A long time ago, I made a little script that converts whatever you type into animals. You can try it here. And you can download and play with the code here. And you can learn more about how it was done below… More »

My friend recently asked me how I make word clouds for presentations. Wordle is definitely a good choice. WordPress automatically makes word clouds out of my tags in the sidebar. But sometimes you can’t or don’t want to upload your data to places like WordPress or Wordle and you just want to use R (because you use R for everything else, so why not? Or is that just me?).

In a typical word cloud, word frequency is what determines the size of the word. As of this writing, the word cloud in my side bar (over there ) has “linguistics” and “programming” as clearly the largest words. Tags like “video games,” “language,” and “education” are also pretty big. There are also really small words like “Navajo” and “handwriting.” This reflects the frequency of each tag. Bigger tags are more frequent, so I write about linguistics a lot but not so much about Navajo in particular.

More »

When I enrolled as a freshman in college, I registered as a linguistics major but I had a notion that I would minor in computer science. Computer science seemed interesting and well-paying and I didn’t even know computational linguistics was a thing at the time. I just liked computers. I never had a problem switching between Macs and PCs. I liked to peak inside computers and replace the RAM and things like that. I had poked around with HTML editors. The classes on things like graphic design and artificial intelligence and stuff seemed really cool.

I looked up the prerequisites and found that to minor in CS you had to actually get pretty far in math, at least through Calculus C and one or two courses of Linear Algebra. So, naturally, I signed up for Calculus A my fall term.

More »

I’m finally getting into GitHub, partially thanks to Coursera’s Data Science specialization, which requires it. Anyway, I blogged about my twitter bot, @AllTheLanguages, here and here, and now you can download, fork, watch, star, or whatever it is that kids do on GitHub to code here.

A few weeks ago, I posted about how to build a twitter bot. I wish it stopped there. Unfortunately, all code has bugs (ahem, I mean, features). There are two bugs in my code. The first one I understand, and could probably fix if I tried, but I haven’t because it’s probably more trouble than it’s worth. The second one I don’t understand, but I do know how to fix it. More »

I started reading a book about artificial intelligence. It’s an older book, but only $4 and it came highly recommended as a starting point, since a lot of the basic concepts are still the same. Based on the things I was reading and this xkcd, I figured it might be within my capabilities to write a program that plays Tic Tac Toe. And I have. Sort of. You can play it here. Kinda.

See, the concepts of artificial intelligence and the basics of programming aren’t so hard. What’s hard is making it work in the “real world.”  More »

A few months ago, I created a bot on Twitter. @AllTheLanguages tweets a new language from the Ethnologue database once every hour or so, and will do so for about a year. Give or take. Sometimes the bot goes down and I have to reboot it. And there are some other bugs too. But more on that in another post…

When I tell people that I made a twitter bot, the first thing they ask (after “why?”) is “how?” Well, today, I’m going to answer that! Why? Because it was fun! How? Well, it’s complicated… More »

Boo is boolean, apparently.


Language has a pretty interesting property known as Zipf’s Law. That is, language data (and even subsets of language data) have a Zipfian distribution. There are a small number of highly frequent words, and a large number of highly infrequent words. Moreover, the frequent words tend to be short, grammatical (words that are grammatically required but don’t really mean anything) and the infrequent words tend to be longer, lexical (words like nouns and verbs which have some sort of referent or meaning).

What does this mean? Well, to show you I downloaded all of the English wikipedia (and you can too here). More »