This week, my students choose their topics for their research papers. Since I’m teaching a general education writing class which is required of all freshmen, the topic can be literally anything. I have to approve it, but I’ve been pretty flexible. After all, I might learn something interesting! Not to mention reading 25 papers on the same thing is eye-numbingly boring.  More »

I’m not gonna lie, I’m somewhat jealous of #thegiftofdata. This couple tracked their text messages for a whole year of dating and a whole year of marriage, and got some pretty cool word clouds out of it!

The news coverage of the ebola outbreak is interesting, and probably worth some linguistic analysis at some later point. My impression is that it oscillates between causing and placating panic. One story will decry sanitary conditions in hospitals and demand the borders be closed to West African countries, so everyone panic because we’re all about to die horrible deaths. The next story will emphasize that it’s hard to contract ebola unless you’re handling bodily fluids or eating African fruit bats, so don’t worry everything is going to be okay. 
More »

Language has a pretty interesting property known as Zipf’s Law. That is, language data (and even subsets of language data) have a Zipfian distribution. There are a small number of highly frequent words, and a large number of highly infrequent words. Moreover, the frequent words tend to be short, grammatical (words that are grammatically required but don’t really mean anything) and the infrequent words tend to be longer, lexical (words like nouns and verbs which have some sort of referent or meaning).

What does this mean? Well, to show you I downloaded all of the English wikipedia (and you can too here). More »