Of all the books I read in high school to prepare for college, How to Lie with Statistics by Darrel Huff was, by far, the best.  It’s actually not encouraging lying with statistics. Rather, it encourages you to think when you see a statistic, and question whether the data are presented in such a way that it leads the reader to a false conclusion. It teaches the kind of critical thinking and questioning authority that’s essential for success in academic research, and it’s written at a level that most high schoolers could understand.

Let me demonstrate with two small datasets: More »

My friend recently asked me how I make word clouds for presentations. Wordle is definitely a good choice. WordPress automatically makes word clouds out of my tags in the sidebar. But sometimes you can’t or don’t want to upload your data to places like WordPress or Wordle and you just want to use R (because you use R for everything else, so why not? Or is that just me?).

In a typical word cloud, word frequency is what determines the size of the word. As of this writing, the word cloud in my side bar (over there ) has “linguistics” and “programming” as clearly the largest words. Tags like “video games,” “language,” and “education” are also pretty big. There are also really small words like “Navajo” and “handwriting.” This reflects the frequency of each tag. Bigger tags are more frequent, so I write about linguistics a lot but not so much about Navajo in particular.

More »

Let’s just get this out of the way: There are, in fact, differences in the way men and women think, speak, act, etc. How much of that difference is due to nature and how much is due to nurture is up for debate. But that is not what this post is about.

This post is about a particular language myth that, for whatever reason, will not die. There are literally dozens of peer-reviewed, scientific studies refuting this myth, and yet the popular culture clings to it.

The myth I’m referring to is the idea that women talk more than men. More »

Last week, Jennifer Lawrence and Conan O’Brien had a little spat about whether the past tense of “sneak” is “snuck” or “sneaked.”

So which is it?

More »

In case you didn’t know, FiveThirtyEight is an awesome blog about statistics. Recently, they posted a challenge against the new Words With Friends Artificial Intelligence. For the sake of science, I decided to replicate their study.

I’m an avid WWF player. As of this writing, I have played exactly 1800 games since October 2010 (which amounts to a little over one game per day). Of those, I’ve won 930, lost 864, and tied 6. Yes, I win more than I lose, but not this isn’t statistically significant (χ² = 2.4281, df = 1, p-value = 0.1192). In other words, I win more than I lose mostly due to chance.

More »

This week, my students choose their topics for their research papers. Since I’m teaching a general education writing class which is required of all freshmen, the topic can be literally anything. I have to approve it, but I’ve been pretty flexible. After all, I might learn something interesting! Not to mention reading 25 papers on the same thing is eye-numbingly boring.  More »

The news coverage of the ebola outbreak is interesting, and probably worth some linguistic analysis at some later point. My impression is that it oscillates between causing and placating panic. One story will decry sanitary conditions in hospitals and demand the borders be closed to West African countries, so everyone panic because we’re all about to die horrible deaths. The next story will emphasize that it’s hard to contract ebola unless you’re handling bodily fluids or eating African fruit bats, so don’t worry everything is going to be okay. 
More »

One of the first things many people learn in introductory linguistics classes is that there are about 7,000 languages in the world, plus or minus 1,000, depending on whether or not you’re only counting living languages or dead ones too, and how you divide the line between languages and dialects. Counting languages is difficult, even for professional linguists. But what about for laypeople? Does average Joe even realize how many languages there are in the world? Are linguists doing a good job of educating the public? More »

I’m teaching my first class as instructor of record this summer. I’ve TA’d and graded for a number of classes, and tutored, and given talks at conferences and so on, but this is the first class that’s been completely mine! The final exam is this week, and one thing I’ve worried about is grade inflation.

More »

Language has a pretty interesting property known as Zipf’s Law. That is, language data (and even subsets of language data) have a Zipfian distribution. There are a small number of highly frequent words, and a large number of highly infrequent words. Moreover, the frequent words tend to be short, grammatical (words that are grammatically required but don’t really mean anything) and the infrequent words tend to be longer, lexical (words like nouns and verbs which have some sort of referent or meaning).

What does this mean? Well, to show you I downloaded all of the English wikipedia (and you can too here). More »