Wednesday, July 2, 2014

Who Wanted America To Lose?

To answer this question, I monitored Twitter during the US-Belgium soccer game, tracking a diverse set of hashtags [1] that included everything from the normal (“worldcup”, “usavbel”) to the strange (“waffles”, “murca”). I collected more than 2 million tweets: at peak, more than 10,000 tweets a minute. (For a sense of how much traffic that is: there were more Tweets in one minute about the game than Tweets involving any kind of alcohol name in three hours. It was enough that Twitter began limiting the number of Tweets I could collect, so it’s an underestimate.)


So who wanted America to lose? One way to answer this question is to look at how often tweeters say “USA” vs “BEL” or “Belgium” during the game. While not every tweeter who mentions a country will be rooting for it, of course, systematic disparities are indicative:


Timezones with most USAs
“USAs” for every “BEL” or “Belgium”
Timezones with Least USAs
“BEL” or “Belgium”s for every “USA”
America/Detroit
8.4
Europe/Brussels
3.7
America/Indianapolis
4.4
Europe/Skopje
2.4
America/Halifax
4.0
Europe/Ljubljana
1.3
America/New York
3.9
Europe/Lisbon
1.2
America/Phoenix
3.8
Asia/Kuala Lumpur
1.1


One thing that emerges, besides the obvious geographic trend, is that the people chanting “USA” are more enthusiastic than the people chanting “Belgium”: the ratios are more extreme. This will not be surprising to anyone who watched the game with a bunch of Americans who were drinking, as I did (well, they were drinking. I was writing code.) The most “USA”s in a single tweet: 35. Most theoretically possible, with spaces, given Twitter’s 140 character limit: 35.

Also, in 82/88 timezones, the number of “USA”s exceeds the number of “Belgium”s or “Bel”s combined, and that’s without counting “America”, “Murca”, or “Merica”. So it’s tempting to declare Twitter victory -- but this is premature, given the biases in this dataset. As an American, I’m not intimately familiar with all the hashtags Belgian fans use to describe themselves, some of which may not be in English: this will bias the selection in favor of America fans.


A second way to see the Europe-America rivalry is to look at when people on each continent shouted “Goo...al” with more than one “o” -- on the assumption that people who do that are either a) television announcers or b) happy their team scored. In Europe, these shouts spike after the first two goals, which Belgium scored; in America, they spike after the third, which America scored.


Of course, you don’t need big data to figure out that Belgium wanted America to lose, and I don’t want to go all the way to Brussels to get revenge. Can we use Twitter to pick out more local Belgium fans? I filtered for Tweeters in North American timezones, then looked at Tweeter profile words that were associated with the most “Belgium”/”Bel” tweets relative to “USA” tweets. The top word was “Ronaldo”, either indicating people seeking revenge for Portugal or people who care about soccer independent from the World Cup and don’t just want an excuse to drink and shout about America. (Not that there's anything wrong with that.) Substantiating the second hypothesis, the second word was “chelseafc”, an abbreviation of “Chelsea Football Club”. Also near the top were “canada” and “canadian”, on which I have no comment. None of these phrases were associated with a strong bias in favor of Belgium, though -- the split was about 50-50 -- probably because most people in America don’t want to publicly support Belgium. The most strongly biased American word was “txst”, indicating tweeters affiliated with Texas State University: 98% of their tweets had more “USA”s than “Bel” or “Belgium”s. This was followed by “dawgs” -- mostly University of Georgia fans -- and “offer” -- mostly local businesses, who probably decided that tweeting for Belgium would not be profitable. (One could also try filtering for North Americans who shouted “Gooal” after Belgian goals but not after American ones: unfortunately, this yielded too small a sample to draw conclusions.)


We can also look at how people’s moods changed over the course of the game, using a technique called sentiment analysis that looks for positive or negative words (this is what was used in the recent Facebook study). For example, a large number of Americans tweeted about waffles during the game, probably because that is one of the few things Americans can associate with Belgium. But near the end of the game, the tweets involving waffles get both negative and profane. I was disappointed that we lost, of course, but this still seems unfair to me. What did the waffles do to you?

This is obviously just a first look at the data -- I wanted to do a quick analysis while everyone was still excited about soccer -- so let me know if there are things which seem off to you or if there are things you'd like me to look further at! Thanks to everyone who suggested tweets to follow.


Notes:

[1] The full list: worldcup, usavbel, letsdothis, ibelievethatwewillwin, usa,belgian,belgiumfacts,ibelieve,areyouready,onenationoneteam,belgium,bel,merica,murica, king_leopold, Stellartois, saynotoracism, ibelievethatwewilllose, waffles, belgianwaffles,1n1t,usmnt. Some of these -- Stellartois, for example -- were just on the list because I thought they might be interesting, and accounted for a very small fraction of tweets.

2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. There's shocking news in the sports betting world.

    It's been said that any bettor needs to look at this,

    Watch this now or stop betting on sports...

    Sports Cash System - Automated Sports Betting Software.

    ReplyDelete