Saturday, August 29, 2015

Using Twitter Data to Study Men and Women's Drinking Habits


My analysis of half a million drunk tweets, and more than two million tweets about alcohol, was just published by Quartz; you can read it here (and the coverage by Slate here, if you speak French).

Friday, August 7, 2015

I Look Like an Engineer

This week tens of thousands of female engineers tweeted pictures of themselves and explanations of the work they do under the hashtag #ILookLikeAnEngineer. The movement made the front page of the New York Times, and so I decided to see what I could do with the tweets. Because this is a post about female engineers, I will describe the engineering steps in a little more detail:
  1. I used a program to scrape roughly 100,000 #ILookLikeAnEngineer tweets.
  2. From each tweet, I extracted any links to images, filtering out retweets and duplicate links.
  3. Using what Mark Zuckerberg in The Social Network would call “a little wget magic”, I downloaded all the pictures from the links. This gave me roughly 10,000 pictures (1.2 GB). Then I created a site so that drunken fraternity men could rate the attractiveness of the women...no, never mind.
  4. I programmatically cropped all 10,000 images into squares of uniform size.
  5. I wanted to see if I could create mosaics: compose a large image from tiled smaller images. There are websites which do this but I was pretty sure they would choke on 1.2 GB of pictures and not give me the freedom to experiment with parameters. The better solution was to write code to do it, and the lazier solution was to see if someone else had already done that, which they had. This allowed me to create mosaics using only one line of code, but I didn’t like the initial output so I added a bunch of my code to their code until it did what I wanted.

Here are the final results. From far away, the mosaics look like Seurat: for example, here is the woman who started it all.

Zoom in and you get lost in the individual pictures.

Here are some high resolution versions (warning: the files are large; zoom in). Please feel free to use them, with attribution, to persuade women to become engineers or do other socially useful things. (Do me a favor and let me know about it!)

I was working on this while watching the Republican debate, and at some point I got so tired of hearing overconfident men pretend to know more than they did. (The line that did me in was Huckabee’s assertion that scientists agree personhood begins at conception because of a fetus’s “DNA schedule”. What the hell is a DNA schedule?) So I went home and wrote code. I’m often comforted by the fact that, however loud and annoying the person lecturing me may be, they cannot get inside my skull: the silent sanctity of those few inches of space, the infinite freedom to reflect and create, remain my own.

And yet. It’s naive to think that freedom of thought is enough. My work requires a computer, which I need economic freedom to buy. And Huckabee’s proposed restrictions of contraception and abortion will reduce women’s economic freedom. My work is funded by government science agencies which Huckabee wants to cut.  So even code is cold comfort at the moment.

Apologies for the slightly bleak ending. If you have a custom image you’d like female-engineerified or higher resolution versions of these images, I’m happy to do that. If you are one of the women portrayed here and are uncomfortable having your face composed of many smaller women, let me know and I will take your picture down. And if you have ideas for cool things to do with this dataset or ways to improve the mosaics, please let me know! (Some pretty obvious improvements one could make are a) filtering out non-faces and b) filtering out duplicate images.)