Montréal’s Language Distribution According to Twitter…


Over the span of roughly eight months I was able to collect 1,116,442 tweets using the Tweepy Python library which facilitates easy access to the free Twitter API. If eight months sounds like a long time for only a ~1M tweets, that’s because the free Twitter API only gives access to roughly 1% of the total Twitter stream or “firehose”. Since saving the actual tweet goes against the user agreement, I only saved a lat/lon location and a language type within a single SQL database. From all these tweets, only 131,773 (11.8%) were geotagged. Of all the geotagged tweets, 79,565 (60.38%) were in English, 39,508 (29.98%) in French and 12,700 (9.64%) were some other language. Within this “other” language, Spanish represented the largest portion at 2.14% of the geotagged tweets. Using just the English and French geotagged tweets resulted in the map containing 119,073 unique tweet locations of which 33% were French and 66% were English. The map was created using QGIS and Bing Maps satellite imagery.