I gave a talk on “Text Geolocation and (Text) Dating” last week at Johns Hopkins University’s Center for Language and Speech Processing. It covers some of our recent work at UT Austin on automaticall geolocating and time-stamping documents. It also includes some initial thoughts on how this can help us move toward quasi-grounded models of word meaning, which is a research area that Katrin Erk and I have recently begun to work on. Here’s the abstract:
It used to be that computational linguists had to collaborate with roboticists in order to work on grounding language in the real world. However, since the advent of the internet, and particularly in the last decade, the world has been brought within digital scope. People’s social and business interactions are increasingly mediated through a medium that is dominated by text. They tweet from places, express their opinions openly, give descriptions of photos, and generally reveal a great deal about themselves in doing so, including their location, gender, age, social status, relationships and more. In this talk, I’ll discuss work on geolocation and dating of texts, that is, identifying a sets of latitude-longitude pairs and time periods that a document is about or related to. These applications and the models developed for them set the stage for deeper investigations into computational models of word meaning that go beyond standard word vectors and into augmented multi-component representations that include dimensions connected to the real world via geographic and temporal values and beyond.
CLSP records all their seminars, and the video for my talk has been posted.
Unfortunately, the slides are not visible in the talk, so I’ve posted them here: slides for Text Geolocation and Dating.
Errata: I believe I say at one point that Alessandro Moschitti had created the TweetToLife application that I used to get temporal distributions on Twitter over the day, but that was actually Marco Baroni I had in mind (joint work with Amaç Herdağdelen). Getting my Italians mixed-up! (I had just read a nice paper by Moschitti, so my prior was off…)