Recognising and disambiguating place names in text documents

Ellie Robins


Supervised by Chris B Jones; Moderated by Surya Thottam Valappil

In order to index documents with regard to geographic space it is necessary to recognise and geocode (i.e attach coordinates to) place names in the textual content. This information retrieval project is concerned with developing machine learning methods to perform this task. A gazetteer consisting of a list of place names will be used to assist in the process. The task can be challenging because of the difficulty of distinguishing genuine place names from other terms, such as the names of people and organisations, and because some place names (such as Newport) are ambiguous due to different places having the same name. The machine learning process will employ various types of evidence that a name is a place name, such as whether it occurs in a gazetteer, and whether it is preceded by spatial relations such as near or towards; and if it is a particular instance of place name, such as whether it is associated with words that are associated with a particular location.

Initial Plan (03/02/2020) [Zip Archive]

Final Report (05/06/2020) [Zip Archive]

Publication Form