Note that the student is on the HPC course, so this project must contain an HPC element!
Twitter data, Flickr photos and social media in general, are increasingly used to understand and analyse what is happening in the world around us. For this purpose, it is especially useful that many tweets and Flickr photos have explicit meta-data associated with them, such as geographical coordinates, a time stamp, or (in the case of Flickr) descriptive tags.
The aim of this project is to analyse data from Twitter (and/or Flickr) to discover which significant events have occurred in a given spatial region during a given period. To this end, the number of tweets (or photos) per day will be analysed for a number of cities in the UK. In cities where an abnormally high number of tweets (or photos) are posted on a given day, some kind of event has likely occurred. In such a case, we can look at the terms that occur abnormally often in associated tweets (or as tags with associated Flickr photos) to find out the nature of this event.
Good Java programming skills are required for this project.
edit by Adam After talking to Steven Schockaert we agreed that I will look at processing this from a big data angle. Although my data set will not be particularly large it will be large enought that it will be impossible to store the tweets in memory. I will also attempt to solve the problem by using hadoop.