Akka Streams integration in CSV-W Validator tool to improve performance

Ajay Joseph


Supervised by Martin Caminada; Moderated by Dr Daniela Tsaneva

The Office for National Statistics (ONS) is the producer of official statistics in the United Kingdom and this dissertation project is done in collaboration with them. The ONS publishes enormous amounts of data every week and most of it is in the form of spreadsheets. The publishing team within ONS is moving towards a cleaner and better form of publishing data and that is by switching to a new standard called CSV-W which stands for Comma Separated Values on the Web. CSV-W is a standard by the World Wide Web Consortium and it is starting to emerge as the best form of linked data and thus it is being adopted by the governments across the world. As CSV-W is a standard with a large set of rules, there comes the need of tools to validate this standard. The Integrated Data Platform or IDP team of ONS is at the forefront of this CSV-W movement. The main aim of this project is to enhance the capabilities of the CSV-W validation tool currently in use by the Integrated Data Platform team. This project tries to find out if the Actor model (a computing model for concurrent processing) can be integrated to the existing tool for CSV-W validation. The importance of this project is based on the fact that ONS has to deal with CSV-Ws which has more than millions of rows inside each table. Thus normal synchronous processing / validation of rows are inadequate at times.

Final Report (26/11/2021) [Zip Archive]

Publication Form