Generating Differentially Private Datasets Using Deep Learning (in collaboration with: Office for National Statistics)

Samuel Wincott


Supervised by George Theodorakopoulos; Moderated by Yukun Lai

Government organisations, businesses, academia, members of the public and other decision-making bodies require access to a wide variety of administrative and survey data to make informed and accurate decisions. However, the collecting bodies are often unable to share sensitive data without risking breaking the confidentiality and consent checks required to obtain this data. Therefore, researchers have proposed many methods for generating synthetic data to replace the raw data for the purposes of processing and analysis. A good synthetic dataset has two properties: it is representative of the original data and it provides strong guarantees about privacy. This project is in collaboration with the Office for National Statistics, and it involves the application of the concept of differential privacy for the generation of synthetic data using deep learning techniques. It is a condition by the ONS that the code from the project be open and available on github.

Initial Plan (03/02/2020) [Zip Archive]

Final Report (15/05/2020) [Zip Archive]

Publication Form