Smart Home Activity Inference using Network Data

Mary Zacharias


Supervised by Charith Perera; Moderated by Carolina Fuentes Toro

This paper explores the development of a smart home-based activity recognition tool through a multi-stage device-state classification approach, using encrypted network data.

Through a comprehensive literature review and findings from subsequent experimentation, this paper demonstrates that commercial off-the-shelf (COTS) devices, and device activities, can be automatically recognised through patterns in their network activity. Subsequently, it finds that more complex human activities can be constructed using this data, almost, if not as successfully, as traditional sensor-based methods. It also posits that this traffic-based methodology, through its open-source nature, offers the benefit of being almost fully hardware and software agnostic. It is therefore put forward as a feasible time and cost-effective alternative to device-based approaches.

This Thesis uses this methodology to develop a layman-friendly activity recognition tool as a counterpart to the diagnostics component of current open-source smart home hubs like Home Assistant and OpenHAB. It finds that the challenge with such applications is their currently niche (technical) user base, and/or their requirement for periodic monetary commitments to connect to certain devices. The proposed tool in this Thesis therefore navigates these challenges through its easy set-up and simple UI — both of which do not require prerequisite technical skills to use. The proposal eliminates any monetary commitment as well, by making use of open-source network sniffing tools to identify all IP connected devices and their current states. This tool goes a step further to ensure that devices are identified through robust fingerprinting methods, instead of (currently used) common identifiers like MAC and IP addresses, so as to prevent the possibility of spoofing attacks.

To implement this tool this Thesis extends current, privacy-centred literature on passive activity detection techniques, and brings this into a production-ready environment. It identifies the database requirements, outlines tools to capture and process network traffic, recommends top performing binary and multi-class classifiers, and offers ways to retrain models using user input. It also highlights the challenges and limitations of using this software development technique.

For evaluation, this Thesis utilises metrics such as precision, recall and F-measure, which provides an insight into the quality of the classification. The performance of these probabilistic models is evaluated and these results provide a baseline for comparison with other recognition methods in live settings.

Final Report (21/10/2022) [Zip Archive]

Publication Form