This project aims to develop an acoustic side-channel attack deep learning model capable of identifying and classifying keystrokes based on the sounds produced by the keyboard during typing. This system leverages advancements in audio processing and machine learning techniques, specifically, Convolutional and self-Attention Networks for deep learning, Mel-spectrograms for image representation, and SpecAugment for data augmentation.