Interpretable Machine Learning aims at designing machine learning models that are human-interpretable [A]. Computer vision is a particularly challenging domain. Many state-of-the-art models tend to learn surface statistical regularities rather than concepts that are meaningful to humans. This does not only cause interpretability, but also robustness problems. For example, researchers in adversarial machine learning demonstrated that the interpretation of traffic signs can change dependent on the lighting conditions or that people can be impersonated by wearing colorful glasses. In order to overcome these limitations, some new architectures have been designed that introduce a human-interpretable layer between the pattern recognition and classification module of deep learning architectures. One interesting example is ProtoPNet [B] that introduces a so-called prototype-layer. Each prototype in the prototype-layer is associated with a patch from the training data, so it can be visualized. The classification is made based on the existence of prototypes. Intuititively, the prototypes should represent prototypical parts of the classified objects like a beak, wing or tail of a bird. ProtoPNet then allows explaining the classification by statements of the form this-looks-like-that. We recently introduced ProtoArgNet [C], an extension of ProtoPNet that increases its expressiveness along different dimensions. The architecture of ProtoArgNet roughly consists of 1. A convolutional backbone that recognices visual patterns. 2. A prototype layer that recognizes "prototypical parts". 3. A super-prototyper layer that combines prototypical parts to a "class representation". 4. A classifier that can be interpreted as a sparse MLP or as a quantitative argumentation framework. The goal of this project is to study the interpretability and plausibility of ProtoArgNet in more detail. To this end, the project first aims at building a user interface for ProtoArgNet that allows visualizing the architecture and facilitating the explanation process. For visualizing the classification component, the project can (but does not have to) build up on PySparx [D], a visualization library for the classification module used in ProtoArgNet. Based on the visualization, the project then aims at systematically studying the plausibility of the learnt prototypes and super-prototypes and the interpretability of the classification of input images.
Possible interesting visualizations include - visualizing the recognized prototypes in the input image (e.g., region highlighting/blurring dependent on activation of the prototype), - visualizing super-prototypes in the input image (e.g., red/green overlay for positive/negative prototypical parts), - visualizing the classification (building up on ideas from PySparx).
Possible interesting questions about plausibility and interpretability include - how many prototypical parts are conceptually meaningful for typical input images? - how plausible are the different superprototypes for typical input images? - how interpretable is the overall classification (can the network be made sufficiently sparse without decreasing classification performance too much)?
This is a joint project with colleagues from Imperial College London and we are planning to have some joint meetings (via Teams) during the project to discuss the visualizations and observations. [A] https://link.springer.com/chapter/10.1007/978-3-030-65965-3_28 [B] https://proceedings.neurips.cc/paper/2019/hash/adf7ee2dcf142b0e11888e72b43fcb75-Abstract.html [C] https://arxiv.org/abs/2311.15438 [D] https://cgi.cse.unsw.edu.au/~eptcs/Published/ICLP2023/Proceedings.pdf
Basic understanding of machine learning and familiarity with Python is necessary for the project. Familiarity with deep learning libraries and/or argumentation frameworks is helpful.