When automating decision making processes that affect humans, it is important that the decision can be explained to the user in order to meet legal obligations (e.g., GDPR), but also to maintain the trust of clients. Counterfactual explanations are one interesting class of explanation methods that can be applied to various models (they are "model-agnostic"). Roughly speaking, a counterfactual explanation tells a user what has to be changed to change the decision. For example, in a loan application scenario, a counterfactual explanation may tell a user that existing debts have to be decreased before the application can be accepted. In case of a system that predicts the risk of heart failure risk, the system may tell a user to be more physically active in order to reduce the risk. Counterfactual explanations are computed by changing the input in a minimal way such that the desired decision is made. When computing counterfactuals for random forests [2], the problem of computing counterfactuals can be seen as a combinatorial optimization problem. Random forests are state-of-the-art machine learning models that often achieve competitive classification performance at low cost. They are composed of a large collection of decision trees (thus forest) that use a random subset of the available features and data (thus random). The final classification is based on a majority vote among the trees. The decision trees discretize the available features. A random forest therefore always works on discrete features even if (some of) the original features are continuous.
The goal of this project is to familiarize yourself with counterfactual explanations and random forests and to implement an algorithm for computing counterfactuals for random forests based on combinatorial optimization ideas. Your solution should be evaluated on some standard benchmarks (e.g. from https://archive.ics.uci.edu/).
[1] Wachter, S., Mittelstadt, B., & Russell, C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., 31, 841. [2] Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.