There are millions of records of biological samples in museums and herbaria, many of which include a natural language description of the habitat where the sample was collected. The data have the potential to be used to reconstruct historical habitats. The project will experiment with using machine learning methods to predict habitat type based on such text data. Classifiers will be trained using ground truth data on habitat in the form of digital land cover maps from recent decades, with a view to measuring how well the biological collections textual data can predict standard habitat types.