This project presents the development of a computer-aided diagnosis (CAD) system for the detection of skin cancer in dermatoscopic images, using traditional image processing and machine learning techniques. A significant focus has been placed on detection of the deadliest form of skin cancer, melanoma. Dermatoscopic images have been acquired from two reputable datasets, the HAM10000 and Ganster datasets, enabling both single and cross-dataset evaluation. A robust pre-processing pipeline has been implemented including hair removal, contrast enhancement and black border removal, before various segmentation methods were evaluated and compared. Otsu thresholding, GrabCut, Watershed and Fuzzy C-Means segmentation have been implemented, and their accuracies have been assessed using Intersection over Union against manually created and pre-existing ground truth masks. Hand-crafted features have been extracted based on the ABCD rule of dermatology (Asymmetry, Border irregularity, Colour, Diameter), in addition to texture features extracted from Grey Level Co-Occurrence Matrices. Recursive Feature Elimination and an ablation study have been used as feature selection techniques, to identify the most informative features and reduce dimensionality. Class imbalance within each dataset has been addressed by applying undersampling techniques to the training sets, whilst leaving the testing sets untouched, to preserve a class balance which resembles real-world clinical scenarios. A range of classifiers have been tested, including SVM, KNN, Random Forest and XGBoost, with XGBoost consistently demonstrating the highest sensitivity across each classification task.
The project utilises datasets which originated from the following papers:
H. Ganster, A. Pinz, et al., Automated melanoma recognition., IEEE Trans. on Medical Imaging, 2001.
The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions