A Tutorial Overview of Machine Learning, Feature Selection
and Classification With Application to Cancer Detection
by Art Wetzel and Sylvester Czanner
The PSC has recently begun a collaboration with Dr. Michael
Lotze and his team at the UPMC Hillman Cancer Center to assist
with the analysis of data for cancer detection. This presentation,
motivated by the Dec 17 visit of Dr. Chip Petricoin to UPMC, is a
tutorial overview of computational methods that may be useful for
identifying cancer biomarkers in proteomic data such as Surface
Enhanced Laser Desorption/Ionization (SELDI)-Mass Spectroscopy.
During the past year Dr. Petricoin's group at the NCI has published
very promising results in distinguishing cancer vs normal cases from
blood serum using SELDI Mass Spec together with computational
classification based on genetic algorithm (GA) feature selecton feeding
into a Kohnoen Self Organizing Map (SOM). Our talk will review
the benefits and limitations of the computational classification
methods used in the Petricoin and similar studies and will suggest
some alternatives that may provide additional understanding of
the relative significance of different biomarkers. We will include
a general discussion of supervised and unsupervised learning methods,
various classification techniques, the importance of feature selection,
and a special section on Support Vector Machine (SVMs) classifiers.
We have used some of these methods to achieve 96% correct
classification on a test using publicly available ovarian cancer data.