Dataset Shift in Machine Learning
Introduction: DataRobot’s Peter Prettenhofer gave an engrossing talk at the recent ODSC UK conference on the problem of dataset shift in Machine Learning. His introduction consisted of a brief touch on the mathematics of supervised learning and an outline of dataset shift. An interactive illustration served as a wonderful visual display of the problem.
Mr. Prettenhofer also touched on a few ways in which dataset shift in induced. Two of these are sample selection bias – particularly when working with imbalanced datasets – and domain adaptation in Natural Language Processing.
Next came a look at identifying the problem. Unsupervised learning was the focus given the more familiar methods for mitigating the issue in Supervised learning. Mr. Prettenhofer went through three methods to use: Statistical Distance, Novelty Detection, and Discriminative Distance.
Discussing solutions was the natural way to end the talk, and here Mr. Prettenhofer covered the concepts of Importance Re-weighting and Feature Engineering to put a cap on an incredibly informative presentation.