Learning to Recognize Plankton

T. Luo, K. Kkramer, D. Goldgof, L. Hall, S. Samson, A Remsen, T. Hopkins

Journal of Machone Learning Research

JMLR 6 April 2005,  pages 589-613

Abstract

This paper presents an active learning method to reduce domain experts' labeling effort in applying support vector machines to recognize underwater zooplankton from higher-resolution, new generation SIPPER II images. Most previous work on active learning with support vector machines only deals with two class problems. In this paper, we propose an active learning approach ``breaking ties'' for multi-class support vector machines using the one-vs-one approach with a probability approximation. Experimental results indicate that our approach often requires significantly less labeled images to reach a given accuracy than the least certainty active learning method and random sampling. It can also run in batch mode with an accuracy comparable to labeling one image at a time and retraining.

Data Sets

There are two data sets used in the paper. MasterTestImages and ValidationImages. They are both availabe in 3 different formats C45, ARFF, and Sparse. The ARFF version of the data sets includes the image file name of the plankton for each example. The images that the two data sets are derived from are available in two seperate zip files, one for test and the other for validation.

Feature Data Sets Image Files Feature Descriptions
C4.5 Arff Sparse Test Images Validation Images Word PS