Some Topics in Spam Filtering
D. Sculley, Tufts University

Abstract

This talk will examine three recent inquiries in machine-learning based filtering of spam emails. First, we will examine a long standing debate in the spam filtering community, and show that online support vector machines (SVMs) do, indeed, give state of the art performance on spam filtering tasks. Second, we show how to reduce the cost of online SVMs with several relaxations, which yield nearly equivalent results at greatly reduced computational cost. Third, we investigate the use of online active learning methods for spam filtering, which both reduce the number of labels needed for strong filtering performance and enable a variety of useful user-interface options. Finally, we investigate the problem of one-sided feedback, caused when a potentially lazy user only labels messages that appear in the inbox, and never gives feedback on messages that are predicted to be spam.

Bio

Venue, Date, and Time

Venue: NSH 1507

Date: Monday, September 24

Time: 12:00 noon

Slides