Cut Once
A Thunderbird extension for Recipient Prediction and Leak Detection
Cut Once is an extension to Mozilla Thunderbird - a popular open
source email client. Cut Once implements methods for Email Leak
Detection and Recipient Recommendation based on the
papers [SDM-2007][ECIR-2008]
by Vitor Carvalho and William Cohen from Carnegie Mellon
University. The extension is entirely written in Javascript. Some
initial comments on Cut Once can be found here. Details on the associated user study can be found here.
Authors & Contact
Ramnath Balasubramanyan, Vitor R. Carvalho and William Cohen from Carnegie Mellon University. Please send all questions to email.research.cmu@gmail.com.
Usage
Cut Once can be downloaded from here [Latest version (04/06/2008) cut_once-0.0.7-tb.xpi ](Save it as a .xpi file! ...NOT as a .zip file).
It is compatible with Thunderbird versions 2.0.0.0 or later.
Thunderbird extensions are distributed as .xpi packages. To install Cut Once:
- Open Thunderbird
- From the top menu, select "Tools"
- In the "Tools" menu, select "Add ons"
- In the "Add ons" window, click on the "Install" button
- Select the .xpi file (CutOncev0.0.xx.xpi)
- The installation will take a few seconds. If successful, you'll be asked to restart Mozilla Thunderbird.
- After restart, the main window should show the "Train" and the "Send Feedback" buttons on the top right. It should look like this:
After the installation, Cut Once needs to be trained before it is able to make recipient
predictions. Training is achieved by:
- With your mouse, select the directory that contains your sent messages (typically called "Sent Mail", "Outbox", or "sent")
- Note: The extension will NOT work properly if it is trained on a different directory.
- Note: The extension cannot be trained in more than one sent directory (in case you have multiple email accounts).
- Click on the "Train" button on the top right of the window.
- Click "Okay".
- The training window should look like this screenshot below.
The time taken for training depends on the number of
messages in the sent folder, the speed of the processor, etc. A rough
estimate is 150 messages per minute.
- note: DO NOT READ OR WRITE EMAILS USING THUNDERBIRD DURING TRAINING.
- After the message "Trained Successfully" is displayed, click on the "Close" button.
Once the train procedure is
completed, a model file (called thunderbird_infoleak_model.dat) is
created in the user’s home directory. The model file is then
read in by Cut
Once everytime Thunderbird starts up. A weekly reminder encourages
users to retrain on a regular basis.
Cut Once recipient predictions can be seen in two different ways. In
the first method, the user can explicitly seek recommendations by
hitting the "Recommend Recipient” button on the toolbar in the Compose window.
Clicking a recommended email address adds the address to the recipient
list in the Compose window.
In the second method, a dialog box pops up when the user hits
the Send button. This dialog box higlights possible leaks (defined as
email addresses that have been chosen as recipients by the user which are unlikely to
be valid recipients for the message composed, based on the history of
past communication with this address) and also lists other recommended recipients. A
countdown timer ensures that the message is automatically sent after 10 seconds if the user does not wish to use the dialog. The "Pause" button freezes the 10-second counter. The "Cancel" button closes this dialog and returns to the original message under composition.
An example is illustrated below: (1) the Compose window. (2) the Predictions by Cut Once on this message.
Another example is shown in the picture below:
The model file (thunderbird_infoleak_model.dat) created during the training process stores the following
pieces of information about the user’s Sent folder.
•
Centroids: A centroid for each email address to which a message was sent
to is computed by calculating a mean vector over all the messages
addressed to the email address. Each email is represented by a TFIDF
vector over the words in the subject and body.
• Document frequencies: A table of words and its
corresponding document frequency which is the number of messages in
which the word occurred. This is necessary to compute TFIDF vectors for
messages during runtime.
• Recency and Frequency Ranks: Candidate email addresses in the
Sent folder are ranked by recency and frequency to establish a baseline
ranking. The ranks assigned to each email address are saved in the
model file to enable Cut Once to display a baseline ranking during
runtime. The training procedure trims the size of the model by
discarding words whose document frequency is below a threshold and by
discarding email addresses which have very few messages addressed to
them.
The "Einstein" button : Helping researchers in Carnegie Mellon.
User actions within the recipient recommendations dialog and the dialog
box opened after the Sent button is hit are logged. This includes
information such as the rank of a recommendation that the user clicks on, the time taken
by the user to accept a recommendation and the position in the list of a
leak that is
removed by the user. No personal information (such as email
content or recipients) is logged. The logging message does not contain any personal or private information from you, nor
from any of your contacts. Users are asked every week if they would like to
send this log file to the researchers who developed the extension. Log files can also be
explicitly sent by hitting the ”Mail statistics” button (Einstein button) on
the main window.
Under the protocol number HS08-026, this research was approved by the IRB
(Institutional Review Board) --- a group formally
designated by the United States government to approve, monitor and
review research studies with the alleged aim to protect the rights of
the research subjects. Please contact email.research.cmu@gmail.com for
any questions or concerns.
Comments, Reviews and Related Links: