Kim et al, COLING 2004
From ScribbleWiki: Analysis of Social Media
Determining the Sentiment of Opinions
ACM Portal or Kim's Copy of Paper
This paper describes a system that can identify opinions and their orientations given a topic. The authos define opinion as the quaruple (Topic, Holder, Claim, Sentiment). Sentiment can be positive, negative or neutral. Their system architecture is shown on the bottom of this pages. Their process is the following:
1. Finding the sentences containing topic and holder
2. Delimit the region for the holder
This is done manually or by named entity extration for PERSON and ORGANIZATION. If multiple holders are found, proximity is used to assign to the topic. They experimented with various region sizes and found that the regions between the topic and holder to the end of the sentence provides better performance.
3. Classify the sentiment of the words (separately by each part of speech)
Start with seed list of positive and negative words and grow the list by WordNet's synonyms and antonym relation. The problem was that some word can appear in both negative and positive phrases and some are neutral so they quantify the sentiment P(sentiment|new word) using two models:
a. product of probabilities of the seed words appearing in the synset of the new word
b. fraction of synset words appearing in various sentiments
The probability values is computed for + and - and the max likelihood sentiment is picked
4. Combine word sentiments to get the sentiment of sentence
This is done by proxmity to holder. If holder is missing, the word sentiment is ignored. To aggragate the word sentiments in a region, they experimented with product of signs, average or geometric means and found that product of signs provides better results.
The evaluation is done for word and sentence sentiment classification separately both on human annotated data and they show that in all cases Human-Human agreement is very similar to Human-Machine and the high recall as well. Their best system performed at 81% with manually provided holder and 67% with automatic ones.
They also explain about the problems which are mostly about related to limited context and as a result incorrect assignments that can occur in different part of the system.
- BibTex
@inproceedings{1220555, author = {Soo-Min Kim and Eduard Hovy}, title = {Determining the sentiment of opinions}, booktitle = {COLING '04: Proceedings of the 20th international conference on Computational Linguistics}, year = {2004}, pages = {1367}, location = {Geneva, Switzerland}, doi = {http://dx.doi.org/10.3115/1220355.1220555}, publisher = {Association for Computational Linguistics}, address = {Morristown, NJ, USA}, }
Annotated by Mehrbod