Popescu and Etzioni, HLT-EMNLP 2005
From ScribbleWiki: Analysis of Social Media
Extracting Product Features and Opinions from Reviews
Etzioni's copy of paper | Slides for class presentation
This paper describes OPINE system which is built on top KnowItAll, a web-based information extraction. Comparing with Minqing Hu and Bing Liu, KDD 2004, they report 22% improvement in precision with 3% loss of recall using feature assessment and relaxation labeling technique.
Their process is the following:
1. Parse reviews: using MINIPAR dependency parser.
2. Find explicit features.
2a. Using KnowItAll system (use extraction rules that are atumatically generated). This step automatically find the parts and properties using WordNet's meronymy (part/whole relation, IS-A) relation ship.
2b. Feature assessor: computes the PMI between the noun phrases and discriminator phrases extracted (e.g.,"of scanner"). PMI for both within the reviews are used and evaluate separately.
3. Extract opinion phrases and determined the polarity using relaxation labeling and they report performance on each task
[4. They omit the description of opinion clustering, finding implicit features and ranking the results.]
The data is the same as Minqing Hu and Bing Liu, KDD 2004 (1621 reviews for 7 products in 5 classes). Result for feature extraction (step 2) was 22% better in precision of which 6% is gained by using PMI assessment within review and 14.5% was due to Web PMI.
They also have tried their extraction on the hotel review data with similar good performance to show that their method is domain independent.
On the task of finding the opinion phrases and the polarity (step 3), they extracted the opinion phrases using extraction rules and then they applied a technique called relaxation labeling which finds the optimum (i.e., maximum likelihood) assignment of set of labels (positive, negative, neutral) to set of objects given a set of neighborhood constraints. This is done in three steps:
1. Assigning labels to words
2. Given the word labels, assign labels to (word,feature) tuple.
3. Given the tuple labels, assign labels to (sentence,word,feature) tuple.
The neighborhood features that are used as constraints in the relaxation labeling are based on the following relationships (between a given word and the neighbors):
1. Conjunction and disjunction
2. Dependency parsing rule templates
3. Morphological relationships between words
4. WordNet synonymy, antonymy, IS-A and morphological information
For the results, OPINE is compared with PMI++ (Turney's PMI but considering context) and Hu++ (Hu's method was using adjective but in this method they considered other part of speech as well). Precision/recall for extraction is 79%/76% and for polarity detection 86%/89%. The breakdown for each POS is also shown.
- Bibtex
@inproceedings{1220618, author = {Ana-Maria Popescu and Oren Etzioni}, title = {Extracting product features and opinions from reviews}, booktitle = {HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing}, year = {2005}, pages = {339--346}, location = {Vancouver, British Columbia, Canada}, doi = {http://dx.doi.org/10.3115/1220575.1220618}, publisher = {Association for Computational Linguistics}, address = {Morristown, NJ, USA}, }
Annotated by Mehrbod