Sentiment analysis

From Brede Wiki
(Redirected from Opinion mining)
Jump to: navigation, search
Topic (help)
Sentiment analysis

Text sentiment analysis
Opinion mining

Category: Sentiment analysis

Text mining
Affective computing


Twitter sentiment analysis
Wikipedia sentiment analysis

Databases: Wikipedia with DBpedia
Papers: DOAJ Google Scholar PubMed
Ontologies: MeSH NeuroLex Wikidata Wikipedia
Other: Google Twitter WolframAlpha

This is a graph with borders and nodes. Maybe there is an Imagemap used so the nodes may be linking to some Pages.

Text sentiment analysis (or usually just sentiment analysis) is a text mining technique to analyze the sentiment of the writer or to the topic written about.

Bo Pang and Lillian Lee have written a lengthy introduction to sentiment analysis: Opinion mining and sentiment analysis.[1]

Sentiment analysis may be combined with another text-mining technique, topic mining, in what is called topic-sentiment analysis.[2]


[edit] Methods

Sentiment analysis may employ machine learning techniques. One often apply method is naïve Bayes classifier where the algorithm is trained on a labeled data set. Within the Python package NLTK is a classic sentiment analysis data set (movie reviews) as well as general machine learning methods for sentiment classification. Some of the earliest papers on this approach are probably

Another approach is to use a word list where each word has been scored for positivity/negativity or sentiment strength. There exists several word lists: ANEW is the oldest and has around 1000 words, AFINN is newer and has around 2.500, while labMT has over 10.000 words scored.

One way to extended word lists is to use word co-occurence or a word ontology such as WordNet.[3] The method may go back to 1957.[4]

[edit] Corpora

Affective Text 
"Affective Text: Data Annotated for Emotions and Polarity" Rada Mihalcea [3]
Darmstadt Service Review Corpus "consumer reviews annotated with opinion related information at the sentence and expression levels."[6]
Movie reviews 
A classic data set in sentiment analysis by Bo Pang and Lillian Lee. It is included in the NLTK python package in nltk.corpus.movie_reviews.
Multi-Domain Sentiment Dataset
Manually-labeled Twitter posts. The data is described here, but it is unclear if the data is publicly available.
[5] In 2013 there was a Twitter sentiment analysis task with several thousand labeled postings and SMS text messages.
Sentiment140 corpora 
2 data sets from Twitter. One with 498 labeled tweets [6]. See also the description: [7].
Stanford Sentiment Treebank[7]
Twitter sentiment analysis. Self-driving cars 
[8] CrowdFlower's dataset with 7'015 tweets.
Twitter Sentiment Corpus 
[9] by Niek Sanders "consists of 5513 hand-classified tweets".
TASS Corpus 
[10] consists of 70000 tweets in Spanish, annotated with global polarity.
UMICH SI650 - Sentiment Classification 
[11] Twitter corpus described as "the training data contains 7086 sentences, already labeled with 1 (positive sentiment) or 0 (negative sentiment). The test data contains 33052 sentences that are unlabeled."

Several researchers have crawled IMDb and downloaded movie reviews text and star rating.[8]

[edit] Affective word lists

Sentiment analysis may use word lists annotated for their arousal and their valence, i.e., whether they are positive or negative. Some word lists are listed and commented on in setion 7.3 of the Pang/Lee monograph. Some of the word lists are:

Affective Norms for English Words (ANEW
An English word list constructed by Bradley and Lang[9] and available from University of Florida [12]. There are 1034 words rated for valence, arousal and dominance. It is "solely for use in academic, not-for-profit research at recognized educational institutions". (It is associated with a program by Greg Siegel, ). SPANEW, Spanish ANEW[10]. DANEW, Dutch ANEW.[11].
An English word list with 2477 words (previously 1468 words) constructed by Finn Årup Nielsen for sentiment analysis of Twitter messages (while also used for other texts) and is available with a share-alike license: [13]. Each word is rated by a valence value from -5 to +5. A evaluation of the word list was described in A new ANEW: evaluation of a word list for sentiment analysis in microblogs and the word list was used in Good friends, bad news - affect and virality in Twitter. For a simple example of using the list with Python see [14].
Balanced Affective Word List ("original")
An older version of the Balanced Affective Word List with 277 English words and associated with the program of Greg Siegle, (The original URL has gone Internet Archive version) The valence coded is 1=positive 2=negative 3=anxious 4=neutral. The words were aggregated from two lists: one list collected by Greg Siegle and Mark Shibley and another list of 240 words by Carolyn H. John from the publication Emotionality ratings and free-association norms of 240 emotional and non-emotional words.[12]
Berlin Affective Word List (BAWL) 
A word list of 2'200 German words with emotional valence and imageability.[13] A research project took some of these words as part of the basis for an annotated word list of 300 English words.[14]
Berlin Affective Word List Reloaded (BAWL-R) 
A newer version of BAWL with addition of arousal for words.[15]
Bilingual Finnish Affective Norms 
210 British English and Finnish nouns, including taboo words.[15] [16]
Compass DeRose Guide to Emotion Words 
English emotional words collected by Steven J. DeRose and categorized but without valence or arousal.
Dictionary of Affect in Language (DAL
constructed by Cynthia M. Whissell. A description of it seems to be available as a chapter in the book Emotion: theory, research, and experience (pp. 113-131) with Robert Plutchik and Henry Kellerman as editors and published by Academic Press. One Web services uses DAL: [16] The list has also been called "Whissell's Dictionary of Affect in Language" (WDAL).[17]
General Inquirer 
has several dictionaries, e.g., a "positive" list with 1'915 words and one 'negative' list with 2'291 words.
Hu-Liu opinion lexicon (HL)
around 6800 words in a negative and a positive list. [17]. Collected over the years starting with the papers Mining and summarizing customer reviews.
A large word list
Leipzig Affective Norms for German (LANG) 
"A list of 1,000 German nouns that have been rated for emotional valence, arousal, and concreteness" .[18]
Linguistic Inquiry and Word Count [18] Commercial ($90) word lists with computer program to extract basic counts / ratios. Contains dictionaries for English, German, Spanish, Dutch, and Italian. Extracts around 60 different word categories, including "positive emotions" and "negative emotions". The program can be purchased; their site also allows you to analyze texts one by one.
Loughran and McDonald Financial Sentiment Dictionaries 
[19] Dictionaries with negative, poisitive, uncertainty, litigious and modal words especially for financial texts by Tim Loughran and Bill McDonald. The lists are "Not for commercial use without authorization". Described in When is a liability not a liability? textual analysis, dictionaries, and 10-Ks.
Dutch words with valence, dominance and arousal.[19]
NRC Emotion Lexicon 
(EmoLex) A large word list constructed by Saif M. Mohammad through Amazon Mechanical Turk.
NRC Hashtag Sentiment Lexicon 
[20] large list of words created from 775,310 tweets with a positive or negative hash tag.[20]
NTU Sentiment Dictionary 
(Listed by Pang and Lee)
Luis von Ahn's Offensive/Profane Word List 
[21]. "1,300+ English terms that could be found offensive."
OpinionFinder's Subjectivity Lexicon 8221 words scored for polarity (positive or negative), subjectivity. Distinguishes between POS-tag.[21] It is sometimes referred to at MPQA.
The Pattern Python package has the sentiment.xml included which 2888 words scored for polarity, subjectivity, intensity and reliability. The words are mostly adjectives. There are no nouns.
Sentiment140 Lexicon 
[22] Large list built from tweets.[22]
Subjectivity and Sentiment Analysis of Social Media Arabic by Muhammad Abdul-Mageed and Mona T. Diab. Not clear whether it is available. See also Toward building a large-scale Arabic sentiment lexicon.
[23] It "consists of 5,496 words and 2,190 synsets labeled with an emotion from a set of 14 emotional categories"[23]
Assigns 3 sentiment scores for WordNet synset: positivity, negativity, objectivity. The license has been "only for research, non-profit purposes",[24] but now changed to CC-BY-SA.[25] The 3.0 version was described in 2010.[26] See also Python interface at
Taboada and Grieve's Turney adjective list 
(listed in Pang and Lee) available through Yahoo! sentimentAI group.
[24] 13,915 English words with valence, arousal and dominance collected with Amazon Mechanical Turk. The word list is licensed under CC-BY-NC-ND.[28]
An English list.[29] Originally "freely available, for research purposes".[30] Now part of WordNet Domains which is distributed under CC-BY.[31] See and

For comparison of the different word lists see Enhancing lexicon-based review classification by merging and revising sentiment dictionaries and A new ANEW: evaluation of a word list for sentiment analysis in microblogs.

[edit] Tools

  1. AFINN, A affective wordlist. Code exists in several programming languages
  2. Pattern, Python library.
  3. sasa-tool, [25], USC SAIL/AIL sentiment analysis tool.
  4. Semantria. Commercial service
  5. SentiStrength
  6. Senti by CrowdFlower [26], commercial crowd-based service
  7. Umigon, by Clement Levallois. See also Umigon: sentiment analysis on tweets based on terms lists and heuristics.

See also list by Seth Grimes in What are the most powerful open-source sentiment-analysis tools?

[edit] Online services

  1. - a browser plug-in that automatically analyzes social media content (including sentiment)
  4. Sentiment-topic mining
  7. — does this work?
  1. ConveyAPI As of 2013 June seemingly Vaporware-ish: "currently offering free a evaluation of the ConveyAPI to select companies." [27]
  2. Bitext, demo available at

[edit] Evaluation

Performance of a sentiment analysis system may depend on corpus and of annotation. Annotation may be a sentiment strength for each text or an categorical variable, 2-class: positive/negative, 3-class: positive/negative/neutral or 4-class: positive/negative/both/neutral.[32]

Sentiment analysis performance of humans have ben reported to be 82-90%.[33]

[edit] Events

  1. Workshop on sentiment and subjectivity in text COLING ACL 2006
    1. Extracting opinions, opinion holders, and topics expressed in online news media text
  2. First International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion Measurement, 2009.
  3. 1st Workshop on Opinion Mining and Sentiment Analysis, 2009.
  4. ICDM11 workshop on opinion mining and sentiment analysis

[edit] Researchers

  1. Bing Liu
  2. Finn Årup Nielsen, AFINN
  3. Mike Thelwall, SentiStrength
  4. Peter D. Turney, unsupervized sentiment analysis
  5. Saif M. Mohammad, NRC Emotion Lexicon, SemEval winner.
  6. ...

[edit] Papers

[edit] Reviews

  1. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications (2015)
  2. Opinion mining and sentiment analysis
  3. Sentiment analysis: detecting valence, emotions, and other affectual states from text (2015)

[edit] Original

  1. A new ANEW: evaluation of a word list for sentiment analysis in microblogs
  2. Building lexicon for sentiment analysis from massive collection of HTML documents
  3. Combining social network analysis and sentiment analysis to explore the potential for online radicalisation
  4. Crowd sentiment detection during disasters and crises
  5. Determining the sentiment of opinions
  6. Domain specific affective classification of documents
  7. Good friends, bad news - affect and virality in Twitter
  8. Large-scale sentiment analysis for news and blogs
  9. Leveraging textual sentiment analysis with social network modeling
  10. Micro-blogging sentiment detection by collaborative online learning
  11. Mining the peanut gallery: opinion extraction and semantic classification of product reviews
  12. Negative emotions accelerating users activity in BBC Forum
  13. Pattern for Python
  14. Quantitative analysis of bloggers collective behavior powered by emotions
  15. Robust sentiment detection on Twitter from biased and noisy data
  16. Semi-supervised recursive autoencoders for predicting sentiment distributions
  17. Sentiment analysis with global topics and local dependency
  18. Sentiment in short strength detection informal text
  19. Tweetin' in the rain: exploring societal-scale effects of weather on mood
  20. Using emoticons to reduce dependency in machine learning techniques for sentiment classification
  21. Using verbs and adjectives to automatically classify blog sentiment

[edit] See also

  1. Sentiment-based text segmentation

[edit] External link

  1. Machine Learning Lecture 2: Sentiment Analysis (text classification), YouTube video.

[edit] References

  1. Bo Pang, Lillian Lee (2008). "Opinion mining and sentiment analysis". Foundations and Trends in Information Retrieval 2(1-2): 1-135. [1].
  2. Quaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, ChengXiang Zhai(2007). "Topic sentiment mixture: modeling facets and opinions in weblogs".
  3. Words with attitude
  4. The measurement of meaning
  5. I. Ounis, C. MacDonald, I. Soboroff. "Overview of the trec-2008 blog trac". The TREC 2008 Proceedings.
  6. Sentence and expression level annotation of opinions in user-generated discourse
  7. Recursive deep models for semantic compositionality over a sentiment treebank
  8. Domain specific affective classification of documents
  9. Margaret M. Bradley, Peter J. Lang. (1999). Affective norms for English words (ANEW). Gainesville, FL. The NIMH Center for the Study of Emotion and Attention, University of Florida.
  10. The Spanish adaptation of ANEW (affective norms for English words)
  11. The QWERTY effect: how typing shapes the meanings of words
  12. Carolyn H. John (1998). "Emotionality ratings and free-association norms of 240 emotional and non-emotional words". Cognition & Emotion 2(1): 49-70. doi: 10.1080/02699938808415229.
  13. Melissa L.-H. Võ, Arthur M. Jacobs, Markus Conrad (2006). "Cross-validating the Berlin Affective Word List". Behavior Research Methods 38(4): 606-609.
  14. Evaluation of lexical and semantic features for English emotion words
  15. Melissa L.-H. Võ, Markus Conrad, Lars Kuchinke, Karolina Urton, Markus J. Hofmann, Arthur M. Jacobs (2009). "The Berlin Affective Word List Reloaded (BAWL-R)". Behavior Research Methods 41: 534-538. doi: 10.3758/BRM.41.2.534.
  16. Tiina M. Eilola, Jelena Havelka (2010). "Affective norms for 210 British English and Finnish nouns". Behavior Research Methods 42(1): 134-140. PMID: 20160293.
  17. Let me listen to poetry, let me see emotions
  18. P. Kanske, S. A. Kotz (2010). "Leipzig Affective Norms for German: A reliability study". Behav Res Methods 42(4): 987-991. PMID: 21139165.
  19. Norms of valence, arousal, dominance, and age of acquisition for 4300 Dutch words
  20. NRC-Canada: building the state-of-the-art in sentiment analysis of tweets
  21. Theresa Wilson, Janyce Wiebe, Paul Hoffmann(2005). "Recognizing contextual polarity in phrase-level sentiment analysis". Proc. of HLT-EMNLP-2005.
  22. NRC-Canada: building the state-of-the-art in sentiment analysis of Tweets
  23. SentiSense: an easily scalable concept-based affective lexicon for sentiment analysis
  24. Andrea Esuli, Fabrizio Sabastiani. "SentiWordNet: a publicly available lexical resource for opinion mining".
  26. Stefano Baccianella, Andrea Esuli, Fabrizio Sebastiani(2010). "SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining". Pages 2200-2204 in In Proceedings of LREC-10, 7th Conference on Language Resources and Evaluation.
  27. Gökçay D., Smith MA., "TÜDADEN:Türkçede Duygusal ve Anlamsal Değerlendirmeli Norm Veri Tabanı", Proceedings of Brain-Computer Workshop 4, 2008, Istanbul.
  28. Norms of valence, arousal, and dominance for 13,915 English lemmas
  29. WordNet-Affect: an affective extension of WordNet
  30. A. Valitutti, C. Strapparava, O. Stock (2004). "Developing affective lexical resources". PsychNology Journal 2(1): 61-83. [2].
  32. Recognizing contextual polarity in phrase-level sentiment analysis
  33. Recognizing contextual polarity in phrase-level sentiment analysis

[edit] Other

  1. Carlo Strapparava, Rada Mihalcea(2008). "Learning to identify emotions in text". Pages 1556-1560 in PSAC '08: Proceedings of the 2008 ACM symposium on Applied computing. doi: [28]
  2. Understanding sentiment of people from news articles: temporal sentiment analysis of social events
Personal tools