3 Feb 2018

Turney's Algorithm

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

Link to the paper

The paper can be found here

What is this about?

Put in simple words: The paper presents a way on how you can classify text without any annotated data (i.e. unsupervised) and some minimal domain knowledge. The paper uses the domain of reviews, where the domain knowledge is knowing excellent is positive while poor is negative sentiment.

Basic Idea

Computing the semantic closeness of “important phrases” in the text to some pre-defined ideas - here words ‘excellent’ and ‘poor’ - can be used to determine the class of the review text.

"The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives  or  adverbs."

Measure of semantic orientation

Semantic orientation of a given phrase ‘p’

SemanticOrientation(Phrase p) = PMI(p, 'excellent') - PMI(p, 'poor')

Semantic Orientation of a review

SemanticOrientation(Review r) = Average( SemanticOrientation(Phrase p) )   for all Phrases p in the Review r

Steps in the algorithm

PoS tagging to identify and extract phrases/bi-grams containing adverb or adjectives.
Compute Semantic Orientation of each of the extracted phrases.
Compute Average Semantic Orientation of the review
Classify a review as positive if its average semantic orientation comes greater than 0, else negative

How to extract the phrases?

First we run a Part-of-speech tagger to tag the various words in a review. To extract the phrases following patterns can be looked for, where the symbols JJ, NN, RB mean the usual PoS.

phrase_patterns

What is PMI - Pointwise Mutual Information

PMI, also known as Pointwise Mutual Information, between two words is a measure of gauging their co-occurance.

Mathematically, it is:

pmi_def

Mutual Information between two random variables

MI (often written as I), or mutual information, between two random variables X and Y is a measure of intuitively understanding - by knowing X, how much do I know about random variable Y.

Mathematically it is given as:

mi_def

This, I guess, makes clear why pointwise mutual information had the word pointwise in it.

What is PMI-IR?

PMI-IR uses Pointwise Mutual Information(PMI) and Information Retieval(IR) to measure the similarity of pairs of words or phrases. PMI-IR estimates PMI by issuing queries to a search engine (hence the IR in PMI-IR) and noting
the number of hits (matching documents).
Note: AltaVista was used for this paper since it allows NEAR operator constraints

Let hits(query) be the number of hits returned, given the query. Then the estimate of SO can be given as follows:

so_def

Why use this algorithm?

Unlike the Hatzivassiloglou and McKeown’s method, which is designed for isolated adjectives, this method uses phrases containing adjectives and adverbs. This is advantageous because, although an isolated adjective may indicate subjectivity, it lacks contexts to truly determine the semantic orientation. (Compare unpredictable in unpredictable steering - is negative but in unpredictable plot - is positive). Hence, this might have a better performance on some datasets.

Discussion

The paper illustrates a very simplistic and straightforward startegy of classifying reviews in an unsupervised manner. This algorithm is easy to estimate and may quickly some baseline or even good results for your dataset. It also, doesn’t face the problem of sparsity that general bi-gram models face because of its treatment of features. It doesn’t have to maintain a count or anything for the features and uses a search engine to compute the PMI between two words. Can these kinds of methods be used to avoid the sparisity problem that n-gram models in general face and still be able to unleash their power? Something we might look for in the future.

Visitors:

Stats:

0 comments