Discussion of Remaining Sections of TextRank
Section 4 of TextRank: Bringing Order Into Texts (http://www.cs.unt.edu/~rada/papers/mihalcea.emnlp04.pdf) by Mihalcea and Tarau continues the paper with another application of the TextRank algorithm. Taking the keyword extraction process to the next level, Mihalcea applies the algorithm to Sentence Extraction. The goal of Sentence Extraction is to identify the sentences of a document that best represent the document, i.e. those sentences that summarize the document. For sentence extraction, the graph nodes are sentences and the edges are determined by using a similarity measure as a function of how much content they have in common. Specifically, the similarity measure is defined by how many tokens are in common between the two sentences. Similar to keyword extraction, filters can be applied so that only certain tokens are used when calculating overlap. Once the graph structure is in place, the convergence algorithm is executed and the top ranking nodes are returned. TextRank does quite well in this task, and section 4.2 provides a discussion of the results. While it isn’t the best performing of the comparison, it is in the top 5. Add in that it is fully unsupervised process and you have a pretty compelling argument for adoption of the approach.
Section 5 has a discussion of why TextRank works from a conceptual point of view. The idea is that nodes end up recommending other nodes based on the strength of the connections between them, much how people build up structural understanding of concepts.
The biggest strength of TextRank is that it is almost completely portable and requires no knowledge of the domain other than how you want to model your nodes and relations. No training data is needed.
TextRank, keyword extraction, sentence extraction, document summarization, text summarization, PageRank, Mihalcea, Tarau, Paper of the Week
Popularity: 2% [?]
Technorati Tags: TextRank, keyword extraction, sentence extraction, document summarization, text summarization, PageRank, Mihalcea, Tarau, Paper of the Week

