Discussion of Section 3 of TextRank
Section 3 of TextRank: Bringing Order Into Texts covers the first application of the TextRank approach proposed in section 2. The authors have chosen keyword extraction to demonstrate the capabilities of the approach. Keyword extraction is the problem of determining the keywords that best describe a document. It can be thought of as a precursor to document summarization, I guess. Those familiar with academic papers might imagine a system that automatically assigns keywords to a paper. In the blogosphere, you can imagine how nice it would be to not have to pick your own keywords, just let the system pick them for you (hmmm, that gives me an idea… anyone want to write a wordpress plugin?)
With the problem defined, Mihalcea describes the preprocessing steps for building the graph. Namely, words are tokenized and part of speech is assigned. These words are then added as nodes to the graph (note, they optionally can filter out certain types of words, such as all adverbs or all nouns) and edges are added between two nodes if the two words co-occur within some variable number of words, usually somewhere between 2 and 10. Once the graph is setup, the convergence algorithm is run until some approximated error threshold is met and the results are ranked and returned.
Performance wise, the algorithm does quite well, beating the best reported supervised result of the time. It should be noted that this approach is completely unsupervised, which is a real benefit. In my mind, if you can get better results without supervision, you have a winner. The rest of section 3 is a discussion of the TextRank results versus other comparable systems. See the paper for details.
Next post we will discuss the approach applied to Sentence extraction for automatic summarization, and we will end the week with the remainder of the paper on Friday.
TextRank, PageRank, Google, Mihalcea, Keyword Extraction, Natural Language Processing, NLP, machine learning, algorithms, computer science
Popularity: 6% [?]
Technorati Tags: TextRank, PageRank, Google, Mihalcea, Keyword Extraction, Natural Language Processing, NLP, machine learning, algorithms, computer science


redsandred wrote,
iam trying to implement it.iam using JGraphT library for making graph.but iam not able to put TextRank in it.can u please help.
Link | March 6th, 2007 at 12:41 am
grant.ingersoll wrote,
Can you post more on where you are having the problem? Is it in building the graph or doing the actual iterative algorithm?
Link | March 6th, 2007 at 7:14 am
redsandred wrote,
iam doing it on weighted.actually iam doing project on text summarization.i almost got seperated appropriate words from paragraph,but i couldn’t find how to get weights for graph.can u please help for this problem.iam implementing a paper by mihalcea
“TextRank:bringing order in text”.
Link | March 6th, 2007 at 11:00 pm
grant.ingersoll wrote,
I assume you are referring to the weights in the W(S(V[i])) formula on page 2. These weights are per application basis. I guess I would start with a weight of 1 to get things working. Then, if you are doing co-occurrence calculations for Keyword matching, perhaps you would consider using the distance between words as the weight. Another option would be to assign more weight to terms of a particular class of words, like Nouns or verbs. That part really is up to you.
Link | March 7th, 2007 at 2:27 pm
redsandred wrote,
can i get the code for this.iam applying for particular class of ,like nouns and verbs.
Link | March 7th, 2007 at 3:31 pm
grant.ingersoll wrote,
I don’t have code for that, as I said, it really is up to your application. Personally, I would try something out. See what happens when you weight nouns twice as much as other words. I think the Mihalcea paper does a pretty decent job of explaining the different filtering mechanisms they use. She (Mihalcea) has also published other papers on TextRank which might give you more insight. The weight factors are assigned as you build the graph
If you want me to look at some specific code that you wrote, I can do that at some point.
Good luck!
Link | March 7th, 2007 at 9:15 pm
redsandred wrote,
ok ,thank u for ur help.i will contact u if i had a problem.
Link | March 7th, 2007 at 10:36 pm
redsandred wrote,
can u send me what u had done on this key word extraction by using textrank.please help me by sending ur code.iam doing it for my class project,but i coud’nt get through well.so please help me to complete it.thanku.
Link | March 8th, 2007 at 12:33 pm
redsandred wrote,
package org.jgrapht.summ;
import java.net.*;
import org.jgrapht.*;
import org.jgrapht.graph.*;
/**
*
* @author 03bce105
*/
public class Summ {
/** Creates a new instance of Summ */
public Summ() {
}
public static void main(String [] args)
{
WeightedGraph stringGraph = createStringGraph();
// note undirected edges are printed as: {,}
System.out.println(stringGraph.toString());
}
private static WeightedGraph createStringGraph()
{
WeightedGraph g =
new WeightedGraph(Edge.class);
String v1 = “v1″;
String v2 = “v2″;
String v3 = “v3″;
String v4 = “v4″;
// add the vertices
g.addVertex(v1);
g.addVertex(v2);
g.addVertex(v3);
g.addVertex(v4);
// add edges to create a circuit
Edge e1=g.addEdge(v1, v2);
Edge e2=g.addEdge(v2, v3);
Edge e3=g.addEdge(v3, v4);
Edge e4=g.addEdge(v4, v1);
g.setEdgeWeight(e1,3.0);
g.setEdgeWeight(e2,5.0);
g.setEdgeWeight(e3,7.0);
g.setEdgeWeight(e4,8.0);
return g;
}
}
first i gave 4 words and their weights as input,and icoudn’t get graph for it.see my code and help me.
Link | March 9th, 2007 at 12:27 am
srinivaskurma wrote,
hi sir,
me too imlementing same paper( TextRank: Bringing Order Into Texts). sir i am unable do it.. as i have to submit code of that paper on 16th of this month..i am doing that project using JGraphT.if i wont submit code of hat paper i cant complete my degree. sir its my humble request pls send some code to complete my degree.
as my input of my project is some text paragraph and output is important keywords..based on above input and output i need some code which will work.
sir if i wont submit some code they will fail me in this project and unable to complete my degree…
pls reply soon sir… thanking you sir
Link | March 9th, 2007 at 12:55 am
grant.ingersoll wrote,
Srinivas,
I don’t know what school you go to or what kind of standards they have, but I very much suggest you buckle down and figure it out. I’m more than happy to answer questions you have (and you should ask your professor, too, since that is partially what they are paid for), as I have done for others, but I am not going to blindly send you my code (which I can’t anyway) just so you can get a degree. What’s the point of getting a degree if you don’t actually learn anything? I surely wouldn’t want to hire you. I know, you probably think, like so many others, if I can just get that piece of paper, I’ll be all set, but in reality, people will see right through you the minute you are asked to do anything real. The way I see it, the 16th is more than enough time for you to figure it out. It is maybe, tops, 2-3 days worth of coding. I suggest you re-read the paper and then start asking appropriate questions to your professor.
Link | March 9th, 2007 at 8:26 am
grant.ingersoll wrote,
Hi Sandeep,
I’m not sure what you are asking here (http://www.paperoftheweek.com/2007/01/31/discussion-of-section-3-of-textrank/#comment-248). It looks like you have successfully added the 4 vertices and edges, but I’m not sure what that means. Now you need to run them through the ranking algorithm.
Link | March 9th, 2007 at 8:30 am
Adriano wrote,
I’m a graduate student and is working on a Web Page Recommender System learning, based on user’s bookmark file. I’m interested in TextRank for Keyword Extraction and would highly appreciate if you could help me regarding where may find algorithm’s pseudocode. Actually, I have studied this paper: “TextRank: Bringing Order into Texts. In Proc. of EMNLP 2004, Barcelona, Spain, July 2004.”. Thanks for answer.
Link | April 27th, 2007 at 4:48 am
grant.ingersoll wrote,
I have some pseudocode at http://www.paperoftheweek.com/2007/02/03/addendum-to-textrank/
Link | April 27th, 2007 at 7:17 am
Adriano wrote,
Thanks for the link. I wanted to ask if the algorithm TextRank can be implemented in Javascript for one extension of firefox?
Link | April 27th, 2007 at 7:56 am