<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Paper of the Week &#187; text summarization</title>
	<atom:link href="http://www.paperoftheweek.com/category/text-summarization/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.paperoftheweek.com</link>
	<description>Read. Learn. Discuss.</description>
	<lastBuildDate>Tue, 14 Aug 2007 01:35:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Discussion of LexRank by Erkan and Radev</title>
		<link>http://www.paperoftheweek.com/2007/02/22/discussion-of-lexrank-by-erkan-and-radev/</link>
		<comments>http://www.paperoftheweek.com/2007/02/22/discussion-of-lexrank-by-erkan-and-radev/#comments</comments>
		<pubDate>Fri, 23 Feb 2007 02:03:40 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[bibliometrics]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Graph Theory]]></category>
		<category><![CDATA[linear algebra]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[text summarization]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/02/22/discussion-of-lexrank-by-erkan-and-radev/</guid>
		<description><![CDATA[POTW 2/18/07: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization The LexRank paper by Erkan and Radev is another PageRank/Graph Theory based approach to working with text, this time applied to the task of summarization. Key parts of sections 1 and 2 discuss the general problem of corpus-based summarization.  Unlike TextRank by Mihalcea, Erkan [...]]]></description>
			<content:encoded><![CDATA[<p>POTW 2/18/07: <a href="http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html">LexRank: Graph-based Lexical Centrality as Salience in Text Summarization</a></p>
<p>The LexRank paper by Erkan and Radev is another PageRank/Graph Theory based approach to working with text, this time applied to the task of summarization.</p>
<p>Key parts of sections 1 and 2 discuss the general problem of corpus-based summarization.  Unlike TextRank by Mihalcea, Erkan and Radev are interested in summarizing groups of documents (although it can be applied to individual documents as well.)  They propose a &#8220;centroid-based&#8221; model whereby they try to determine the most important sentences in a group of documents.  In this approach, the sentences that have more words near to the center of the cluster of documents are deemed to be more important, thus giving higher weight.  The authors in section 3 then layout their methods for determining the similarity between two sentences using a TF-IDF formula (think vector space search).  From these similarity scores, a matrix can be constructed tallying all of the similarities between all the sentences.</p>
<p>After doing some fancy linear algebra, they come to define LexRank, which is really just Google&#8217;s PageRank applied to this particular problem.  Sections 3.2 do provides all the gory details on the math and how the problem can be described in terms of stochastic matrices and markov chains, which all seems to boil down to the lovely formula that Brin and Page gave us.   This section also lays out in more detail the iterative algorithm for calculating PageRank/TextRank/LexRank/YouFillInTheBlankRank.</p>
<p>Taking this matrix, we put it into a graph and weight the edges according to our similarity and then run the iterative algorithm over it.</p>
<p>I know this sounds like skimping on my part, but the rest of the paper is just about the experiments they ran and I don&#8217;t feel like rehashing those.  Guess what?  It did pretty well.  One thing that is interesting is the availability of the MEAD summarization system, available at <a href="http://www.summarization.com/">http://www.summarization.com/</a>.</p>
<p>Like most of the graph based approaches we have seen so far, it does well with noisy data, which is one of the big selling points for me.</p>
<p>Next week, on to something new: question answering!  See you then!</p>
<p>LexRank, Radev, Erkan, PageRank, Brin, Page, Google, linear algebra, graph theory</p>
<p>Technorati Tags: <a href="http://technorati.com/tag/LexRank" rel="tag">LexRank</a>, <a href="http://technorati.com/tag/Radev" rel="tag">Radev</a>, <a href="http://technorati.com/tag/Erkan" rel="tag">Erkan</a>, <a href="http://technorati.com/tag/PageRank" rel="tag">PageRank</a>, <a href="http://technorati.com/tag/Brin" rel="tag">Brin</a>, <a href="http://technorati.com/tag/Page" rel="tag">Page</a>, <a href="http://technorati.com/tag/Google" rel="tag">Google</a>, <a href="http://technorati.com/tag/linear+algebra" rel="tag">linear algebra</a>, <a href="http://technorati.com/tag/graph+theory" rel="tag">graph theory</a></p>]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/02/22/discussion-of-lexrank-by-erkan-and-radev/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 2/18/07: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization</title>
		<link>http://www.paperoftheweek.com/2007/02/18/potw-21807-lexrank-graph-based-lexical-centrality-as-salience-in-text-summarization/</link>
		<comments>http://www.paperoftheweek.com/2007/02/18/potw-21807-lexrank-graph-based-lexical-centrality-as-salience-in-text-summarization/#comments</comments>
		<pubDate>Sun, 18 Feb 2007 16:57:41 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Graph Theory]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[text summarization]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/02/18/potw-21807-lexrank-graph-based-lexical-centrality-as-salience-in-text-summarization/</guid>
		<description><![CDATA[The POTW for 2/18/07 is another graph-based approach, this time by another leader in this area, Dragomir Radev.  The paper is LexRank: Graph-based Lexical Centrality as Salience in Text Summarization Enjoy!]]></description>
			<content:encoded><![CDATA[<p>The POTW for 2/18/07 is another graph-based approach, this time by another leader in this area, Dragomir Radev.  The paper is</p>
<p><a href="http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html">LexRank: Graph-based Lexical Centrality as Salience in Text Summarization</a></p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/02/18/potw-21807-lexrank-graph-based-lexical-centrality-as-salience-in-text-summarization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

