<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Paper of the Week &#187; email</title>
	<atom:link href="http://www.paperoftheweek.com/category/email/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.paperoftheweek.com</link>
	<description>Read. Learn. Discuss.</description>
	<lastBuildDate>Tue, 14 Aug 2007 01:35:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>POTW 2/11/07: Discussion of sections 5-8 of Minkov, et. al</title>
		<link>http://www.paperoftheweek.com/2007/02/16/potw-21107-discussion-of-sections-5-8-of-minkov-et-al/</link>
		<comments>http://www.paperoftheweek.com/2007/02/16/potw-21107-discussion-of-sections-5-8-of-minkov-et-al/#comments</comments>
		<pubDate>Fri, 16 Feb 2007 11:54:09 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Graph Theory]]></category>
		<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[disambiguation]]></category>
		<category><![CDATA[email]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/02/16/potw-21107-discussion-of-sections-5-8-of-minkov-et-al/</guid>
		<description><![CDATA[POTW 2/11/07: Contextual Search and Name Disambiguation in Email Using Graphs Discussion of Sections 5 through 8 The remaining sections of this paper are discussions of two applications of the algorithms plus the body of related work and conclusions that can be drawn from the work. Section 5 gives the details on what corpora were [...]]]></description>
			<content:encoded><![CDATA[<p>POTW 2/11/07: <a href="http://www.cs.cmu.edu/%7Eeinat/sigir-06.pdf">Contextual Search and Name Disambiguation in Email Using Graphs</a></p>
<p><strong>Discussion of Sections 5 through 8</strong></p>
<p>The remaining sections of this paper are discussions of two applications of the algorithms plus the body of related work and conclusions that can be drawn from the work.  Section 5 gives the details on what corpora were used (Enron, plus an internal email set not publicly available) before proceeding to the task of name disambiguation.</p>
<p>Name disambiguation in email is the task of correlating the mention of a name in an email with the actual person.  While this is fairly straightforward in most cases for people reading their own email, it becomes difficult when reading other&#8217;s email since one may not know all the people the other person does.  Multiply this by a collection of emails filled with nicknames and initials and it is not hard to see why this is difficult.  The task is useful for establishing social networks as well as other applications.  One could easily imagine an automated system that retrieved relevant information about a person mentioned in an email (bio, address, phone, past conversations, etc.) and made it available to the reader for instant access.</p>
<p>The remaining parts of section 5 go into the details of applying the algorithm and the results that are achieved.   The interesting thing in my mind is that the graph often connects different types of nodes and associates different probabilities with the transitions from one node to the other.  Applying the graph walking strategy then leads to the desired results.  Suffice it to say, the new approach performs better than the baseline approach!</p>
<p>Identifying threads of emails is the second application the authors use to demonstrate their capabilities.  Threading is the problem of identifying one or more messages that are related to some chosen email.  Many email systems do a basic job at this by comparing subject lines, esp. those that use the &#8220;RE:&#8221; prefix.  However, we all know people often treat the subject line differently.  Furthermore, people tend to quote the previous messages in the thread differently.  Some use &#8220;&gt;&#8221;, while others use &#8220;|&#8221;, while still others use nothing at all.  Add in inline replies, which is especially common on mailing lists, and you see why the problem becomes difficult.  Section 5.4 lays out the graph walk approach and compares it to TF-IDF IR approach, which, of course it does better, especially when using a machine learning re-ranking approach.</p>
<p>The rest of the paper is on related work and conclusions.  I am glad the authors address the performance in terms of scalability in the conclusions section, as I had my doubts about how well the approach could perform on large amounts of data.  In fact, I find a lot of papers in the NLP realm fail to account for performance, so it is refreshing to see it addressed.</p>
<p>graph theory, email, natural language processing, NLP, information retrieval, IR, threading, name disambiguation</p>
<p>Technorati Tags: <a href="http://technorati.com/tag/graph+theory" rel="tag">graph theory</a>, <a href="http://technorati.com/tag/email" rel="tag">email</a>, <a href="http://technorati.com/tag/natural+language+processing" rel="tag">natural language processing</a>, <a href="http://technorati.com/tag/NLP" rel="tag">NLP</a>, <a href="http://technorati.com/tag/information+retrieval" rel="tag">information retrieval</a>, <a href="http://technorati.com/tag/IR" rel="tag">IR</a>, <a href="http://technorati.com/tag/threading" rel="tag">threading</a>, <a href="http://technorati.com/tag/name+disambiguation" rel="tag">name disambiguation</a></p>]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/02/16/potw-21107-discussion-of-sections-5-8-of-minkov-et-al/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 2/11/07: Discussion of sections 1-4 of Minkov, et. al</title>
		<link>http://www.paperoftheweek.com/2007/02/15/potw-21107-discussion-of-sections-1-4-of-minkov-et-al/</link>
		<comments>http://www.paperoftheweek.com/2007/02/15/potw-21107-discussion-of-sections-1-4-of-minkov-et-al/#comments</comments>
		<pubDate>Thu, 15 Feb 2007 13:40:26 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[disambiguation]]></category>
		<category><![CDATA[email]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/02/15/potw-21107-discussion-of-sections-1-4-of-minkov-et-al/</guid>
		<description><![CDATA[Intro This week we are reading Contextual Search and Name Disambiguation in Email Using Graphs by Einat Minkov, William W. Cohen and Andrew Y Ng, all of Carnegie Mellon University. Like the past few papers, this paper also focuses on how to use graph theory to solve some common NLP papers. Unlike the past few [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Intro</strong></p>
<p>This week we are reading <a href="http://www.cs.cmu.edu/~einat/sigir-06.pdf">Contextual Search and Name Disambiguation in Email Using Graphs</a><br />
by Einat Minkov, William W. Cohen and Andrew Y Ng, all of Carnegie Mellon University.  Like the past few papers, this paper also focuses on how to use graph theory to solve some common NLP papers.  Unlike the past few by Klienberg, Brin and Page and Mihalcea, this one adds a machine learning component and also is much more current, appearing at SIGIR &#8217;06, although I ran across an earlier version first at HLT &#8217;06 in the TextGraphs workshop.</p>
<p><strong>Discussion</strong></p>
<p>Section 1 provides the introduction to the paper and posits that in most current info retrieval applications, the documents being searched are connected to both structured and unstructured items, such as other documents and meta-data.  This linkage suggests a graph based approach similar to PageRank would be useful in finding documents that are similar to a user&#8217;s query.  The author&#8217;s distinguishing factor, to quote is:</p>
<blockquote><p>&#8220;In contrast to this previous work [Brin and Page, others], we consider schemes for propogating similarity across a graph that naturally models a structured dataset like an email corpus&#8230;&#8221;</p></blockquote>
<p>The authors then propose to demonstrate this on two email related tasks: name disambiguation and email threading.</p>
<p>The next section, section  2, provides us with the details of how to construct a graph out of email.  This is much more involved than my first guess, as they not only account for the social networking aspects of the email (to, from, cc, etc.) when creating the graph but they also account for the text in the mail messages.  Table 1 in the paper lays out the 5 different source types they can have for nodes in the graph along with what types of edges and targets can be linked to from the given source type.  This is another differentiation from past approaches, namely they have several different type of nodes and edges.  Additionally, they define the inverse relation for all edge labels, which results in a cyclic graph.  It is not clear to me yet what the benefit of the mixed nodes are in one graph, but I&#8217;m sure that is to come.</p>
<p>To determine the similarity between two nodes in the graph, the authors propose to do a &#8220;lazy walk&#8221; of the graph.  Whereas Brin and Page and others propose a neverending graph walk, Minkov incorporates a probability that the walk can be halted at any given step in addition to the probability of transitioning to another node.   Sections 3.1 and 3.2 have the math to support this.  The walk, the math shows, can be boiled down to  doing matrix multiplication on a set of sparce matrices.  Finally, queries  are described as  a distribution over the nodes (think vector just like in the tradition vector space, at least I think this is right) plus a statement about what type of output is desired.  The result then is a list of nodes of this type.  Section 3.3 lays out the case that this formulation is very similar to the traditional TF-IDF approach.  I guess, though, that the added benefit here is the ability to get other output types.</p>
<p>Section 4 describes how a learning approach can be used to rerank the nodes.  I&#8217;m not sure I understand the details just yet, so it probably is worth reading the reference given in the paper that goes into more depth.  Essentially, I think they are just using a classifier to improve upon the results they get from the initial search.</p>
<p>graph theory, machine learning, email, disambiguation, NLP, natural language processing, Paper of the Week, POTW, information retrieval, PageRank</p>
<p>Technorati Tags: <a href="http://technorati.com/tag/graph+theory" rel="tag">graph theory</a>, <a href="http://technorati.com/tag/machine+learning" rel="tag">machine learning</a>, <a href="http://technorati.com/tag/email" rel="tag">email</a>, <a href="http://technorati.com/tag/disambiguation" rel="tag">disambiguation</a>, <a href="http://technorati.com/tag/NLP" rel="tag">NLP</a>, <a href="http://technorati.com/tag/natural+language+processing" rel="tag">natural language processing</a>, <a href="http://technorati.com/tag/Paper+of+the+Week" rel="tag">Paper of the Week</a>, <a href="http://technorati.com/tag/POTW" rel="tag">POTW</a>, <a href="http://technorati.com/tag/information+retrieval" rel="tag">information retrieval</a>, <a href="http://technorati.com/tag/PageRank" rel="tag">PageRank</a></p>]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/02/15/potw-21107-discussion-of-sections-1-4-of-minkov-et-al/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 2/11/07: Contextual Search and Name Disambiguation in Email Using Graphs by Minkov, et. al</title>
		<link>http://www.paperoftheweek.com/2007/02/11/potw-21107-contextual-search-and-name-disambiguation-in-email-using-graphs-by-minkov-et-al/</link>
		<comments>http://www.paperoftheweek.com/2007/02/11/potw-21107-contextual-search-and-name-disambiguation-in-email-using-graphs-by-minkov-et-al/#comments</comments>
		<pubDate>Sun, 11 Feb 2007 18:39:04 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Graph Theory]]></category>
		<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[disambiguation]]></category>
		<category><![CDATA[email]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/02/11/potw-21107-contextual-search-and-name-disambiguation-in-email-using-graphs-by-minkov-et-al/</guid>
		<description><![CDATA[This week&#8217;s paper is another graph theory (do you sense a trend?) by Minkov, et. al title &#8220;Contextual Search and Name Disambiguation in Email Using Graphs&#8221; and appeared in SIGIR &#8217;06. email, graph theory, disambiguation Technorati Tags: email, graph theory, disambiguation]]></description>
			<content:encoded><![CDATA[<p>This week&#8217;s paper is another graph theory (do you sense a trend?) by Minkov, et. al title &#8220;<a href="http://www.cs.cmu.edu/~einat/sigir-06.pdf">Contextual Search and Name Disambiguation in Email Using Graphs</a>&#8221; and appeared in SIGIR &#8217;06.</p>
<p>email, graph theory, disambiguation</p>
<p>Technorati Tags: <a href="http://technorati.com/tag/email" rel="tag">email</a>, <a href="http://technorati.com/tag/graph+theory" rel="tag">graph theory</a>, <a href="http://technorati.com/tag/disambiguation" rel="tag">disambiguation</a></p>]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/02/11/potw-21107-contextual-search-and-name-disambiguation-in-email-using-graphs-by-minkov-et-al/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
