<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Paper of the Week &#187; Algorithms</title>
	<atom:link href="http://www.paperoftheweek.com/category/computer-science/algorithms/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.paperoftheweek.com</link>
	<description>Read. Learn. Discuss.</description>
	<lastBuildDate>Tue, 14 Aug 2007 01:35:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>POTW 6/24/07: &#8220;Support-Vector Networks&#8221; by Cortes and Vapnik</title>
		<link>http://www.paperoftheweek.com/2007/06/25/potw-62407-support-vector-networks-by-cortes-and-vapnik/</link>
		<comments>http://www.paperoftheweek.com/2007/06/25/potw-62407-support-vector-networks-by-cortes-and-vapnik/#comments</comments>
		<pubDate>Mon, 25 Jun 2007 18:27:22 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[SVM]]></category>
		<category><![CDATA[Statistical Approach]]></category>
		<category><![CDATA[Text Categorization]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[support vector machines]]></category>
		<category><![CDATA[text mining]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/06/25/potw-62407-support-vector-networks-by-cortes-and-vapnik/</guid>
		<description><![CDATA[Long paper this week, but it is the original on Support Vector Machines: Support-Vector Networks by Cortes and Vapnik.  Given my schedule, I may spread this out over two weeks.]]></description>
			<content:encoded><![CDATA[<p>Long paper this week, but it is the original on Support Vector Machines: <a href="http://citeseer.ist.psu.edu/rd/0%2C500489%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/23317/http:zSzzSzwww.research.att.comzSz%7EcorinnazSzpaperszSzsupport.vector.pdf/cortes95supportvector.pdf">Support-Vector Networks</a> by Cortes and Vapnik.  Given my schedule, I may spread this out over two weeks.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/06/25/potw-62407-support-vector-networks-by-cortes-and-vapnik/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 6/11/07: Discussion of &#8220;A Sequential Algorithm for Training Text Classifiers&#8221; by Lewis and Gale</title>
		<link>http://www.paperoftheweek.com/2007/06/17/potw-61107-discussion-of-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/</link>
		<comments>http://www.paperoftheweek.com/2007/06/17/potw-61107-discussion-of-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/#comments</comments>
		<pubDate>Mon, 18 Jun 2007 03:58:12 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[Statistical Approach]]></category>
		<category><![CDATA[Text Categorization]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[naive bayes]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/06/17/potw-61107-discussion-of-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/</guid>
		<description><![CDATA[In &#8220;A Sequential Algorithm for Training Text Classifiers&#8221; by David D. Lewis and William Gale, the authors put forth a new (at the time) method training text classifiers using an approach they call &#8220;uncertainty sampling&#8221; Section 1 outlines the problem of training, namely obtaining a good sample of text to be labeled for the trainer. [...]]]></description>
			<content:encoded><![CDATA[<p>In &#8220;A Sequential Algorithm for Training Text Classifiers&#8221; by David D.<br />
Lewis and William Gale, the authors put forth a new (at the time)<br />
method training text classifiers using an approach they call<br />
&#8220;uncertainty sampling&#8221;</p>
<p>Section 1 outlines the problem of training, namely obtaining a good<br />
sample of text to be labeled for the trainer.  After disposing of<br />
several other methods of garnering samples (random, relevance<br />
feedback based), Lewis and Gale introduce an iterative approach for<br />
manually labeling examples.</p>
<p>Section 2 then discusses the benefits of &#8220;learning by query&#8221; in<br />
theory, namely the possibility of reducing the error rate very<br />
quickly in comparison to the number of queries required.</p>
<p>Figure 1 (described in section 3) outlines their basic approach,<br />
which relies on having a human judge some subset of examples that the<br />
currently used classifier is least certain about.  This process is<br />
iterated until the human feels satisfied with the results.  One<br />
caveat of this approach is that the classifier must not only predict<br />
the class, it must give a measurement of certainty for that class.</p>
<p>Continuing on into section 4, we are introduced to how to build a<br />
classifier and use uncertainty sampling to train it.  Most of the<br />
section details the probability theory behind it, finishing up with<br />
how to do the sampling.  One thing I always wish for in these papers<br />
are concrete examples (maybe as an appendix or a reference) that work<br />
through the math on an actual toy problem.  Section 5 does just this,<br />
laying out an experiment and discussing the details, minus the math,<br />
which probably suits most people just fine.</p>
<p>Section 7 has an excellent discussion of the results, the pay dirt<br />
being that using this new method significantly reduces the number of<br />
examples required for training, at the cost of having a human in the<br />
loop.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/06/17/potw-61107-discussion-of-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 6/11/07: &#8220;A Sequential Algorithm for Training Text Classifiers&#8221; by Lewis and Gale</title>
		<link>http://www.paperoftheweek.com/2007/06/11/potw-61107-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/</link>
		<comments>http://www.paperoftheweek.com/2007/06/11/potw-61107-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/#comments</comments>
		<pubDate>Mon, 11 Jun 2007 12:18:22 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[Statistical Approach]]></category>
		<category><![CDATA[Text Categorization]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[naive bayes]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/06/11/potw-61107-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/</guid>
		<description><![CDATA[More on text classification: &#8220;A Sequential Algorithm for Training Text Classifiers&#8221; by David Lewis and William Gale.  A little bit of an older paper, but still looks to be a good one.]]></description>
			<content:encoded><![CDATA[<p>More on text classification: &#8220;<a href="http://citeseer.ist.psu.edu/rd/52437760%2C100508%2C1%2C0.25%2CDownload/http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/508/http:zSzzSzwww.research.att.comzSz%7ElewiszSzpaperszSzlewis94c.pdf/lewis94sequential.pdf">A Sequential Algorithm for Training Text Classifiers</a>&#8221; by David Lewis and William Gale.  A little bit of an older paper, but still looks to be a good one.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/06/11/potw-61107-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 6/3/07: Discussion of &#8220;A Comparison of Event Models for Naive Bayes Text Classification&#8221; by Andrew McCallum and Kamal Nigam</title>
		<link>http://www.paperoftheweek.com/2007/06/09/potw-6307-discussion-of-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/</link>
		<comments>http://www.paperoftheweek.com/2007/06/09/potw-6307-discussion-of-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/#comments</comments>
		<pubDate>Sun, 10 Jun 2007 01:34:53 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[Statistical Approach]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[naive bayes]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/06/09/potw-6307-discussion-of-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/</guid>
		<description><![CDATA[We are reading &#8220;A Comparison of Event Models for Naive Bayes Text Classification&#8221; by McCallum and Nigam. Text classification is the process of assigning a document to one or more categories (we looked at classification/categorization earlier when exploring Support Vector Machines, SVMs).  My understanding of the difference between categorization and classification is that categorization has [...]]]></description>
			<content:encoded><![CDATA[<p>We are reading &#8220;<a href="http://citeseer.ist.psu.edu/rd/0%2C489994%2C1%2C0.25%2CDownload/http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/24415/http:zSzzSzlans.ece.utexas.eduzSzulgzSzpaperszSznigam-mccallum-bayes.pdf/mccallum98comparison.pdf">A Comparison of Event Models for Naive Bayes Text Classification</a>&#8221; by McCallum and Nigam.</p>
<p>Text classification is the process of assigning a document to one or more categories (we looked at classification/categorization earlier when exploring Support Vector Machines, SVMs).  My understanding of the difference between categorization and classification is that categorization has a set number of categories, whereas classification  does not.  At any rate, this paper is  comparing two different classifiers that  use  a <a href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier">naive Bayes</a> approach.  The naive Bayes approach assumes that all attributes of the examples we are studying are independent of each other.  Even though this is rarely true in the real world for text (after all, if I chose all the words of this post independently we probably would have gibberish) it turns out that it still works pretty well in practice.  But I digress&#8230; The two approaches that are being compared are the Bernoulli model and the multinomial model.  The Bernoulli model uses the document as an event and builds a vector of binary attributes based on whether a term occurs or not in the document.  It DOES NOT take into account the number of times the word occurs.  In the multinomial approach, words are the event and term frequency does matter.  The next couple of sections after the introduction layout the common ground between the two approaches as well as the differences.  The differences come down to how the probabilities are calculated.</p>
<p>There is some interesting discussion of feature selection (a way of reducing the size of the vocabulary, which speeds things up, without, hopefully, losing too much information) using mutual information that is worth digging into a bit more if you have the time.</p>
<p>The next sections are where the rubber meets the road and the authors do a side by side comparison of the two approaches using 5 different collections.  You can see the results on pages 5 and 6.  Finally the discussion of the results occurs on page 6 and 7, with the bottom line seeming to be that the multinomial model seems to &#8220;be almost uniformly better than the multi-variate Bernoulli model.&#8221;</p>
<p>For those interested, <a href="http://www.cs.waikato.ac.nz/~ml/weka/index.html">Weka</a> has tools for building Naive Bayes classifiers.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/06/09/potw-6307-discussion-of-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 6/3/07: &#8220;A Comparison of Event Models for Naive Bayes Text Classification&#8221; by Andrew McCallum and Kamal Nigam</title>
		<link>http://www.paperoftheweek.com/2007/06/04/potw-6307-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/</link>
		<comments>http://www.paperoftheweek.com/2007/06/04/potw-6307-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/#comments</comments>
		<pubDate>Mon, 04 Jun 2007 12:42:56 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[Statistical Approach]]></category>
		<category><![CDATA[Text Categorization]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[naive bayes]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/06/04/potw-6307-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/</guid>
		<description><![CDATA[Paper of the week for the week of June 3, 2007 is &#8220;A Comparison of event Models for Naive Bayes Text Classification&#8221; by Andrew McCallum and Kamal Nigam. This paper promises to shed some light on different ways of using bayesian classifiers. It might be useful to do some background reading on naive Bayes starting [...]]]></description>
			<content:encoded><![CDATA[<p>Paper of the week for the week of June 3, 2007 is &#8220;<a href="http://citeseer.ist.psu.edu/rd/0%2C489994%2C1%2C0.25%2CDownload/http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/24415/http:zSzzSzlans.ece.utexas.eduzSzulgzSzpaperszSznigam-mccallum-bayes.pdf/mccallum98comparison.pdf">A Comparison of event Models for Naive Bayes Text Classification</a>&#8221; by Andrew McCallum and Kamal Nigam.  This paper promises to shed some light on different ways of using bayesian classifiers.  It might be useful to do some background reading on naive Bayes starting <a href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/06/04/potw-6307-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 5/21/07: &#8220;A Study on Retrospective and On-Line Event Detection&#8221; by Yang, Pierce and Carbonell</title>
		<link>http://www.paperoftheweek.com/2007/05/28/potw-52107-a-study-on-retrospective-and-on-line-event-detection-by-yang-pierce-and-carbonell-2/</link>
		<comments>http://www.paperoftheweek.com/2007/05/28/potw-52107-a-study-on-retrospective-and-on-line-event-detection-by-yang-pierce-and-carbonell-2/#comments</comments>
		<pubDate>Tue, 29 May 2007 02:52:43 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[event detection]]></category>
		<category><![CDATA[trend analysis]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/05/28/potw-52107-a-study-on-retrospective-and-on-line-event-detection-by-yang-pierce-and-carbonell-2/</guid>
		<description><![CDATA[Due to the Memorial Day weekend, and visiting family, I am going to extend this paper into this week.]]></description>
			<content:encoded><![CDATA[<p>Due to the Memorial Day weekend, and visiting family, I am going to extend this paper into this week.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/05/28/potw-52107-a-study-on-retrospective-and-on-line-event-detection-by-yang-pierce-and-carbonell-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 5/21/07: &#8220;A Study on Retrospective and On-Line Event Detection&#8221; by Yang, Pierce and Carbonell</title>
		<link>http://www.paperoftheweek.com/2007/05/21/potw-52107-a-study-on-retrospective-and-on-line-event-detection-by-yang-pierce-and-carbonell/</link>
		<comments>http://www.paperoftheweek.com/2007/05/21/potw-52107-a-study-on-retrospective-and-on-line-event-detection-by-yang-pierce-and-carbonell/#comments</comments>
		<pubDate>Tue, 22 May 2007 01:32:36 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[event detection]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/05/21/potw-52107-a-study-on-retrospective-and-on-line-event-detection-by-yang-pierce-and-carbonell/</guid>
		<description><![CDATA[Paper of the Week for May 20, 2007 is &#8220;A Study on Retrospective and On-Line Event Detection&#8221; by Yiming Yang, Tom Pierce and Jaime Carbonell.]]></description>
			<content:encoded><![CDATA[<p>Paper of the Week for May 20, 2007 is &#8220;<a href="http://citeseer.ist.psu.edu/rd/0%2C51293%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/1982/http:zSzzSzwww.cs.cmu.eduzSz%7EyimingzSzpapers.yyzSzsigir98.pdf/yang98study.pdf">A Study on Retrospective and On-Line Event Detection</a>&#8221; by Yiming Yang, Tom Pierce and Jaime Carbonell.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/05/21/potw-52107-a-study-on-retrospective-and-on-line-event-detection-by-yang-pierce-and-carbonell/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 5/14/07: Discussion of &#8220;Discovering Trends in Text Databases&#8221; by Lent et. al.</title>
		<link>http://www.paperoftheweek.com/2007/05/18/potw-51407-discussion-of-discovering-trends-in-text-databases-by-lent-et-al/</link>
		<comments>http://www.paperoftheweek.com/2007/05/18/potw-51407-discussion-of-discovering-trends-in-text-databases-by-lent-et-al/#comments</comments>
		<pubDate>Fri, 18 May 2007 13:28:33 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[trend analysis]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/05/18/potw-51407-discussion-of-discovering-trends-in-text-databases-by-lent-et-al/</guid>
		<description><![CDATA[This week&#8217;s paper, &#8220;Discovering Trends in Text Databases&#8221; by Lent is my first look at some text mining tools and applications. The paper discusses a method for identifying trends in databases. In this case, a trend is defined as &#8220;a specific subsequence of the history of a phrase that satisfies the users&#8217; query over the [...]]]></description>
			<content:encoded><![CDATA[<p>This week&#8217;s paper, &#8220;<a href="http://citeseer.ist.psu.edu/rd/52437760%2C29718%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/1451/http:zSzzSzwww.almaden.ibm.comzSzcszSzpeoplezSzragrawalzSzpaperszSzkdd97_trends.pdf/lent97discovering.pdf">Discovering Trends in Text Databases</a>&#8221; by Lent is my first look at some text mining tools and applications.  The paper discusses a method for identifying trends in databases.  In this case, a trend is defined as &#8220;a specific subsequence of the history of a phrase that satisfies the users&#8217; query over the histories&#8221;.  Essentially, what the authors are doing is identifying phrases in text that has been timestamped which they can then use to match user&#8217;s queries concerning things like spikes in usage of particular phrases, etc.</p>
<p>After covering some related work about Latent Semantic Indexing (I suppose I should look into that some day), the authors delve into the methodology of identifying phrases and their histories.  There are 3 steps to the process: 1) identify frequent phrases, 2) generating histories for the phrases and 3) identifying the phrases for a given trend.</p>
<p>Phrases in this paper go beyond the simple sequence of terms, introducing the notion of a &#8220;k-phrase&#8221;.   A k-phrase is essentially a nesting of phrases and they can span sentences, etc. when appropriate.  For the histories, each word gets a transaction id and associated timestamps.  Then, given these bits of informations, the authors use a shape query language to mine the phrases and histories.  The shape query language allows the user to specify they are interested when items are &#8220;spiking&#8221; or &#8220;trending downward&#8221;, etc.  There is a reference for the shape language in the paper.</p>
<p>Finally, the paper ends with a discussion of how IBM used the approach in a patent mining system to  identify trends in patents from the US Patent office.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/05/18/potw-51407-discussion-of-discovering-trends-in-text-databases-by-lent-et-al/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 5/14/07: &#8220;Discovering Trends in Text Databases&#8221; by Lent et. al.</title>
		<link>http://www.paperoftheweek.com/2007/05/14/potw-51407-discovering-trends-in-text-databases-by-lent-et-al/</link>
		<comments>http://www.paperoftheweek.com/2007/05/14/potw-51407-discovering-trends-in-text-databases-by-lent-et-al/#comments</comments>
		<pubDate>Mon, 14 May 2007 13:37:37 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[trend analysis]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/05/14/potw-51407-discovering-trends-in-text-databases-by-lent-et-al/</guid>
		<description><![CDATA[Ah, good to be back!  This week&#8217;s paper is &#8220;Discovering Trends in Text Databases&#8221; by Brian Lent, Rakesh Agrawal and Ramakrishnan Srikant.]]></description>
			<content:encoded><![CDATA[<p>Ah, good to be back!  This week&#8217;s paper is &#8220;<a href="http://citeseer.ist.psu.edu/rd/52437760%2C29718%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/1451/http:zSzzSzwww.almaden.ibm.comzSzcszSzpeoplezSzragrawalzSzpaperszSzkdd97_trends.pdf/lent97discovering.pdf">Discovering Trends in Text Databases</a>&#8221; by Brian Lent, Rakesh Agrawal and Ramakrishnan Srikant.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/05/14/potw-51407-discovering-trends-in-text-databases-by-lent-et-al/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW to return next week</title>
		<link>http://www.paperoftheweek.com/2007/05/08/potw-to-return-next-week/</link>
		<comments>http://www.paperoftheweek.com/2007/05/08/potw-to-return-next-week/#comments</comments>
		<pubDate>Tue, 08 May 2007 12:39:57 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/05/08/potw-to-return-next-week/</guid>
		<description><![CDATA[I will be returning to writing next week after a great ApacheCon Europe conference last week.  My &#8220;Advanced Lucene&#8221; slides are available at http://www.cnlp.org/presentations/present.asp?show=conference Next week, I think I am going to start looking into things like event detection, etc.  However, I am also considering looking into some non-NLP areas related to data mining, so [...]]]></description>
			<content:encoded><![CDATA[<p>I will be returning to writing next week after a great <a href="http://www.eu.apachecon.com">ApacheCon Europe</a> conference last week.  My &#8220;Advanced Lucene&#8221; slides are available at <a href="http://www.cnlp.org/presentations/present.asp?show=conference">http://www.cnlp.org/presentations/present.asp?show=conference</a></p>
<p>Next week, I think I am going to start looking into things like event detection, etc.  However, I am also considering looking into some non-NLP areas related to data mining, so if you have a preference, let me know.  I am open to explore many new ideas in the field.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/05/08/potw-to-return-next-week/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
