<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Paper of the Week &#187; classification</title>
	<atom:link href="http://www.paperoftheweek.com/category/classification/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.paperoftheweek.com</link>
	<description>Read. Learn. Discuss.</description>
	<lastBuildDate>Tue, 14 Aug 2007 01:35:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>POTW 6/24/07: &#8220;Support-Vector Networks&#8221; by Cortes and Vapnik</title>
		<link>http://www.paperoftheweek.com/2007/06/25/potw-62407-support-vector-networks-by-cortes-and-vapnik/</link>
		<comments>http://www.paperoftheweek.com/2007/06/25/potw-62407-support-vector-networks-by-cortes-and-vapnik/#comments</comments>
		<pubDate>Mon, 25 Jun 2007 18:27:22 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[SVM]]></category>
		<category><![CDATA[Statistical Approach]]></category>
		<category><![CDATA[Text Categorization]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[support vector machines]]></category>
		<category><![CDATA[text mining]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/06/25/potw-62407-support-vector-networks-by-cortes-and-vapnik/</guid>
		<description><![CDATA[Long paper this week, but it is the original on Support Vector Machines: Support-Vector Networks by Cortes and Vapnik.  Given my schedule, I may spread this out over two weeks.]]></description>
			<content:encoded><![CDATA[<p>Long paper this week, but it is the original on Support Vector Machines: <a href="http://citeseer.ist.psu.edu/rd/0%2C500489%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/23317/http:zSzzSzwww.research.att.comzSz%7EcorinnazSzpaperszSzsupport.vector.pdf/cortes95supportvector.pdf">Support-Vector Networks</a> by Cortes and Vapnik.  Given my schedule, I may spread this out over two weeks.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/06/25/potw-62407-support-vector-networks-by-cortes-and-vapnik/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 6/11/07: Discussion of &#8220;A Sequential Algorithm for Training Text Classifiers&#8221; by Lewis and Gale</title>
		<link>http://www.paperoftheweek.com/2007/06/17/potw-61107-discussion-of-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/</link>
		<comments>http://www.paperoftheweek.com/2007/06/17/potw-61107-discussion-of-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/#comments</comments>
		<pubDate>Mon, 18 Jun 2007 03:58:12 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[Statistical Approach]]></category>
		<category><![CDATA[Text Categorization]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[naive bayes]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/06/17/potw-61107-discussion-of-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/</guid>
		<description><![CDATA[In &#8220;A Sequential Algorithm for Training Text Classifiers&#8221; by David D. Lewis and William Gale, the authors put forth a new (at the time) method training text classifiers using an approach they call &#8220;uncertainty sampling&#8221; Section 1 outlines the problem of training, namely obtaining a good sample of text to be labeled for the trainer. [...]]]></description>
			<content:encoded><![CDATA[<p>In &#8220;A Sequential Algorithm for Training Text Classifiers&#8221; by David D.<br />
Lewis and William Gale, the authors put forth a new (at the time)<br />
method training text classifiers using an approach they call<br />
&#8220;uncertainty sampling&#8221;</p>
<p>Section 1 outlines the problem of training, namely obtaining a good<br />
sample of text to be labeled for the trainer.  After disposing of<br />
several other methods of garnering samples (random, relevance<br />
feedback based), Lewis and Gale introduce an iterative approach for<br />
manually labeling examples.</p>
<p>Section 2 then discusses the benefits of &#8220;learning by query&#8221; in<br />
theory, namely the possibility of reducing the error rate very<br />
quickly in comparison to the number of queries required.</p>
<p>Figure 1 (described in section 3) outlines their basic approach,<br />
which relies on having a human judge some subset of examples that the<br />
currently used classifier is least certain about.  This process is<br />
iterated until the human feels satisfied with the results.  One<br />
caveat of this approach is that the classifier must not only predict<br />
the class, it must give a measurement of certainty for that class.</p>
<p>Continuing on into section 4, we are introduced to how to build a<br />
classifier and use uncertainty sampling to train it.  Most of the<br />
section details the probability theory behind it, finishing up with<br />
how to do the sampling.  One thing I always wish for in these papers<br />
are concrete examples (maybe as an appendix or a reference) that work<br />
through the math on an actual toy problem.  Section 5 does just this,<br />
laying out an experiment and discussing the details, minus the math,<br />
which probably suits most people just fine.</p>
<p>Section 7 has an excellent discussion of the results, the pay dirt<br />
being that using this new method significantly reduces the number of<br />
examples required for training, at the cost of having a human in the<br />
loop.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/06/17/potw-61107-discussion-of-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 6/11/07: &#8220;A Sequential Algorithm for Training Text Classifiers&#8221; by Lewis and Gale</title>
		<link>http://www.paperoftheweek.com/2007/06/11/potw-61107-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/</link>
		<comments>http://www.paperoftheweek.com/2007/06/11/potw-61107-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/#comments</comments>
		<pubDate>Mon, 11 Jun 2007 12:18:22 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[Statistical Approach]]></category>
		<category><![CDATA[Text Categorization]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[naive bayes]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/06/11/potw-61107-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/</guid>
		<description><![CDATA[More on text classification: &#8220;A Sequential Algorithm for Training Text Classifiers&#8221; by David Lewis and William Gale.  A little bit of an older paper, but still looks to be a good one.]]></description>
			<content:encoded><![CDATA[<p>More on text classification: &#8220;<a href="http://citeseer.ist.psu.edu/rd/52437760%2C100508%2C1%2C0.25%2CDownload/http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/508/http:zSzzSzwww.research.att.comzSz%7ElewiszSzpaperszSzlewis94c.pdf/lewis94sequential.pdf">A Sequential Algorithm for Training Text Classifiers</a>&#8221; by David Lewis and William Gale.  A little bit of an older paper, but still looks to be a good one.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/06/11/potw-61107-a-sequential-algorithm-for-training-text-classifiers-by-lewis-and-gale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 6/3/07: Discussion of &#8220;A Comparison of Event Models for Naive Bayes Text Classification&#8221; by Andrew McCallum and Kamal Nigam</title>
		<link>http://www.paperoftheweek.com/2007/06/09/potw-6307-discussion-of-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/</link>
		<comments>http://www.paperoftheweek.com/2007/06/09/potw-6307-discussion-of-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/#comments</comments>
		<pubDate>Sun, 10 Jun 2007 01:34:53 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[Statistical Approach]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[naive bayes]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/06/09/potw-6307-discussion-of-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/</guid>
		<description><![CDATA[We are reading &#8220;A Comparison of Event Models for Naive Bayes Text Classification&#8221; by McCallum and Nigam. Text classification is the process of assigning a document to one or more categories (we looked at classification/categorization earlier when exploring Support Vector Machines, SVMs).  My understanding of the difference between categorization and classification is that categorization has [...]]]></description>
			<content:encoded><![CDATA[<p>We are reading &#8220;<a href="http://citeseer.ist.psu.edu/rd/0%2C489994%2C1%2C0.25%2CDownload/http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/24415/http:zSzzSzlans.ece.utexas.eduzSzulgzSzpaperszSznigam-mccallum-bayes.pdf/mccallum98comparison.pdf">A Comparison of Event Models for Naive Bayes Text Classification</a>&#8221; by McCallum and Nigam.</p>
<p>Text classification is the process of assigning a document to one or more categories (we looked at classification/categorization earlier when exploring Support Vector Machines, SVMs).  My understanding of the difference between categorization and classification is that categorization has a set number of categories, whereas classification  does not.  At any rate, this paper is  comparing two different classifiers that  use  a <a href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier">naive Bayes</a> approach.  The naive Bayes approach assumes that all attributes of the examples we are studying are independent of each other.  Even though this is rarely true in the real world for text (after all, if I chose all the words of this post independently we probably would have gibberish) it turns out that it still works pretty well in practice.  But I digress&#8230; The two approaches that are being compared are the Bernoulli model and the multinomial model.  The Bernoulli model uses the document as an event and builds a vector of binary attributes based on whether a term occurs or not in the document.  It DOES NOT take into account the number of times the word occurs.  In the multinomial approach, words are the event and term frequency does matter.  The next couple of sections after the introduction layout the common ground between the two approaches as well as the differences.  The differences come down to how the probabilities are calculated.</p>
<p>There is some interesting discussion of feature selection (a way of reducing the size of the vocabulary, which speeds things up, without, hopefully, losing too much information) using mutual information that is worth digging into a bit more if you have the time.</p>
<p>The next sections are where the rubber meets the road and the authors do a side by side comparison of the two approaches using 5 different collections.  You can see the results on pages 5 and 6.  Finally the discussion of the results occurs on page 6 and 7, with the bottom line seeming to be that the multinomial model seems to &#8220;be almost uniformly better than the multi-variate Bernoulli model.&#8221;</p>
<p>For those interested, <a href="http://www.cs.waikato.ac.nz/~ml/weka/index.html">Weka</a> has tools for building Naive Bayes classifiers.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/06/09/potw-6307-discussion-of-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POTW 6/3/07: &#8220;A Comparison of Event Models for Naive Bayes Text Classification&#8221; by Andrew McCallum and Kamal Nigam</title>
		<link>http://www.paperoftheweek.com/2007/06/04/potw-6307-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/</link>
		<comments>http://www.paperoftheweek.com/2007/06/04/potw-6307-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/#comments</comments>
		<pubDate>Mon, 04 Jun 2007 12:42:56 +0000</pubDate>
		<dc:creator>grant.ingersoll</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Natural Language Processing (NLP)]]></category>
		<category><![CDATA[Statistical Approach]]></category>
		<category><![CDATA[Text Categorization]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[naive bayes]]></category>

		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/06/04/potw-6307-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/</guid>
		<description><![CDATA[Paper of the week for the week of June 3, 2007 is &#8220;A Comparison of event Models for Naive Bayes Text Classification&#8221; by Andrew McCallum and Kamal Nigam. This paper promises to shed some light on different ways of using bayesian classifiers. It might be useful to do some background reading on naive Bayes starting [...]]]></description>
			<content:encoded><![CDATA[<p>Paper of the week for the week of June 3, 2007 is &#8220;<a href="http://citeseer.ist.psu.edu/rd/0%2C489994%2C1%2C0.25%2CDownload/http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/24415/http:zSzzSzlans.ece.utexas.eduzSzulgzSzpaperszSznigam-mccallum-bayes.pdf/mccallum98comparison.pdf">A Comparison of event Models for Naive Bayes Text Classification</a>&#8221; by Andrew McCallum and Kamal Nigam.  This paper promises to shed some light on different ways of using bayesian classifiers.  It might be useful to do some background reading on naive Bayes starting <a href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paperoftheweek.com/2007/06/04/potw-6307-a-comparison-of-event-models-for-naive-bayes-text-classification-by-andrew-mccallum-and-kamal-nigam/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
