<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: An Evaluation of Statistical Approaches to Text Categorization &#8211; Yang (ResearchIndex)</title>
	<atom:link href="http://www.paperoftheweek.com/2007/01/08/an-evaluation-of-statistical-approaches-to-text-categorization-yang-researchindex/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.paperoftheweek.com/2007/01/08/an-evaluation-of-statistical-approaches-to-text-categorization-yang-researchindex/</link>
	<description>Read. Learn. Discuss.</description>
	<lastBuildDate>Mon, 18 Jun 2007 15:05:50 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Paper of the Week &#187; Blog Archive &#187; Discussion of Sections 5-7 of Yang 97</title>
		<link>http://www.paperoftheweek.com/2007/01/08/an-evaluation-of-statistical-approaches-to-text-categorization-yang-researchindex/comment-page-1/#comment-4</link>
		<dc:creator>Paper of the Week &#187; Blog Archive &#187; Discussion of Sections 5-7 of Yang 97</dc:creator>
		<pubDate>Sat, 13 Jan 2007 02:12:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/01/08/an-evaluation-of-statistical-approaches-to-text-categorization-yang-researchindex/#comment-4</guid>
		<description>[...] Whew, I think we&#8217;ve made it through our first paper, or we are about to anyway.  If you recall, we are working our way through Yang 97 and had made it through the first 4 sections so far, which are covered here.  This leaves us with the meat of the paper, I guess, which is the actual experiments. [...]</description>
		<content:encoded><![CDATA[<p>[...] Whew, I think we&#8217;ve made it through our first paper, or we are about to anyway.  If you recall, we are working our way through Yang 97 and had made it through the first 4 sections so far, which are covered here.  This leaves us with the meat of the paper, I guess, which is the actual experiments. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paper of the Week &#187; Blog Archive &#187; Discussion of Sections 1-4 of Yang 97</title>
		<link>http://www.paperoftheweek.com/2007/01/08/an-evaluation-of-statistical-approaches-to-text-categorization-yang-researchindex/comment-page-1/#comment-2</link>
		<dc:creator>Paper of the Week &#187; Blog Archive &#187; Discussion of Sections 1-4 of Yang 97</dc:creator>
		<pubDate>Thu, 11 Jan 2007 02:31:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.paperoftheweek.com/2007/01/08/an-evaluation-of-statistical-approaches-to-text-categorization-yang-researchindex/#comment-2</guid>
		<description>[...] So, hopefully everyone has read the paper (http://www.paperoftheweek.com/2007/01/08/an-evaluation-of-statistical-approaches-to-text-categorization-yang-researchindex/) at least once. The first 4 sections are quite easy to get, in my opinion, as they define the problem of text categorization and lay the framework for the experiments. Digging into the details of the various implementations will be left as an exercise for the reader at this point (damn, I&#8217;ve always wanted to say that ever since wading through math proofs back in college). In a nutshell, the author is setting out to evaluate 14 different text categorization methods using a few different &#8220;standard&#8221; collections. The real effort here is to try to compare apples to apples, since some of the prior research concerning these systems has used a variety of approaches to evaluation, preventing direct comparison. In section 3.2 it is important to note the discussion of how the corpora are used. I am sure, as we go forward on many of these topics that we will come across these corpora again and again, especially the Reuters collection (heck, we even use it in Lucene for benchmarking.) Section 4, on performance measures, discusses another piece of information that occurs in much of the literature, namely the concept of evaluation. Recall and precision are very common measures and the paper does a good job of how we derive recall, precision, break even point and the F measure from the truth table generated by a binary classifier. The recall and precision methods are worth repeating here, I think, as we will see many variations of this when discussing IR, etc.: [...]</description>
		<content:encoded><![CDATA[<p>[...] So, hopefully everyone has read the paper (<a href="http://www.paperoftheweek.com/2007/01/08/an-evaluation-of-statistical-approaches-to-text-categorization-yang-researchindex/" rel="nofollow">http://www.paperoftheweek.com/2007/01/08/an-evaluation-of-statistical-approaches-to-text-categorization-yang-researchindex/</a>) at least once. The first 4 sections are quite easy to get, in my opinion, as they define the problem of text categorization and lay the framework for the experiments. Digging into the details of the various implementations will be left as an exercise for the reader at this point (damn, I&#8217;ve always wanted to say that ever since wading through math proofs back in college). In a nutshell, the author is setting out to evaluate 14 different text categorization methods using a few different &#8220;standard&#8221; collections. The real effort here is to try to compare apples to apples, since some of the prior research concerning these systems has used a variety of approaches to evaluation, preventing direct comparison. In section 3.2 it is important to note the discussion of how the corpora are used. I am sure, as we go forward on many of these topics that we will come across these corpora again and again, especially the Reuters collection (heck, we even use it in Lucene for benchmarking.) Section 4, on performance measures, discusses another piece of information that occurs in much of the literature, namely the concept of evaluation. Recall and precision are very common measures and the paper does a good job of how we derive recall, precision, break even point and the F measure from the truth table generated by a binary classifier. The recall and precision methods are worth repeating here, I think, as we will see many variations of this when discussing IR, etc.: [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

