An Evaluation of Statistical Approaches to Text Categorization – Yang (ResearchIndex) is the first paper of the week that I am going to tackle.
Finding a first paper was harder than I thought. I originally intended to start with some of the seminal papers in IR, such as Salton’s work, or Sparck-Jones, but it seems many of these works are “locked up” in books that I don’t have access to at the moment. I guess I may need to renew my ACM membership in order to get access to more content or see if the local library has them.  In the long run, I think I will probably end up also allowing for chapters of books to be read, but I first want to start with what is freely available on the web.
This paper, however, strikes me as a good intro to some of the statistical approaches to text categorization, so it seems it would be a good starting place for me. It comes recommended from Foundations of Statistical Natural Language Processing by Manning and Schutze (Chapter 16th), which is the book we used when I took Dr. Liddy’s NLP course at Syracuse University a few years back.