Collection of Articles On Text Categorization

The following articles helped me a lot in my work on Text Classification. You will find only the articles, as I didn't want to break any copyright laws. But you can find most of these papers by using the titles as keywords at google.

Section 0

Rennie, Rifkin: Improving Multiclass Text Classification with the Support Vector Machine (Oct. 2001) (using 20 Newsgroups Data Set)
Georges Siolas, Florence d'Alche-Buc: Support Vector Machines based on a Semantic Kernel for Text Categorization (using 20 Newsgroups Data Set)
Burges: A Tutorial on Support Vector Machines
Osuna et al.: Support Vector Machines, Training and Applications
Ngai Tang: Text Categorisation using Support Vector Machines (interesting dissertation, 30 August 2001)
Section 1

Domingos, Pazzani: On the Optimality of the Simple Bayesian Classifier und Zero-One Loss
Fabrizio Sebastiani: A Tutorial on Automated Text Categorisation
Fabrizio Sebastiani: Machine Learning in Automated Text Categorization
Fabrizio Sebastiani: Machine Learning in Automated Text Categorization (differently formatted, i.e. 55 pages instead of 63)
Galavotti, Sebastiani, Simi: Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization
Automatic Web Page Categorization by Link and Context Analysis
Categorisation by Context
Guest Editors'Introduction to the Special Issue on Automated Text Categorization
Caropreso, Matwin, Sebastiani: A Learner-Independent Evaluation of the Usefulness of Statistical Phrases for Automated Text Categorization
Lewis et al: Naive (Bayes) at Fourty
Section 2

Jason D.M. Rennie: Improving Multi-class Text Classification with Naive Bayes (Master's Thesis)
McCallum, Nigam, Rennie, Seymore: A Machine Learning Approach to Building Domain-Specific Search Engines
Nigam, McCallum, Thrun, Mitchell: Learning to Classify Text from Labeled and Unlabeled Documents
Nigam, McCallum, Thrun, Mitchell: Learning to Classify Text from Labeled and Unlabeled Documents (condensed version)
McCallum, Nigam: A Comparison of Event Models for Naive Bayes Text Classification
McCallum: Multi-Label Text Classification with a Mixture Model Trained by EMn
McCallum, Nigam: Employing EM and Pool-Based Active Learning for Text Classification
Craven, DiPasquo, Freitag, McCallum, Mitchell, Nigam, Slattery: Learning to Extract Symbolic Knowledge from the WWW
Baker, McCallum: Distributional Clustering of Words for Text Classification (newer?)
Baker, McCallum: Distributional Clustering of Words for Text Classification
Using Maximum Entropy for Text Classificationn
Andrew McCallum, Fernando Freitag: Maximum Entropy Markow Models for Information Extraction and Segmentation
D'Alessio, Murray, Schiaffino: The Effect of Using Hierarchical Classifiers in Text Categorization
Section 3

David Yarowsky: Word-Sense Disambiguation, Using Statistical Models of Roget's Categories, Trained on Large Corpora
Ide, Veronis: Word Sense Disambiguation: The State of the Art
SchÃÂ¼tze: Automatic Word Sense Discrimination
Mladenic, Grobelnik: Word Sequences as Features in Text-Learning
Yang et. al.: Learning Approaches for Detecting and Tracking News Events
Section 4

Apte, Damerau, Weiss: Automated Learning of Decision Rules for Text Categorization
Susan Dumais, Hao Chen: Hierarchical Classification of Web Content
Section 5

Lewis, Jones: Natural Language Processing for Information Retrieval
Wiener, Pedersen, Weigend: A Neural Network Approach to Topic Spotting
Gorniak, Peter: Sorting Email Messages by Topic
Gorniak, Peter: MailMind, A Connectionist E-Mail Sorting Client
Section 6

Vijay Boyapati: Towards a Comprehensive Topic, Hierarchy for News
Moulinier, Raskinis, Ganascia: Text Categorization: a Symbolic Approach
Quasthoff, Wolff: Effizientes Dokumentclustering durch niederfrequente Therme
Section 7

Yang, Pederson: A Comparative Study on Feature Selection in Text Categorization
Yang, Liu: A re-examination of Text Categorisation Methods
Improving Text Classification by Shrinkage in a Hierarchy of Classes
John, Kohavi, Pfleger: Irrelevant Features and the Subset Selection Problem
Martijn Spitters: Comparing feature sets for learning text categorization
Ellen Riloff: Little Words Can Make a Big Difference
Fuka, Hanka: Feature Set Reductuction for Document Classification Problems
Feature subset selection in text-learning
Ruiz, Srinivasan: Hierarchical Neural Networks for Text Categorization
Section 8 (some only in print)

An Algorithm for Suffix Stripping
Hsu, Lang: Feature Reduction and Database Maintanance in NETNEWS Classification
Thomas Hofmann: Learning and Representing Topic
Seminararbeit: Advanced Information Retrieval Methods
Section 9

Sam Scott: Feature Engineering for a Symbolic Approach to Text Classification
Kermit, et al.: Automatic Complexity Management: Personalised Document Retrieval from the World Wide Webn
Michie, et. al.: Machine Learning, Neural and Statistical Classification ( 298 pages!, review of different approaches to text classification)
Joachims: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
Meghini et. al.: A Model of Multimedia Information Retrieval
Slonim, Tishby: Document Clustering using Word Clusters via the Information Bottleneck Method
Mlademic: Turning Yahoo into an Automatic Web-Page Classifier
Mlademic, Grobelnik:Assigning keywords to documents using machine learning
Fuhr et. al.: AIR/X a Rule-Based Multistage Indexing System for Large Subject _Fields
Mitchell: Machine Learning,
Slides for instructors
Various

Articles by Junker
Mladenic: Text-Learning and Related Intelligent Agents: A Survey
Compression: A Key for Next-Generation Text Retrieval Systems
Chang: Enabling Concept-Based Relevance Feedback for Information Retrieval on the WWW

Whatever you do will be insignificant, but it is very important that you do it.
(Mahatma Gandhi)