Collection of Articles On Text Categorization


The following articles helped me a lot in my work on Text Classification. You will find only the articles, as I didn't want to break any copyright laws. But you can find most of these papers by using the titles as keywords at google.

Section 0
  • Rennie, Rifkin: Improving Multiclass Text Classification with the Support Vector Machine (Oct. 2001) (using 20 Newsgroups Data Set)
  • Georges Siolas, Florence d'Alche-Buc: Support Vector Machines based on a Semantic Kernel for Text Categorization (using 20 Newsgroups Data Set)
  • Burges: A Tutorial on Support Vector Machines
  • Osuna et al.: Support Vector Machines, Training and Applications
  • Ngai Tang: Text Categorisation using Support Vector Machines (interesting dissertation, 30 August 2001)
Section 1
  • Domingos, Pazzani: On the Optimality of the Simple Bayesian Classifier und Zero-One Loss
  • Fabrizio Sebastiani: A Tutorial on Automated Text Categorisation
  • Fabrizio Sebastiani: Machine Learning in Automated Text Categorization
  • Fabrizio Sebastiani: Machine Learning in Automated Text Categorization (differently formatted, i.e. 55 pages instead of 63)
  • Galavotti, Sebastiani, Simi: Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization
  • Automatic Web Page Categorization by Link and Context Analysis
  • Categorisation by Context
  • Guest Editors'Introduction to the Special Issue on Automated Text Categorization
  • Caropreso, Matwin, Sebastiani: A Learner-Independent Evaluation of the Usefulness of Statistical Phrases for Automated Text Categorization
  • Lewis et al: Naive (Bayes) at Fourty
Section 2
  • Jason D.M. Rennie: Improving Multi-class Text Classification with Naive Bayes (Master's Thesis)
  • McCallum, Nigam, Rennie, Seymore: A Machine Learning Approach to Building Domain-Specific Search Engines
  • Nigam, McCallum, Thrun, Mitchell: Learning to Classify Text from Labeled and Unlabeled Documents
  • Nigam, McCallum, Thrun, Mitchell: Learning to Classify Text from Labeled and Unlabeled Documents (condensed version)
  • McCallum, Nigam: A Comparison of Event Models for Naive Bayes Text Classification
  • McCallum: Multi-Label Text Classification with a Mixture Model Trained by EMn
  • McCallum, Nigam: Employing EM and Pool-Based Active Learning for Text Classification
  • Craven, DiPasquo, Freitag, McCallum, Mitchell, Nigam, Slattery: Learning to Extract Symbolic Knowledge from the WWW
  • Baker, McCallum: Distributional Clustering of Words for Text Classification (newer?)
  • Baker, McCallum: Distributional Clustering of Words for Text Classification
  • Using Maximum Entropy for Text Classificationn
  • Andrew McCallum, Fernando Freitag: Maximum Entropy Markow Models for Information Extraction and Segmentation
  • D'Alessio, Murray, Schiaffino: The Effect of Using Hierarchical Classifiers in Text Categorization
Section 3
  • David Yarowsky: Word-Sense Disambiguation, Using Statistical Models of Roget's Categories, Trained on Large Corpora
  • Ide, Veronis: Word Sense Disambiguation: The State of the Art
  • Schütze: Automatic Word Sense Discrimination
  • Mladenic, Grobelnik: Word Sequences as Features in Text-Learning
  • Yang et. al.: Learning Approaches for Detecting and Tracking News Events
Section 4
  • Apte, Damerau, Weiss: Automated Learning of Decision Rules for Text Categorization
  • Susan Dumais, Hao Chen: Hierarchical Classification of Web Content
Section 5
  • Lewis, Jones: Natural Language Processing for Information Retrieval
  • Wiener, Pedersen, Weigend: A Neural Network Approach to Topic Spotting
  • Gorniak, Peter: Sorting Email Messages by Topic
  • Gorniak, Peter: MailMind, A Connectionist E-Mail Sorting Client
Section 6
  • Vijay Boyapati: Towards a Comprehensive Topic, Hierarchy for News
  • Moulinier, Raskinis, Ganascia: Text Categorization: a Symbolic Approach
  • Quasthoff, Wolff: Effizientes Dokumentclustering durch niederfrequente Therme
Section 7
  • Yang, Pederson: A Comparative Study on Feature Selection in Text Categorization
  • Yang, Liu: A re-examination of Text Categorisation Methods
  • Improving Text Classification by Shrinkage in a Hierarchy of Classes
  • John, Kohavi, Pfleger: Irrelevant Features and the Subset Selection Problem
  • Martijn Spitters: Comparing feature sets for learning text categorization
  • Ellen Riloff: Little Words Can Make a Big Difference
  • Fuka, Hanka: Feature Set Reductuction for Document Classification Problems
  • Feature subset selection in text-learning
  • Ruiz, Srinivasan: Hierarchical Neural Networks for Text Categorization
Section 8 (some only in print)
  • An Algorithm for Suffix Stripping
  • Hsu, Lang: Feature Reduction and Database Maintanance in NETNEWS Classification
  • Thomas Hofmann: Learning and Representing Topic
  • Seminararbeit: Advanced Information Retrieval Methods
Section 9
  • Sam Scott: Feature Engineering for a Symbolic Approach to Text Classification
  • Kermit, et al.: Automatic Complexity Management: Personalised Document Retrieval from the World Wide Webn
  • Michie, et. al.: Machine Learning, Neural and Statistical Classification ( 298 pages!, review of different approaches to text classification)
  • Joachims: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
  • Meghini et. al.: A Model of Multimedia Information Retrieval
  • Slonim, Tishby: Document Clustering using Word Clusters via the Information Bottleneck Method
  • Mlademic: Turning Yahoo into an Automatic Web-Page Classifier
  • Mlademic, Grobelnik:Assigning keywords to documents using machine learning
  • Fuhr et. al.: AIR/X a Rule-Based Multistage Indexing System for Large Subject _Fields
  • Mitchell: Machine Learning,
    Slides for instructors
Various
  • Articles by Junker
  • Mladenic: Text-Learning and Related Intelligent Agents: A Survey
  • Compression: A Key for Next-Generation Text Retrieval Systems
  • Chang: Enabling Concept-Based Relevance Feedback for Information Retrieval on the WWW


generators and sunrise



Whatever you do will be insignificant, but it is very important that you do it.
(Mahatma Gandhi)


Bernd Klein

© Copyright 1996 - 2018, Bernd Klein
Data Protection Declaration (DSGVO)
My German site