Text Categorization or Classification

Though the automated classification (categorization) of texts has been flourishing in the last decade or so, has a history, which dates back to about 1960. The incredible increase in online documents, which has been mostly due to the expanding internet, has renewed the interest in automated document classification and data mining. While text classification in the beginning was based mainly on heuristic methods, i.e. applying a set of rules based on expert knowledge, nowadays the focus has turned to fully automatic learning and even clustering methods.

Definition of Text Classification:

Let C = { c₁, c₂, ... c_m} be a set of categories (classes) and D = { d₁, d₂, ... d_n} a set of documents.

The task of the text classification consists in assigning to each pair ( c_i, d_j) of C x D (with 1 ≤ i ≤ m and 1 ≤ j ≤ n) a value of 0 or 1, i.e. the value 0, if the document d_j doesn't belong to c_i

This mapping is sometimes refered to as the decision matrix:

	d₁	...	d_j	...	d_n
c₁	a₁₁	...	a_1j	...	a_1n
...	...	...	...	...	...
c_i	a_i1	...	a_ij	...	a_in
...	...	...	...	...	...
c_m	a_m1	...	a_mj	...	a_mn

The main approaches to solve this task are:

Naive Bayes
Support Vector Machine
Nearest Neighbour

More about this topic in our chapter Text Categorization and Classification of our Python Course, where you can also find an implementation of a Naive Bayes Classifier in Python.

You can find an interesting and exhaustive bibliography on this topic:
Articles on text classification

If your are interested in writing your own text classification system and if you are looking for a seminar with an expert both in Python and in natural language text processing, you can attend one of my courses on "Natural Language Processing" with Python at Bodenseo. xyz as a symbol for natural language processing

xyz as a symbol for natural language processing

The class on Text Classification is taught at our training centre in Toronto as well: Pleas check our website Python-training-courses.com

Trainings in Toronto