|
Text Categorization or ClassificationThough the automated classification (categorization) of texts has been flourishing in the last decade or so, has a history, which dates back to about 1960. The incredible increase in online documents, which has been mostly due to the expanding internet, has renewed the interest in automated document classification and data mining. While text classification in the beginning was based mainly on heuristic methods, i.e. applying a set of rules based on expert knowledge, nowadays the focus has turned to fully automatic learning and even clustering methods. Definition of Text Classification: Let C = { c1, c2, ... cm} be a set of categories (classes) and D = { d1, d2, ... dn} a set of documents. The task of the text classification consists in assigning to each pair ( ci, dj ) of C x D (with 1 ≤ i ≤ m and 1 ≤ j ≤ n) a value of 0 or 1, i.e. the value 0, if the document dj doesn't belong to ci This mapping is sometimes refered to as the decision matrix:
More about this topic in our chapter Text Categorization and Classification of our Python Course, where you can also find an implementation of a Naive Bayes Classifier in Python. You can find an interesting and exhaustive bibliography on this topic: Articles on text classification |
If your are interested in writing your own text classification system and if you are looking for a seminar with an expert both in Python and in natural language text processing, you can attend one of my courses on "Natural Language Processing" with Python at Bodenseo. The class on Text Classification is taught at our training centre in Toronto as well: Pleas check our website Python-training-courses.com © Copyright 1996 - 2018, Bernd Klein Data Protection Declaration (DSGVO) My German site |