![]() |
||||||||||||||||||||||||||||||||||||||||
|
|
Text Categorization or ClassificationThough the automated classification (categorization) of texts has been flourishing in the last decade or so, is a history, which dates back to about 1960. The incredible increase in online documents, which has been mostly due to the expanding internet, has renewed the interst in automated document classification and data mining. While text classification in the beginning was based mainly on heuristic methods, i.e. applying a set of rules based on expert knowledge, nowadays the focus has turned to fully automatic learning and even clustering methods.
Definition of Text Classification:
Let C = { c1, c2, ... cm} be a set of categories (classes) and D = { d1, d2, ... dn} a set of documents. The task of the text classification consists in assigning to each pair ( ci, dj ) of C x D (with 1 ≤ i ≤ m and 1 ≤ j ≤ n) a value of 0 or 1, i.e. the value 0, if the document dj doesn't belong to ci
This mapping is sometimes refered to as the decision matrix:
Articles on text classification |
Whatever you do will be insignificant, but it is very important that you do it. (Mahatma Gandhi)
© Copyright 1996 - 2009, Bernd Klein My German site |
||||||||||||||||||||||||||||||||||||||