First page Back Continue Last page Summary Graphic
Term and Vector-based Approach
Documents are represented by "bags of words" - essentially these are the conflated word stems plus statistical information of the words appearing in the document.
- "bus accidents over the holidays",
"car accidents during New Year's Eve",
"nuclear accidents in Ukraine"
- The 1st and 2nd `accidents' should be more similar than the 1st and the 3rd.
- "New Year's Eve" should be treated as one single word.
Structure information is needed for higher precision.