Software: SWATH - Thai Word Segmentation



Home

       My Publication Software

SWATH (Smart Word Analysis for THai) is a word segmentation for Thai. Swath offers 3 algorithms: Longest Matching, Maximal Matching and Part-of-Speech Bigram. The algorithrm are briefly in [1] and [2]. The program supports various file input format such as html, rtf, LaTeX as well as plain text.

Download

Swath2.0.1 (binary) for Win32 (Complied on October 7, 2003)
Swath for Linux
Manual

Open Source

Open source version for Swath under GPL license.
Download: ftp://linux.thai.net/pub/thailinux/cvs/software/swath
or
Check out from cvs:
cvs -d :pserver:anonymous@linux.thai.net:/home/cvs co software/swath

References

  1. Paisarn Charoenpornsawat. 1999. Feature-based Thai Word Segmentation. Master's Thesis. Computer  Engineering. Chulalongkorn University, Bangkok, Thailand. (in Thai).

  2. Surapant Meknavin, Paisarn Charoenpornsawat, and Boonserm Kijsirikul, 1997. Feature-based Thai Word Segmentation. In Proceedings of the Natural Language Processing Pacific Rim Symposium 1997(NLPRS’97), Phuket, Thailand.



Publications
Software
My Blog
Links
Contact