|
|
Machine Learning Algorithm
|
The algorithm used to create the model for handwriting identification using training data employed a machine learning algorithm Support Vector Machines (SVM). SVM is a supervised learning method that employs both classification and regression.
Support vector machines map input vectors to a higher dimensional space where separating hyperplanes can be constructed to section off each a particular set of data points. The region bounded by these hyperplanes defines the range of values for each classification. A property of these hyperplanes is that they maximize the distance between two parallel hyperplanes which empirical classification error and maximizes the geometric margin. An example in 2 dimensional space (2 attributes) can be seen below.
In the case of this research, there was originally 248 attributes which was then filtered down to 35 of the most distinctive attributes so that attributes that resulted in no gains or even loses in accuracy would not be present to detriment the model. We also used feature space normalization and second power hyperplanes.
For more information on SVM please check out the Wikipedia page.
|
How was it used?
|
The machine learning algorithm, SVM, was employed in 2 ways throughout the project. During initial development and proof of concept stages of the project a generalized machine learning tool called Weka was used. Weka was initially used as it supports a wide variety of different machine learning algorithms, from decision trees, bayes tests, regression models, and SVM. It allowed for feature seletion of the data to filter out those identifiers that increase model performance and those that could have potentially damaged it. It also allowed for clear analysis of model performance and easy model and algorithem optimization to find the best model for the situation. Weka allowed us to optimize and choose a final algorithm to use in further research of the problem.
As the project developed furhter, we incorporated a public domain library called SVM.NET developed by Mathew Johnson that allowed for built in modeling using training set data and perdication of test data. However, since training using so many features proved impractical using this SVM library, we switched back to Weka for our modeling and testing. This also aforded us better diagnosis and result feedback not to mention freedom during testing.
|
Zuye Zheng | Ananda Gunawardena |
|