A Javascript Implementation of Logistic Regression and C4.5 Decision Tree Algorithms

Author: Yandong Liu. Email: yandongl @ cs.cmu.edu. Date: 2013.5

Update: I've made some update on the data loading logic so now it reads in csv-format file. Previous version is still accessible but it's no longer supported.


Drop a csv file (such as this training and test files) to below areas and see the learning in action!
Drop training data file here Drop test data file here

Introduction: Javascript implementation of several machine learning algorithms including Decision Tree and Logistic Regression this far. More to come.

Data format: Input files need to be in CSV-format with 1st line being feature names. One of the features has to be called 'label'. E.g.

outlook, temp, humidity, wind, label
text, real, text, text, feature_type
'Sunny',80,'High', 'Weak', 'No'
'Sunny',82,'High', 'Strong', 'No'
'Overcast',73,'High', 'Weak', 'Yes' 
There's also an optional 2nd line for feature types and the 'label' column for 2nd line has to be called 'feature_type'. This is useful if feature types are mixed.

Usage:

  • Data loading: data_util.js provides three methods: In the loading callback function you will obtain a data object D on which you can apply the learning methods. Note that only Decision Tree supports both real and categorical features. Logistic Regression works on real features only.

  • Use in browser:
    loadString(content, function(D) {
      var tree = new learningjs.tree();
      tree.train(D, function(model, err){
        if(err) {
          console.log(err);
        } else {
          model.calcAccuracy(D.data, D.targets, function(acc, correct, total){
            console.log( 'training: got '+correct +' correct out of '+total+' examples. accuracy:'+(acc*100.0).toFixed(2)+'%');
          });
        }
      });
    }); 
    
    Check the source code of this page and see how it works on the dropped files.

  • Use in nodejs:
    data_util.loadRealFile(fn_csv, function(D) {
    
      //normalize data
      data_util.normalize(D.data, D.nfeatures); 
    
      //logistic regression. following params are optional
      D.optimizer = 'sgd'; //default choice. other choice is 'gd'
      D.learning_rate = 0.005;
      D.l2_weight = 0.000001;
      D.iterations = 1000; //increase number of iterations for better performance
    
      new learningjs.logistic().train(D, function(model, err){
        if(err) {
          console.log(err);
        } else {
          model.calcAccuracy(D.data, D.targets, function(acc, correct, total){
            console.log('training: got '+correct +' correct out of '+total+' examples. accuracy:'+(acc*100.0).toFixed(2)+'%');
          });
          data_util.loadRealFile(fn_test, function(T) {
            model.calcAccuracy(T.data, T.targets, function(acc, correct, total){
              console.log('    test: got '+correct +' correct out of '+total+' examples. accuracy:'+(acc*100.0).toFixed(2)+'%');
            });
          });
        }
      });
    }); 
    
    Here's a sample code file for tree and logistic regression for its application in nodejs.

    License: MIT

    Also see the source code