Using Raw or Preprocessed Data

The advantage of using raw data (plain samples) is that you can use any preprocessing on the fly after reading the file during training or in any other situation. If your feature files contain plain samples, then you have access to all the recorded information. The disadvandages, however, are that raw data might take up (much) more space on the mass storage device than preprocessed data. Also preprocessing costs some time which might be significant for you.

If you know that your acoustic front end wont change for a very long time, or if you desparately need a very fast acosutic front end, or if you don't have enough space on your hard disk to store all the raw speech files, then you better use preprocessed data.

If you want to experiment with your front end, or if you don't care about disk usage and the speed of the acoustic front end is of minor importance then you should use raw data speech files.