Smaller word clusters used in experiments:
filename#wordtypes#tweets#tokens#clustersmin
count
tweet source
6mpaths111,844 ~6,000,0001,575,58980010 10k tweet/day sample, 9/10/08 to 7/18/12
3mpaths124,731 3,000,0001,006,3248005subsample
750kpaths50,780 750,000?8005subsample
100kpaths21,345 100,000?8003subsample
10kpaths6,944 10,000?8002subsample
1kpaths4,142 100015,1598001subsample