Tasks
Task 1-1. Write a program which reads in a text file and counts the different words occuring in this text file. The program outputs the list of different words together with the number of occurences (separated by blanks). Caution: The first field in this text file contains a utterance-ID which should not be counted in the program.
Download count.tcl here
# =============================================================== # JANUS-SR Janus Speech Recognition Toolkit # --------------------------------------------------- # Advanced Lab Speech Recognition and Understanding # # Author : tanja@cs.cmu.edu # Module : count.tcl # Date : Feb 16 2001 # # Remarks : # =============================================================== if { $argc != 1 || [lindex $argv 1] == "-help"} { puts stderr "USAGE: $argv0 'inputfile'" exit } set filename [lindex $argv 0] # -------------------- # define useful procs # -------------------- proc FreqSort {word1 word2} { global count; if {$count($word1) < $count($word2)} { return 1 } if {$count($word1) > $count($word2)} { return -1 } # in case of same frequency, sort alphabetically return [string compare $word1 $word2]; } # -------------------- # read in text file # -------------------- set fp [open $filename r] while {[gets $fp line] > -1} { set useline [lrange $line 1 end] foreach word $useline { if [info exists count($word)] { incr count($word) } else { set count($word) 1 } } } close $fp # -------------------- # outputs array # -------------------- foreach entry [array names count] { puts "$entry $count($entry)" } # -------------------- # outputs freq sorted # array ; requires # proc FreqSort # -------------------- foreach entry [lsort -command FreqSort [array names count]] { puts "$entry $count($entry)" } exitjanusS tools/count.tcl steps/data/transcripts gives transcript output
Question 1-1: What is the output if you are using steps/data/dict as input file instead?
janusS tools/count.tcl steps/data/dict gives you the frequencies of used phones in dict dict output
(I changed the file dict on Feb-22, so if your output is older than Feb-22, please run your count.tcl on the changed version in order to compare to the solution given here.)
Task 1-2. Write a program countPairs.tcl which reads in a text file and counts the occuring word pairs. The program outputs the list of different words pairs together with the number of occurences (separated by blanks). Use the existing program count.tcl as sample.
Caution: Like in count.tcl the utteranceID should not be considered as a word. For calculating the number of word pairs we would like to know which words occur at the beginning or the end of an utterance. Therefore, we define the character <s> to be the marker of the beginning of an utterance, and </s> to be the marker of the end of an utterance.
Download countPairs.tcl here
# =============================================================== # JANUS-SR Janus Speech Recognition Toolkit # --------------------------------------------------- # Advanced Lab Speech Recognition and Understanding # # Author : tanja@cs.cmu.edu # Module : countPairs.tcl # Date : Feb 20 2001 # # Remarks : Read in textfile and count different word pairs #======================================================================== if { $argc != 1 || [lindex $argv 1] == "-help"} { puts stderr "USAGE: $argv0 'textfile'" exit } set filename [lindex $argv 0] # --------------------- # proc count # --------------------- proc countWord {word} { global count if [info exists count($word)] { incr count($word) } else { set count($word) 1 } } # -------------------- # read in text file # -------------------- set fp [open $filename r] while {[gets $fp line] > -1} { set useline [lrange $line 1 end] set prev <s> foreach word $useline { countWord $prev,$word set prev $word } countWord $prev,</s> } close $fp # --------------------- # output sorted list # --------------------- foreach word [lsort [array names count]] { puts "$word $count($word)" } exit
janusS
% llength [types] gives you 124 (or 125 for the new janus version)
Question 2-2: What's the semantic of this syntax?
Create the transponed matrix mt and the inverse matrix m-1 and build the new matrix k = m-1 * mt
% DMatrix m {{3 1 0} {-1 0 1} {1 0 0}} % m { 3.00000000e+00 1.00000000e+00 0.00000000e+00 } { -1.00000000e+00 0.00000000e+00 1.00000000e+00 } { 1.00000000e+00 0.00000000e+00 0.00000000e+00 } % [DMatrix mt] trans m % [DMatrix mi] inv m % [DMatrix k] mul mi mt % k { -2.23859558e-16 1.00000000e+00 -2.34321386e-17 } { 3.00000000e+00 -4.00000000e+00 1.00000000e+00 } { 1.00000000e+00 1.00000000e+00 -5.55111512e-17 }Question 2-3: What's the resulting matrix k?
/ 0 1 0 \ k = ( 3 -4 1 ) \ 1 1 0 /Question 2-4: Verify E = A * A-1 using your matrix m
% [DMatrix E] mul m mi % E { 1.00000000e+00 1.49933236e-16 2.66453526e-15 } { -3.20790126e-17 1.00000000e+00 -2.22044605e-16 } { -2.34321386e-17 -1.53563143e-16 1.00000000e+00 }
/ 1 0 0 \ E = ( 0 1 0 ) \ 0 0 1 /
Create two features by typing in the following lines:
% FeatureSet fs
% fs FMatrix m {{1 2 3} {4 5 6} {7 8 9}}
% fs SVector s {1 2 3 4 5 6 7 8 9}Question 2-5: What happens, i.e. what's the meaning of the following commands:
fs FeatureSet Object (method puts) fs : outputs content of FeatureSet (method :) % m s fs: same as 'fs :' % m s fs. outputs subobjects of FeatureSet (method .) % Can't access sub-object '.' fs:m outputs Feature-object m in FeatureSet fs % m {useN 0} {type FMatrix} {frameN 3} {coefN 3} fs:m type outputs type of object m (method type) % Feature fs:m. outputs subobject of object fs:m (method .) % data fs:m. data outputs subobject 'data' of object fs:m (FMatrix) { 1.000000e+00 2.000000e+00 3.000000e+00 } { 4.000000e+00 5.000000e+00 6.000000e+00 } { 7.000000e+00 8.000000e+00 9.000000e+00 } fs:m.data same as fs:m. data fs:s.data outputs subobject 'data' of object fs:s (SVector) % 1 2 3 4 5 6 7 8 9 fs:s.data -help prints out all methods of object fs:s.data (method -help) fs:m.data := [fs:s.data] the coefficients of SVector are interpreted as FMatrix content ([] ascii output) and are assigned to fs:m.data (method :=)Question 2-6: Are there more information in the features?
Use method configure
fs:m configure {-samplingRate 16.000000} {-shift 0.000000} {-frameN 3} {-coeffN 3} {-dcoeffN 0} {-trans 0} fs:s configure {-samplingRate 16.000000} {-shift 0.000000} {-sampleN 9} {-dcoeffN 0} {-trans 0}Last modified: Fri Feb 23 11:29:54 EST 2001
Maintainer: tanja@cs.cmu.edu.