Solution Homework 1 Feb-16

Tasks

Task 1-1. Write a program which reads in a text file and counts the different words occuring in this text file. The program outputs the list of different words together with the number of occurences (separated by blanks). Caution: The first field in this text file contains a utterance-ID which should not be counted in the program.

Download count.tcl here

# ===============================================================
#  JANUS-SR   Janus Speech Recognition Toolkit
#             ---------------------------------------------------
#             Advanced Lab Speech Recognition and Understanding
#
#  Author  : tanja@cs.cmu.edu
#  Module  : count.tcl
#  Date    : Feb 16 2001
# 
#  Remarks :
# ===============================================================
if { $argc != 1 || [lindex $argv 1] == "-help"} {
     puts stderr "USAGE: $argv0 'inputfile'"
     exit
}
set filename [lindex $argv 0]

# --------------------
#  define useful procs
# --------------------
proc FreqSort {word1 word2} {
    global count;

    if {$count($word1) < $count($word2)} { return 1  }
    if {$count($word1) > $count($word2)} { return -1 }
    # in case of same frequency, sort alphabetically
    return [string compare $word1 $word2];
}



# --------------------
#  read in text file
# --------------------
set fp [open $filename r]
while {[gets $fp line] > -1} {
    set useline [lrange $line 1 end]
    foreach word $useline {
	if [info exists count($word)] {
	    incr count($word)
	} else {
	    set count($word) 1
	}
    }
}
close $fp
# --------------------
#  outputs array
# --------------------
foreach entry [array names count] {
    puts "$entry $count($entry)"
}

# --------------------
#  outputs freq sorted 
#  array ; requires 
#  proc FreqSort
# --------------------

foreach entry [lsort -command FreqSort [array names count]] {
    puts "$entry $count($entry)"
}


exit

janusS tools/count.tcl steps/data/transcripts gives transcript output

Question 1-1: What is the output if you are using steps/data/dict as input file instead?
janusS tools/count.tcl steps/data/dict gives you the frequencies of used phones in dict dict output
(I changed the file dict on Feb-22, so if your output is older than Feb-22, please run your count.tcl on the changed version in order to compare to the solution given here.)

Task 1-2. Write a program countPairs.tcl which reads in a text file and counts the occuring word pairs. The program outputs the list of different words pairs together with the number of occurences (separated by blanks). Use the existing program count.tcl as sample.
Caution: Like in count.tcl the utteranceID should not be considered as a word. For calculating the number of word pairs we would like to know which words occur at the beginning or the end of an utterance. Therefore, we define the character <s> to be the marker of the beginning of an utterance, and </s> to be the marker of the end of an utterance.

Download countPairs.tcl here

# ===============================================================
#  JANUS-SR   Janus Speech Recognition Toolkit
#             ---------------------------------------------------
#             Advanced Lab Speech Recognition and Understanding
#
#  Author  : tanja@cs.cmu.edu
#  Module  : countPairs.tcl
#  Date    : Feb 20 2001
# 
#  Remarks :  Read in textfile and count different word pairs
#========================================================================
if { $argc != 1 || [lindex $argv 1] == "-help"} {
      puts stderr "USAGE: $argv0 'textfile'" 
      exit
}
set filename [lindex $argv 0]

# ---------------------
#  proc count
# ---------------------
proc countWord {word} {
    global count
    if [info exists count($word)] { 
       incr count($word)
    } else { 
       set count($word) 1 
    }
}

# --------------------
#  read in text file
# --------------------
set fp [open $filename r]
while {[gets $fp line] > -1} {
    set useline [lrange $line 1 end]
    set prev <s>
    foreach word $useline {
	countWord $prev,$word
	set prev $word
    }
    countWord $prev,</s>
}
close $fp

# ---------------------
#  output sorted list
# ---------------------
foreach word [lsort [array names count]] {
    puts "$word $count($word)"
}

exit

Question 2-1: How many object classes do you have in your Janus?

janusS % llength [types] gives you 124 (or 125 for the new janus version)

Create a matrix m of type DMatrix (double float coefficients) of the following content: {{3 1 0} {-1 0 1} {1 0 0}}

Question 2-2: What's the semantic of this syntax?

Create the transponed matrix mt and the inverse matrix m-1 and build the new matrix k = m-1 * mt

% DMatrix m {{3 1 0} {-1 0 1} {1 0 0}}
% m
{  3.00000000e+00  1.00000000e+00  0.00000000e+00 }
{ -1.00000000e+00  0.00000000e+00  1.00000000e+00 }
{  1.00000000e+00  0.00000000e+00  0.00000000e+00 }

% [DMatrix mt] trans m
% [DMatrix mi] inv m
% [DMatrix k] mul mi mt
% k
{ -2.23859558e-16  1.00000000e+00 -2.34321386e-17 }
{  3.00000000e+00 -4.00000000e+00  1.00000000e+00 }
{  1.00000000e+00  1.00000000e+00 -5.55111512e-17 }
Question 2-3: What's the resulting matrix k?

        / 0  1  0 \
   k = (  3 -4  1  )
        \ 1  1  0 / 
Question 2-4: Verify E = A * A-1 using your matrix m

% [DMatrix E] mul m mi
% E
{  1.00000000e+00  1.49933236e-16  2.66453526e-15 }
{ -3.20790126e-17  1.00000000e+00 -2.22044605e-16 }
{ -2.34321386e-17 -1.53563143e-16  1.00000000e+00 }

        / 1  0  0 \
   E = (  0  1  0  )
        \ 0  0  1 / 

Create two features by typing in the following lines:
% FeatureSet fs
% fs FMatrix m {{1 2 3} {4 5 6} {7 8 9}}
% fs SVector s {1 2 3 4 5 6 7 8 9}

Question 2-5: What happens, i.e. what's the meaning of the following commands:

fs                        FeatureSet Object (method puts)
fs :                      outputs content of FeatureSet (method :) % m s
fs:                       same as 'fs :' % m s
fs.                       outputs subobjects of FeatureSet (method .) % Can't access sub-object '.'
fs:m                      outputs Feature-object m in FeatureSet fs % m {useN 0} {type FMatrix} {frameN 3} {coefN 3}
fs:m type                 outputs type of object m (method type) % Feature
fs:m.                     outputs subobject of object fs:m (method .) % data
fs:m. data                outputs subobject 'data' of object fs:m (FMatrix)
                          {  1.000000e+00  2.000000e+00  3.000000e+00 }
                          {  4.000000e+00  5.000000e+00  6.000000e+00 }
                          {  7.000000e+00  8.000000e+00  9.000000e+00 }
fs:m.data                 same as fs:m. data
fs:s.data                 outputs subobject 'data' of object fs:s (SVector) % 1 2 3 4 5 6 7 8 9
fs:s.data -help           prints out all methods of object fs:s.data (method -help)
fs:m.data := [fs:s.data]  the coefficients of SVector are interpreted as FMatrix content ([] ascii output)
                          and are assigned to fs:m.data (method :=)

Question 2-6: Are there more information in the features?

Use method configure

fs:m configure
{-samplingRate 16.000000} {-shift 0.000000} {-frameN 3} {-coeffN 3} {-dcoeffN 0} {-trans 0} 
fs:s configure
{-samplingRate 16.000000} {-shift 0.000000} {-sampleN 9} {-dcoeffN 0} {-trans 0} 

Last modified: Fri Feb 23 11:29:54 EST 2001
Maintainer: tanja@cs.cmu.edu.