1a: MLE argmax_h P(data|h) --- what makes the data most likely
    MAP argmax_h P(h|data) --- what hypothesis does the data make most
                               likely; requires prior on h

1b: VC(H) = |X|

1c (i):
             Z
           /   \
          /     \
          V     V
 
          X     Y

1c(ii) 5 params

1c(iii) 7 params

1d : FALSE: estimate is optimistic because it might not be paying for training
            set noise

1e: TRUE

1f: TRUE (though not covered in 2003)

1g: FALSE: Both can hit local minima

1h: FALSE: MDPs have fully observed state whereas HMMs have observation
           symbols that are stochastically dependent on state

2a Initialize with if [Attribute value 1] then class

   Taking the next example...

     If it is correctly classified do nothing
     Otherwise pace at the root one of the attributes with the corresponding
       classification

2b (Assuming lists are of depth d)

        4^d * k! / (k - d)!

2c m >= 1/eps ( log |H| + log (1/delta) ) with eps = 0.1, delta = 0.1

2d |H| = 2^k and m >= 10 ( k log 2 + log 10 )

3a The centroids found by k-means can get pushed away from each other so
   the are further apart than the true means that generated the data.

3d One difference is that GMM elements may be long and thin

4a 0.8

4b 0.18

4c 0.44

4d 0.2 (Since P(yell) is 0.2 no matter what the state)

4e HAAAA

5a: A and C

5b: 

    P( A ^ B | C ) = 1/8         P( A ^ B | ~C ) = 2/8
    P(~A ^ B | C ) = 4/8         P(~A ^ B | ~C ) = 1/8
    P( A ^~B | C ) =   0         P( A ^~B | ~C ) = 5/8
    P(~A ^~B | C ) = 3/8         P(~A ^~B | ~C ) =   0

    P(C) = 1/2    P(~C) = 1/2

5c: 
    P(C) = 1/2    P(~C) = 1/2

    P(A|C) = 1/8   P(A|~C) = 7/8 
    P(B|C) = 5/8   P(B|~C) = 3/8   [ P(~A|..) and P(~B|..) defined implicitly]

5d: C=1

5e: C=1

6b: w = (0,2) b = -5

6c: It's possible


   - - - - - - -                  - + + + + 
   x=-1               x=0              x=+1

                |------------------|

   For small C would choose this margin

7a:  3
7b:  30/7
7c:  18/4
7d:  18/4

7e: Yes. TRAINING set error is zero because each point is closest to itself

7f: Same answer

8a: 0
8b: 3/8
8c: 1/4
8d: 1/4
8e: 1/4
8f: 0

9a: No arcs

9b: A ---> C <---- B

9c: Full connection

10,11: Am trying to find the figure that goes with this question!!