Lab Partner Names: ________________________________________________

15-494/694 Cognitive Robotics Lab 7:
PyTorch and Neural Networks

I. Software Update and Initial Setup

Note: You can do this lab/homework assignment either individually, or in teams of two.

At the beginning of every lab you should update your copy of the cozmo-tools package. Do this:

$ cd ~/cozmo-tools
$ git pull

II. Experiments with the MNIST Dataset and Linear Models

Make a lab7 directory.
Download the files mnist1.py, mnist2.py, mnist3.py into your lab7 directory.
Read the mnist1.py source code and skim the source code. This is a linear neural network with one layer of trainable weights.
Run the model by typing "python3 -i mnist1.py". The "-i" switch tells python not to exit after running the program. Press Enter to see each output unit's weight matrix, or type control-C and Enter to abort that part.
Try typing the following expressions to Python:
- model
- params = list(model.parameters())
- params
- [p.size() for p in params]
The first parameter is the 784x10 weight matrix; the second one is the 10 biases.
How long did each epoch of training take, on average? ________________
Modify the model to use the GPU instead of the CPU. (You just have to uncomment one line and comment out another.)
Run the model on the GPU. How long does each epoch take now? ________________
Are you surprised? GPUs don't help for small models. A few thousand weights is small.
Skim the code for the mnist2 model. This model has a hidden layer with 20 units. Each hidden unit is fully connected to the input and output layers.
Run the mnist2 model. How long does each epoch of training take, on average? ________________
You can use the show_hidden_weights() and show_output_weights() functions to display the learned weights.
Modify the mnist2 code to run on the GPU. How long does each epoch take now? ________________

III. Experiments with the MNIST Dataset and a Convolutional Model

Skim the code for the mnist3 model.
Run the model. You can ignore the "THCudaCheck FAIL" message. Look at some of the kernels the model learns.
What are the parameters of this model? ________________________________________________
How many weights does it have? ________________
How many parameters does this model have, where each parameter is a tensor? ________________
How many total weights are in the model? (Show your calculation.) ________________________________________________
This model runs on the GPU. How long did each epoch of training take, on average? ________________
If you modify the model to run on the CPU, how long does an epoch take now? (You don't need to run the model to completion.) ________________

IV. Homework Problem: Digit Recognition

In this problem you're going to have Cozmo recognize handwritten digits. You can assume that the digits are separated by whitespace; they do not overlap. They will be drawn with a fat magic marker on a white sheet of paper that fills the camera image so there is no background clutter.
By combining code from load_mnist3.py and Lab7.fsm you can write a Cozmo behavior that captures camera images and does digit recognition.
You will need to resize the camera image to 28x28 in order to fit the neural network's input requirements. See this web page for help on resizing an image using the cv2.resize() method from OpenCV.
Since the original image is 320x240, which is not square, you can't just resize it to 28x28 because that will introduce distortion.
Another issue is that all the data used to train the neural net was normalized: each digit was scaled to a uniform size and centered in the image. But if you're holding up a sheet of paper to Cozmo, the digit will vary in size based on distance, and may not be centered. Therefore, you will need to write some code to find the bounding box of the digit, allow for a bit of white space around it, and rescale the resulting region to 28x28 so it looks like the training data. You can assume that the input is well-formed, i.e., there is a single digit on a white background. But your code must work on real grayscale images from Cozmo's camera, so when finding the bounding box it cannot assume that the background pixels are perfectly white, or that there is no noise in the image.
Write code to take the current camera image, segment out the digits, rescale each digit to 28x28, run it through the mnist3 model you trained previously, and have Cozmo speak the digit.
You can assume that there is one row of digits written on the sheet, so to segment out the digits you could do something simple like apply thresholding and then scan columns of the image, or something fancier like using cv2.connectedComponents.

Hand In

Hand in a collection of files sufficient to run your demo without further training. That includes your fsm file, your model definition, and your saved weights. Note: do not hand in a model definition that tries to load the training set; that step wastes time and is unnecessary when running on real camera iamges.