15-494/694 Cognitive Robotics Lab 8:
Cozmo and the GPU

Note: You can do this lab/homework assignment either individually, or in teams of two.

At the beginning of every lab you should update your copy of the cozmo-tools package. Do this:

$ cd ~/cozmo-tools
$ git pull

Make a lab8 directory.
Download the files mnist3.py and load_mnist3.py into your lab8 directory.
Skim the mnist3.py source code; it's slightly different than the version we used previously. With this version, you must call train() to train the network. At the end of training, it saves the weights in a file called mnist3-saved.pt.
Run the model by typing "python3 -i mnist3.py". The "-i" switch tells python not to exit after running the program. After 15 epochs of training it will save the weights.
If you want to see some of the trained kernels, you can type display() after training finishes.
Skim the code for load_mnist3.py. Note that this code has changed slightly since the lab on Friday; be sure to grab the latest copy.
Run load_mnist3.py and observe that it loads the saved weights, and the reconstituted network classifies the training instance correctly.

Open cozmo_fsm/program.py in a text editor and search for "user_image". This function is automatically called by the StateMachineProgram machinery on every camera image received from the robot. The first argument is the raw 3 channel RGB image; the second argument is a single channel grayscale image.
Copy and run the file Lab8.fsm. This program captures one image from Cozmo's camera and displays it using matplotlib.

By combining code from load_mnist3.py and Lab8.fsm you can write a Cozmo behavior that captures camera images and does digit recognition.
You will need to resize the camera image to 28x28 in order to fit the neural network's input requirements. See this web page for help on resizing an image using the cv2.resize() method from OpenCV.
Since the original image is 320x240, which is not square, you can't just resize it to 28x28 because that will introduce distortion.
Another issue is that all the data used to train the neural net was normalized: each digit was scaled to a uniform size and centered in the image. But if you're holding up a sheet of paper to Cozmo, the digit will vary in size based on distance, and may not be centered. Therefore, you will need to write some code to find the bounding box of the digit, allow for a bit of white space around it, and rescale the resulting region to 28x28 so it looks like the training data. You can assume that the input is well-formed, i.e., there is a single digit on a white background. But your code must work on real grayscale images from Cozmo's camera, so when finding the bounding box it cannot assume that the background pixels are perfectly white, or that there is no noise in the image.
Your code should take one camera image per second, normalize it, run it through the neural network, and display the classification result on the console.

Collect your fsm and py files into a zip file and hand it in via Autolab.

Last modified: Wed Mar 8 01:47:37 EST 2017