Name: ________________________________________________
15-494/694 Cognitive Robotics Lab 8: Convolutional Neural Networks
I. Software Update and Initial Setup
At the beginning of every lab you should update your copy of the
cozmo-tools package. Do this:
$ cd ~/cozmo-tools
$ git pull
II. Experiments with the MNIST Dataset and Linear Models
You can do this part in teams of two if you wish.
- Make a lab8 directory.
- Download the files
mnist_data.zip,
mnist1.py,
mnist2.py,
mnist3.py
into your lab8 directory.
- Unzip the mnist_data.zip file.
- Skim the mnist1.py source code. This is a linear neural
network with one layer of trainable weights.
- Have a look at
the PyTorch
documentation, and specifically the documentation
for torch.nn.Linear.
- Run the model by typing "python3 -i mnist1.py". The "-i" switch tells python not to exit
after running the program. Press Enter to see each output unit's weight matrix, or
type control-C and Enter to abort that part.
- Try typing the following expressions to Python:
- model
- params = list(model.parameters())
- params
- [p.size() for p in params]
The first parameter is the 784x10 weight matrix; the second one is the 10 biases.
- How long did each epoch of training take, on average? ________________
- Modify the model to use the CPU instead of the GPU. (You just have to uncomment
one line and comment out another.)
- Run the model on the CPU. How long does each epoch take now? ________________
Are you surprised? GPUs don't help for small models. A few thousand weights is small.
- Skim the code for the mnist2 model. This model has a hidden layer with 20 units.
Each hidden unit is fully connected to the input and output layers.
- Run the mnist2 model on the GPU (the default). How long does each epoch of
training take, on average? ________________
- You can use the show_hidden_weights() and show_output_weights() functions to display
the learned weights.
- Modify the mnist2 code to run on the CPU. How long does each epoch take now? ________________
III. Experiments with the MNIST Dataset and a Convolutional Model
You can do this part in teams of two if you wish.
- Skim the code for the mnist3 model.
- Run the model on the GPU, not the CPU. You can ignore the
"THCudaCheck FAIL" message. Look at some of the kernels the model
learns.
- How many parameters does this model have, where each parameter
is a tensor? ________________
- What are the parameters of this model? Describe them in English. ________________________________________________
________________________________________________________________
- Note that two of the parameters are batch normalization values
(means and variances) created by the BatchNorm2D layer. The rest
are weights. (Biases are considered to be weights.) Looking at
the sizes of the various weight and bias tensors, how many total
weights does this model have? Show your calculation. ____________________________________
A convolutional neural network is a "virtual" network where each
kernel is replicated many times, but we don't actually build out
all the units and connections as individual data structures, since
they share the same weights. When running data through the
network, though, we still have to do all the multiply and
accumulate operations as if we had built out the network, so the
number of "effective" weights is many times the number of weight
parameters. How many effective weights are in the mnist3 model?
Show your calculation.
________________________________________________
- This model runs on the GPU. How long did each epoch of
training take, on average? ________________
- If you modify the model to run on the CPU, how long does an
epoch take now? (You don't need to run the model to completion.)
________________
IV. Object Recognition with MobileNet
You can do this part in teams of two if you wish.
- Run the MobileNet demo
on Cozmo. Note: to install this demo you must download both MobileNet.fsm
and the labels.py file found in the same directory.
- Use your cellphone to call up a picture of a cat and show it to Cozmo.
- Type "tm" to tell the program to proceed with recognition. Did it recognize the cat?
- Try some dog breeds, and some other object classes such as airplanes or cars.
V. Homework Problem: Digit Recognition
This part must be done on your own, not as a team.
- In this problem you're going to have Cozmo recognize
handwritten digits. You can assume that the digits are separated by
whitespace; they do not overlap. They will be drawn with a fat magic
marker on a white sheet of paper that fills the camera image so there
is no background clutter.
- You can use the torch.save call in
in mnist3_train.py to save the trained
weights. Likewise, you can use the torch.load call in
mnist3_test.py to load the weights so you don't
have to retrain from scratch every time you test your program.
- Note that when testing rather than training a model, it is
important to do model.eval() to turn off the gradient computation
and switch the mode of the batch normalization layer to use training
set statistics instead of batch statistics. This is done in
mnist3_test.py and should be done in your code as well.
- By combining code
from mnist3_test.py
and Lab8.fsm you can write a Cozmo behavior
that captures camera images and does digit recognition.
- Start by assuming the paper contains a single digit. You will
need to resize the camera image to 28x28 in order to fit the
neural network's input requirements. See
this
web page for help on resizing an image using the cv2.resize()
method from OpenCV. Note that since the original image is 320x240,
which is not square, you can't just resize it to 28x28 because
that will introduce distortion.
- Another issue is that all the data used to train the neural net
was normalized: each digit was scaled to a uniform size and centered
in the image. But if you're holding up a sheet of paper to Cozmo,
the digit will vary in size based on distance, and may not be
centered. Therefore, you will need to write some code to find the
bounding box of the digit, allow for a bit of white space around it,
and rescale the resulting region to 28x28 so it looks like the
training data. You can assume that the input is well-formed, i.e.,
there is a single digit on a white background. But your code must
work on real grayscale images from Cozmo's camera, so when finding
the bounding box it cannot assume that the background pixels are
perfectly white, or that there is no noise in the image.
Furthermore, the network was trained on light digits against a dark
background; you will have to invert your camera images to match.
- Once you are recognizing a single digit successfully, modify your
code to take the current camera image, segment out the individual
digits, rescale each digit to 28x28, run it through the mnist3 model
you trained previously, and have Cozmo speak the digit.
- You can assume that there is one row of digits written on the
sheet, so to segment out the digits you could do something simple like
apply thresholding and then scan columns of the image, or something
fancier like using cv2.connectedComponents.
- UPDATED for 2024: Your program must make what it's doing
visible to the person who runs it. That means (1) displaying the
thresholded image, and (2) displaying each of the 28x28 extracted
digit images that it is feeding to the neural network.
Hand In
Hand in a collection of files sufficient to run your digit recognition
demo without further training. That includes your fsm file, your
model definition, and your saved weights. Note: do not hand in a
model definition that tries to load the training set; that step wastes
time and is unnecessary when running on real camera images.
|