15-694 Cognitive Robotics Final Project

Partial Cube Detection

Instructor: Professor Dave Touretzky

1. Introduction

This page is about my final project from 15-694: Cognitive Robotics course at Carnegie Mellon University Computer Science Department that I took in Spring 2020 semester. The main goal of the project is to give the Cozmo robot an ability to detect the partial cube in his vision by using a deep learning, then turn to its direction so that Cozmo can see the full cube in its vision.

2. Robot and Software

At this project, I have used the Cozmo. Details of 'How does the Cozmo work?' can be found at here. Base code that operates the Cozmo for this project and this course can be found at here.

I use the OpenCV when I handle images

For the training and testing the partial cube detection, I used the PyTorch.

3. Data Collection

Even though the Professor Touretzky suggested collecting the image data by Cozmo's camera with his TakePictures.fsm code, I decided to collect the data by my smartphone (iPhone 7+) since placing Cozmo, checking his vision at computer, then taking picture requires much more labor and time than needed. However, data collected by Cozmo can be more exquisite, and more helpful for making better neural network. If time permits, I will try to collect the data by Cozmo, then compare the performance.

3.1 Gathering Training Images Data

All raw images, taken from the smart-phone, have 4032 pixels width, and 3024 pixels height. Thus, the aspect ratio of all raw images is 4 : 3.

These are some of my raw images with cubes.

These are some of my raw images with no cube.

3.2 Editing Raw Images to Cozmo's perspective

As I mentioned above, these images are taken by smart-phone camera instead of the Cozmo's camera. At testing, images directly taken from the Cozmo's camera will be an input image of the neural network. Thus, it is necessary to make our training images similar to image taken from the Cozmo.

This is how Cozmo sees the world with his camera. The width of the image is 320 pixels, and height of the image is 240 pixels. Thus, the aspect ratio is 4 : 3. Also, the image is in grayscale.

Fortunately, images from the Cozmo's camera has same aspect ratio as images from the smartphone. Thus, we only need to resize the image (divide both width and height with 12.6), then change RGB colors into grayscale.

3.3 Collaboration at image collection

My classmates, Evan, Kirubel, Carrie, Ting-Yu, Tyler, Krithika, and Hita, who works on the same topic but with different approaches, also collected image data. We merged the dataset as common dataset to benefit from each other. All common dataset can be found at here.

4. Deep Learning

4.1 Preprocessing

Before running the training, I preprocess whole training image data. I followed the Professor Touretzkey's suggestions, which is the cropping the right half of the image, turns into the grayscale, use random crop, then normalize the image.

4.2 Architecture

General structure of my network architecture.

Pytorch code to implement my architecture.

4.3 Result

After several experiments at Section 5, this network gave me the best result on both training data and test data

5. Deep Learning Experiments and Analysis

At this section, I changed some factors in the CNN, then compared the performance among several conditions. All experiments were run 3 times of training, then the final result is average of 3 experiments' result.

5.1 Optimizer: SGD vs Adam

At this section, I compared the performance of SGD optimizer and Adam optimizer in same condition. Above charts represents results.
Adam optimizer tend to show the better performance when other factors stay fixed.
Interesting part is that both SGD Optimizer and Adam optimizer shows the best performance when learning rate is 0.0005 and the number of epochs are 250.

5.2 Average Pooling vs Max Pooling

Max Pooling is used in the optimal network twice, I switched them to Average Pooling to check how that makes a difference in the perfermance. I ran 250 epochs, and used Adam optimizer.

As a result, max pooling shows better performance than average pooling in both training and test set. At partial cube detection problem, max pooling seems to work better than average pooling.

5.3 Sigmoid, ReLU, and number of Lineaer Layers

At this section, I compared the performance of the ReLU and Sigmoid activation functions.

Also, I experimented how performance changes as more linear layers are added at the end

As a result, ReLU tends to show better performance than Sigmoid.
As more linear layers are added with Sigmoid activation functions, the performance tends to get worse.
As more linear layers are added with ReLU activation functions, the performance tends to get better.

6. Demo with Cozmo

6.1 FlowChart

6.2 When Cozmo sees full cube with no partial cube

6.3 When Cozmo sees full cube with left partial cube

Image below shows Cozmo's camera image before running the program & after running the program

6.4 When Cozmo sees full cube with right partial cube

Image below shows Cozmo's camera image before running the program & after running the program

6.5 When Cozmo sees partial cube left

Image below shows Cozmo's camera image before running the program & after running the program

6.6 When Cozmo sees partial cube right

Image below shows Cozmo's camera image before running the program & after running the program

7. Acknowledgement

I want to express a huge thanks to Professor Dave Touretzky. With his wonderful course, I could learn various aspects of the robotics through Cozmo, and deep learning. His meticulous feedbacks and suggestions really helped me to improve the result of this final project.

Also, I want to say thank you very much to Computer Science Department at Carnegie Mellon University. After COVID-19 pandemic arises, their decision to distribute the new Cozmo product to each students in the class helped students to keep learn, and finally made me to complete this final project.

Last updated at May 2020