This page is about my final project from 15-694: Cognitive Robotics course at Carnegie Mellon University Computer Science Department that I took in Spring 2020 semester. The main goal of the project is to give the Cozmo robot an ability to detect the partial cube in his vision by using a deep learning, then turn to its direction so that Cozmo can see the full cube in its vision.
At this project, I have used the Cozmo. Details of 'How does the Cozmo work?' can be found at here. Base code that operates the Cozmo for this project and this course can be found at here.
I use the OpenCV when I handle images
For the training and testing the partial cube detection, I used the PyTorch.
Even though the Professor Touretzky suggested collecting the image data by Cozmo's camera with his TakePictures.fsm code, I decided to collect the data by my smartphone (iPhone 7+) since placing Cozmo, checking his vision at computer, then taking picture requires much more labor and time than needed. However, data collected by Cozmo can be more exquisite, and more helpful for making better neural network. If time permits, I will try to collect the data by Cozmo, then compare the performance.
All raw images, taken from the smart-phone, have 4032 pixels width, and 3024 pixels height. Thus, the aspect ratio of all raw images is 4 : 3.
These are some of my raw images with cubes.
These are some of my raw images with no cube.
As I mentioned above, these images are taken by smart-phone camera instead of the Cozmo's camera. At testing, images directly taken from the Cozmo's camera will be an input image of the neural network. Thus, it is necessary to make our training images similar to image taken from the Cozmo.
This is how Cozmo sees the world with his camera. The width of the image is 320 pixels, and height of the image is 240 pixels. Thus, the aspect ratio is 4 : 3. Also, the image is in grayscale.
Fortunately, images from the Cozmo's camera has same aspect ratio as images from the smartphone. Thus, we only need to resize the image (divide both width and height with 12.6), then change RGB colors into grayscale.
My classmates, Evan, Kirubel, Carrie, Ting-Yu, Tyler, Krithika, and Hita, who works on the same topic but with different approaches, also collected image data. We merged the dataset as common dataset to benefit from each other. All common dataset can be found at here.
Before running the training, I preprocess whole training image data. I followed the Professor Touretzkey's suggestions, which is the cropping the right half of the image, turns into the grayscale, use random crop, then normalize the image.
General structure of my network architecture.
Pytorch code to implement my architecture.
After several experiments at Section 5, this network gave me the best result on both training data and test data
At this section, I changed some factors in the CNN, then compared the performance among several conditions. All experiments were run 3 times of training, then the final result is average of 3 experiments' result.
At this section, I compared the performance of SGD optimizer and Adam optimizer in same condition. Above charts represents results.
Adam optimizer tend to show the better performance when other factors stay fixed.
Interesting part is that both SGD Optimizer and Adam optimizer shows the best performance when learning rate is 0.0005 and the number of epochs are 250.
Max Pooling is used in the optimal network twice, I switched them to Average Pooling to check how that makes a difference in the perfermance. I ran 250 epochs, and used Adam optimizer.
As a result, max pooling shows better performance than average pooling in both training and test set. At partial cube detection problem, max pooling seems to work better than average pooling.
At this section, I compared the performance of the ReLU and Sigmoid activation functions.
Also, I experimented how performance changes as more linear layers are added at the end
As a result, ReLU tends to show better performance than Sigmoid.
As more linear layers are added with Sigmoid activation functions, the performance tends to get worse.
As more linear layers are added with ReLU activation functions, the performance tends to get better.
Image below shows Cozmo's camera image before running the program & after running the program
Image below shows Cozmo's camera image before running the program & after running the program
Image below shows Cozmo's camera image before running the program & after running the program
Image below shows Cozmo's camera image before running the program & after running the program
I want to express a huge thanks to Professor Dave Touretzky. With his wonderful course, I could learn various aspects of the robotics through Cozmo, and deep learning. His meticulous feedbacks and suggestions really helped me to improve the result of this final project.
Also, I want to say thank you very much to Computer Science Department at Carnegie Mellon University. After COVID-19 pandemic arises, their decision to distribute the new Cozmo product to each students in the class helped students to keep learn, and finally made me to complete this final project.