From: Philipp Michel Subject: Hints for assignment 2 Hi guys, I'm going to try to clear up some confusion regarding the vision assignment that people are having. Don't be put off by the length of the email, I just wanted to include some pointers I could think of. First off, the purpose of this assignment is to just make you a bit familiar with how you access images from a camera and perform simple manipulations on the real time video. The optical flow code in the provided example is already pretty advanced. You're supposed to have some fun hacking around with vision code. For those of you not too comfy with programming, you could focus on making a good, detailed writeup describing how you *would* solve the problem. This should include some algorithmic thinking. Keep in mind that the only thing you really have to work with is pixels. What steps would it take to "extract" from a 2D pixel grid the information you need to solve the task. A neat thing to do would for example be to "segment out" your piece of cardboard, figure out at what angle it lies in the image and calculate how you would have to turn the camera to line it up with it. You could then move the cam along a zig-zag mobot like paper course and the program would tell you how to steer. Another nice thing would just be to track objects of particular color. Again, you could do color segmentation and then draw bounding boxes onto the camera output around all the color blobs you've detected. Here are some hints about how to do color segmentation (e.g. to detect a brightly colored red / pink / blue / whatever piece of cardboard): -In the sample program, frame1 is the full color image captured from the camera and is of the opencv datatype IplImage*. Almost all OpenCV operations that work with images operate on IplImages. The datastructure contains a header with maintainance info (size, number of channels, color arrangement scheme, etc.) and a pointer to an array of actual pixel values. -To access pixels directly, you would do this: For our 8-bit 3-channel image representing the currentI (IplImage* frame1), you can access the three (RGB) channels of pixel (x,y) by: int blue = ((uchar*)(frame1->imageData + frame1->widthStep*y))[x*3] int green = ((uchar*)(frame1->imageData + frame1->widthStep*y))[x*3+1] int red = ((uchar*)(frame1->imageData + frame1->widthStep*y))[x*3+2] - The width and height of the image is given by: frame1->width and frame1->height -Here's some code that will, for every pixel in the image, check whether it falls within some color threshold: unsigned char *p_rgb, *row_rgb; unsigned char ch_r, ch_g, ch_b; int x, y; row_rgb = (unsigned char*)frame1->imageData; for (y=0; yheight; y++) { p_rgb = row_rgb; for (x=0; xwidth; x++) { // The red, green and blue values for pixel @ (x,y) ch_r = *(p_rgb++); ch_g = *(p_rgb++); ch_b = *(p_rgb++); if(ISPINK(ch_r, ch_g, ch_b)) { // This pixel is part of your cardboard. You should // write a function that checks whether each of the // values falls withing some color range that you've defined // for the particular piece of cardboard / paper that you have // You could also set a corresponding pixel in a new, binary // image to 1 and use that as your pink/not pink mask } } row_rgb += frame1->widthStep; } - The camera might not spit back an RGB image, but rather a BGR image (because of the way the hardware works). You can easily convert using OpenCV as follows: cvCvtColor(frame1, frame1, CV_BGR2RGB); The point of all of this assignment is not to freak you out, but rather to just let you have some fun hacking around with vision code. There is no "right" solution. If there are any more specific questions, don't hesitate to email. Cheers, -Phil