From: Philipp Michel <pmichel@cs.cmu.edu>
Subject: Hints for assignment 2

Hi guys,


I'm going to try to clear up some confusion regarding the vision
assignment that people are having. Don't be put off by the length of the
email, I just wanted to include some pointers I could think of.

First off, the purpose of this assignment is to just make you a bit
familiar with how you access images from a camera and perform simple
manipulations on the real time video. The optical flow code in the
provided example is already pretty advanced. You're supposed to have
some fun hacking around with vision code.

For those of you not too comfy with programming, you could focus on
making a good, detailed writeup describing how you *would* solve the
problem. This should include some algorithmic thinking. Keep in mind
that the only thing you really have to work with is pixels. What steps
would it take to "extract" from a 2D pixel grid the information you need
to solve the task.

A neat thing to do would for example be to "segment out" your piece of
cardboard, figure out at what angle it lies in the image and calculate
how you would have to turn the camera to line it up with it. You could
then move the cam along a zig-zag mobot like paper course and the
program would tell you how to steer.

Another nice thing would just be to track objects of particular color.
Again, you could do color segmentation and then draw bounding boxes onto
the camera output around all the color blobs you've detected.

Here are some hints about how to do color segmentation (e.g. to detect a
brightly colored red / pink / blue / whatever piece of cardboard):

-In the sample program, frame1 is the full color image captured from the
camera and is of the opencv datatype IplImage*. Almost all OpenCV
operations that work with images operate on IplImages.  The
datastructure contains a header with maintainance info (size, number of
channels, color arrangement scheme, etc.) and a pointer to an array of
actual pixel values.

-To access pixels directly, you would do this:

For our 8-bit 3-channel image representing the currentI (IplImage*
frame1), you can access the three (RGB) channels of pixel (x,y) by:

int blue = ((uchar*)(frame1->imageData + frame1->widthStep*y))[x*3]
int green = ((uchar*)(frame1->imageData + frame1->widthStep*y))[x*3+1]
int red = ((uchar*)(frame1->imageData + frame1->widthStep*y))[x*3+2]

- The width and height of the image is given by:
frame1->width and frame1->height

-Here's some code that will, for every pixel in the image, check whether
it falls within some color threshold:


  unsigned char *p_rgb, *row_rgb;
  unsigned char ch_r, ch_g, ch_b;
  int x, y;

  row_rgb = (unsigned char*)frame1->imageData;
  for (y=0; y<frame1->height; y++) {
    p_rgb = row_rgb;
    for (x=0; x<frame1->width; x++) {

      // The red, green and blue values for pixel @ (x,y)
      ch_r = *(p_rgb++);
      ch_g = *(p_rgb++);
      ch_b = *(p_rgb++);

      if(ISPINK(ch_r, ch_g, ch_b)) {
        // This pixel is part of your cardboard. You should
	// write a function that checks whether each of the
	// values falls withing some color range that you've defined
	// for the particular piece of cardboard / paper that you have
	// You could also set a corresponding pixel in a new, binary
	// image to 1 and use that as your pink/not pink mask
      }

    }
    row_rgb += frame1->widthStep;
  }

- The camera might not spit back an RGB image, but rather a BGR image
(because of the way the hardware works). You can easily convert using
OpenCV as follows:

cvCvtColor(frame1, frame1, CV_BGR2RGB);


The point of all of this assignment is not to freak you out, but rather
to just let you have some fun hacking around with vision code. There is
no "right" solution. If there are any more specific questions, don't
hesitate to email.

Cheers,

-Phil