Simple webcam access in C

I have been wanting to write something that detects things in real-time, on images streamed from a webcam. People who have watched the RoboCop, Terminator, or Iron Man movies, will probably remember the rectangles or circles around objects in video, with a description next to it. While having a rectangle around my head would be a good start, I’d like my computer to also recognize other things, such as a book, a key or any other object. But to do this, I needed a simple interface to my webcam, so that I can say with one function that I want to read in a frame from the webcam.

Video4Linux

Video4Linux (v4l) is the application programming interface for webcams, TV tuners, and other devices. As it is supposed to support a broad range of devices, it has a lot of possibilities, most of which are not applicable to webcams, or at least to what I wanted to do with my webcam.

There is an example code in the v4l documentation that lets you grab a frame from the webcam. Based on this, I made my small interface library that does the following things:

  • Open and close a webcam
  • Set the frame resolution
  • Start/stop streaming
  • Grab a frame

For these, the following functions have been defined in the header file:

webcam_t *webcam_open(const char *dev);
void webcam_close(webcam_t *w);
void webcam_resize(webcam_t *w, uint16_t width, uint16_t height);
void webcam_stream(webcam_t *w, bool flag);
buffer_t webcam_grab(webcam_t *w);

The structures used are:

typedef struct buffer {
    uint8_t *start;
    size_t  length;
} buffer_t;
 
typedef struct webcam {
    char            *name;
    int             fd;
    buffer_t        *buffers;
    uint8_t         nbuffers;
 
    buffer_t        frame;
    pthread_t       thread;
    pthread_mutex_t mtx_frame;
 
    uint16_t        width;
    uint16_t        height;
    uint8_t         colorspace;
 
    char            formats[16][5];
    bool            streaming;
} webcam_t;

The buffer structure is used to store the pixel information, and the webcam structure is for the webcam instance. Each webcam instance has its own thread for capturing the image and storing it in its buffers. This thread also converts the current buffer into a RGB frame and stores it into the frame buffer.

YUV format and converting to RGB

The code I have written has at the moment only been tested with my webcam, so it probably needs to be extended to support the operations needed for other types of webcams. At the moment, the buffers are memory-mapped, and the format of the captured frames is YUV422, in the YUYV order. The YUYV order means that the byte values are stored as follows in the buffer:

0x00 Y U Y V Y U Y V Y U Y V Y U Y V
0x10 Y U Y V Y U Y V Y U Y V Y U Y V
0x20 Y U Y V Y U Y V Y U Y V Y U Y V
... etc

The first pixel is using the Y value at position 0x00 and the U value at position 0x01. The V value is set to 0x80 (128) for this first pixel. The second pixel, though, uses the Y value at position 0x02, the U value of the previous pixel at position 0x01 and the V value at position 0x03. Then the third pixel uses the Y value at position 0x04, the U value at position 0x05, and the V value from the previous pixel, at position 0x03. So, let’s say we have Y0..n for n pixels. Then we have U0, U2, Ui for all even i ∈ [0..n]. We also have V1, V3, Vj for all odd j ∈ <0, n]. So for every pixel i, its YUV values are Yi, Ui, Vi-1 when i is even, and Yi, Ui-1, Vi when i is odd.

I tried displaying the captured YUYV frames using ImageMagick’s display, but the images looked very reddish. According to the v4l2 documentation, you need to convert this colorspace into RGB using a specific function. This function can be seen in the code under function convertToRGB (keep in mind the YUYV byte order as explained in the previous paragraph):

static void convertToRGB(struct buffer buf, struct buffer *frame)
{
    size_t i;
    uint8_t y, u, v;
 
    int uOffset = 0;
    int vOffset = 0;
 
    double R, G, B;
    double Y, Pb, Pr;
 
    // Initialize frame
    if (frame->start == NULL) {
        frame->length = buf.length / 2 * 3;
        frame->start = calloc(frame->length, sizeof(char));
    }
 
    // Go through the YUYV buffer and calculate RGB pixels
    for (i = 0; i < buf.length; i += 2)
    {
        uOffset = (i % 4 == 0) ? 1 : -1;
        vOffset = (i % 4 == 2) ? 1 : -1;
 
        y = buf.start[i];
        u = (i + uOffset > 0 && i + uOffset < buf.length) ? buf.start[i + uOffset] : 0x80;
        v = (i + vOffset > 0 && i + vOffset < buf.length) ? buf.start[i + vOffset] : 0x80;
 
        Y =  (255.0 / 219.0) * (y - 0x10);
        Pb = (255.0 / 224.0) * (u - 0x80);
        Pr = (255.0 / 224.0) * (v - 0x80);
 
        R = 1.0 * Y + 0.000 * Pb + 1.402 * Pr;
        G = 1.0 * Y + 0.344 * Pb - 0.714 * Pr;
        B = 1.0 * Y + 1.772 * Pb + 0.000 * Pr;
 
        frame->start[i / 2 * 3    ] = clamp(R);
        frame->start[i / 2 * 3 + 1] = clamp(G);
        frame->start[i / 2 * 3 + 2] = clamp(B);
    }
}

The resulting frame can be viewed with ImageMagick’s display as follows:

# display -size x -depth 8 -colorspace rgb file.rgb

with w and h being the size of the frame as instructed with webcam_resize, and file.rgb the file where you wrote the frame buffer to.

The thread, and the mutex

The little library uses a POSIX thread to store the image in the memory-mapped buffer, and convert it into RGB and store it in the frame buffer, while the webcam is in streaming mode. To make sure the thread is not writing into the frame buffer as the main thread is reading from it during webcam_grab, we need a mutual exclusion mechanism, also called “mutex”. The thread checks the boolean streaming and calls the webcam_read function if it is true. The webcam_stream function sets it to true, and creates the thread:

w->streaming = true;
pthread_create(&w->thread, NULL, webcam_streaming, (void *)w);

This thread runs the following function:

static void *webcam_streaming(void *ptr)
{
    webcam_t *w = (webcam_t *)ptr;
    while(w->streaming) webcam_read(w);
}

The webcam_read function locks the mutex here, before writing the contents into the frame buffer:

pthread_mutex_lock(&w->mtx_frame);
convertToRGB(w->buffers[buf.index], &w->frame);
pthread_mutex_unlock(&w->mtx_frame);

And the webcam_grab does the same here, before getting the contents:

buffer_t ret;
 
pthread_mutex_lock(&w->mtx_frame);
ret.length = w->frame.length;
ret.start = calloc(ret.length, sizeof(uint8_t));
memcpy(ret.start, w->frame.start, w->frame.length);
pthread_mutex_unlock(&w->mtx_frame);

Finally, in the webcam_stream, when the false flag is passed, it sets the streaming to false, and waits for the webcam thread to finish:

w->streaming = false;
pthread_join(w->thread, NULL);

Now what?

I am not sure what to extend this little piece of code [ latest ] with, so suggestions are welcome. Next step would be to use the SDL application framework [ github ], and make something awesome, that recognizes things on the images. Real-time image processing, here I come!

What are your thoughts?