I have been wanting to write something that detects things in real-time, on images streamed from a webcam. People who have watched the RoboCop, Terminator, or Iron Man movies, will probably remember the rectangles or circles around objects in video, with a description next to it. While having a rectangle around my head would be a good start, I’d like my computer to also recognize other things, such as a book, a key or any other object. But to do this, I needed a simple interface to my webcam, so that I can say with one function that I want to read in a frame from the webcam.
Video4Linux
Video4Linux (v4l) is the application programming interface for webcams, TV tuners, and other devices. As it is supposed to support a broad range of devices, it has a lot of possibilities, most of which are not applicable to webcams, or at least to what I wanted to do with my webcam.
There is an example code in the v4l documentation that lets you grab a frame from the webcam. Based on this, I made my small interface library that does the following things:
- Open and close a webcam
- Set the frame resolution
- Start/stop streaming
- Grab a frame
For these, the following functions have been defined in the header file:
webcam_t *webcam_open(const char *dev); void webcam_close(webcam_t *w); void webcam_resize(webcam_t *w, uint16_t width, uint16_t height); void webcam_stream(webcam_t *w, bool flag); buffer_t webcam_grab(webcam_t *w); |
The structures used are:
typedef struct buffer { uint8_t *start; size_t length; } buffer_t; typedef struct webcam { char *name; int fd; buffer_t *buffers; uint8_t nbuffers; buffer_t frame; pthread_t thread; pthread_mutex_t mtx_frame; uint16_t width; uint16_t height; uint8_t colorspace; char formats[16][5]; bool streaming; } webcam_t; |
The buffer
structure is used to store the pixel information, and the webcam
structure is for the webcam instance. Each webcam instance has its own thread
for capturing the image and storing it in its buffers. This thread also converts the current buffer into a RGB frame and stores it into the frame
buffer.
YUV format and converting to RGB
The code I have written has at the moment only been tested with my webcam, so it probably needs to be extended to support the operations needed for other types of webcams. At the moment, the buffers are memory-mapped, and the format of the captured frames is YUV422, in the YUYV order. The YUYV order means that the byte values are stored as follows in the buffer:
0x00 Y U Y V Y U Y V Y U Y V Y U Y V 0x10 Y U Y V Y U Y V Y U Y V Y U Y V 0x20 Y U Y V Y U Y V Y U Y V Y U Y V ... etc
The first pixel is using the Y value at position 0x00
and the U value at position 0x01
. The V value is set to 0x80
(128) for this first pixel. The second pixel, though, uses the Y value at position 0x02
, the U value of the previous pixel at position 0x01
and the V value at position 0x03
. Then the third pixel uses the Y value at position 0x04
, the U value at position 0x05
, and the V value from the previous pixel, at position 0x03
. So, let’s say we have Y0..n for n
pixels. Then we have U0, U2, Ui for all even i ∈ [0..n]
. We also have V1, V3, Vj for all odd j ∈ <0, n]
. So for every pixel i
, its YUV values are Yi, Ui, Vi-1 when i
is even, and Yi, Ui-1, Vi when i
is odd.
I tried displaying the captured YUYV frames using ImageMagick’s display
, but the images looked very reddish. According to the v4l2 documentation, you need to convert this colorspace into RGB using a specific function. This function can be seen in the code under function convertToRGB
(keep in mind the YUYV byte order as explained in the previous paragraph):
static void convertToRGB(struct buffer buf, struct buffer *frame) { size_t i; uint8_t y, u, v; int uOffset = 0; int vOffset = 0; double R, G, B; double Y, Pb, Pr; // Initialize frame if (frame->start == NULL) { frame->length = buf.length / 2 * 3; frame->start = calloc(frame->length, sizeof(char)); } // Go through the YUYV buffer and calculate RGB pixels for (i = 0; i < buf.length; i += 2) { uOffset = (i % 4 == 0) ? 1 : -1; vOffset = (i % 4 == 2) ? 1 : -1; y = buf.start[i]; u = (i + uOffset > 0 && i + uOffset < buf.length) ? buf.start[i + uOffset] : 0x80; v = (i + vOffset > 0 && i + vOffset < buf.length) ? buf.start[i + vOffset] : 0x80; Y = (255.0 / 219.0) * (y - 0x10); Pb = (255.0 / 224.0) * (u - 0x80); Pr = (255.0 / 224.0) * (v - 0x80); R = 1.0 * Y + 0.000 * Pb + 1.402 * Pr; G = 1.0 * Y + 0.344 * Pb - 0.714 * Pr; B = 1.0 * Y + 1.772 * Pb + 0.000 * Pr; frame->start[i / 2 * 3 ] = clamp(R); frame->start[i / 2 * 3 + 1] = clamp(G); frame->start[i / 2 * 3 + 2] = clamp(B); } } |
The resulting frame can be viewed with ImageMagick’s display
as follows:
# display -size x -depth 8 -colorspace rgb file.rgb
with w
and h
being the size of the frame as instructed with webcam_resize
, and file.rgb
the file where you wrote the frame
buffer to.
The thread, and the mutex
The little library uses a POSIX thread to store the image in the memory-mapped buffer, and convert it into RGB and store it in the frame
buffer, while the webcam is in streaming mode. To make sure the thread is not writing into the frame
buffer as the main thread is reading from it during webcam_grab
, we need a mutual exclusion mechanism, also called “mutex”. The thread checks the boolean streaming
and calls the webcam_read
function if it is true. The webcam_stream
function sets it to true, and creates the thread:
w->streaming = true; pthread_create(&w->thread, NULL, webcam_streaming, (void *)w); |
This thread runs the following function:
static void *webcam_streaming(void *ptr) { webcam_t *w = (webcam_t *)ptr; while(w->streaming) webcam_read(w); } |
The webcam_read
function locks the mutex here, before writing the contents into the frame
buffer:
pthread_mutex_lock(&w->mtx_frame); convertToRGB(w->buffers[buf.index], &w->frame); pthread_mutex_unlock(&w->mtx_frame); |
And the webcam_grab
does the same here, before getting the contents:
buffer_t ret; pthread_mutex_lock(&w->mtx_frame); ret.length = w->frame.length; ret.start = calloc(ret.length, sizeof(uint8_t)); memcpy(ret.start, w->frame.start, w->frame.length); pthread_mutex_unlock(&w->mtx_frame); |
Finally, in the webcam_stream
, when the false
flag is passed, it sets the streaming
to false, and waits for the webcam thread to finish:
w->streaming = false; pthread_join(w->thread, NULL); |
Now what?
I am not sure what to extend this little piece of code [ latest ] with, so suggestions are welcome. Next step would be to use the SDL application framework [ github ], and make something awesome, that recognizes things on the images. Real-time image processing, here I come!
Motion-capture for VR and Animated content would be a pretty good expansion on this project…
Identifying LEDs worn by the player/animator… (Active marker tech)
If we can discuss some Protocols/ideas on this (preferably by email or Discord)
USB 3.0 Webcams are getting good enough to do high quality motion-capture at home on the cheap… All we need is a software that is high-performance enough and runs well on Linux at this point…
(Open source helps to) (OpenCV is to complex for most users)
I have programmed in C enough to get an OpenGL 3.3 game-engine working with the GLFW 3 libraries and have the box move around in a 2d tile world with collision geometry..
So most of your code makes sense….
I believe with some creative use of Tri-color LEDs and primary and secondary color combinations for 0s and 1s we can reduce the Synchronization issues between the LED and webcam..