06 Feb 2021 - tsp
Last update 06 Feb 2021
26 mins
The API used on most Unixoid operating systems (i.e. Linux, FreeBSD, etc.) is
Video 4 Linux. It basically consists of a specification for device naming (i.e.
the /dev/videoN
devices) as well as:
These are realized using the standard Unix read
/ write
and ioctl
APIs as usual. V4L does not only support webcams but also tuners, video capture,
satellite receivers, etc. - this page only focuses on cameras though most of the
operations being the same for other video capture devices.
The specification for V4L2 can be found online.
For webcams there are three different methods that can be used to read or stream frames from the camera:
read
syscall indicated by the
capability flag V4L2_CAP_READWRITE
. Using this API no metadata is
passed besides image information (i.e. no framecounters, timestamps, etc.) which
would be required when sychronizing with other frames or detecting frame dropping.
This is the most simple I/O method.mmap
. This mode is
supported whenever the V4L2_CAP_STREAMING
flag is set and the mmap
mode is supported by VIDIOC_REQBUFS
. This has been one of the most
efficient streaming modes and is usually widely supported. The application can
provide multiple buffers to allow seamless streaming.V4L2_CAP_STREAMING
is set and the mode of usermode pointers is supported by VIDIOC_REQBUFS
.
The main difference to mmap
is that the application allocated the buffers
itself and thus can for example be easily shared with different processes
or swapped out - the application just passes a pointer to the driver, the
driver then locks the buffer if required and reads data into the applications
memory space. Metadata is passed in an extra structure.Up to my knowledge USB webcams currently only support the mmap
mode for
USB webcams so this is what this blog post will look into first. Note the v4l2
specification does not specify any mandatory interface so for a truly portable
application it would be a good idea to support both streaming methods as well
as a method based on read
/write
.
All Video4Linux2 methods and data types are defined in a single header file
that’s usually contained in linux/videodev2.h
The first thing is obviously opening the device file. The naming is specified by the Video4Linux specification but it’s a good idea to allow overriding by the user anyways - as one usually has to support systems including multiple capture devices this is not a huge problem anyways.
The devices are usually named:
/dev/video0
to /dev/video63
for video capture devices. There might
also be a /dev/video
device for the default capture device though this
doesn’t always exist./dev/bttv0
as well as /dev/vbi0
to /dev/vbi31
/dev/radio0
up to /dev/radio63
and the optional
default device /dev/radio
/dev/vtx0
up to /dev/vtx31
and the optional
default device /dev/vtx
Before one opens the device it’s a good idea to check if the file exists and is really a device file:
enum cameraError deviceOpen(
int* lpDeviceOut,
char* deviceName
) {
struct stat st;
int hHandle;
if(lpDeviceOut == NULL) { return cameraE_InvalidParam; }
(*lpDeviceOut) = -1;
if(deviceName == NULL) { return cameraE_InvalidParam; }
/* Check if the device exists */
if (stat(deviceName, &st) == -1) {
return cameraE_UnknownDevice;
}
/* Check if it's a device file */
if (!S_ISCHR (st.st_mode)) {
return cameraE_UnknownDevice;
}
hHandle = open(deviceName, O_RDWR | O_NONBLOCK, 0);
if(hHandle < 0) {
switch(errno) {
case EACCES: return cameraE_PermissionDenied;
case EPERM: return cameraE_PermissionDenied;
default: return cameraE_Failed;
}
}
(*lpDeviceOut) = hHandle;
return cameraE_Ok;
}
Since we opened the device using open
we have to close the device in the
end using close
:
enum cameraError deviceClose(
int hHandle
) {
close(hHandle);
return cameraE_Ok;
}
The next step is to query capabilities of the opened device. This is first done
via the VIDIOC_QUERYCAP
ioctl
. This call fills a struct v4l2_capability
structure. This structure contains:
driver
)card
)bus_info
)version
)capabilities
)device_caps
)The most important field is the capabilities
field. This can be used together
with some interesting flags:
V4L2_CAP_VIDEO_CAPTURE
identifies a cpature device - which is what one’s looking
for when looking for an webcam.V4L2_CAP_READWRITE
is set if read and write syscalls are supported to
read and write dataV4L2_CAP_ASYNCIO
signals support for asynchronous I/O mechanisms. Since
this is usually not supported by V4L2 this is not of any interest usually.V4L2_CAP_STREAMING
is required to support streaming input and
output which includes userspace buffer pointers and memory mapping.V4L2_CAP_VIDEO_OUTPUT
and V4L2_CAP_VIDEO_OVERLAY
would identify
video output and overlay devices, V4L2_CAP_VBI_CAPTURE
and V4L2_CAP_VBI_OUTPUT
raw VBI devices. The same category are V4L2_CAP_SLICED_VBI_CAPTURE
and V4L2_CAP_SLICED_VBI_OUTPUT
V4L2_CAP_RDS_CAPTURE
devices allow one to capture RDS packets, V4L2_CAP_RDS_OUTPUT
is an RDS encoderV4L2_CAP_VIDEO_OUTPUT_OVERLAY
signals that the device supports video
output overlayV4L2_CAP_HW_FREQ_SEEK
supports hardware frequency seekingV4L2_CAP_VIDEO_CAPTURE_MPLANE
and V4L2_CAP_VIDEO_OUTPUT_MPLANE
signal
input and output support for multiplanar formats.V4L2_CAP_VIDEO_M2M_MPLANE
indicates multi planar format support on
memory to memory devices.V4L2_CAP_VIDEO_M2M
identifies a memory to memory device.V4L2_CAP_TUNER
for tuner support, V4L2_CAP_AUDIO
for audio
as well as V4L2_CAP_RADIO
for radio and V4L2_CAP_MODULATOR
for
modulator support.The first thing to check for when capturing from a webcam or video camera is,
that the device really supports V4L2_CAP_VIDEO_CAPTURE
and either
the V4L2_CAP_READWRITE
mode for single frame capture or V4L2_CAP_STREAMING
for mmap
or userptr
mode.
Since the ioctl
calls can be interrupted which is indicated by an EINTR
error code libraries usually supply an xioctl
method that retries the ioctl
until it either succeeds or fails:
static int xioctl(int fh, int request, void *arg) {
int r;
do {
r = ioctl(fh, request, arg);
} while ((r == -1) && (errno == EINTR));
return r;
}
To fetch the capability flags one simply uses this xioctl
method and
checks for the required flags:
struct v4l2_capability cap;
bool bReadWriteSupported = false;
bool bStreamingSupported = false;
if(xioctl(hHandle, VIDIOC_QUERYCAP, &cap) == -1) {
return cameraE_Failed; /* Failed to fetch capabilities */
}
if((cap.capabilities & V4L2_CAP_VIDEO_CAPTURE) == 0) {
return cameraE_InvalidParam; /* We are not a capture device */
}
if((cap.capabilities & V4L2_CAP_READWRITE) != 0) { bReadWriteSupported = true; }
if((cap.capabilities & V4L2_CAP_STREAMING) != 0) { bStreamingSupported = true; }
The next step is to query cropping capabilities and pixel aspects. This is done using
the VIDIOC_CROPCAP
call. This call requires a pointer to a to be filled struct v4l2_cropcap
that’s initialized to the requested stream type. Since the task of this blog post
is to describe video capture the buffer type will be V4L2_BUF_TYPE_VIDEO_CAPTURE
.
Now one can simply call the driver:
struct v4l2_cropcap cropcap;
memset(&cropcap, 0, sizeof(cropcap));
cropcap.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
if(xioctl(hHandle, VIDIOC_CROPCAP, &cropcap) == -1) {
return cameraE_Failed; /* failed to fetch crop capabilities */
/*
Note that some applications simply ignore this error
and simply don't set any cropping rectangle later on
since there are drivers that don't support cropping.
*/
}
The v4l2_cropcap
structure contains three interesting members:
bounds
is an struct v4l2_rect
that specifies the boundary of the
window in which cropping is possible - this is the maximum possible window size.defrect
is the default cropping rectangle that whould cover the whole
image. For an pixel aspect ratio of 1:1 this would be for example 640 × 480 for NTSC
images.pixelaspect
which is an struct v4l2_fract
.
This specifies the aspect ratio (y/x) when no scaling is applied. This is the ratio
required to get square pixels.Each rect
contains left
, top
, width
and height
After querying one can initialize cropping - for example to the default cropping
rectangle that should usually cover the whole image. This is done using
the VIDIOC_S_CROP
call supplying an struct v4l2_crop
. Usually this
should not be required but since there are drivers that do not initialize using
the default cropping rectangle it’s a good idea anyways. The structure basically
only contains a cropping rectangle c
.
struct v4l2_crop crop;
/*
Note that this should only be done if VIDIOC_CROPCAP was successful
*/
crop.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
crop.c = cropcap.defrect;
if(xioctl(hHandle, VIDIOC_S_CROP, &crop) == -1) {
/* Failed. Maybe only not supported (EINVAL) */
}
To be able to negotiate a format one should usually query the formats supported by each device to locate one supported by the application. The code sample accompanying this blog post does not perform this negotiation but simply assumes an webcam to support the YUYV color model and at least 640x480 resolution to make the code easier to read. But I’ll cover the format negotiation here - it’s rather simple.
The first thing one has to know is that there are two major basic representations for colors used:
The main advantage of luma and chroma based models is that one immediately has an grayscale image available when just looking at the luma channel. This is also how this encoding schemes emerges historically - YUV models have just added two subcarrier encoded chroma channels to transmit color information in addition to backwards compatible grayscale images for TV usage.
RGB models on the other side are usually easier to use on modern input and output devices.
All color models basically support the same information but dependent on their encoding support different resolution and scales. Nearly all models allow one to add an optional alpha channel that covers transparency. Since we’re interested in video capture alpha channels usually don’t play a role.
The most major difference for all color models is the way they encode the data. Again there are two major encoding methods:
Depending on the chosen format the information for each channel may be of the same amount or there may be different amount of information for each pixel. For the mostly used YUYV format (that’s also selected by the example and is often calles YUV422) there are for each two pixels two luminance informations but only one U and one V coordinate for both. The idea is that the human eye is more sensitive to luminance changes than chroma changes so one has to encode way less chromatic information. These four values then occupy - for YUV422 - three bytes in a specific pattern that has to be decoded.
There is a huge number of supported formats - the usualy way to handle this inside media processing libraries is to decide on one or two internally supported formats and decode as well as re-encode on the application boundaries. For example I personally usually decide to support:
For more specialized algorithms I personally also use:
To determine which format an capture devices supports one can use the VIDIOC_ENUM_FMT
function call. This is built around the struct v4l2_fmtdesc
structure:
struct v4l2_fmtdesc {
__u32 index;
enum v4l2_buf_type type;
__u32 flags;
__u8 description[32];
__u32 pixelformat;
__u32 reserved[4];
};
The basic idea is that the application just fills the index
and type
fields, calls the VIDIOC_ENUM_FMT
function and the driver fills the fields
with available information. To query information about our capture device
one will iterate the index
value from 0 and count upwards till the
driver fails with an error code of EINVAL
. The type
has to be set
to V4L2_BUF_TYPE_VIDEO_CAPTURE
:
for(int idx = 0;; idx = idx + 1)
struct v4l2_fmtdesc fmt;
fmt.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
fmt.index = idx;
if(xioctl(hHandle, VIDIOC_ENUM_FMT, &fmt) < 0) {
/* Failed, usually one should check the error code ... */
break;
}
/* We got some format information. For demo purposes just display it */
printf("Detected format %08x (is compressed: %s): %s\n", fmt.pixelformat, ((fmt.flags & V4L2_FMT_FLAG_COMPRESSED) != 0) ? "yes" : "no", fmt.description);
}
The next step is setting the desired format. There are three calls involved with setting, trying or getting the format:
VIDIOC_G_FMT
queries the current formatVIDIOC_S_FMT
sets the format (might change the width an height though)VIDIOC_TRY_FMT
passes a format to the driver like S_FMT
but does not
change driver state. It fails if the format is not supported and might change width/height
as S_FMT
. Note that drivers are not required to implement this call so it
might also fail every time.Setting the format requires usually negotiation of the format but most webcams
support YUYV
color space and interlaced pixel layout. This can be set
in a struct v4l2_format
:
struct v4l2_format fmt;
unsigned int width, height;
/*
Select 640 x 480 resolution (you should use dimensions
as previously set while setting cropping parameters),
YUYV color format and interlaced order
*/
fmt.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
fmt.fmt.pix.width = 640;
fmt.fmt.pix.height = 480;
fmt.fmt.pix.pixelformat = V4L2_PIX_FMT_YUYV;
fmt.fmt.pix.field = V4L2_FIELD_INTERLACED;
if(xioctl(hHandle, VIDIOC_S_FMT, &fmt) == -1) {
/* Failed to set format ... */
}
/* Now one should query the real size ... */
width = fmt.fmt.pix.width;
height = fmt.fmt.pix.height;
In some code like v4l2grab
there is some additional handling of buggy
drivers. Since webcams are usually cheap products and there are some buggy
drivers so on Linux they check if the fmt.fmt.pix.bytesperline
is at least
two times the fmt.fmt.pix.width
and that fmt.fmt.pix.sizeimage
is at least 2 * fmt.fmt.pix.width * fmt.fmt.pix.height
.
The interface supported for most webcams is streaming I/O using memory mapped
buffers. This has been the most efficient streaming method for a long time - allowing
an application to virtually map device memory areas (for example memory contained
on an PCI capture card) directly into application memory. Later on a second method
using userptr
has been added that allows one also to exploit DMA transfer
into real main memory when using devices supporting busmastering. For cheap USB
webcams this usually doesn’t make a difference though and userptr
streaming I/O
mode is usually not supported by most hardware anyways.
Note that there is no way for a driver to indicate which type of streaming methods they support except for one to request allocation of buffers.
The basic idea is:
mmap
method have
to be allocated using the V4L2_MEMORY_MMAP
memory type using
the VIDIOC_REQBUFS
ioctl. Note that though the buffer descriptors
seem to contain real memory offsets these are just some kind of magic cookie
that is used by the driver to recognize the allocated buffers (for example
these might be real adresses or just )VIVIOC_QBUF
. Whenever a buffer has been written
successfully it has to be dequeued using VIDIOC_DQBUF
.VIDIOC_QUERYBUF
select
, poll
or kqueue
event notification frameworks to determine readiness of new frames.VIDIOC_STREAMON
and VIDIOC_STREAMOFF
There is a common structure used by the queue and dequeue operations that’s
called struct v4l2_buffer
. This structure contains:
index
. This is a linear index into a sequence of allocated buffers - used only
with memory mapped buffers.type
which identifies either input (V4L2_BUF_TYPE_VIDEO_CAPTURE
) or
output (V4L2_BUF_TYPE_VIDEO_OUTPUT
) buffers.length
). The size of the allocated buffer has to
be able to contain a full frame of the requested data. After dequeueing
a capture buffer the driver also has set bytesused
which might be equal or
smaller than length
. For output buffers the bytesused
is set by
the application to indicate real used data sizefield
timestamp
might be set to indicate when the buffer had been captured. For
output the timestamp can specify at which point in time the buffer should
be transmitted by the output device.timecode
is another method to determine the position inside the data
stream.sequence
allows tracking of lost frames. It’s a monotonically increasing
sequence number.memory
indicates the type of the buffer (memory mapped or userptr)userptr
or offset
contained in the same union provide a way
for the driver to identify either the offset inside the applications user mode
memory range or provides a cookie to pass to mmap.input
would allow switching between multiple supported data sources on
the same device.flags
can be a combination of:
V4L2_BUF_FLAG_MAPPED
indicates that a buffer is mapped into the application
address space.V4L2_BUF_FLAG_QUEUED
indicates a buffer is currently enqueued for the
device driver to be used. The application should not modify the buffer. The
buffer is said to be in the driver incoming queue.V4L2_BUF_FLAG_DONE
indicates that a buffer is already processed by
the driver and is waiting to be dequeued by the application.V4L2_BUF_FLAG_KEYFRAME
signals that a buffer contains a keyframe - which is
interesting when resynchronizing within compressed streams.V4L2_BUF_FLAG_PFRAME
predicted frame (compressed streams only)V4L2_BUF_FLAG_BFRAME
difference frame (compressed streams only)V4L2_BUF_FLAG_TIMECODE
is set whenever the timecode
field is valid.V4L2_BUF_FLAG_INPUT
is only set it the input
field is valid.As shown in the outline above the first step is to request buffers from the device driver. One can request multiple buffers - the driver itself determines the lower (!) and upper bound onto the number of buffers that have to be requested. It’s a good idea to support a variable number in case the driver requests on to use more or less buffers.
To request buffers one can use the VIDIOC_REQBUFS
ioctl that resembles
the function call int (*vidioc_reqbufs) (struct file *file, void *private_data, struct v4l2_requestbuffers *req);
The struct v4l2_requestbuffers
structure contains:
count
. This is an input and output field
that might be increased or decreased arbitrarily by the driver. Note that setting
count to 0
has the special meaning of releasing all buffers.type
(V4L2_BUF_TYPE_VIDEO_CAPTURE
or V4L2_BUF_TYPE_VIDEO_OUTPUT
)
of the buffermemory
specifier. This identifies if the memory are is mapped
into userspace. In this case the V4L2_MEMORY_MMAP
constant is used.
If one would use userptr
like DMA transfers one would set the
constant to V4L2_MEMORY_USERPTR
.If the driver does not support mmap
(or if it has been requested the userptr
mode) it will return EINVAL
. This is the only way to determine the supported
streaming data transfer mode.
struct v4l2_requestbuffers rqBuffers;
/*
Request 1 buffer (simple but not seamless, usually use 3+) ...
*/
rqBuffers.count = bufferCount;
rqBuffers.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
rqBuffers.memory = V4L2_MEMORY_MMAP;
if(xioctl(hHandle, VIDIOC_REQBUFS, &rqBuffers) == -1) {
printf("%s:%u Requesting buffers failed!\n", __FILE__, __LINE__);
deviceClose(cameraE_Ok);
return 2;
}
bufferCount = rqBuffers.count;
After the buffers have been requested they have to be mapped into memory. To do
so one has to VIDIOC_QUERYBUF
each buffer to determine the parameters that
will be passed to mmap
in the same way as mapping from a memory mapped file.
On entry into QUERYBUF
one just has to pass type
and index
.
struct imageBuffer* lpBuffers;
{
lpBuffers = calloc(bufferCount, sizeof(struct imageBuffer));
if(lpBuffers == NULL) {
printf("%s:%u Out of memory\n", __FILE__, __LINE__);
deviceClose(hHandle);
return 2;
}
int iBuf;
for(iBuf = 0; iBuf < bufferCount; iBuf = iBuf + 1) {
struct v4l2_buffer vBuffer;
memset(&vBuffer, 0, sizeof(struct v4l2_buffer));
/*
Query a buffer identifying magic cookie from the driver
*/
vBuffer.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
vBuffer.memory = V4L2_MEMORY_MMAP;
vBuffer.index = iBuf;
if(xioctl(hHandle, VIDIOC_QUERYBUF, &vBuffer) == -1) {
printf("%s:%u Failed to query buffer %d\n", __FILE__, __LINE__, iBuf);
deviceClose(hHandle);
return 2;
}
/*
Use the mmap syscall to map the drivers buffer into our
address space at an arbitrary location.
*/
lpBuffers[iBuf].lpBase = mmap(NULL, vBuffer.length, PROT_READ|PROT_WRITE, MAP_SHARED, hHandle, vBuffer.m.offset);
lpBuffers[iBuf].sLen = vBuffer.length;
if(lpBuffers[iBuf].lpBase == MAP_FAILED) {
printf("%s:%u Failed to map buffer %d\n", __FILE__, __LINE__, iBuf);
deviceClose(hHandle);
return 2;
}
}
}
Then one has to enqueue all buffers that one wants to provide to the driver (typically
all of them before starting the processing loop) by using the VIDIOC_QBUF
function. One just has to supply type
and index
when using memory
mapped buffers.
{
int iBuf;
for(iBuf = 0; iBuf < bufferCount; iBuf = iBuf + 1) {
struct v4l2_buffer buf;
memset(&buf, 0, sizeof(struct v4l2_buffer));
buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
buf.memory = V4L2_MEMORY_MMAP;
buf.index = iBuf;
if(xioctl(hHandle, VIDIOC_QBUF, &buf) == -1) {
printf("%s:%u Queueing buffer %d failed ...\n", __FILE__, __LINE__, iBuf);
deviceClose(hHandle);
return 2;
}
}
}
Whenever the device is ready the processing loop will use VIDIOC_DQBUF
to
pop the oldest filled buffer from the output queue. This is a blocking call - that
can also be realized using standard select
, epoll
or kqueue
asynchronous processing functions in case O_NONBLOCK
had been set during
the open
. Usually one wants to re-enqueue the buffer after having
finished processing or having copied the data for further processing.
int iFrames = 0;
while(iFrames < numFrames) {
struct kevent kev;
struct v4l2_buffer buf;
int r = kevent(kq, NULL, 0, &kev, 1, NULL);
if(r < 0) {
printf("%s:%u kevent failed\n", __FILE__, __LINE__);
deviceClose(hHandle);
return 2;
}
if(r > 0) {
/* We got our frame or EOF ... try to dqueue */
memset(&buf, 0, sizeof(struct v4l2_buffer));
buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
buf.memory = V4L2_MEMORY_MMAP;
if(xioctl(hHandle, VIDIOC_DQBUF, &buf) == -1) {
if(errno == EAGAIN) { continue; }
printf("%s:%u DQBUF failed\n", __FILE__, __LINE__);
deviceClose(hHandle);
return 2;
}
printf("%s:%u Dequeued buffer %d\n", __FILE__, __LINE__, buf.index);
/* ToDo: Process image ... */
/* Re-enqueue */
if(xioctl(hHandle, VIDIOC_QBUF, &buf) == -1) {
printf("%s:%u Queueing buffer %d failed ...\n", __FILE__, __LINE__, buf.index);
deviceClose(hHandle);
return 2;
}
iFrames = iFrames + 1;
}
}
The last two important functions start and stop the stream processing. These
are VIDIOC_STREAMON
and VIDIOC_STREAMOFF
. Of course one should
start streaming before running the event processing loop.
{
/* Enable streaming */
enum v4l2_buf_type type;
type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
if(xioctl(hHandle, VIDIOC_STREAMON, &type) == -1) {
printf("%s:%u Stream on failed\n", __FILE__, __LINE__);
deviceClose(hHandle);
return 2;
}
}
{
/* Disable streaming */
enum v4l2_buf_type type;
type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
if(xioctl(hHandle, VIDIOC_STREAMOFF, &type) == -1) {
printf("%s:%u Stream off failed\n", __FILE__, __LINE__);
deviceClose(hHandle);
return 2;
}
}
The usage of the read/write interface will be added in near future (hopefully). Note that it’s usually not supported by webcams on FreeBSD anyways.
The process of writing an raw image into a JPEG file has been discussed
in a previous blog post. The major remaining
task is to convert the read image into the format accepted by libjpeg
. In
my application I had to convert the YUV422 format into RGB888. In YUV422 there
are always two luminance values as well as a single set of chroma values per
sample - two pixels share the chroma values but have different luminance values.
This article is tagged: Programming, ANSI C, Tutorial
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/