[libcamera-devel] Camera mode selection, "hints" and field of view

Tue Apr 20 11:41:46 CEST 2021

Hi everyone

I wanted to return to the subject of camera mode selection and mode
hints, and this time, make some actual proposals. This is going to get
rather long, so I apologise in advance, but I think the process of
working your way through the various issues is essential.

To help the discussion, here are some use cases that I have in mind:

#1 Choosing a mode with the largest bit-depth.

#2 Choosing a mode with a fast framerate.

#3 Imagine that a sensor has modes for full resolution, 1080p and
 720p, and that each mode has a progressively smaller field of view
 and a faster framerate. How would you get 720p output but with the
 same FoV as the 1080p output?

#4 Choosing a mode with a faster readout (and lower resolution) but
 full FoV. (For example, for switching from preview to capture.)

A Note on Camera Modes

I talk about "camera modes" a lot. Here, I really just mean the
resolution, bit-depth and field of view (depends on binning/scaling
and cropping), but there doesn't have to be a predetermined list of
them (even if, in reality, it may work like that).

How it works currently

The camera mode is chosen based on the output resolution requested,
usually that of the first or perhaps the largest of the streams
specified. Mostly, pipeline handlers choose the mode closest to this
resolution with a minimum of (or no) upscaling, though the behaviour
can vary here.

Bit-Depth

Bit-depth is easily determined from the format code. A "hint" to
choose formats with the highest possible bit-depth is easily encoded
as a single bit.

I think this covers my use-case #1.

Framerate

A "hint" to choose formats with the highest framerate is also easily
encoded with a single bit. These hints are probably best stored in the
CameraConfiguration rather than with any of the streams, as the
meaning of these hints can be mutually exclusive (a high bit-depth
will tend to go with a lower framerate). And they do, fundamentally,
apply to the sensor as a whole.

Again, this simple "hint" mechanism would seem to cover use-case #2.

It's not clear to me how to determine what modes have a high
framerate. Can V4L2 tell us this other than by setting the format
explicitly and querying the vblanking?

Other Hints

We've sometimes discussed other "hints", such as "I will/won't give
you raw stream buffers". It feels like these could reasonably be
incorporated into the same (for the sake of argument) bitfield.

Field of View

This is the trickiest one. To some degree, raw streams give us some of
this already. For example, if we request an output of 720p but set the
raw stream size to 1080p, then this should cover use-case #3. That is,
we'll get the raw stream that would be used for 1080p but it will be
resized down to 720p.

Actually there's still a little theoretical wrinkle here. What if the
720p mode does have the identical FoV to the 1080p mode already, but
by demanding the 1080p raw stream that's the one we'll get even though
we didn't need it?

Also, use-case #4 still evades us. To ensure a full Fov we'd have to
request the full resolution raw frames, but that then comes with
(probably) low framerates. Perhaps you could supply the "fast
framerate hint", and interpret this as meaning the raw stream
resolution is actually a FoV, not an actual pixel resolution, but
mixing these concepts together makes me nervous.

Maybe what we need is a FoV in the CameraConfiguration? The
requirement is then that any mode selected must have at least this
FoV. Perhaps you would give it in native pixel resolution.

This too, seems problematic. It's OK for use-case #4 - you'd fill in
the full resolution and not specify a raw stream resolution. But for
use-case #3, how would you know what FoV to ask for? It could be quite
awkward for applications to interrogate the sensor, discover the FoV
for the 1080p mode, and ask for that. Surely application code would
just want an easy way to say "give me a consistent FoV"?

All this has led me to the view that the CameraConfiguration should
somehow specify the FoV, but in a more convenient way. I'd like to
propose specifying a resolution (just a width and height), which we're
going to call the "FoV resolution". When it is specified, the "FoV
resolution" means "Give me at least the FoV I would have got if I had
asked for this output resolution". If the camera actually delivers more
this, I'd expect the digital crop to be set up to compensate. Our
use-cases now work as follows:

* For use-case #3, I set the "FoV resolution" to 1080p. This no longer
  forces the raw stream up to this size, but does ensure at least this
  FoV.

* For use-case #4, I simply put in the full sensor resolution. The
  pipeline handler will still be able to choose (for example) binned
  modes.

Pipeline handlers would be encouraged to support the notion of "FoV
resolution", though some may not. Note that it remains easy for
applications to use it or to ignore it. If (as now) an application
does not set any value, it would implicitly be taken to be the same as
the output resolution, yielding the same camera mode as before.

There are some further details and corner cases to think about, but
given that this is already so long, I'd best leave it there for now
and gauge the reaction. As always, your thoughts on this subject
please! (And well done on reaching the end!)

Thanks and best regards

David