[libcamera-devel] Cropping (aka. digital zoom)

Thu Jun 25 05:08:53 CEST 2020

Hi David and Naush,

On Mon, Jun 22, 2020 at 03:18:19PM +0100, David Plowman wrote:
> On Fri, 19 Jun 2020 at 13:27, Naushir Patuck wrote:
> > On Fri, 19 Jun 2020 at 12:36, David Plowman wrote:
> > >
> > > Hi everyone
> > >
> > > Another thing I'd like to discuss is the ability to crop a region of
> > > interest from a camera image. This is another feature that will be
> > > available in many camera systems but which is not currently possible
> > > within libcamera.
> > >
> > > Just to recap, cropping specifies a window in the input image, and
> > > everything outside this window is discarded. However, the used portion
> > > of the image is still scaled to the same final output size, making
> > > this a natural mechanism for implementing digital zoom.
> > >
> > > I would therefore like to propose a new control consisting of four
> > > numbers:
> > >
> > > * The first two are the x and y offsets of the crop window within the
> > >   input image. Having these parameters allows an application to pan
> > >   around within the input image.
> > >
> > > * The next two would be the width and height of the crop window.
> > >
> > > I believe it's sensible for these numbers to be ratios, rather than
> > > (for example) pixels. This makes it much easier for an application to
> > > use. To specify that you want the middle quarter of an image ("2x
> > > digital zoom"), you'd pass the numbers 0.25, 0.25, 0.5 and 0.5. Note
> > > how setting width == height guarantees that you don't mess up your
> > > aspect ratio.

This is an interesting idea. We could also go one step further, and
specify the centre of the crop rectangle instead of the top-left corner.
This would allow zooming in the centre of the image by only changing
width and height, without recomputing x and y.

One issue with this scheme, though, is that rounding wouldn't be
controllable by the application. Imagine you want to pan slowly, one
pixel at a time, isn't there a risk of jumping over a pixel sometimes
due to rounding ? Using absolute coordinates would ensure full control
of the cropping and scaling, at the cost of more calculation on the
application side. Once again I feel that this may call for a helper to
translate a scaling ratio to a rectangle of absolute coordinates.

Such helpers have been discussed for quite some time, maybe it's time to
give them a try ? It could be as simple as creating a libcamera::helpers
namespace and throwing a few functions and classes there, I'm sure we'll
refactor that a few times before the API stabilizes anyway, so there's
no need to spend a month coming up with a perfect design.

> > One question I have, what reference would these ratios be based on?
> > Typically you would choose the dimensions of the input frame to the
> > ISP.  However, we currently do a "hidden" crop to fix aspect ratio
> > when, for example, we want to output 16:9 given a 4:3 input frame.  So
> > would the ratio reference this hidden crop?  If no, then the
> > application must have knowledge of the input frame size (it currently
> > does not), and must crop correctly to adjust aspect ratios.  If yes,
> > then we will be hiding a portion of the input frame that the
> > application will never be able to pan into, but maybe that's not a
> > problem?
> 
> I think it's applied after we've done that "hidden" crop for the aspect
> ratio. Anything else would seem bizarre and devious. You'd wind up
> being able to pan up and down just because you're cropping 16:9
> from a 4:3 sensor mode. It could be amusing, but I don't think the
> average user would be expecting it...!

The sensor size should be reported as a property, see "[PATCH v6]
libcamera: properties: Define pixel array properties" that Jacopo is
working on. I'm tempted to consider that the crop rectangle could be
relative to the active pixel array, with the pipeline handler adding
additional cropping to keep the aspect ratio constant. Other options are
possible, but they also depend on how we end up expressing the crop
rectangle. If we use absolute coordinates, then we need to report the
reference explicitly. If we don't use the active pixel array size as the
reference, another property needs to report the reference.

Regardless of what option we pick, the documentation will need to
specify this very clearly, to avoid any ambiguity.

> > > Questions
> > >
> > > 1. How to represent the numbers?
> > >
> > > I see an existing Rectangle class that gets used within controls, so
> > > maybe that makes sense. I'd perhaps go with fixed point values such
> > > that 65536 is equivalent to 1.0? (Though that doesn't leave huge scope
> > > for sub-pixel cropping, where platforms support this...)
> >
> > Would it be better to have a floating point equivalent of the
> > Rectangle class so we can use floats here?
> 
> Well, true, on the other hand I sense that floats are not universally
> loved, and the Rectangle is "just there"....!

I would be fine adding a RectangleF class for this (assuming we agree
that floating point values would be the best choice of course). We would
then also need a SizeF class.

> > > 2. Are there devices that can't pan within the input image?
> > >
> > > How would they signal this? I guess reporting min and max
> > > rectangles with identical x and y values would cover this.

I'm not aware of any such device, but it would be a good idea to be
prepared to support them, just in case. The x and y values would simply
be ignored in that case. As for reporting the pan capability, we have
the option to use min and max indeed, or we could add a specific
property to report this.

If we decide to use min and max, what should they be set to for devices
that can't pan ? If x and y represent the centre of the rectangle, then
we would set that to 0.5, so it's fairly easy.

> > > 3. Minimum and maximum values?
> > >
> > > A platform could also report "reasonable" minimum width and height
> > > values that indicate roughly what is supported. Values of 1.0 here
> > > (65536) would mean that there is no zoom capability.
> > >
> > > Valid maximum widths and heights of course depend on x and y (or
> > > vice versa). Perhaps we just report 1.0 here.

I think the maximum zoom level is useful to report one way or another.
Do you foresee cases where it could be different in the horizontal and
vertical directions ?

As above, we can use min and max to report this information, or add a
max zoom ratio property (or any other properties).

> > > 4. How to handle awkward cases or requests?
> > >
> > > I think we have to allow implementations silently to coerce the
> > > requested values into the "closest" thing that works. This might
> > > involve changing offsets (for example) to accommodate the requested
> > > width/height, or even adjusting the crop slightly to satisfy
> > > things like pixel alignment requirements.
> > >
> > > Perhaps the metadata returned in a request can indicate a value closer
> > > to what was actually used.

That's what I was going to propose :-)

> > > Finally, I'm sure there are other questions and things I've
> > > overlooked. What do people think?

Nice proposal, thank you for bringing it up.

-- 
Regards,

Laurent Pinchart