[libcamera-devel] Cropping (aka. digital zoom)

Fri Jun 26 17:04:22 CEST 2020

Hi Laurent, Jacopo, Naush, everyone...

Thanks for all the discussion and comments! It's all become
rather spread out in that original email, so I thought I'd try and
gather everything up again, taking account of the comments and some
revised opinions. I'll also try and give some concrete numerical
examples, I think that helps!

I'm sorry this is going to be rather long again...

First of all, we're talking about the crop being applied to the input
image; the image is subsequently scaled so that the output resolution
is always the same. Most likely we have the crop being
applied within the ISP. It's not inconceivable that someone might try
and dynamically alter the sensor read-out but there are some
challenges here, and I don't think it alters the basic discussion that
much.

Let's look at two schemes, A and B, for what happens. I'm hoping this
will clarify what Jacopo was asking. (If not, please ask again!)

A: we define the crop as a rectangle from the full sensor output.

Example: imx477 in 2x2 binning mode. Now the sensor may well have an
explicit 1080p mode, but to illustrate the differences let's suppose
it only has a 2028x1520 mode.

Let's suppose we want 1080p, the default crop chosen might be:
(54,220) 1920x1080 (meaning a 1920x1080 rectangle of pixels with its
first pixel taken from (54,220) in the 2028x1520 sensor output)

But the application gets to change all that so it can actually pan
about in the sensor output (mostly up and down) even without any zoom!
e.g. setting the crop to (0,0) 1920x1080 reveals pixels not in the
"default" image. Or it could "zoom out" just very slightly, by
setting the crop to (0,190) 2028x1140.

B: we decide the unzoomed rectangle, and you can only zoom within
that.

Example: imx477 2x2 binning as before.

The system decides upon the same default crop as A. But here the
application crop is restricted to that default crop. You can zoom in
and out anywhere within that initial 1920x1080 rectangle, but not
outside it.

Differences between A and B

For A to work you have to be able get the sensor dimensions 2028x1520,
and you'd probably want to know the default system crop system too,
(54,200) 1920x1080 (or whatever).

These values are (I believe?) only available after configure(). Where
platforms try to crop in the sensor, or have an inline ISP, trying to
control the crop for the first frame could be tricky.

I think you'd also need to be setting the crop in pixels.

Scheme B is simpler and you don't have the behaviour that you can pan
about and discover "new" parts of the image.

It could operate either in pixels, or in ratios. If we wish to use
pixels all we'd need is the size of the rectangle that is cropped out
(1920x1080 in our example, or 2028x1140 if the system had gone for a
maximum FOV crop). Using ratios, you wouldn't need anything.

Pixels, as Laurent pointed out, do give you exact control if that's
what you want. In truth I don't think pixel-level control is what
applications would really use, but you have the option here.

So where have I ended up after all this?

Well, I'm preferring scheme B. Scheme A is certainly more flexible,
but to me it seems a bit unexpected and slightly devious. I think
application writers would scratch their heads over it, and probably
implement something that behaves like scheme B anyway! There
was some talk of helper libraries, and perhaps you could have
something to make it easy to make scheme A work like scheme
B, though at that point it's starting to look rather fussy, and many
applications could end up duplicating lots of rather similar
crop/zoom code...

However, I think I've moved to the view that pixels are a better unit
than ratios. It does give you that precise control if you want it, but
it's still easy to use. It also makes it very obvious how the request
metadata can return exactly the values it used. But it must be
easy to get those "initial crop dimensions" from the system and I'm
not sure what the mechanism for that is - will these "sensor
properties" do that, bearing in mind it depends on the output image
aspect ratio too?

We also need to be sure we address the question of how to change
the crop even for the very first frame (obviously there's some
overlap with the question of setting exposure/gain before the
camera is started).

Finally, and perhaps least importantly, I'm somewhat against the idea
of representing these crops using centre/size rather than (top left)
offset/size. I'd prefer to represent rectangles the same way
everywhere, and not have some places or controls where they're
different, but that's just a personal opinion. So I'd vote for
offset/size everywhere and go with that existing Rectangle class.

I hope that's all understandable? Thanks for all the help and
head-scratching over this!

Best regards
David

On Thu, 25 Jun 2020 at 10:19, Jacopo Mondi <jacopo at jmondi.org> wrote:
>
> Hello,
>   just a few questions that I have after reading the whole thread
>
> On Fri, Jun 19, 2020 at 12:36:45PM +0100, David Plowman wrote:
> > Hi everyone
> >
> > Another thing I'd like to discuss is the ability to crop a region of
> > interest from a camera image. This is another feature that will be
> > available in many camera systems but which is not currently possible
> > within libcamera.
> >
> > Just to recap, cropping specifies a window in the input image, and
> > everything outside this window is discarded. However, the used portion
> > of the image is still scaled to the same final output size, making
> > this a natural mechanism for implementing digital zoom.
> >
> > I would therefore like to propose a new control consisting of four
> > numbers:
> >
> > * The first two are the x and y offsets of the crop window within the
> >   input image. Having these parameters allows an application to pan
> >   around within the input image.
> >
> > * The next two would be the width and height of the crop window.
>
> How does this play with the current stream dimensions ?
>
> I assume that the stream desired size is the final up-scaled
> image size (or down-scaled, why not), while the proposed control
> specifies the image sensor pixel array portion to process, and
> up(down)-scale through the ISP.
>
> If we consider the control to select the pixel array portion to
> process, I agree it should apply to the pixel array active matrix
> size, being it expressed in pixels, or as a fraction of the active
> area size.
>
> If we consider this a sensor-related control, Naush' question on how to
> take "hidden" crops to maintain the aspect ratio would be answered by
> respecting the sequence of operation that actually take place, and it
> will be responsibility of the ISP to further crop the image received
> from the sensor before up(down)-scaling it to the user requested size.
>
> Making out of this a Controls::Sensor::digitalCrop control (we don't
> have control and properties namespaces, yet, but you got the idea)
> what would happes is the application will have to know the active
> pixel array size dimension (we have a property for that), select the
> portion to extract through the control, and request the final
> (up-scaled) image size through the StreamConfiguration.size parameter
> as it would usually do.
>
> In this case, the Sensor::digitalCrop control would limit the size of
> the available raw frames (ie frames captured before ISP processing at
> the CSI-2 receiver output), but I think that's expected ?
>
> More references on how this handled on Android:
> https://jmondi.org/android_metadata_tags/docs.html#controls_android.scaler.cropRegion
>
> Thanks
>   j
>
>
> >
> > I believe it's sensible for these numbers to be ratios, rather than
> > (for example) pixels. This makes it much easier for an application to
> > use. To specify that you want the middle quarter of an image ("2x
> > digital zoom"), you'd pass the numbers 0.25, 0.25, 0.5 and 0.5. Note
> > how setting width == height guarantees that you don't mess up your
> > aspect ratio.
> >
> > Questions
> >
> > 1. How to represent the numbers?
> >
> > I see an existing Rectangle class that gets used within controls, so
> > maybe that makes sense. I'd perhaps go with fixed point values such
> > that 65536 is equivalent to 1.0? (Though that doesn't leave huge scope
> > for sub-pixel cropping, where platforms support this...)
> >
> > 2. Are there devices that can't pan within the input image?
> >
> > How would they signal this? I guess reporting min and max
> > rectangles with identical x and y values would cover this.
> >
> > 3. Minimum and maximum values?
> >
> > A platform could also report "reasonable" minimum width and height
> > values that indicate roughly what is supported. Values of 1.0 here
> > (65536) would mean that there is no zoom capability.
> >
> > Valid maximum widths and heights of course depend on x and y (or
> > vice versa). Perhaps we just report 1.0 here.
> >
> > 4. How to handle awkward cases or requests?
> >
> > I think we have to allow implementations silently to coerce the
> > requested values into the "closest" thing that works. This might
> > involve changing offsets (for example) to accommodate the requested
> > width/height, or even adjusting the crop slightly to satisfy
> > things like pixel alignment requirements.
> >
> > Perhaps the metadata returned in a request can indicate a value closer
> > to what was actually used.
> >
> > Finally, I'm sure there are other questions and things I've
> > overlooked. What do people think?
> >
> > Best regards
> > David
> > _______________________________________________
> > libcamera-devel mailing list
> > libcamera-devel at lists.libcamera.org
> > https://lists.libcamera.org/listinfo/libcamera-devel