[libcamera-devel] [PATCH v3 0/6] Digital zoom

Tue Oct 13 10:36:50 CEST 2020

Hi Laurent

On Tue, 13 Oct 2020 at 02:45, Laurent Pinchart
<laurent.pinchart at ideasonboard.com> wrote:
>
> Hi David,
>
> On Mon, Oct 12, 2020 at 07:49:31PM +0100, David Plowman wrote:
> > Hi Laurent
> >
> > Thanks for the comments. Some interesting points.
>
> Sorry about chiming in so late and challenging some of the base
> assumptions though. I'll now try not to introduce any further delay.
>
> > On Sun, 11 Oct 2020 at 22:47, Laurent Pinchart wrote:
> >
> > > Hi David,
> > >
> > > Thank you for the patches. And more than sorry for reviewing the series
> > > so late.
> > >
> > > On Tue, Sep 29, 2020 at 05:39:54PM +0100, David Plowman wrote:
> > > > Hi everyone
> > > >
> > > > Despite not originally intending to, I've actually made a version 3 of
> > > > the digital zoom patches, just to take care of a few things that might
> > > > be a bit annoying otherwise.
> > > >
> > > > 1. I've improved the description of the IspCrop control as was
> > > > suggested.
> > > >
> > > > 2. I've improved the description of the zoom option in cam (if we
> > > > decide to use this patch!), also as was proposed.
> > > >
> > > > 3. There was actually a problem with the "{}" syntax to denote zero
> > > > Rectangles in include/libcamera/ipa/raspberrypi.h. They were failing a
> > > > range type test in ControlInfoMap::generateIdmap() and so the control
> > > > wasn't working. I've replaced "{}" by "Rectangle{}" which seems OK.
> > > >
> > > > 4. There's been a bit of churn in the RPi pipeline handler lately so
> > > > rebasing gave some conflicts. I've fixed those up.
> > > >
> > > > Otherwise everything else remains the same.
> > >
> > > I've gone through the different versions of this series, and the first
> > > thing I want to see is that it has improved very nicely over time. Nice
> > > job !
> > >
> > > Configuring zoom through an absolute crop rectangle is I believe the way
> > > to go, so the overall approach is good in my opinion. There are however
> > > a few points I'd like to discuss, related to the SensorOutputSize
> > > property and IspCrop control.
> > >
> > > Reading through the discussion, I become increasingly aware of a topic
> > > that was present in the background without ever being named, and that's
> > > the camera pipeline model. As mentioned yesterday in a few other replies
> > > to selected patches in this series and its previous versions, the job of
> > > libcamera is to expose to applications a camera model that abstract
> > > hardware differences. This is not an easy job, as we have to strike the
> > > right balance between a higher-level inflexible but simple to use model
> > > that can't support many of the less common use cases, and a lower-level
> > > powerful but complex to use model that exposes a very large range of
> > > hardware features. We have implicitly defined the skeleton of such a
> > > model through the API of the Camera and PipelineHandler classes, but
> > > have never made it explicit.
> > >
> > > This needs to change. I see no reason to block the digital zoom feature
> > > until we finish documenting the pipeline model, but I would like to
> > > design the feature while thinking about the bigger picture. Here are the
> > > assumptions I think the pipeline model should make for devices that
> > > support digital zoom.
> > >
> > > - The camera pipeline starts with a camera  sensor, that may implement
> > >   binning, skipping and/or cropping.
> > >
> > > - The subsequent blocks in the pipeline may perform additional cropping,
> > >   either at the direct command of the pipeline handler (e.g. cropping at
> > >   the input of the scaler), or automatically to support image processing
> > >   steps (e.g. colour interpoloation often drops a few lines and columns
> > >   on all edges of the image).
> > >
> > > - The pipeline ends with a scaler that can implement digital zoom
> > >   through a combination of cropping followed by scaling.
> > >
> > > The exact order of the processing steps at the hardware level doesn't
> > > need to match the pipeline model. For instance, cropping at the input
> > > and output of the scaler are interchangeable (not taking into account
> > > sub-pixel differences). It doesn't matter if the ISP scales before
> > > cropping the output, the hardware scaler parameters and output crop
> > > rectangle can be computed from an abstract input crop rectangle and
> > > output size. This is crucial to consider for the definition of the
> > > pipeline model: we need to design it in a way that ensures all features
> > > can be mapped to how they are implemented in the different types of
> > > hardware we want to support, but we're otherwise free to not map
> > > controls and properties 1:1 with the hardware parameters. When multiple
> > > options are possible, we should be guided by API design criteria such as
> > > coherency, simplicity and flexibility.
> > >
> > > Coming back to digital zoom, this series exposes a new SensorOutputSize
> > > property and a new IspCrop control. The SensorOutpuSize property is
> > > introduced to support the IspCrop control, as the base rectangle in
> > > which the scaler can crop. There are two issues here that bother me:
> > >
> > > - The property, despite being named SensorOutputSize, potentially takes
> > >   into account the cropping added by the CSI-2 receiver and by the ISP
> > >   for operations that consume lines and columns on the edges of the
> > >   image. The naming can create some confusion, which can possibly be
> > >   addressed by a combination of documentation (you're covering that
> > >   already) and possibly a more explicit name for the property. However,
> > >   as the property bundles crop operations perfomed in different stages
> > >   of the pipeline, I'm worried that it will turn out to be a bit
> > >   ill-defined regardless of how well we document it, with slightly
> > >   different behaviours in different implementations.
> > >
> > > - Ignoring the additional cropping performed in the CSI-2 receiver and
> > >   ISP (if any), the property exposes the sensor binning, skipping and
> > >   cropping. It bundles those three operations together, and thus doesn't
> > >   convey how the cropping affects the field of view as a given output
> > >   size can be achieved with different combinations of skipping/binning
> > >   and cropping.
> > >
> > > For these reasons, I'm concerned that the SensorOutputCrop property is a
> > > ad-hoc solution to provide a reference for the IspCrop property, and
> > > will not fit clearly in a pipeline model that should be based on simple,
> > > base building blocks. I would thus like to propose an alternative
> > > option.
> > >
> > > Instead of expressing the IspCrop controls (which I think we should
> > > rename to ScalerCrop) relatively to the SensorOutputSize property, could
> > > we express it relatively to the existing PixelArrayActiveAreas property
> > > ? This would have the advantage, in my opinion, of abstracting binning
> > > and skipping from applications. The pipeline handlers would need to
> > > perform a bit more work to compute the crop rectangle actually applied
> > > to the scaler, in order to take sensor binning/skipping and all
> > > additional cropping in the pipeline into account. The upside is that the
> > > ScalerCrop will directly define how the field of view is affected. It
> > > would also simplify the API, as no intermediate property between
> > > PixelArrayActiveAreas and ScalerCrop would need to be defined, and the
> > > ScalerCrop coordinates wouldn't depend on the active camera
> > > configuration. I think this would be easier to clearly document as part
> > > of a camera pipeline model.
> >
> > Renaming IspCrop to ScalerCrop sounds fine to me. It has had so many
> > different names!
> >
> > I'm less sure about trying to derive the SensorOutputSize (or
> > ScalerInputSize or whatever else we want to call it!) from the
> > PixelArrayActiveAreas property. Let me try and take a step back.
> >
> > So first, I think knowing the PixelArrayActiveArea isn't enough. How would
> > you know if the pipeline handler was doing some extra cropping that wasn't
> > "strictly necessary", perhaps to reduce memory traffic, or for a faster
> > framerate. How would the application know not to try and zoom there? It
> > seems to me that this really is a decision for the pipeline handler based
> > on the sensor driver, it isn't available from the properties of the sensor
> > itself.
> >
> > Actually I'd quite like to leave the discussion there for the moment and
> > see if that much is controversial or not. Of course we then have to move on
> > but maybe let's see what we think about that first...
> >
> > Thoughts??
>
> I fully agree with you that the pipeline handler has the last word when
> it comes to the (combined) internal cropping. It may not have any choice
> (if the CFA interpolation eats two or four lines and columns on each
> side of the image due to the hardware implementation, there's nothing
> that can be done against it), or may just want to decide on policies for
> whatever reason it sees fit (while trying not to over-police though, to
> leave options to applications).
>
> My proposal doesn't deny this from the pipeline handler, but tries to
> express the crop rectangle in a way that maps directly to the field of
> view. To do so, binning/skipping would be hidden from applications by
> using a reference for the ScalerCrop property that has no
> binning/skipping applied.

So we certainly agree on the necessity for "something", that's good,
and then it's a case of figuring out what "something" is. I have
several conflicting thoughts here (I frequently disagree with myself):

* Actually, as a basic principle, I think applications should be able
to find out "the truth" if they want it. This would mean the sensor
crop, binning/scaling, and any subsequent cropping.

* I even wonder sometimes whether there shouldn't really be a class
that describes how the camera mode has been derived from the sensor,
with all this information made explicit. But I think that leads us
even deeper into the woods, and I suspect we don't really want to go
there! (Also, I'm not sure if the V4L2 API would even let you discover
all of it, which might be a problem...)

* But I also think digital zoom needs to be easy to use for
applications that don't care about all these details ("the truth", as
I called it!). Actually I feel the overwhelming majority of
applications will fall into this category.

>
> The pipeline handler would restrict the requested ScalerCrop control
> based on the internal cropping that is always applied (combining sensor,
> CSI-2 receiver and ISP), and report the result through the request
> metadata. The application would thus know the exact crop rectangle that
> has been used, in unscaled (binning/skipping) sensor coordinates.

Let me try and paraphrase a bit and you can tell me if I've got it wrong.

You're proposing a property (details in a moment) that can be used to
choose zoom regions for the ScalerCrop control.

The units for this property are the native sensor pixels, ignoring any
binning/scaling that might be going on.

The property would give you a Rectangle that expresses exactly the
area in which you are permitted to zoom? The offset of the rectangle
tells you where in the sensor array this rectangle actually lies.

The term "sensor array" there needs some clarification. I think it
probably means the "PixelArraySize" less any invalid or optical black
pixels. (Do we have a way to get hold of that easily?)

The ScalerCrop requested, and reported back, gives the offset and size
of the zoom rectangle relative to the rectangle given by the property
under discussion here (I think that's the most convenient choice). So
if it reports back offsets of (0,0) then you know you're in the top
left hand corner of what you could possibly ever see in this camera
mode (but not necessarily of the sensor as a whole).

I think this could work and is a fair compromise between the amount of
information and the ease of use. Most digital zoom applications could
simply request its Size, and go from there. Only a few minor
mis-givings:

* Does V4L2 actually give you a way of finding out where a particular
"camera mode" lies in the pixel array? Do you even know if a mode is
binned or heavily cropped... someone who knows rather more about V4L2
needs to answer that one!

* Use of native sensor coords is OK, though a few sensors might
support non-integer scaling at which point the property would be
slightly "approximate". That seems fairly minor to me, though. (Some
ISPs might even do non-integer cropping, so the problem can exist
regardless.)

* It doesn't really support my assertion that you should be able to
know exactly what's going on, if you want to. (But if we decide this
doesn't matter, then that's fine too.)

* In one of our early discussions we suggested that having the exact
pixel level information might be useful, and you could do things like
pan one pixel at a time. We lose that by going to native sensor
coordinates. I think I said at the time that this ability feels more
useful than I believe it really is, so I wouldn't be too upset.

>
> While in some use cases the actual crop rectangle reported through
> metadata would be enough (for instance, in a GUI that starts with an
> unzoomed view, the application would get the maximum possible crop
> rectangle in the metadata of the first frame), I can imagine we would
> have use cases that need this information before capturing the first

Indeed. And I also think you should be able to set digital zoom to
take effect *on* the very first frame itself (this spills over into
Naush's most recent patch set...)

> frame. I initially thought we could report it through the max value of
> the ControlInfo instance for the ScalerCrop control, but this is
> supposed to be static information. As this would be (in my opinion) a
> neat solution, I'd like to investigate the possibility of making
> ControlInfo dynamic, but maybe we will need a different solution. For
> foresee the addition of an API that will create request templates,
> filling them with default control values based on a requested use case,
> and the maximum ScalerCrop could possibly be reported through that
> mechanism. I'm also fine reporting it through a temporary mechanism (for
> instance adding it to CameraConfiguration, or creating a dynamic
> property) for the time being, if you already have pressing use cases for
> knowing the maximum value before capturing the first frame.
>
> Thoughts ? :-)

I'm glad we're talking about all this stuff!!

>
> > > Two additional points I'd like to consider (and which are orthogonal to
> > > the previous one) are:
> > >
> > > - Should we automatically adjust the ScalerCrop rectangle to always
> > >   output square pixels, or should we allow modifying the aspect ratio
> > >   when scaling ? Most use cases call for square pixels, but I don't
> > >   think we necessarily want to create an artificial limitation here (as
> > >   long as we make it easy for applications to compute the scaling
> > >   parameters that will give them square pixels)/
> >
> > Personally I see no reason to restrict what an application can request. We
> > need to make it easy to request the right aspect ratio (hence those
> > geometry helpers), but if people actually have a use-case for something
> > strange...
>
> Fine with me.
>
> > > - Should we allow a ScalerCrop rectangle smaller than the stream output
> > >   size, or should we restrict scaling to down-scaling only ?
> >
> > I think up-scaling is probably the most common use-case for us (though
> > downscaling will happen too). Think of all those (rubbish) 30x zoom
> > pictures that some phones like to produce...!
>
> Rubbish is exactly the work I would have used :-) I'm tempted to try and
> teach users that they shouldn't do this by disabling this feature, but
> that would be pretentious. I suppose there's no good reason to forbid
> this. Should we put a limit on the upscaling factor though ?

Always tricky. To what extent should we save people from themselves?
I'm pretty sure the Pi imposes some kind of a limit, but don't really
know. In the end, so long as it doesn't blow up, that's kind of OK...

Best regards
David

>
> > > > David Plowman (6):
> > > >   libcamera: Add SensorOutputSize property
> > > >   libcamera: Initialise the SensorOutputSize property
> > > >   libcamera: Add IspCrop control
> > > >   libcamera: Add geometry helper functions
> > > >   libcamera: pipeline: raspberrypi: Implementation of digital zoom
> > > >   cam: Add command line option to test IspCrop control
> > > >
> > > >  include/libcamera/geometry.h                  |  20 +++
> > > >  include/libcamera/ipa/raspberrypi.h           |   1 +
> > > >  src/cam/capture.cpp                           |  25 +++-
> > > >  src/cam/capture.h                             |   2 +-
> > > >  src/cam/main.cpp                              |   3 +
> > > >  src/cam/main.h                                |   1 +
> > > >  src/ipa/raspberrypi/raspberrypi.cpp           |   7 +
> > > >  src/libcamera/camera_sensor.cpp               |   6 +
> > > >  src/libcamera/control_ids.yaml                |  12 ++
> > > >  src/libcamera/geometry.cpp                    | 129 ++++++++++++++++++
> > > >  .../pipeline/raspberrypi/raspberrypi.cpp      |  47 +++++++
> > > >  src/libcamera/property_ids.yaml               |  19 +++
> > > >  12 files changed, 269 insertions(+), 3 deletions(-)
>
> --
> Regards,
>
> Laurent Pinchart