[libcamera-devel] [PATCH v2 1/2] libcamera: Implement digital zoom

David Plowman david.plowman at raspberrypi.com
Wed Jul 22 19:58:23 CEST 2020


Hi Kieran and everyone

Thanks for sticking with this gnarly subject! Perhaps I haven't helped
myself with some poor naming choices. You may be right that it
actually wants a "face to face" discussion, but before that maybe I'll
have one more go at an explanation, and then I'll post a new version
of the patches with some tidy-ups and improved nomenclature.

So there are lots of pixels and crops flying about. There may well be
some going on in the sensor, but here I'm talking about the crop that
is applied within the ISP, before it all gets rescaled to create the
output images. (Our pipeline handler shovels all the pixels we receive
from the sensor into the ISP; even those that get discarded are still
useful for things like statistics.)

>From now on (and until someone changes my mind again) I'm going to
call this the *pipeline crop*. How is the "pipeline crop" decided?

Let's suppose we're using the imx477 sensor and we want 1080p
output. The pipeline handler will probably choose the 2x2 binned mode,
giving us 2028x1520 pixels from the sensor.

We're going to feed all of this into the ISP. But the ISP has to apply
a crop before it rescales everything to the output size of 1920x1080 in
our case.

One obvious choice would be to take the largest rectangle from the
2028x1520 pixels that matches the output aspect ratio, and then to
centre this rectangle within the 2028x1520 pixels. This would give us
a "pipeline crop" of 2028x1140 pixels, with the top left pixel taken
from (0,190) in the 2028x1520 array from the sensor.

But the pipeline handler could have chosen something different. If the
thought of not-quite-1-to-1 scaling gives the pipeline handler the
creeps, it might have chosen a "pipeline crop" of 1920x1080, taken
from (54,220).

So the "pipeline crop" depends on: the sensor, the current sensor
mode, the aspect ratio of the output images, and basically what mood
the pipeline handler was in at that moment.

The idea is then that the pipeline crop is given to the application,
which, in our example, was either 2028x1140 or 1920x1080. The
application pans and zooms about within there. Notice how the offsets
- (0,190) or (54,220) - are immaterial to the application; the
pipeline handler will add them on transparently before setting the
V4L2 "selection".

Some additional notes:

1. There's also a policy decision here that you can't pan into regions
of image that were not selected by the pipeline crop. This seems
reasonable to me; allowing that to happen would create something both
slightly surprising and fiddly, and I don't really see any upside.

2. You could implement digital zoom by cropping in the sensor itself
rather than the ISP, though I think this would be unusual. But the
discussion above works just the same, with the pipeline crop meaning
"the largest crop we're prepared to take from the sensor". So it does
muddy the meaning a bit, but I'm stuck as to what else to call it!

Anyway, thanks again everyone and please watch out for a revised
patch set tomorrow.

David

On Wed, 22 Jul 2020 at 14:35, Kieran Bingham
<kieran.bingham at ideasonboard.com> wrote:
>
> Hi David,
>
> On 21/07/2020 12:27, David Plowman wrote:
> > Hi again Kieran, everyone
> >
> > So I've been contemplating some of this a bit more since yesterday,
>
>
> I've been trying to give this a bit more thought too... more discussion
> below:
>
>
> > and I wonder if I'm confusing everything by giving that contentious
> > function the name "getSensorCrop". It makes it sound like it's a
> > property of the sensor when it really isn't - it can change whenever
> > the sensor mode chosen changes, or the size of the requested output
> > changes. It's an interaction between the sensor mode and the behaviour
> > of the pipeline handler, so I wonder if we should be coining a
> > different name for it?
> >
> > How about....
> >
> > getInputCrop
> > getPipelineCrop
> > getPipelineSensorCrop
> > getPipelineSensorArea
> >
> > Of those I think I prefer the last one, it's the area of the sensor
> > that the pipeline handler has chosen that we can use. But does anyone
> > have any better suggestions?
> >
>
>
> I think I actually like getInputCrop(or is it setInputCrop?)
>
> Perhaps I'm getting confused as to whether this addition is to set or
> get the zoom window.
>
>
>
> > I guess I'm also still unclear as to where to store it - I'm assuming
> > that the SensorInfo isn't the right place for something that can
> > change like this. (?)
>
>
> I seem to still be confused as to what data this is representing.
>
> I.e. is it:
>
> A) A crop which reduces the output of the sensor (sensorOutputCrop) and
> further restricts the data output by the sensor reducing the overall
> window further than the analogCrop.
>
> B) A crop which restricts the input to the ISP/Scaler (scalerInputCrop)
> and must be smaller than the analogCrop from the sensor
>
>
>
> Also, I think I have been confused by the fact that there is a Rectangle
> DigitalZoom control added in this patch, which should come later, and
> will 'use'/'set' the crop being defined by this patch?
>
>
> Maybe a video call to discuss this sometime might help me understand
> what parts you are referring to.
>
>
>
> > Thanks!
> > David
> >
> > On Mon, 20 Jul 2020 at 08:19, David Plowman
> > <david.plowman at raspberrypi.com> wrote:
> >>
> >> Hi Kieran
> >>
> >> Thanks for your feedback. Apologies for the delay in replying - I've
> >> been away for the past week!
> >>
> >> I agree there's still some stuff to figure out here! Let me try and
> >> answer some of the questions (and probably pose some more!):
> >>
> >> On Fri, 10 Jul 2020 at 14:12, Kieran Bingham
> >> <kieran.bingham at ideasonboard.com> wrote:
> >>>
> >>> Hi David,
> >>>
> >>> On 09/07/2020 10:15, David Plowman wrote:
> >>>> These changes add a Digital Zoom control, tacking a rectangle as its
> >>>
> >>> s/tacking/taking/
> >>>
> >>>> argument, indicating the region of the sensor output that will be
> >>>> zoomed up to the final output size.
> >>>>
> >>>> Additionally, we need have a method returning the "sensorCrop" which
> >>>
> >>> s/need have/need to have/
> >>>
> >>>> gives the dimensions of the sensor output within which we can pan and
> >>>> zoom.
> >>>
> >>>
> >>> This commit message describes implementing digital zoom, but *this*
> >>> commit is only dealing with the representaion of a crop region in the
> >>> (CameraSensor) which can be used to implement digital zoom.
> >>>
> >>> Perhaps this commit should be titled:
> >>>  "libcamera: camera_sensor: Support cropping regions"
> >>>
> >>> (with that being the assumption that my comments below about moving this
> >>> to the CameraSensor class are correct)
> >>>
> >>>
> >>>
> >>>> ---
> >>>>  include/libcamera/camera.h                    |  2 ++
> >>>>  include/libcamera/internal/pipeline_handler.h |  4 +++
> >>>>  src/libcamera/camera.cpp                      | 26 +++++++++++++++++++
> >>>>  src/libcamera/control_ids.yaml                | 10 +++++++
> >>>>  4 files changed, 42 insertions(+)
> >>>>
> >>>> diff --git a/include/libcamera/camera.h b/include/libcamera/camera.h
> >>>> index 4d1a4a9..dd07f7a 100644
> >>>> --- a/include/libcamera/camera.h
> >>>> +++ b/include/libcamera/camera.h
> >>>> @@ -92,6 +92,8 @@ public:
> >>>>       std::unique_ptr<CameraConfiguration> generateConfiguration(const StreamRoles &roles = {});
> >>>>       int configure(CameraConfiguration *config);
> >>>>
> >>>> +     Size const &getSensorCrop() const;
> >>>> +
> >>>>       Request *createRequest(uint64_t cookie = 0);
> >>>>       int queueRequest(Request *request);
> >>>>
> >>>> diff --git a/include/libcamera/internal/pipeline_handler.h b/include/libcamera/internal/pipeline_handler.h
> >>>> index 22e629a..37e069a 100644
> >>>> --- a/include/libcamera/internal/pipeline_handler.h
> >>>> +++ b/include/libcamera/internal/pipeline_handler.h
> >>>> @@ -89,6 +89,8 @@ public:
> >>>>
> >>>>       const char *name() const { return name_; }
> >>>>
> >>>> +     Size const &getSensorCrop() const { return sensorCrop_; }
> >>>> +
> >>>>  protected:
> >>>>       void registerCamera(std::shared_ptr<Camera> camera,
> >>>>                           std::unique_ptr<CameraData> data);
> >>>> @@ -100,6 +102,8 @@ protected:
> >>>>
> >>>>       CameraManager *manager_;
> >>>>
> >>>> +     Size sensorCrop_;
> >>>> +
> >>>
> >>> I think this needs to go in the CameraSensor class...
> >>> It's 'per sensor' not 'per pipeline handler' right?
> >>
> >> Maybe this is the billion dollar question. See below...!
>
>
> I might also have meant CameraData ... but it depends still...
>
>
> >>>>  private:
> >>>>       void mediaDeviceDisconnected(MediaDevice *media);
> >>>>       virtual void disconnect();
> >>>> diff --git a/src/libcamera/camera.cpp b/src/libcamera/camera.cpp
> >>>> index 69a1b44..6614c93 100644
> >>>> --- a/src/libcamera/camera.cpp
> >>>> +++ b/src/libcamera/camera.cpp
> >>>> @@ -793,6 +793,32 @@ int Camera::configure(CameraConfiguration *config)
> >>>>       return 0;
> >>>>  }
> >>>>
> >>>> +/**
> >>>> + * \brief Return the size of the sensor image being used to form the output
> >>>> + *
> >>>> + * This method returns the size, in pixels, of the raw image read from the
> >>>> + * sensor and which is used to form the output image(s). Note that these
> >>>> + * values take account of any cropping performed on the sensor output so
> >>>> + * as to produce the correct aspect ratio. It would normally be necessary
> >>>> + * to retrieve these values in order to calculate correct parameters for
> >>>> + * digital zoom.
> >>>> + *
> >>>> + * Example: a sensor mode may produce a 1920x1440 output image. But if an
> >>>> + * application has requested a 16:9 image, the values returned here would
> >>>> + * be 1920x1080 - the largest portion of the sensor output that provides
> >>>> + * the correct aspect ratio.
> >>>
> >>> Should this be a Rectangle? Can the crop region be positioned
> >>> arbitrarily or is it always centered?
> >>
> >> As I've implemented this, it is just a pair of dimensions. The
> >> pipeline handler may position it anywhere it likes within the array of
> >> pixels output by the sensor (and takes care of any offset within
> >> that), but the application doesn't need to know. The idea being that
> >> the application gets to choose a *rectangle* from within the
> >> dimensions it is given and doesn't have to bother itself with how it's
> >> offset in the pixel array beyond that.
>
> Surely the position is quite essential to define somehow though,
> otherwise we'll get some zoom which will zoom in on the bottom right
> corner and be perfectly valid, while another zoom chooses the centre,
> and another the top left ... ?
>
> I assume centreing is usually the right thing to do - but I don't know ...
>
> If it is defined as always zooming in on the centre point then indeed a
> Size would be the only thing needed to store.
>
>
>
>
>
>
> >>> What is the relationship between this and the analogCrop in
> >>> CameraSensorInfo? I 'think' that one is more about what the actual valid
> >>> data region is from the camera, so I presume the analogCrop is sort of
> >>> the 'maximum' image size?
> >>
> >> Yes, I think the analogCrop is the maximum usable pixel area from the
> >> sensor. The crop here is what the pipeline handler decides it wants to
> >> use. It's adjusted for aspect ratio (of the requested output), the
> >> actual size that it wants to use (and hence the scaling). I certainly
> >> agree that it might want putting somewhere else... yet it's only the
> >> pipeline handler that can calculate it. It depends on the sensor, but
> >> it also depends on the mode selected, the output size requested, and
> >> "arbitrary" decisions made in the pipeline handler (for example, they
> >> may wish to avoid marginal scaling and prefer 1:1 input to output
> >> pixels).
>
>
> Just to clarify further, do you expect that this control changes the
> mode of the camera at all? I.e. does it crop the data coming from the
> sensor? Or is this purely applying the input crop to the rescaler ?
>
>
> >>>> + *
> >>>> + * \context This function is \threadsafe. It will only return valid
> >>>> + * (non-zero) values when the camera has been configured.
> >>>> + *
> >>>> + * \return The dimensions of the sensor image in use.
> >>>> + */
> >>>> +
> >>>> +Size const &Camera::getSensorCrop() const
> >>>> +{
> >>>> +     return p_->pipe_->getSensorCrop();
> >>>
> >>> Aha, more indication to me that this should go in to the Camera class,
> >>> and it would save the indirection required here...
> >>
> >> Indeed. But given that the pipeline handler calculates it, can we have
> >> it poke its value into the Camera class? Or is that weird if the
> >> number depends on rather more than just the camera? (And will change
> >> every time we start the sensor in a different mode.)
>
> Hrm - ok so that depends on something I assumed. I assumed that this
> would be setting a property on the sensor as well, but now I realise I
> think this is just defining a crop which will determine what window to
> read from at the input to the rescaler.
>
>
> >> Another solution to this that I've talked about previously is simply
> >> to go use ratios, and avoid pixel coordinates altogether. Then you
> >> just don't need this function. The counter argument is that an
> >> application can't control digital pan/zoom down to the precise pixel -
> >> which sounds appealing. Yet I think real applications will have in
> >> mind a _rate_ at which they wish to work (e.g. pan across half the
> >> field of view in 5 seconds) at which point ratios are quite natural.
>
> Yes, I think I am used to seeing zoom in terms of a multiplier, but that
> won't allow positioning/panning within the region.
>
>
>
>
>
> >>
> >> Best regards
> >> David
> >>
> >>>
> >>>
> >>>> +}
> >>>> +
> >>>>  /**
> >>>>   * \brief Create a request object for the camera
> >>>>   * \param[in] cookie Opaque cookie for application use
> >>>> diff --git a/src/libcamera/control_ids.yaml b/src/libcamera/control_ids.yaml
> >>>> index 988b501..5a099d5 100644
> >>>> --- a/src/libcamera/control_ids.yaml
> >>>> +++ b/src/libcamera/control_ids.yaml
> >>>> @@ -262,4 +262,14 @@ controls:
> >>>>          In this respect, it is not necessarily aimed at providing a way to
> >>>>          implement a focus algorithm by the application, rather an indication of
> >>>>          how in-focus a frame is.
> >>>> +
> >>>> +  - DigitalZoom:
> >>>> +      type: Rectangle
> >>>> +      description: |
> >>>> +        Sets the portion of the full sensor image, in pixels, that will be
> >>>> +        used for digital zoom. That is, this part of the sensor output will
> >>>> +        be scaled up to make the full size output image (and everything else
> >>>> +        discarded). To obtain the "full sensor image" that is available, the
> >>>> +        method Camera::getOutputCrop() should be called once the camera is
>
> Aha, so the DigitalZoom is a rectangle that it can zoom in on... I think
> I confused this DigitalZoom rectangle with the getSensorCrop() above.
>
> But that also highlights a confusion in this control documentation, Did
> you mean getSensorCrop when you stated getOutputCrop above?
>
>
> I think this actual DigitalZoom control addition perhaps shouldn't be in
> this patch, and should be in a patch on it's own after, or in patch 2/2
> in the series?
>
> Oh, and now I see both patches have the same title - That might need
> fixing too to clarify intent of each patch ;-) It probably is also why I
> have been confused. I thought this patch was implementing more of the
> digital zoom than I realised, and actually the zooming part is in the
> next patch...
>
>
>
>
>
> >>>> +        configured. An application may pan and zoom within this rectangle.
> >>>>  ...
> >>>>
> >>>
> >>> --
> >>> Regards
> >>> --
> >>> Kieran
>
> --
> Regards
> --
> Kieran


More information about the libcamera-devel mailing list