[libcamera-devel] libcamera transform control

Mon Jul 27 13:47:28 CEST 2020

Hi everyone

Thanks for all the comments. I think we can formulate a plan for which
I can produce a set of patches. It looks like this:

1. Define an enum class Transform as described by Laurent. (I prefer
"Transform" to "Transformation", not only is it shorter to type but I
think it feels more natural to me in this context.)

2. There are going to be a few operators and helper functions:

operator* to compose Transforms, such that a*b means b first, then a.
I think anything other than the "standard mathematical" interpretation
would seem strange.

Unary operator- for inverse. Actually this offends me a bit, using *
for compose and - as inverse but I'm not sure what else to do. Use +
for compose? Or not have inverse at all but implement operator/ (so
that the inverse of Transform t is given by Transform::Identity / t).
But none of that feels ideal either...

A rotationToTransform(int32_t) function that makes a Transform from a
rotation property. (Can the rotation have values other than 0, 90,
270, 360? In which case maybe it needs to return a success/fail code.)
 Also perhaps the reverse, namely transformToRotation (again, with
success/fail code as not all Transforms are a simple rotation).
.
Maybe we also want something like transformToString for debugging.

(I wonder whether some of this would be tidier if we turn Transform
into a bona fide class, or is that overkill?)

3. We'll put a Transform field in the CameraConfiguration, defaulting
it to Identity. I don't think there's much else to do in the libcamera
core.

4. (In the Raspberry Pi pipeline handler) Handle the user-supplied
Transform in the CameraConfiguration, combining it the the
properties::Rotation to set the correct h/vflip bits in the camera
driver.

5. Handle the Transform in the ALSC algorithm. I think we put the
Transform in our DeviceStatus metadata and then ALSC has to flip its
various tables in the right way.

I think that's pretty much it. Please watch for some patches in due course!

Thanks and best regards
David

On Fri, 24 Jul 2020 at 00:10, Laurent Pinchart
<laurent.pinchart at ideasonboard.com> wrote:
>
> Hi David,
>
> On Tue, Jul 21, 2020 at 10:19:30AM +0100, David Plowman wrote:
> > Hi again
> >
> > Replying again to my own discussion (I seem to be making a habit of this...)
> >
> > On Mon, 20 Jul 2020 at 08:59, David Plowman wrote:
> > >
> > > Hi Laurent, Naush and everyone!
> > >
> > > Thanks for all the discussion while I was away!
> > >
> > > On Mon, 13 Jul 2020 at 23:27, Laurent Pinchart wrote:
> > >> On Mon, Jul 13, 2020 at 09:16:19AM +0100, Naushir Patuck wrote:
> > >>> Hi Laurent,
> > >>>
> > >>> David is away this week, so I'll reply with some thoughts and comments
> > >>> in his absence.
> > >>>
> > >>> On Sun, 12 Jul 2020 at 00:28, Laurent Pinchart wrote:
> > >>>> On Thu, Jul 09, 2020 at 09:13:56AM +0100, David Plowman wrote:
> > >>>>> Replying to my own post to add comments from Naush and Jacopo!
> > >>>>>
> > >>>>> On Mon, 6 Jul 2020 at 14:32, David Plowman wrote:
> > >>>>>>
> > >>>>>> Hi everyone
> > >>>>>>
> > >>>>>> This email takes the previous discussion of image transformations and
> > >>>>>> moves it into a thread of its own.
> > >>>>>>
> > >>>>>> The problem at hand is how to specify the 2D planar transformation to
> > >>>>>> be applied to a camera image. These are well known to us as flips,
> > >>>>>> rotations and so on (the usual eight members of dihedral group D4).
> > >>>>>>
> > >>>>>> Laurent's suggestion was to apply them as standard libcamera controls.
> > >>>>>> Some platforms may be able to apply them on a per-frame basis; others
> > >>>>>> will not (they will require the transform to be set before the camera
> > >>>>>> is started - once this mechanism exists). (Is there anyway to signal a
> > >>>>>> platform's capabilities in this regard?)
> > >>>>
> > >>>> Not at the moment, but I think we should add that information to the
> > >>>> ControlInfo class. It should be fairly straightforward, we can add a
> > >>>> flags field, with a single field defined for now to tell whether the
> > >>>> control can be modified per frame or has to be set before starting the
> > >>>> camera.
> > >>>
> > >>> That would work.  However, I do struggle to see what would require an
> > >>> application to change the orientation on a per-frame basis.  In
> > >>> particular, if the change in orientation requires a transpose
> > >>> operation, then the buffer sizing might also change due to alignment
> > >>> constraints, and these would almost certainly require a set of buffer
> > >>> re-allocations unless you pre-allocate for the largest possible size.
> > >>> Of course, that is not to say that there may be other controls that
> > >>> can only be set on startup (I cannot think of any specific ones right
> > >>> now).
> > >>
> > >> While we could in theory support a change of rotation at runtime
> > >> (assuming the kernel drivers and APIs would let us do so), I don't think
> > >> that would be particularly valuable. The main use case I could image
> > >> would be flipping the image horizontally, a.k.a. mirroring, for video
> > >> conferencing use cases where the user may want to enable or disable
> > >> mirroring for the local display.  One could however argue that platforms
> > >> used for video conferencing with a local display would likely have a GPU
> > >> that would handle the composition, making mirroring easy to implement on
> > >> the display side, but there could be other similar use cases on less
> > >> powerful platforms. We could force a stop/start sequence in that case,
> > >> but are there drawbacks in allowing this to be changed at runtime ?
> > >
> > > To me it seems a reasonable compromise to go with the standard Control
> > > mechanism, but perhaps to add a flag indicating whether you have to
> > > apply the control _before_ starting the camera, or whether you can
> > > apply it after. As Naush says, it could get quite hairy trying to do
> > > it in the sensor, or if you had an inline ISP that was doing it, and
> > > figuring out which frame after the request would actually have the
> > > change might be awkward.
> >
> > I wanted to add a couple of further comments to this. Quite commonly,
> > transforms may be implemented in the sensor. This means that whenever
> > you set or change the transform, the pixel format of the raw stream
> > may change too. In fact I've already seen this kind of problem - under
> > some circumstances the Bayer order recorded in qcam's DNG files can be
> > wrong.
> >
> > So we need to consider the interaction of the transform control with
> > the raw stream format. Specifically:
> >
> > * If we change the transform at run time, are we prepared for the raw
> > stream format to change spontaneously? Would requests have to contain
> > metadata saying what the pixel format of the raw stream was for that
> > capture?
> >
> > * We've talked about passing an initial ControlList into either the
> > configure or start method. Whichever approach one takes, you've got to
> > be careful about exactly when the raw stream pixel format may or may
> > nor be correct.
> >
> > All this keeps nudging me back to an initial thought that the
> > transform should be a bona fide field in the stream configuration. You
> > set it before calling configure, and it never changes. All these
> > problems simply disappear.
> >
> > Thoughts?
>
> I agree it would make things simpler, at the cost of not supporting some
> use cases that are currently considered as uncommon, if not rarer. It
> would probably be a field in the CameraConfiguration instead of the
> StreamConfiguration though, as the transformation is mostly applied at
> the sensor level.
>
> I wonder if we should start with this, and later add controls at the
> stream level (which we don't have at the moment) to apply per-stream
> transformations, if we encounter hardware that supports this.
>
> Let's also consider transposition (as a mean to implement 90° and 270°
> rotations) in this discussion, even if we don't implement support for it
> now. Transpositionns when supported at the hardware level, is most
> likely implemented in the ISP (I haven't seen a sensor that can rotate
> by 90°, but I wouldn't rule the existence of such a device out). It is
> likely even more difficult to change at runtime than the other
> transformations, as it changes the layout of the buffer. It is however
> quite likely to be implemented in DMA engines, and thus at the stream
> level.
>
> To summarize my opinion, I believe we need to consider all
> transformations and the different places where they can occur (at the
> camera level or the stream level), when they can be configured (before
> starting the camera or at runtime), and how transformations at the
> camera level and stream level would interact (for instance we may have
> an offline ISP that performs rotation on the input side, the raw stream
> would thus not support rotation, and all the process streams would have
> the same rotation value). We can then check that we have all cases
> covered in the design, with clear interactions, and implement a subset
> (for instance leaving the runtime control out to start with). Does this
> make sense ?
>
> > Thanks! (and sorry for being awkward...)
>
> It would be the pot calling the kettle black if I blamed you for that
> :-)
>
> > >>>>>> For the time being we are ignoring the possibility that some platforms
> > >>>>>> might be able to apply different transformations to different streams
> > >>>>>> (I believe there is in any case no mechanism for per-stream controls).
> > >>>>
> > >>>> Indeed. That's something we may add in the future, but as let's try to
> > >>>> avoid it if possible :-)
> > >>>>
> > >>>>>> Note also that raw streams will always have the orientation with which
> > >>>>>> they come out of the camera, though again I don't believe we have a
> > >>>>>> convenient way for a platform to indicate whether this includes the
> > >>>>>> transform or not  (perhaps a stream could indicate its transform
> > >>>>>> relative to the requested transform?).
> > >
> > > How do folks feel about this? It seems like a useful feature to me.
> > >
> > >>>>>> We propose to represent the transforms as "int" controls, in fact
> > >>>>>> being members of an enum with eight entries. Further, we suggest that
> > >>>>>> the first four entries are "identity", "hflip", "vflip", "h+vflip",
> > >>>>>> meaning the control's maximum value can indicate whether transforms
> > >>>>>> that involve a transposition are excluded.
> > >>>>
> > >>>> An enum sounds good. How would you name the next our entries ? :-)
> > >
> > > So I think I'd have a class or enum class (or equivalent) using 3 bits
> > > to represent a transform. For example, bit 0 = hflip, bit 1 = vflip,
> > > bit 2 = transpose. By convention we might say that the transpose is
> > > applied after the flips.
>
> Looks good to me.
>
> > > I'd also add some aliases for transforms with
> > > familiar names. And we should define a * operator to compose
> > > transforms. Oh, and an inverse operator too.
>
> operator∘() ? :-) It would be interesting if C++ allowed defining
> custom operators.
>
> > > I always hate naming stuff, but here's an example.Maybe use MIRROR
> > > instead of FLIP?
> > >
> > > IDENTITY = ROT0 = 0,
> > > HFLIP = 1,
> > > VFLIP = 2,
> > > HVFLIP = ROT180 = 3,
> > > TRANSPOSE = 4,
> > > ROT270 = 5,
> > > ROT90 = 6,
> > > TRANSPOSE_ROT180 = 7
>
> enum class Transformation {
>         Identity = 0,
>         Rot0 = Identity,
>         HFlip = 1,
>         VFlip = 2,
>         HVFlip = HFlip | VFlip,
>         Transpose = 4,
>         Rot270 = HFlip | Transpose,
>         Rot90 = VFlip | Transpose,
>         Rot180Transpose = HFlip | VFlip | Transpose
> };
>
> constexpr Transformation operator*(Transformation lhs, Transformation rhs)
> {
>         ...
> }
>
> constexpr Transformation operator-(Transformation t)
> {
>         ...
> }
>
> I really like the idea of custom operators, as it's too easy to mix the
> order of transformations and get things wrong (we need "enum class"
> instead of "enum" to disallow usage of default arithmetic operators).
> There's of course the question of whether a*b should perform a first and
> then b, or the other way around. Mathematically speaking, a ∘ b applies
> b first, then a. I foresee this being confusing to many though.
>
> It seems that the D4 (or D8, depending on the notation) group is also
> commonly constructed from the identity, rot90 and hflip operations (see
> https://en.wikipedia.org/wiki/Dihedral_group_of_order_8). We could base
> the numerical values on that (Identy = 0, Rot90 = 1, HFlip = 2) and
> define the other transformations accordingly. I'm not sure what the pros
> and cons of this would be.
>
> (On a side note, it looks like
> https://en.wikipedia.org/wiki/File:Dih4_cycle_graph.svg has the order of
> the operations reversed, while
> https://en.wikipedia.org/wiki/File:Dihedral_group4_example.png seems
> correct.)
>
> > > You'll notice I have no natural name for the last entry! (any offers?)
>
> Not at the moment I'm afraid.
>
> > >>>>>> Naush:
> > >>>>>>
> > >>>>>> ... the pipeline handlers need to maintain a list of
> > >>>>>> controls supported, like in include/libcamera/ipa/raspberrypi.h
> > >>>>>> (RPiControls).  This then gets exported to the CameraData base class
> > >>>>>> member controlInfo_.  ARequest will not allow setting a
> > >>>>>> libcamera::control that is not listed in CameraData::controlInfo_
> > >>>>>
> > >>>>> Jacopo:
> > >>>>>
> > >>>>> That's correct, it seems to me from this discussion, a ControlInfo
> > >>>>> could be augmented with information about a control being supported
> > >>>>> per-frame or just at at configuration time (or else...)
> > >>>>
> > >>>> Seems we agree :-)
> > >>>>
> > >>>> Please also note that include/libcamera/ipa/raspberrypi.h lists controls
> > >>>> that are supported by the IPA. If there are controls that are fully
> > >>>> implemented on the pipeline handler side without involving the IPA (I'm
> > >>>> not suggesting this is or is not the case here, it's a general remark),
> > >>>> they can be added to the control info map by the pipeline handler.
> > >>>>
> > >>>>>> Notes on the Raspberry Pi implementation:
> > >>>>>> Only transpose-less transforms will be allowed, and only before the
> > >>>>
> > >>>> What do you mean by transpose-less transforms ?
> > >
> > > So you can look at the list I made above. Another way to think of it
> > > is that a transpose-less transform preserves input rows of pixels as
> > > output rows of pixels. A transform with a transpose in it turns input
> > > rows of pixels into output *columns*.
>
> I've got it now, thanks.
>
> > >>> Anything that involves only a combination of horizontal and vertical
> > >>> flips would be a tranpose-less transform, e.g. rot180, rot180 +
> > >>> mirror.  Any rot 90/270 would require a transpose operation, and we
> > >>> cannot do that with the hardware block.
> > >>>
> > >>>>>> camera is started. We will support them by setting the corresponding
> > >>>>>> control bits in the camera driver (so raw streams will include the
> > >>>>>> transform).
> > >>>>
> > >>>> Does this mean configuring h/v flip at the camera sensor level ? I
> > >>>> assume that will be the most usual case. I wonder if offline ISPs
> > >>>> typically support h/v flip.
> > >>>
> > >>> The RPi ISP does support it.  However, we do find it much much easier
> > >>> overall if the flips occurred at the source.
> > >>>
> > >>>>>> This means the ALSC (Auto Lens Shading) algorithm will
> > >>>>>> have to know whether camera images are oriented differently from the
> > >>>>>> calibration images.
> > >>>>
> > >>>> How do you think we should convey the orientation of the calibration
> > >>>> data ?
> > >>>
> > >>> We could have a field in our tuning file specifying the orientation of
> > >>> the calibration table - or always enforce the calibration must happen
> > >>> at rot 0.  Then any orientation change can be passed into the IPA (via
> > >>> controls?) so that the calibration table can be flipped appropriately.
> > >>> Personally, I prefer to have all calibration data fixed at rot 0, but
> > >>> David may have other opinions :)  However, all this is up to vendors
> > >>> to decide for themselves I suppose.
> > >>
> > >> I think I agree with you here, it seems less confusing and error-prone
> > >> to standardize the calibration rotation, but I may be missing something.
> > >
> > > Down at the Raspberry Pi level, I think it might be natural to include
> > > the camera transform in our DeviceStatus metadata. This makes it
> > > immediately available to any control algorithms that care.
> > >
> > > Doing lens shading calibration with the identity transform certainly
> > > makes sense and reduces confusion. But you know someone somewhere will
> > > get themselves in a muddle and discover that they've got their
> > > calibration images the wrong way round.
> > >
> > > So we _will_ end up with a "calibration transform" field in ALSC
> > > recording the orientation of those calibration images. However, with
> > > those compose and inverse operations defined for transforms, it will
> > > merely be a (slightly head-scratching) one-liner to deduce the right
> > > transform for the output tables!
>
> --
> Regards,
>
> Laurent Pinchart