[libcamera-devel] libcamera transform control

Mon Jul 27 13:57:16 CEST 2020

Hi David,

On Mon, Jul 27, 2020 at 12:47:28PM +0100, David Plowman wrote:
> Hi everyone
> 
> Thanks for all the comments. I think we can formulate a plan for which
> I can produce a set of patches. It looks like this:
> 
> 1. Define an enum class Transform as described by Laurent. (I prefer
> "Transform" to "Transformation", not only is it shorter to type but I
> think it feels more natural to me in this context.)
> 
> 2. There are going to be a few operators and helper functions:
> 
> operator* to compose Transforms, such that a*b means b first, then a.
> I think anything other than the "standard mathematical" interpretation
> would seem strange.

Please make sure it's properly documented :-)

> Unary operator- for inverse. Actually this offends me a bit, using *
> for compose and - as inverse but I'm not sure what else to do. Use +
> for compose? Or not have inverse at all but implement operator/ (so
> that the inverse of Transform t is given by Transform::Identity / t).
> But none of that feels ideal either...
> 
> A rotationToTransform(int32_t) function that makes a Transform from a
> rotation property. (Can the rotation have values other than 0, 90,
> 270, 360? In which case maybe it needs to return a success/fail code.)
>  Also perhaps the reverse, namely transformToRotation (again, with
> success/fail code as not all Transforms are a simple rotation).
> .
> Maybe we also want something like transformToString for debugging.
> 
> (I wonder whether some of this would be tidier if we turn Transform
> into a bona fide class, or is that overkill?)

That's exactly what I was going to subject as I was reading through the
above :-) I think it's a good idea.

> 3. We'll put a Transform field in the CameraConfiguration, defaulting
> it to Identity. I don't think there's much else to do in the libcamera
> core.

We also need to define the behaviour of CameraConfiguration::validate()
when the requested transformation isn't supported. Adjusting it like we
do for all the other camera configuration fields should be fine, but
maybe I'm missing something, so feel free to propose an alternative.

This should be wired up in the existing documentation as appropriate,
and pipeline handlers need to be updated to update the transformation to
Identity in the validate() function.

> 4. (In the Raspberry Pi pipeline handler) Handle the user-supplied
> Transform in the CameraConfiguration, combining it the the
> properties::Rotation to set the correct h/vflip bits in the camera
> driver.
> 
> 5. Handle the Transform in the ALSC algorithm. I think we put the
> Transform in our DeviceStatus metadata and then ALSC has to flip its
> various tables in the right way.
> 
> I think that's pretty much it. Please watch for some patches in due course!
> 
> On Fri, 24 Jul 2020 at 00:10, Laurent Pinchart wrote:
> > On Tue, Jul 21, 2020 at 10:19:30AM +0100, David Plowman wrote:
> >> Hi again
> >>
> >> Replying again to my own discussion (I seem to be making a habit of this...)
> >>
> >> On Mon, 20 Jul 2020 at 08:59, David Plowman wrote:
> >>>
> >>> Hi Laurent, Naush and everyone!
> >>>
> >>> Thanks for all the discussion while I was away!
> >>>
> >>> On Mon, 13 Jul 2020 at 23:27, Laurent Pinchart wrote:
> >>>> On Mon, Jul 13, 2020 at 09:16:19AM +0100, Naushir Patuck wrote:
> >>>>> Hi Laurent,
> >>>>>
> >>>>> David is away this week, so I'll reply with some thoughts and comments
> >>>>> in his absence.
> >>>>>
> >>>>> On Sun, 12 Jul 2020 at 00:28, Laurent Pinchart wrote:
> >>>>>> On Thu, Jul 09, 2020 at 09:13:56AM +0100, David Plowman wrote:
> >>>>>>> Replying to my own post to add comments from Naush and Jacopo!
> >>>>>>>
> >>>>>>> On Mon, 6 Jul 2020 at 14:32, David Plowman wrote:
> >>>>>>>>
> >>>>>>>> Hi everyone
> >>>>>>>>
> >>>>>>>> This email takes the previous discussion of image transformations and
> >>>>>>>> moves it into a thread of its own.
> >>>>>>>>
> >>>>>>>> The problem at hand is how to specify the 2D planar transformation to
> >>>>>>>> be applied to a camera image. These are well known to us as flips,
> >>>>>>>> rotations and so on (the usual eight members of dihedral group D4).
> >>>>>>>>
> >>>>>>>> Laurent's suggestion was to apply them as standard libcamera controls.
> >>>>>>>> Some platforms may be able to apply them on a per-frame basis; others
> >>>>>>>> will not (they will require the transform to be set before the camera
> >>>>>>>> is started - once this mechanism exists). (Is there anyway to signal a
> >>>>>>>> platform's capabilities in this regard?)
> >>>>>>
> >>>>>> Not at the moment, but I think we should add that information to the
> >>>>>> ControlInfo class. It should be fairly straightforward, we can add a
> >>>>>> flags field, with a single field defined for now to tell whether the
> >>>>>> control can be modified per frame or has to be set before starting the
> >>>>>> camera.
> >>>>>
> >>>>> That would work.  However, I do struggle to see what would require an
> >>>>> application to change the orientation on a per-frame basis.  In
> >>>>> particular, if the change in orientation requires a transpose
> >>>>> operation, then the buffer sizing might also change due to alignment
> >>>>> constraints, and these would almost certainly require a set of buffer
> >>>>> re-allocations unless you pre-allocate for the largest possible size.
> >>>>> Of course, that is not to say that there may be other controls that
> >>>>> can only be set on startup (I cannot think of any specific ones right
> >>>>> now).
> >>>>
> >>>> While we could in theory support a change of rotation at runtime
> >>>> (assuming the kernel drivers and APIs would let us do so), I don't think
> >>>> that would be particularly valuable. The main use case I could image
> >>>> would be flipping the image horizontally, a.k.a. mirroring, for video
> >>>> conferencing use cases where the user may want to enable or disable
> >>>> mirroring for the local display.  One could however argue that platforms
> >>>> used for video conferencing with a local display would likely have a GPU
> >>>> that would handle the composition, making mirroring easy to implement on
> >>>> the display side, but there could be other similar use cases on less
> >>>> powerful platforms. We could force a stop/start sequence in that case,
> >>>> but are there drawbacks in allowing this to be changed at runtime ?
> >>>
> >>> To me it seems a reasonable compromise to go with the standard Control
> >>> mechanism, but perhaps to add a flag indicating whether you have to
> >>> apply the control _before_ starting the camera, or whether you can
> >>> apply it after. As Naush says, it could get quite hairy trying to do
> >>> it in the sensor, or if you had an inline ISP that was doing it, and
> >>> figuring out which frame after the request would actually have the
> >>> change might be awkward.
> >>
> >> I wanted to add a couple of further comments to this. Quite commonly,
> >> transforms may be implemented in the sensor. This means that whenever
> >> you set or change the transform, the pixel format of the raw stream
> >> may change too. In fact I've already seen this kind of problem - under
> >> some circumstances the Bayer order recorded in qcam's DNG files can be
> >> wrong.
> >>
> >> So we need to consider the interaction of the transform control with
> >> the raw stream format. Specifically:
> >>
> >> * If we change the transform at run time, are we prepared for the raw
> >> stream format to change spontaneously? Would requests have to contain
> >> metadata saying what the pixel format of the raw stream was for that
> >> capture?
> >>
> >> * We've talked about passing an initial ControlList into either the
> >> configure or start method. Whichever approach one takes, you've got to
> >> be careful about exactly when the raw stream pixel format may or may
> >> nor be correct.
> >>
> >> All this keeps nudging me back to an initial thought that the
> >> transform should be a bona fide field in the stream configuration. You
> >> set it before calling configure, and it never changes. All these
> >> problems simply disappear.
> >>
> >> Thoughts?
> >
> > I agree it would make things simpler, at the cost of not supporting some
> > use cases that are currently considered as uncommon, if not rarer. It
> > would probably be a field in the CameraConfiguration instead of the
> > StreamConfiguration though, as the transformation is mostly applied at
> > the sensor level.
> >
> > I wonder if we should start with this, and later add controls at the
> > stream level (which we don't have at the moment) to apply per-stream
> > transformations, if we encounter hardware that supports this.
> >
> > Let's also consider transposition (as a mean to implement 90° and 270°
> > rotations) in this discussion, even if we don't implement support for it
> > now. Transpositionns when supported at the hardware level, is most
> > likely implemented in the ISP (I haven't seen a sensor that can rotate
> > by 90°, but I wouldn't rule the existence of such a device out). It is
> > likely even more difficult to change at runtime than the other
> > transformations, as it changes the layout of the buffer. It is however
> > quite likely to be implemented in DMA engines, and thus at the stream
> > level.
> >
> > To summarize my opinion, I believe we need to consider all
> > transformations and the different places where they can occur (at the
> > camera level or the stream level), when they can be configured (before
> > starting the camera or at runtime), and how transformations at the
> > camera level and stream level would interact (for instance we may have
> > an offline ISP that performs rotation on the input side, the raw stream
> > would thus not support rotation, and all the process streams would have
> > the same rotation value). We can then check that we have all cases
> > covered in the design, with clear interactions, and implement a subset
> > (for instance leaving the runtime control out to start with). Does this
> > make sense ?
> >
> >> Thanks! (and sorry for being awkward...)
> >
> > It would be the pot calling the kettle black if I blamed you for that
> > :-)
> >
> >>>>>>>> For the time being we are ignoring the possibility that some platforms
> >>>>>>>> might be able to apply different transformations to different streams
> >>>>>>>> (I believe there is in any case no mechanism for per-stream controls).
> >>>>>>
> >>>>>> Indeed. That's something we may add in the future, but as let's try to
> >>>>>> avoid it if possible :-)
> >>>>>>
> >>>>>>>> Note also that raw streams will always have the orientation with which
> >>>>>>>> they come out of the camera, though again I don't believe we have a
> >>>>>>>> convenient way for a platform to indicate whether this includes the
> >>>>>>>> transform or not  (perhaps a stream could indicate its transform
> >>>>>>>> relative to the requested transform?).
> >>>
> >>> How do folks feel about this? It seems like a useful feature to me.
> >>>
> >>>>>>>> We propose to represent the transforms as "int" controls, in fact
> >>>>>>>> being members of an enum with eight entries. Further, we suggest that
> >>>>>>>> the first four entries are "identity", "hflip", "vflip", "h+vflip",
> >>>>>>>> meaning the control's maximum value can indicate whether transforms
> >>>>>>>> that involve a transposition are excluded.
> >>>>>>
> >>>>>> An enum sounds good. How would you name the next our entries ? :-)
> >>>
> >>> So I think I'd have a class or enum class (or equivalent) using 3 bits
> >>> to represent a transform. For example, bit 0 = hflip, bit 1 = vflip,
> >>> bit 2 = transpose. By convention we might say that the transpose is
> >>> applied after the flips.
> >
> > Looks good to me.
> >
> >>> I'd also add some aliases for transforms with
> >>> familiar names. And we should define a * operator to compose
> >>> transforms. Oh, and an inverse operator too.
> >
> > operator∘() ? :-) It would be interesting if C++ allowed defining
> > custom operators.
> >
> >>> I always hate naming stuff, but here's an example.Maybe use MIRROR
> >>> instead of FLIP?
> >>>
> >>> IDENTITY = ROT0 = 0,
> >>> HFLIP = 1,
> >>> VFLIP = 2,
> >>> HVFLIP = ROT180 = 3,
> >>> TRANSPOSE = 4,
> >>> ROT270 = 5,
> >>> ROT90 = 6,
> >>> TRANSPOSE_ROT180 = 7
> >
> > enum class Transformation {
> >         Identity = 0,
> >         Rot0 = Identity,
> >         HFlip = 1,
> >         VFlip = 2,
> >         HVFlip = HFlip | VFlip,
> >         Transpose = 4,
> >         Rot270 = HFlip | Transpose,
> >         Rot90 = VFlip | Transpose,
> >         Rot180Transpose = HFlip | VFlip | Transpose
> > };
> >
> > constexpr Transformation operator*(Transformation lhs, Transformation rhs)
> > {
> >         ...
> > }
> >
> > constexpr Transformation operator-(Transformation t)
> > {
> >         ...
> > }
> >
> > I really like the idea of custom operators, as it's too easy to mix the
> > order of transformations and get things wrong (we need "enum class"
> > instead of "enum" to disallow usage of default arithmetic operators).
> > There's of course the question of whether a*b should perform a first and
> > then b, or the other way around. Mathematically speaking, a ∘ b applies
> > b first, then a. I foresee this being confusing to many though.
> >
> > It seems that the D4 (or D8, depending on the notation) group is also
> > commonly constructed from the identity, rot90 and hflip operations (see
> > https://en.wikipedia.org/wiki/Dihedral_group_of_order_8). We could base
> > the numerical values on that (Identy = 0, Rot90 = 1, HFlip = 2) and
> > define the other transformations accordingly. I'm not sure what the pros
> > and cons of this would be.
> >
> > (On a side note, it looks like
> > https://en.wikipedia.org/wiki/File:Dih4_cycle_graph.svg has the order of
> > the operations reversed, while
> > https://en.wikipedia.org/wiki/File:Dihedral_group4_example.png seems
> > correct.)
> >
> >>> You'll notice I have no natural name for the last entry! (any offers?)
> >
> > Not at the moment I'm afraid.
> >
> >>>>>>>> Naush:
> >>>>>>>>
> >>>>>>>> ... the pipeline handlers need to maintain a list of
> >>>>>>>> controls supported, like in include/libcamera/ipa/raspberrypi.h
> >>>>>>>> (RPiControls).  This then gets exported to the CameraData base class
> >>>>>>>> member controlInfo_.  ARequest will not allow setting a
> >>>>>>>> libcamera::control that is not listed in CameraData::controlInfo_
> >>>>>>>
> >>>>>>> Jacopo:
> >>>>>>>
> >>>>>>> That's correct, it seems to me from this discussion, a ControlInfo
> >>>>>>> could be augmented with information about a control being supported
> >>>>>>> per-frame or just at at configuration time (or else...)
> >>>>>>
> >>>>>> Seems we agree :-)
> >>>>>>
> >>>>>> Please also note that include/libcamera/ipa/raspberrypi.h lists controls
> >>>>>> that are supported by the IPA. If there are controls that are fully
> >>>>>> implemented on the pipeline handler side without involving the IPA (I'm
> >>>>>> not suggesting this is or is not the case here, it's a general remark),
> >>>>>> they can be added to the control info map by the pipeline handler.
> >>>>>>
> >>>>>>>> Notes on the Raspberry Pi implementation:
> >>>>>>>> Only transpose-less transforms will be allowed, and only before the
> >>>>>>
> >>>>>> What do you mean by transpose-less transforms ?
> >>>
> >>> So you can look at the list I made above. Another way to think of it
> >>> is that a transpose-less transform preserves input rows of pixels as
> >>> output rows of pixels. A transform with a transpose in it turns input
> >>> rows of pixels into output *columns*.
> >
> > I've got it now, thanks.
> >
> >>>>> Anything that involves only a combination of horizontal and vertical
> >>>>> flips would be a tranpose-less transform, e.g. rot180, rot180 +
> >>>>> mirror.  Any rot 90/270 would require a transpose operation, and we
> >>>>> cannot do that with the hardware block.
> >>>>>
> >>>>>>>> camera is started. We will support them by setting the corresponding
> >>>>>>>> control bits in the camera driver (so raw streams will include the
> >>>>>>>> transform).
> >>>>>>
> >>>>>> Does this mean configuring h/v flip at the camera sensor level ? I
> >>>>>> assume that will be the most usual case. I wonder if offline ISPs
> >>>>>> typically support h/v flip.
> >>>>>
> >>>>> The RPi ISP does support it.  However, we do find it much much easier
> >>>>> overall if the flips occurred at the source.
> >>>>>
> >>>>>>>> This means the ALSC (Auto Lens Shading) algorithm will
> >>>>>>>> have to know whether camera images are oriented differently from the
> >>>>>>>> calibration images.
> >>>>>>
> >>>>>> How do you think we should convey the orientation of the calibration
> >>>>>> data ?
> >>>>>
> >>>>> We could have a field in our tuning file specifying the orientation of
> >>>>> the calibration table - or always enforce the calibration must happen
> >>>>> at rot 0.  Then any orientation change can be passed into the IPA (via
> >>>>> controls?) so that the calibration table can be flipped appropriately.
> >>>>> Personally, I prefer to have all calibration data fixed at rot 0, but
> >>>>> David may have other opinions :)  However, all this is up to vendors
> >>>>> to decide for themselves I suppose.
> >>>>
> >>>> I think I agree with you here, it seems less confusing and error-prone
> >>>> to standardize the calibration rotation, but I may be missing something.
> >>>
> >>> Down at the Raspberry Pi level, I think it might be natural to include
> >>> the camera transform in our DeviceStatus metadata. This makes it
> >>> immediately available to any control algorithms that care.
> >>>
> >>> Doing lens shading calibration with the identity transform certainly
> >>> makes sense and reduces confusion. But you know someone somewhere will
> >>> get themselves in a muddle and discover that they've got their
> >>> calibration images the wrong way round.
> >>>
> >>> So we _will_ end up with a "calibration transform" field in ALSC
> >>> recording the orientation of those calibration images. However, with
> >>> those compose and inverse operations defined for transforms, it will
> >>> merely be a (slightly head-scratching) one-liner to deduce the right
> >>> transform for the output tables!

-- 
Regards,

Laurent Pinchart