[libcamera-devel] libcamera transform control

Fri Jul 24 01:10:03 CEST 2020

Hi David,

On Tue, Jul 21, 2020 at 10:19:30AM +0100, David Plowman wrote:
> Hi again
> 
> Replying again to my own discussion (I seem to be making a habit of this...)
> 
> On Mon, 20 Jul 2020 at 08:59, David Plowman wrote:
> >
> > Hi Laurent, Naush and everyone!
> >
> > Thanks for all the discussion while I was away!
> >
> > On Mon, 13 Jul 2020 at 23:27, Laurent Pinchart wrote:
> >> On Mon, Jul 13, 2020 at 09:16:19AM +0100, Naushir Patuck wrote:
> >>> Hi Laurent,
> >>>
> >>> David is away this week, so I'll reply with some thoughts and comments
> >>> in his absence.
> >>>
> >>> On Sun, 12 Jul 2020 at 00:28, Laurent Pinchart wrote:
> >>>> On Thu, Jul 09, 2020 at 09:13:56AM +0100, David Plowman wrote:
> >>>>> Replying to my own post to add comments from Naush and Jacopo!
> >>>>>
> >>>>> On Mon, 6 Jul 2020 at 14:32, David Plowman wrote:
> >>>>>>
> >>>>>> Hi everyone
> >>>>>>
> >>>>>> This email takes the previous discussion of image transformations and
> >>>>>> moves it into a thread of its own.
> >>>>>>
> >>>>>> The problem at hand is how to specify the 2D planar transformation to
> >>>>>> be applied to a camera image. These are well known to us as flips,
> >>>>>> rotations and so on (the usual eight members of dihedral group D4).
> >>>>>>
> >>>>>> Laurent's suggestion was to apply them as standard libcamera controls.
> >>>>>> Some platforms may be able to apply them on a per-frame basis; others
> >>>>>> will not (they will require the transform to be set before the camera
> >>>>>> is started - once this mechanism exists). (Is there anyway to signal a
> >>>>>> platform's capabilities in this regard?)
> >>>>
> >>>> Not at the moment, but I think we should add that information to the
> >>>> ControlInfo class. It should be fairly straightforward, we can add a
> >>>> flags field, with a single field defined for now to tell whether the
> >>>> control can be modified per frame or has to be set before starting the
> >>>> camera.
> >>>
> >>> That would work.  However, I do struggle to see what would require an
> >>> application to change the orientation on a per-frame basis.  In
> >>> particular, if the change in orientation requires a transpose
> >>> operation, then the buffer sizing might also change due to alignment
> >>> constraints, and these would almost certainly require a set of buffer
> >>> re-allocations unless you pre-allocate for the largest possible size.
> >>> Of course, that is not to say that there may be other controls that
> >>> can only be set on startup (I cannot think of any specific ones right
> >>> now).
> >>
> >> While we could in theory support a change of rotation at runtime
> >> (assuming the kernel drivers and APIs would let us do so), I don't think
> >> that would be particularly valuable. The main use case I could image
> >> would be flipping the image horizontally, a.k.a. mirroring, for video
> >> conferencing use cases where the user may want to enable or disable
> >> mirroring for the local display.  One could however argue that platforms
> >> used for video conferencing with a local display would likely have a GPU
> >> that would handle the composition, making mirroring easy to implement on
> >> the display side, but there could be other similar use cases on less
> >> powerful platforms. We could force a stop/start sequence in that case,
> >> but are there drawbacks in allowing this to be changed at runtime ?
> >
> > To me it seems a reasonable compromise to go with the standard Control
> > mechanism, but perhaps to add a flag indicating whether you have to
> > apply the control _before_ starting the camera, or whether you can
> > apply it after. As Naush says, it could get quite hairy trying to do
> > it in the sensor, or if you had an inline ISP that was doing it, and
> > figuring out which frame after the request would actually have the
> > change might be awkward.
> 
> I wanted to add a couple of further comments to this. Quite commonly,
> transforms may be implemented in the sensor. This means that whenever
> you set or change the transform, the pixel format of the raw stream
> may change too. In fact I've already seen this kind of problem - under
> some circumstances the Bayer order recorded in qcam's DNG files can be
> wrong.
> 
> So we need to consider the interaction of the transform control with
> the raw stream format. Specifically:
> 
> * If we change the transform at run time, are we prepared for the raw
> stream format to change spontaneously? Would requests have to contain
> metadata saying what the pixel format of the raw stream was for that
> capture?
> 
> * We've talked about passing an initial ControlList into either the
> configure or start method. Whichever approach one takes, you've got to
> be careful about exactly when the raw stream pixel format may or may
> nor be correct.
> 
> All this keeps nudging me back to an initial thought that the
> transform should be a bona fide field in the stream configuration. You
> set it before calling configure, and it never changes. All these
> problems simply disappear.
> 
> Thoughts?

I agree it would make things simpler, at the cost of not supporting some
use cases that are currently considered as uncommon, if not rarer. It
would probably be a field in the CameraConfiguration instead of the
StreamConfiguration though, as the transformation is mostly applied at
the sensor level.

I wonder if we should start with this, and later add controls at the
stream level (which we don't have at the moment) to apply per-stream
transformations, if we encounter hardware that supports this.

Let's also consider transposition (as a mean to implement 90° and 270°
rotations) in this discussion, even if we don't implement support for it
now. Transpositionns when supported at the hardware level, is most
likely implemented in the ISP (I haven't seen a sensor that can rotate
by 90°, but I wouldn't rule the existence of such a device out). It is
likely even more difficult to change at runtime than the other
transformations, as it changes the layout of the buffer. It is however
quite likely to be implemented in DMA engines, and thus at the stream
level.

To summarize my opinion, I believe we need to consider all
transformations and the different places where they can occur (at the
camera level or the stream level), when they can be configured (before
starting the camera or at runtime), and how transformations at the
camera level and stream level would interact (for instance we may have
an offline ISP that performs rotation on the input side, the raw stream
would thus not support rotation, and all the process streams would have
the same rotation value). We can then check that we have all cases
covered in the design, with clear interactions, and implement a subset
(for instance leaving the runtime control out to start with). Does this
make sense ?

> Thanks! (and sorry for being awkward...)

It would be the pot calling the kettle black if I blamed you for that
:-)

> >>>>>> For the time being we are ignoring the possibility that some platforms
> >>>>>> might be able to apply different transformations to different streams
> >>>>>> (I believe there is in any case no mechanism for per-stream controls).
> >>>>
> >>>> Indeed. That's something we may add in the future, but as let's try to
> >>>> avoid it if possible :-)
> >>>>
> >>>>>> Note also that raw streams will always have the orientation with which
> >>>>>> they come out of the camera, though again I don't believe we have a
> >>>>>> convenient way for a platform to indicate whether this includes the
> >>>>>> transform or not  (perhaps a stream could indicate its transform
> >>>>>> relative to the requested transform?).
> >
> > How do folks feel about this? It seems like a useful feature to me.
> >
> >>>>>> We propose to represent the transforms as "int" controls, in fact
> >>>>>> being members of an enum with eight entries. Further, we suggest that
> >>>>>> the first four entries are "identity", "hflip", "vflip", "h+vflip",
> >>>>>> meaning the control's maximum value can indicate whether transforms
> >>>>>> that involve a transposition are excluded.
> >>>>
> >>>> An enum sounds good. How would you name the next our entries ? :-)
> >
> > So I think I'd have a class or enum class (or equivalent) using 3 bits
> > to represent a transform. For example, bit 0 = hflip, bit 1 = vflip,
> > bit 2 = transpose. By convention we might say that the transpose is
> > applied after the flips.

Looks good to me.

> > I'd also add some aliases for transforms with
> > familiar names. And we should define a * operator to compose
> > transforms. Oh, and an inverse operator too.

operator∘() ? :-) It would be interesting if C++ allowed defining
custom operators.

> > I always hate naming stuff, but here's an example.Maybe use MIRROR
> > instead of FLIP?
> >
> > IDENTITY = ROT0 = 0,
> > HFLIP = 1,
> > VFLIP = 2,
> > HVFLIP = ROT180 = 3,
> > TRANSPOSE = 4,
> > ROT270 = 5,
> > ROT90 = 6,
> > TRANSPOSE_ROT180 = 7

enum class Transformation {
	Identity = 0,
	Rot0 = Identity,
	HFlip = 1,
	VFlip = 2,
	HVFlip = HFlip | VFlip,
	Transpose = 4,
	Rot270 = HFlip | Transpose,
	Rot90 = VFlip | Transpose,
	Rot180Transpose = HFlip | VFlip | Transpose
};

constexpr Transformation operator*(Transformation lhs, Transformation rhs)
{
	...
}

constexpr Transformation operator-(Transformation t)
{
	...
}

I really like the idea of custom operators, as it's too easy to mix the
order of transformations and get things wrong (we need "enum class"
instead of "enum" to disallow usage of default arithmetic operators).
There's of course the question of whether a*b should perform a first and
then b, or the other way around. Mathematically speaking, a ∘ b applies
b first, then a. I foresee this being confusing to many though.

It seems that the D4 (or D8, depending on the notation) group is also
commonly constructed from the identity, rot90 and hflip operations (see
https://en.wikipedia.org/wiki/Dihedral_group_of_order_8). We could base
the numerical values on that (Identy = 0, Rot90 = 1, HFlip = 2) and
define the other transformations accordingly. I'm not sure what the pros
and cons of this would be.

(On a side note, it looks like
https://en.wikipedia.org/wiki/File:Dih4_cycle_graph.svg has the order of
the operations reversed, while
https://en.wikipedia.org/wiki/File:Dihedral_group4_example.png seems
correct.)

> > You'll notice I have no natural name for the last entry! (any offers?)

Not at the moment I'm afraid.

> >>>>>> Naush:
> >>>>>>
> >>>>>> ... the pipeline handlers need to maintain a list of
> >>>>>> controls supported, like in include/libcamera/ipa/raspberrypi.h
> >>>>>> (RPiControls).  This then gets exported to the CameraData base class
> >>>>>> member controlInfo_.  ARequest will not allow setting a
> >>>>>> libcamera::control that is not listed in CameraData::controlInfo_
> >>>>>
> >>>>> Jacopo:
> >>>>>
> >>>>> That's correct, it seems to me from this discussion, a ControlInfo
> >>>>> could be augmented with information about a control being supported
> >>>>> per-frame or just at at configuration time (or else...)
> >>>>
> >>>> Seems we agree :-)
> >>>>
> >>>> Please also note that include/libcamera/ipa/raspberrypi.h lists controls
> >>>> that are supported by the IPA. If there are controls that are fully
> >>>> implemented on the pipeline handler side without involving the IPA (I'm
> >>>> not suggesting this is or is not the case here, it's a general remark),
> >>>> they can be added to the control info map by the pipeline handler.
> >>>>
> >>>>>> Notes on the Raspberry Pi implementation:
> >>>>>> Only transpose-less transforms will be allowed, and only before the
> >>>>
> >>>> What do you mean by transpose-less transforms ?
> >
> > So you can look at the list I made above. Another way to think of it
> > is that a transpose-less transform preserves input rows of pixels as
> > output rows of pixels. A transform with a transpose in it turns input
> > rows of pixels into output *columns*.

I've got it now, thanks.

> >>> Anything that involves only a combination of horizontal and vertical
> >>> flips would be a tranpose-less transform, e.g. rot180, rot180 +
> >>> mirror.  Any rot 90/270 would require a transpose operation, and we
> >>> cannot do that with the hardware block.
> >>>
> >>>>>> camera is started. We will support them by setting the corresponding
> >>>>>> control bits in the camera driver (so raw streams will include the
> >>>>>> transform).
> >>>>
> >>>> Does this mean configuring h/v flip at the camera sensor level ? I
> >>>> assume that will be the most usual case. I wonder if offline ISPs
> >>>> typically support h/v flip.
> >>>
> >>> The RPi ISP does support it.  However, we do find it much much easier
> >>> overall if the flips occurred at the source.
> >>>
> >>>>>> This means the ALSC (Auto Lens Shading) algorithm will
> >>>>>> have to know whether camera images are oriented differently from the
> >>>>>> calibration images.
> >>>>
> >>>> How do you think we should convey the orientation of the calibration
> >>>> data ?
> >>>
> >>> We could have a field in our tuning file specifying the orientation of
> >>> the calibration table - or always enforce the calibration must happen
> >>> at rot 0.  Then any orientation change can be passed into the IPA (via
> >>> controls?) so that the calibration table can be flipped appropriately.
> >>> Personally, I prefer to have all calibration data fixed at rot 0, but
> >>> David may have other opinions :)  However, all this is up to vendors
> >>> to decide for themselves I suppose.
> >>
> >> I think I agree with you here, it seems less confusing and error-prone
> >> to standardize the calibration rotation, but I may be missing something.
> >
> > Down at the Raspberry Pi level, I think it might be natural to include
> > the camera transform in our DeviceStatus metadata. This makes it
> > immediately available to any control algorithms that care.
> >
> > Doing lens shading calibration with the identity transform certainly
> > makes sense and reduces confusion. But you know someone somewhere will
> > get themselves in a muddle and discover that they've got their
> > calibration images the wrong way round.
> >
> > So we _will_ end up with a "calibration transform" field in ALSC
> > recording the orientation of those calibration images. However, with
> > those compose and inverse operations defined for transforms, it will
> > merely be a (slightly head-scratching) one-liner to deduce the right
> > transform for the output tables!

-- 
Regards,

Laurent Pinchart