[libcamera-devel] [PATCH 0/1] Proposal of mapping between camera configurations and requested configurations

Tomasz Figa tfiga at chromium.org
Mon Sep 7 17:33:34 CEST 2020


On Fri, Sep 4, 2020 at 2:53 PM Jacopo Mondi <jacopo at jmondi.org> wrote:
>
> Hi Tomasz,
>
> On Thu, Sep 03, 2020 at 02:36:47AM +0200, Tomasz Figa wrote:
> > Hi Jacopo,
> >
> > On Tue, Sep 1, 2020 at 6:05 PM Jacopo Mondi <jacopo at jmondi.org> wrote:
> > >
> > > Hi Hiro,
> > >    first of all I'm very sorry for the un-aceptable delay in giving
> > > you a reply.
> > >
> > > If that's of any consolation we have not ignored your email, but it
> > > has gone through several internal discussion, as it come at the
> > > time where the JPEG support was being merged and the two things
> > > collided a bit. Add a small delay due to leaves, and here you have a
> > > month of delay. Again, we're really sorry for this.
> > >
> > > > On Thu, Aug 06, 2020 at 03:17:05PM +0900, Hirokazu Honda wrote:
> > > > This is a proposal about how to map camera configurations and
> > > > requested configurations in Android Camera HAL adaptation layer.
> > > > Please also see the sample code in the following patch.
> > > >
> > > > # Software Stream Processing in libcamera
> > > >
> > > > _hiroh at chromium.org / Draft: 2020-08-06_
> > > >
> > > >
> > >
> > > As an initial and un-related note looking at the patch, I can see you
> > > are following the ChromeOS coding style. Please note that libcamera
> > > has it's own code style, which you can find documented at
> > >
> > > - https://www.libcamera.org/coding-style.html#coding-style-guidelines
> > >
> > > And we have a style checker, which can assist with this. The best way to
> > > use the style checker is to install it as a git-hook.
> > >
> > > I understand that this is an RFC, but we will need this style to be
> > > followed to be able to integrate any future patches.
> > >
> > > >
> > > > # Objective
> > > >
> > > > Perform frame processing in libcamera to achieve requested stream
> > > > configurations that are not supported natively by the camera
> > > > hardware, but required by the Android Camera HAL interface.
> > > >
> > >
> > > As you can see in the camera_device.cpp file we have tried to list the
> > > resolution and image formats that the Android Camera3 specification
> > > lists as mandatory or suggested.
> > >
> > > Do you have a list of additional requirements to add ?
> > > Are there ChromeOS specific requirements ?
> > > Or is this meant to full-fill the above stated requirements on
> > > platforms that cannot satisfy them ?
> > >
> >
> > There can be per-device resolutions that should be supported due to
> > product requirements. Our current HAL implementations use
> > configuration files which define the required configurations.
> >
> > That said, I think it's an independent problem, which we can likely
> > ignore for now, and I believe what Hiro had in mind was the latter -
> > platforms that cannot satisfy them. This also includes the cases you
> > mentioned below, when a number of streams greater than the number of
> > native hardware streams is requested.
> >
> > As usual, the Android Camera2 API documentation is the authoritative
> > source of information here:
> > https://developer.android.com/reference/android/hardware/camera2/CameraDevice.html#createCaptureSession(android.hardware.camera2.params.SessionConfiguration)
> >
> > The tables lower on the page include required stream combinations for
> > various capability levels.
> >
>
> Those are the requirements I think should be encoded.
> So far, as a reference for the supported formats and resolutions I
> used as reference the documentation of the scaler.availableStreamConfigurations
> metadata tag
>

Yeah, the various pieces of the documentation are scattered across
many places sadly. Some bits are quite difficult to discover and often
show up only when trying to get CTS to pass...

> > > >
> > > > # Background
> > > >
> > > >
> > > > ### Libcamera
> > > >
> > > > In addition to its native API, libcamera[^1] provides a number of
> > > > camera APIs, for example, V4L2 Webcam API and Android Camera HAL3.
> > > > The platform specific implementations are wrapped in libcamera core
> > > > and a caller of libcamera doesn’t have to take care the platform.
> > > >
> > > >
> > > > ### Android Camera HAL
> > > >
> > > > Chrome OS camera stack uses Android Camera HAL[^2] interface.
> > > > Libcamera provides Android Camera HAL with an adaptation layer[^3]
> > > > between libcamera core part and Android HAL, which is called
> > > > Android HAL adaptation layer in this document.
> > > >
> > > > To present a uniform set of capabilities to the API users, Android
> > > > Camera HAL API[^4] allows caller to request stream configurations
> > > > that are beyond the device capabilities. For example, while a
> > > > camera device is able to produce a single stream, a HAL caller
> > > > requests three possibly different resolution streams (PRIV, YUV,
> > > > JPEG). However, libcamera core implementation produces
> > > > camera-capable streams. Therefore, we have to create three streams
> > > > from the single stream produced by libcamera.
> > > >
> > > > Requests beyond the device capability is supported only in Android
> > > > HAL at this moment. I describe the design in this document that the
> > > > stream processing is performed in Android HAL adaptation layer.
> > > >
> > > >
> > > > # Overview
> > > >
> > > >
> > > > ## Current implementation
> > > >
> > > > The requested stream configuration is given by
> > > > _camera3_device_t->ops->configure_streams()_ in Android Camera HAL.
> > > > This delegates CameraDevice::configureStreams()[^5] in libcamera.
> > > > The current implementation attempts all the given configurations
> > > > and succeeds if and only if the camera device can produces them
> > > > without any adjustments.
> > > >
> > > >
> > > > ### libcamera::CameraConfiguration
> > > >
> > > > It is CameraConfiguration[^6] that judges whether adjustments are
> > > > required, or even requested configurations are infeasible.
> > > >
> > > > The procedure of configuration is that CameraDevice
> > > >
> > > >
> > > >
> > > > 1. Adds every configuration by
> > > > CameraConfiguration::addConfiguration(). 2. Assorts the added
> > > > configurations by CameraConfiguration::validate().
> > > >
> > > > CameraConfiguration, especially for validate(), is implemented per
> > > > pipeline. For instance, the CameraConfiguration implementation for
> > > > IPU3 is IPU3CameraConfiguration[^7].
> > > >
> > > > validate() returns one of the below,
> > > >
> > > >
> > > >
> > > > *   Valid *    A camera can produce streams with requested
> > > > configurations. *   Adjusted *   A camera cannot produce streams
> > > > with requested configurations as-is, but can produce streams with
> > > > different pixel formats or resolutions. *   Invalid *   A camera
> > > > cannot produce streams with either requested configurations or
> > > > different pixel formats and resolutions. For instance, this is
> > > > returned when the larger resolution is requested than the maximum
> > > > supported one?
> > > >
> > > > What we need to resolve is, when Adjusted is returned, to map
> > > > adjusted camera streams to requested camera streams and required
> > > > processing.
> > > >
> > > >
> > > > ## Stream processing
> > > >
> > > > The processing to be thought of are followings.
> > > >
> > > >
> > > >
> > > > *   Down-scaling *   We don’t perform up-scaling because it affects
> > > > stream qualities *   Down-scaling is allowed for the same ratio to
> > > > avoid producing distorted frames. For instance, scaling from
> > > > 1280x720 (16:9) to 480x360 (4:3) is not allowed. *   Cropping *
> > > > Cropping is executed only to change the frame ratio. Thus it must
> > > > be done after down-scaling if required. For example, to convert
> > > > 1280x720 to 480x360, first down-scale to 640x360 and then crop to
> > > > 480x360.
> > > >
> > > > *   Format conversion *   Pixel format conversion *   JPEG
> > > > encoding
> > > >
> > > >
> > > > # Proposal
> > > >
> > > > Basically we only need to consider a mapping algorithm after
> > > > validate(). However, to obtain less processing and better stream
> > > > qualities, we should reorder given configurations within
> > > > validate().
> > >
> > > >
> > >
> > > The way the HAL layer works, and I agree something has changed since
> > > the recent merge of the JPEG support, is slightly more complex, and
> > > boils down to the following steps
> > >
> > > 1) Build the list of supported configuration
> > >
> > > When a CameraDevice is initialized, a list of supported stream
> > > configuration is built, in order to be able to report to Android
> > > what it could ask. See CameraDevice::initializeStreamConfigurations().
> > >
> > > We currently report the libcamera::Camera supported formats and
> > > size, plus additional JPEG streams which are produced in the HAL.
> > > This creates the first distinction between HAL-only-streams and
> > > libcamera-streams, that you correctly identified in your summary.
> > >
> > > Here, as we do (naively at the moment) for JPEG, you should inspect
> > > the libcamera-streams and pass them through your code that infer
> > > what kind of HAL-only-streams can be produced from the available
> > > libcamera ones. If I'm not mistaken Android only asks for stream
> > > combinations reported through the
> > > ANDROID_SCALER_AVAILABLE_STREAM_CONFIGURATIONS_OUTPUT metadata, and
> > > if you do not augment that list at initialization time, you won't
> > > ever be asked for non-native streams later.
> >
> > I'm not entirely sure about this, because there are mandatory stream
> > configurations defined for the Camera2 API. If something is mandatory,
> > I suspect there is no need to query for the availability of it.
>
> Well, we need to query the libcamera::Camera to know which of the
> required streams could be natively produced and which ones instead has
> to be produced in the HAL layer.
>

I was referring to the Android side, i.e. even if something is not
reported in ANDROID_SCALER_AVAILABLE_STREAM_CONFIGURATIONS_OUTPUT, the
client could still possibly request it if the spec defines it as
mandatory.

> >
> > That said, I'd assume that CTS verifies whether all the required
> > configurations are both reported and supported, so perhaps there isn't
> > much to worry about here.
> >
> > >
> > > 2) Camera configuration
> > >
> > > That's the part you focused on, and a good part of what you wrote
> > > could indeed be used to move forward.
> > >
> > > The problem here can be summarized as: 'for each stream android
> > > requested, the ones that cannot be natively produced by the
> > > libcamera::Camera shall be mapped on the closest possible native
> > > stream' (and here we could apply your implementation that identifies
> > > the 'best matching' stream)
> > >
> > > Unfortunately the problem breaks down into several others:
> > >
> > > 1) How to identify if a stream is a native or an HAL only one ?
> > > Currently we get away with a trivial "if (!JPEG)" as all the non-JPEG
> > > streams are native ones. This should be made smarter.
> > >
> > > 2) How to best map HAL-streams to libcamera-streams. Assume to
> > > receive a request for two YUV streams in 1080p and 720p resolutions.
> > > The libcamera::Camera claims to be able to support both, so we can
> > > simply go and ask for those two streams. Then we receive a request
> > > for the same streams plus a full-size JPEG one. What we have to do is
> > > ask for the full-size YUV stream and use it to produce JPEG, and one
> > > 1080p YUV to produce both the YUV streams in 1080p and 720p
> > > resolutions. In the case we'll then have to crop one YUV stream, and
> > > dedicate a full-size YUV one to JPEG. Alternatively we can produce
> > > 1080p from the same full-size YUV used to produce JPEG, and ask for a
> > > 720p stream to the camera.
> > >
> >
> > Right, there are multiple possible choices. I've discussed this and
> > concluded that there might be some help needed from the pipeline
> > handler to tell the client which configuration is better from the
> > hardware point of view.
> >
>
> I don't think the HAL could do all by itself, I agree. The number of
> combination to test would be large and there's currently no way to get
> a taste of what would be the better combination for the HW.
>
> Have you already thought how this can be improved ?
>

While it would be really nice to have some smart heuristics for this,
I think it might be difficult to have something automatic that works
for everyone, because there could also be business decisions involved
in the process. For example, scaling at component X could result in
sharper but possibly more noisy image, while at component Y could be
less noisy, but also not so sharp. The decision is certainly a matter
of someone's preferences.

I know we're trying to run away from configuration files as much as
possible, but I think this might be one of the places we need some way
to express integration-specific preferences.

>
> > > Now, Android specifies some format/size requirements in the Camera3
> > > specification, I assume ChromeOS has maybe others. As we tried to
> > > record the Camera3 requirements and satisfy them in the code, I
> > > think the additional streams that are required should be someone
> > > listed first, in order to be able to create only the -required-
> > > additional streams.
> > >
> > > For an example, have a look at CameraDevice::camera3Resolutions and
> > > CameraDevice::camera3FormatsMap, these encode the Camera3
> > > specification requirements.
> > >
> > > Once the additional requirments have been encoded, I would then
> > > proceed to divide them in 3 categories (there might very well be
> > > others):
> >
> > I believe we don't have any additional requirements for now.
> >
> > >
> > >   1) Format conversions: Convert to one pixel format to the other. What
> > >      happens today with JPEG more or less. We have an Encode interface for
> > >      that purpose and I guess format converter should be implemented
> > >      according to it, but that has to be discussed.
> > >
> >
> > One thing that is also missing today is MJPEG decoding. This is also
> > required to fulfill the stream configuration requirements, since it's
> > assumed that the formats are displayable and explicit YUV streams are
> > included as well.
> >
>
> s/Encoder/Transcoder ?
>
> Decoding and encoding fall in the same category to me, but I agree
> this represents a nice use case to start implementing something. I
> assume the USB HAL has already some of that in place, right ?
>

Right, the handling from the HAL3 API point of view is implemented
there, but the decoding itself is implemented in Chromium and there is
an IPC-based API exposed for the HALs to use. The layer in Chromium
supports multiple backends (software, V4L2, VAAPI) and also has
clients other than the camera.

> > >   2) Down-scale/crop: Assuming it happens in the HAL using maybe some
> > >      external components, down-scaling/cropping produce additional
> > >      resolutions from the list of natively supported ones. Given a
> > >      powerful enough implementation we could produce ANY format <= a given
> > >      native format, but that's not what we want I guess. We shall
> > >      establish a list of additional resolutions we want to report to the
> > >      framework layer, and find out how to produce them from the native
> > >      streams.
> >
> > Given the above, we should be able to stick to the resolutions we have
> > already supported in the adaptation layer.
> >
>
> This then boils down again to improve how we identify what could be
> natively produced by the camera and has to be produced in the HAL
> using one of the native streams.
>
> I think Hiro's proposal addresses the second part (streams
(> identification) but the first one has to be taken into account,
> probably in first place or at least in parallel ?
>

I understood his proposal as that we ask the pipeline handler to
adjust the requested configuration to the best supported by the
hardware and then the HAL decide what to do further with them.

The proposal mentions that the pipeline handler specifically has to
sort the requested streams by aspect ratios and resolutions, to
provide as many of the requested aspect ratios as possible and highest
requested resolutions to avoid upscaling. However I wonder if it's not
oversimplified. Let's consider the example below, on RKISP1.

1) PRIV 1280x640 (preview)
2) YUV 1920x1080 (record)
3) JPEG 1920x1440 (still, full sensor resolution)

Also note the Android cropping requirements:

"
* In all cases, the stream crop must be centered within the full crop region,
* and each stream is only either cropped horizontally or vertical relative to
* the full crop region, never both.
"

(https://android.googlesource.com/platform/hardware/libhardware/+/master/include/hardware/camera3.h#988)

The ISP can produce two streams. If we apply the sorting by resolution
and ratio, we get:

1) 1920x1440
2) 1920x1080
3) 1280x640

If we select the first two, we don't end up producing the most
efficient setup, because we need to scale 1920x1080->1280x720 in
software, which wouldn't be necessary if we selected 1920x1440
(croppable to 1920x1080) and 1280x640.

How about something like this:

1) Sort the horizontal resolution.

1920x1080
1920x1440
1280x640

2) Sort the vertical resolution.

1920x1440
1920x1080
1280x640

3) Discard entries with the same horizontal resolution, but smaller
vertical resolutions, until the number of streams is small enough to
be supported by the hardware.

1920x1440
1280x640

For RKISP1 we would be fine here, but if we have even more constrained
hardware, like a UVC camera, we would have to go even further.

4) If we still have more streams than we can support, sort them by
their aspect ratio and resolution and eliminate all except the highest
resolution of each aspect ratio.

1920x1440
1280x640

5) Sort by resolutions again and fold the lowest resolution streams
into higher resolution streams by expanding their FoV to cover the
sensor area required by both streams. Repeat until the number of
streams is low enough.

1920x1440

Of course the above prefers scaling higher resolution images in the
hardware, which could already be a business decision rather than an
universal choice. The selection may also depend on the availability of
additional hardware, like a V4L2 mem2mem image processor to do the
scaling.

The above also lacks handling of any platform-specific constraints,
such as min/max scaling ratio, resolution limits of hardware streams,
etc., which is where it needs to rely on the pipeline handler.

> > >
> > >    3) Image transformations A bit a lateral issue, but I assume some
> > >       'transformations' can be performed by HAL only components. This
> > >       mostly depends on handling streams with some specific metadata
> > >       associated, which needs to be handled in the HAL. The most trivial
> > >       example is rotation. If the libcamera::Camera is for whatever reason
> > >       unable to rotate the images, they have to be software rotated in the
> > >       HAL. This won't require any stream mapping, but rather inspecting
> > >       metadata and pass the native streams through an additional processing
> > >       layer.
> >
> > Right. I honestly hope we won't need to do software rotation on any
> > reasonable hardware platform, but AFAIK we still have some in Chrome
> > OS for which we do, in some specific cases, like a tablet with the
> > camera in landscape orientation, but the device in portrait
> > orientation.
> >
>
> I hope it's a corner case as well, but who knows, Android runs on a
> great variety of platforms nowadays, some of them might not be that
> 'reasonable' ? I agree this is a corner case at the moment though
>

Sadly, it's quite the opposite and we need to support it as well as we
can. Ideally GLES or a simple V4L2 mem2mem device could be used to
perform the rotation.

> > >
> > > 3) Buffer allocation/handling:
> > >    When performing any conversions between a HAL stream and a libcamera
> > >    stream we may need to allocate an intermediate buffer to provide storage
> > >    for processing the frame in libcamera, with the conversion entity
> > >    reading from the libcamera buffer and writing into the android buffer.
> > >    This is likely possible with the existing FrameBufferAllocator classes,
> > >    but may have extra requirements.
> >
> > I suppose we could have 3 types of buffers here:
> > 1) buffers written by hardware driven by libcamera
> >  - without any software processing these would be directly provided by
> > Android and imported to libcamera,
> >  - with processing, I assume libcamera would have to allocate its own,
> > but I guess that would just end up being V4L2 MMAP buffers?
>
> These days we're looking at this part with the idea of exploiting the
> FrameBufferAllocator abstraction we also provide to applications.
>
> This would end up in
> 1) Allocating buffers in the video devices (the pipeline handler
> decides which one)
> 2) Exporting them as dmabuf file descriptor
> 3) Re-importing them in video devices at Request processing time.
>
> > 2) buffers between processing steps - if software only, an arbitrary
> > malloc buffer could be used.
>
> I don't see this distinct from the previous point. Whenever we need an
> intermediate buffer that has to be processed (and the result of the
> processing written to the Android provided buffer) it need to be
> allocated in libcamera.
>
> Or do you mean buffers for additional processing steps, in example a
> scratch buffer used during an encoding procedure ? In that case I
> think the transcoder implementation would deal with that as they
> prefer, in example an HW accelerated component might need to allocate
> buffers accessible by the CPU and the accelerator, and in this case it
> will provide buffers to libcamera to import as well it will allocate
> intermediate buffers if it requires any.

Let's say we do the following:

camera output --1--> software scale --2--> JPEG encode --3--> client

The buffer at 1) would be libcamera-allocated DMAble memory. At 3) -
Android-allocated DMA-buf. At 2) - needs to be libcamera-allocated,
but no expectations about DMAbility and likely shouldn't be a DMA-buf,
because on ARM that would currently mean an uncached mapping,
significantly affecting the performance of the software processing.

>
> > 3) buffers for the end results - always provided by Android and
> >  - without processing they are the same thing as 1),
> >  - with processing they need to be mapped in the adaptation layer and
> > wouldn't reach libcamera itself.
>
> Again this sounds the same as you 1.2 point. I feel I missed something
> :)
>

The point is that for some processing steps, the buffer might be
either imported or allocated. Moreover, for the allocation, there
might be different requirements, like I described above.

> >
> > For consistency, one might be tempted to use some external allocator,
> > like gralloc, and import the hardware buffers to libcamera in both
> > cases, but there are limitations - gralloc only knows how to allocate
> > the end result buffers, so it wouldn't give us arbitrary buffers
> > without some ugly hacks and DMA-buf heaps still need some time to gain
> > adoption. Therefore I think we might need to live with special cases
> > like this until the world improves.
> >
>
> Well, for sure we cannot depend on gralloc :)
>
> For the time being, we have to address 1) first, as we have a few test
> cases that requires an intermediate buffer to be allocated by the HAL
> and provided to libcamera. This part does not concern me too much.
>

Do we have any ideas on how to allocate those?

> > >
> > > My take is still that we should try to solve one problem at the time:
> > >
> > > 1) formalize additional requirements that are not expressed by our
> > >    CameraDevice::camera3Resolutions and CameraDevice::camera3FormatsMap
> >
> > This hopefully shouldn't be needed, although we might want to double
> > check if those fully cover the Android requirements.
> >
>
> I know they don't fully do that at the moment (we still don't enforce
> mandatory resolutions to be supported in example). And they need to be
> versioned depending on the reported HW level. There's indeed space for
> development there.
>
> > > 2) if not other requirements are necessary, indentify a use case that
> > >    cannot be satisfied by the current pipeline implementations we
> > >    have. In example, a UVC camera that cannot produce NV12 and need
> > >    conversion might be a good start
> >
> > The use cases we encountered in practice:
> > a) a UVC camera which outputs only MJPEG for higher resolutions and
> > needs decoding (and possibly one more extra conversion) to output YUV
> > 4:2:0.
> > b) a UVC camera which only outputs 1 stream, while Android requires up
> > to 2 YUV streams + JPEG for the LIMITED capability level.
>
> This seems interesting use cases to start applying some of the
> proposed implementation to an actual use case, aren't they ?
>

Indeed.

> > c) IPU3/RKISP1 which can output up to 2 streams, but there is a stream
> > configuration that requires 3 streams which could have different
> > resolutions - 2 YUV up to but not necessarily equal max PREVIEW size
> > and 1 JPEG with MAXIMUM resolution.
>
> Interesting. This will require software down-scaling of the YUV stream
> at max resolution used to produce JPEG. Unless we encode JPEG from
> Bayer RAW (I'm not even sure it's possible :)
>

Not necessarily. One stream could be made full resolution, while the
other max of the two other streams and that downscaled and/or cropped.
I think the heuristic I described above should work at least for
software-only processing.

Actually, with the dual pipe mode, IPU3 should be able to handle this
setup natively, but probably some work would be needed in the pipeline
handler to handle it. RKISP1 is still limited to 2 streams, though.

Best regards,
Tomasz


More information about the libcamera-devel mailing list