[libcamera-devel] [PATCH 0/1] Proposal of mapping between camera configurations and requested configurations

Fri Sep 4 14:57:24 CEST 2020

Hi Tomasz,

On Thu, Sep 03, 2020 at 02:36:47AM +0200, Tomasz Figa wrote:
> Hi Jacopo,
>
> On Tue, Sep 1, 2020 at 6:05 PM Jacopo Mondi <jacopo at jmondi.org> wrote:
> >
> > Hi Hiro,
> >    first of all I'm very sorry for the un-aceptable delay in giving
> > you a reply.
> >
> > If that's of any consolation we have not ignored your email, but it
> > has gone through several internal discussion, as it come at the
> > time where the JPEG support was being merged and the two things
> > collided a bit. Add a small delay due to leaves, and here you have a
> > month of delay. Again, we're really sorry for this.
> >
> > > On Thu, Aug 06, 2020 at 03:17:05PM +0900, Hirokazu Honda wrote:
> > > This is a proposal about how to map camera configurations and
> > > requested configurations in Android Camera HAL adaptation layer.
> > > Please also see the sample code in the following patch.
> > >
> > > # Software Stream Processing in libcamera
> > >
> > > _hiroh at chromium.org / Draft: 2020-08-06_
> > >
> > >
> >
> > As an initial and un-related note looking at the patch, I can see you
> > are following the ChromeOS coding style. Please note that libcamera
> > has it's own code style, which you can find documented at
> >
> > - https://www.libcamera.org/coding-style.html#coding-style-guidelines
> >
> > And we have a style checker, which can assist with this. The best way to
> > use the style checker is to install it as a git-hook.
> >
> > I understand that this is an RFC, but we will need this style to be
> > followed to be able to integrate any future patches.
> >
> > >
> > > # Objective
> > >
> > > Perform frame processing in libcamera to achieve requested stream
> > > configurations that are not supported natively by the camera
> > > hardware, but required by the Android Camera HAL interface.
> > >
> >
> > As you can see in the camera_device.cpp file we have tried to list the
> > resolution and image formats that the Android Camera3 specification
> > lists as mandatory or suggested.
> >
> > Do you have a list of additional requirements to add ?
> > Are there ChromeOS specific requirements ?
> > Or is this meant to full-fill the above stated requirements on
> > platforms that cannot satisfy them ?
> >
>
> There can be per-device resolutions that should be supported due to
> product requirements. Our current HAL implementations use
> configuration files which define the required configurations.
>
> That said, I think it's an independent problem, which we can likely
> ignore for now, and I believe what Hiro had in mind was the latter -
> platforms that cannot satisfy them. This also includes the cases you
> mentioned below, when a number of streams greater than the number of
> native hardware streams is requested.
>
> As usual, the Android Camera2 API documentation is the authoritative
> source of information here:
> https://developer.android.com/reference/android/hardware/camera2/CameraDevice.html#createCaptureSession(android.hardware.camera2.params.SessionConfiguration)
>
> The tables lower on the page include required stream combinations for
> various capability levels.
>

Those are the requirements I think should be encoded.
So far, as a reference for the supported formats and resolutions I
used as reference the documentation of the scaler.availableStreamConfigurations
metadata tag

> > >
> > > # Background
> > >
> > >
> > > ### Libcamera
> > >
> > > In addition to its native API, libcamera[^1] provides a number of
> > > camera APIs, for example, V4L2 Webcam API and Android Camera HAL3.
> > > The platform specific implementations are wrapped in libcamera core
> > > and a caller of libcamera doesn’t have to take care the platform.
> > >
> > >
> > > ### Android Camera HAL
> > >
> > > Chrome OS camera stack uses Android Camera HAL[^2] interface.
> > > Libcamera provides Android Camera HAL with an adaptation layer[^3]
> > > between libcamera core part and Android HAL, which is called
> > > Android HAL adaptation layer in this document.
> > >
> > > To present a uniform set of capabilities to the API users, Android
> > > Camera HAL API[^4] allows caller to request stream configurations
> > > that are beyond the device capabilities. For example, while a
> > > camera device is able to produce a single stream, a HAL caller
> > > requests three possibly different resolution streams (PRIV, YUV,
> > > JPEG). However, libcamera core implementation produces
> > > camera-capable streams. Therefore, we have to create three streams
> > > from the single stream produced by libcamera.
> > >
> > > Requests beyond the device capability is supported only in Android
> > > HAL at this moment. I describe the design in this document that the
> > > stream processing is performed in Android HAL adaptation layer.
> > >
> > >
> > > # Overview
> > >
> > >
> > > ## Current implementation
> > >
> > > The requested stream configuration is given by
> > > _camera3_device_t->ops->configure_streams()_ in Android Camera HAL.
> > > This delegates CameraDevice::configureStreams()[^5] in libcamera.
> > > The current implementation attempts all the given configurations
> > > and succeeds if and only if the camera device can produces them
> > > without any adjustments.
> > >
> > >
> > > ### libcamera::CameraConfiguration
> > >
> > > It is CameraConfiguration[^6] that judges whether adjustments are
> > > required, or even requested configurations are infeasible.
> > >
> > > The procedure of configuration is that CameraDevice
> > >
> > >
> > >
> > > 1. Adds every configuration by
> > > CameraConfiguration::addConfiguration(). 2. Assorts the added
> > > configurations by CameraConfiguration::validate().
> > >
> > > CameraConfiguration, especially for validate(), is implemented per
> > > pipeline. For instance, the CameraConfiguration implementation for
> > > IPU3 is IPU3CameraConfiguration[^7].
> > >
> > > validate() returns one of the below,
> > >
> > >
> > >
> > > *   Valid *    A camera can produce streams with requested
> > > configurations. *   Adjusted *   A camera cannot produce streams
> > > with requested configurations as-is, but can produce streams with
> > > different pixel formats or resolutions. *   Invalid *   A camera
> > > cannot produce streams with either requested configurations or
> > > different pixel formats and resolutions. For instance, this is
> > > returned when the larger resolution is requested than the maximum
> > > supported one?
> > >
> > > What we need to resolve is, when Adjusted is returned, to map
> > > adjusted camera streams to requested camera streams and required
> > > processing.
> > >
> > >
> > > ## Stream processing
> > >
> > > The processing to be thought of are followings.
> > >
> > >
> > >
> > > *   Down-scaling *   We don’t perform up-scaling because it affects
> > > stream qualities *   Down-scaling is allowed for the same ratio to
> > > avoid producing distorted frames. For instance, scaling from
> > > 1280x720 (16:9) to 480x360 (4:3) is not allowed. *   Cropping *
> > > Cropping is executed only to change the frame ratio. Thus it must
> > > be done after down-scaling if required. For example, to convert
> > > 1280x720 to 480x360, first down-scale to 640x360 and then crop to
> > > 480x360.
> > >
> > > *   Format conversion *   Pixel format conversion *   JPEG
> > > encoding
> > >
> > >
> > > # Proposal
> > >
> > > Basically we only need to consider a mapping algorithm after
> > > validate(). However, to obtain less processing and better stream
> > > qualities, we should reorder given configurations within
> > > validate().
> >
> > >
> >
> > The way the HAL layer works, and I agree something has changed since
> > the recent merge of the JPEG support, is slightly more complex, and
> > boils down to the following steps
> >
> > 1) Build the list of supported configuration
> >
> > When a CameraDevice is initialized, a list of supported stream
> > configuration is built, in order to be able to report to Android
> > what it could ask. See CameraDevice::initializeStreamConfigurations().
> >
> > We currently report the libcamera::Camera supported formats and
> > size, plus additional JPEG streams which are produced in the HAL.
> > This creates the first distinction between HAL-only-streams and
> > libcamera-streams, that you correctly identified in your summary.
> >
> > Here, as we do (naively at the moment) for JPEG, you should inspect
> > the libcamera-streams and pass them through your code that infer
> > what kind of HAL-only-streams can be produced from the available
> > libcamera ones. If I'm not mistaken Android only asks for stream
> > combinations reported through the
> > ANDROID_SCALER_AVAILABLE_STREAM_CONFIGURATIONS_OUTPUT metadata, and
> > if you do not augment that list at initialization time, you won't
> > ever be asked for non-native streams later.
>
> I'm not entirely sure about this, because there are mandatory stream
> configurations defined for the Camera2 API. If something is mandatory,
> I suspect there is no need to query for the availability of it.

Well, we need to query the libcamera::Camera to know which of the
required streams could be natively produced and which ones instead has
to be produced in the HAL layer.

>
> That said, I'd assume that CTS verifies whether all the required
> configurations are both reported and supported, so perhaps there isn't
> much to worry about here.
>
> >
> > 2) Camera configuration
> >
> > That's the part you focused on, and a good part of what you wrote
> > could indeed be used to move forward.
> >
> > The problem here can be summarized as: 'for each stream android
> > requested, the ones that cannot be natively produced by the
> > libcamera::Camera shall be mapped on the closest possible native
> > stream' (and here we could apply your implementation that identifies
> > the 'best matching' stream)
> >
> > Unfortunately the problem breaks down into several others:
> >
> > 1) How to identify if a stream is a native or an HAL only one ?
> > Currently we get away with a trivial "if (!JPEG)" as all the non-JPEG
> > streams are native ones. This should be made smarter.
> >
> > 2) How to best map HAL-streams to libcamera-streams. Assume to
> > receive a request for two YUV streams in 1080p and 720p resolutions.
> > The libcamera::Camera claims to be able to support both, so we can
> > simply go and ask for those two streams. Then we receive a request
> > for the same streams plus a full-size JPEG one. What we have to do is
> > ask for the full-size YUV stream and use it to produce JPEG, and one
> > 1080p YUV to produce both the YUV streams in 1080p and 720p
> > resolutions. In the case we'll then have to crop one YUV stream, and
> > dedicate a full-size YUV one to JPEG. Alternatively we can produce
> > 1080p from the same full-size YUV used to produce JPEG, and ask for a
> > 720p stream to the camera.
> >
>
> Right, there are multiple possible choices. I've discussed this and
> concluded that there might be some help needed from the pipeline
> handler to tell the client which configuration is better from the
> hardware point of view.
>

I don't think the HAL could do all by itself, I agree. The number of
combination to test would be large and there's currently no way to get
a taste of what would be the better combination for the HW.

Have you already thought how this can be improved ?

> > Now, Android specifies some format/size requirements in the Camera3
> > specification, I assume ChromeOS has maybe others. As we tried to
> > record the Camera3 requirements and satisfy them in the code, I
> > think the additional streams that are required should be someone
> > listed first, in order to be able to create only the -required-
> > additional streams.
> >
> > For an example, have a look at CameraDevice::camera3Resolutions and
> > CameraDevice::camera3FormatsMap, these encode the Camera3
> > specification requirements.
> >
> > Once the additional requirments have been encoded, I would then
> > proceed to divide them in 3 categories (there might very well be
> > others):
>
> I believe we don't have any additional requirements for now.
>
> >
> >   1) Format conversions: Convert to one pixel format to the other. What
> >      happens today with JPEG more or less. We have an Encode interface for
> >      that purpose and I guess format converter should be implemented
> >      according to it, but that has to be discussed.
> >
>
> One thing that is also missing today is MJPEG decoding. This is also
> required to fulfill the stream configuration requirements, since it's
> assumed that the formats are displayable and explicit YUV streams are
> included as well.
>

s/Encoder/Transcoder ?

Decoding and encoding fall in the same category to me, but I agree
this represents a nice use case to start implementing something. I
assume the USB HAL has already some of that in place, right ?

> >   2) Down-scale/crop: Assuming it happens in the HAL using maybe some
> >      external components, down-scaling/cropping produce additional
> >      resolutions from the list of natively supported ones. Given a
> >      powerful enough implementation we could produce ANY format <= a given
> >      native format, but that's not what we want I guess. We shall
> >      establish a list of additional resolutions we want to report to the
> >      framework layer, and find out how to produce them from the native
> >      streams.
>
> Given the above, we should be able to stick to the resolutions we have
> already supported in the adaptation layer.
>

This then boils down again to improve how we identify what could be
natively produced by the camera and has to be produced in the HAL
using one of the native streams.

I think Hiro's proposal addresses the second part (streams
identification) but the first one has to be taken into account,
probably in first place or at least in parallel ?

> >
> >    3) Image transformations A bit a lateral issue, but I assume some
> >       'transformations' can be performed by HAL only components. This
> >       mostly depends on handling streams with some specific metadata
> >       associated, which needs to be handled in the HAL. The most trivial
> >       example is rotation. If the libcamera::Camera is for whatever reason
> >       unable to rotate the images, they have to be software rotated in the
> >       HAL. This won't require any stream mapping, but rather inspecting
> >       metadata and pass the native streams through an additional processing
> >       layer.
>
> Right. I honestly hope we won't need to do software rotation on any
> reasonable hardware platform, but AFAIK we still have some in Chrome
> OS for which we do, in some specific cases, like a tablet with the
> camera in landscape orientation, but the device in portrait
> orientation.
>

I hope it's a corner case as well, but who knows, Android runs on a
great variety of platforms nowadays, some of them might not be that
'reasonable' ? I agree this is a corner case at the moment though

> >
> > 3) Buffer allocation/handling:
> >    When performing any conversions between a HAL stream and a libcamera
> >    stream we may need to allocate an intermediate buffer to provide storage
> >    for processing the frame in libcamera, with the conversion entity
> >    reading from the libcamera buffer and writing into the android buffer.
> >    This is likely possible with the existing FrameBufferAllocator classes,
> >    but may have extra requirements.
>
> I suppose we could have 3 types of buffers here:
> 1) buffers written by hardware driven by libcamera
>  - without any software processing these would be directly provided by
> Android and imported to libcamera,
>  - with processing, I assume libcamera would have to allocate its own,
> but I guess that would just end up being V4L2 MMAP buffers?

These days we're looking at this part with the idea of exploiting the
FrameBufferAllocator abstraction we also provide to applications.

This would end up in
1) Allocating buffers in the video devices (the pipeline handler
decides which one)
2) Exporting them as dmabuf file descriptor
3) Re-importing them in video devices at Request processing time.

> 2) buffers between processing steps - if software only, an arbitrary
> malloc buffer could be used.

I don't see this distinct from the previous point. Whenever we need an
intermediate buffer that has to be processed (and the result of the
processing written to the Android provided buffer) it need to be
allocated in libcamera.

Or do you mean buffers for additional processing steps, in example a
scratch buffer used during an encoding procedure ? In that case I
think the transcoder implementation would deal with that as they
prefer, in example an HW accelerated component might need to allocate
buffers accessible by the CPU and the accelerator, and in this case it
will provide buffers to libcamera to import as well it will allocate
intermediate buffers if it requires any.

> 3) buffers for the end results - always provided by Android and
>  - without processing they are the same thing as 1),
>  - with processing they need to be mapped in the adaptation layer and
> wouldn't reach libcamera itself.

Again this sounds the same as you 1.2 point. I feel I missed something
:)

>
> For consistency, one might be tempted to use some external allocator,
> like gralloc, and import the hardware buffers to libcamera in both
> cases, but there are limitations - gralloc only knows how to allocate
> the end result buffers, so it wouldn't give us arbitrary buffers
> without some ugly hacks and DMA-buf heaps still need some time to gain
> adoption. Therefore I think we might need to live with special cases
> like this until the world improves.
>

Well, for sure we cannot depend on gralloc :)

For the time being, we have to address 1) first, as we have a few test
cases that requires an intermediate buffer to be allocated by the HAL
and provided to libcamera. This part does not concern me too much.

> >
> > My take is still that we should try to solve one problem at the time:
> >
> > 1) formalize additional requirements that are not expressed by our
> >    CameraDevice::camera3Resolutions and CameraDevice::camera3FormatsMap
>
> This hopefully shouldn't be needed, although we might want to double
> check if those fully cover the Android requirements.
>

I know they don't fully do that at the moment (we still don't enforce
mandatory resolutions to be supported in example). And they need to be
versioned depending on the reported HW level. There's indeed space for
development there.

> > 2) if not other requirements are necessary, indentify a use case that
> >    cannot be satisfied by the current pipeline implementations we
> >    have. In example, a UVC camera that cannot produce NV12 and need
> >    conversion might be a good start
>
> The use cases we encountered in practice:
> a) a UVC camera which outputs only MJPEG for higher resolutions and
> needs decoding (and possibly one more extra conversion) to output YUV
> 4:2:0.
> b) a UVC camera which only outputs 1 stream, while Android requires up
> to 2 YUV streams + JPEG for the LIMITED capability level.

This seems interesting use cases to start applying some of the
proposed implementation to an actual use case, aren't they ?

> c) IPU3/RKISP1 which can output up to 2 streams, but there is a stream
> configuration that requires 3 streams which could have different
> resolutions - 2 YUV up to but not necessarily equal max PREVIEW size
> and 1 JPEG with MAXIMUM resolution.

Interesting. This will require software down-scaling of the YUV stream
at max resolution used to produce JPEG. Unless we encode JPEG from
Bayer RAW (I'm not even sure it's possible :)

>
> I don't remember if we in the end had to deal with it, but I recall also:
> d) hardware platform that doesn't support one of the smaller required
> resolutions due to max scaling factor constraints.
>

Seems like a use case for a software downscaler too.

> > 3) Address the buffer allocation issues which I understand is still
> >    to be addressed.
>
> Agreed.
>
> >
> > Sorry for the wall of text. Hope it helps.
>
> Yep, thanks for starting the discussion.

Thank you for the useful feedback.

Looking forward for new developments in this area, as you've seen
there are quite some patches in-flight for the HAL, I know it's
complicated to start new developments with such a fast-moving base...

Thanks
  j

>
> Best regards,
> Tomasz