[libcamera-devel] apply methods in IPARPi

Tue May 25 07:32:16 CEST 2021

Hey Naushir, David and Laurent + CC Jean-Michel,

sorry for taking such a long time to reply, I am still new to the whole
media subsystem and libcamera, so I needed a bit more time to think
stuff through (and I am probably still wrong on some subjects).

I have tried to address the general statistics problem, with a design
idea + a minimal implementation of that idea. I am sorry that this
response got quite large, but I really wanted to discuss my idea
properly. If you would like me to propose in a different format for the
future (pdf, markdown, presentation etc.), then please just tell me. :)

On 06.05.2021 14:45, Naushir Patuck wrote:
>On Thu, 6 May 2021 at 14:32, Laurent Pinchart <
>laurent.pinchart at ideasonboard.com> wrote:
>
>> Hello,
>>
>> On Thu, May 06, 2021 at 09:45:38AM +0100, David Plowman wrote:
>> > On Thu, 6 May 2021 at 09:04, Naushir Patuck <naush at raspberrypi.com>
>> wrote:
>> > > On Wed, 5 May 2021 at 19:08, Sebastian Fricke <
>> sebastian.fricke at posteo.net> wrote:
>> > >>
>> > >> Hello Naushir and David,
>> > >>
>> > >> I am currently performing a lot of investigation and preparatory work
>> > >> for a few generalizations in form of an IPAFramework class, that
>> reuses
>> > >> some of the great concepts you introduced to libcamera. Along the way
>> I
>> > >> noticed, that you have implemented the apply methods (applyAGC,
>> applyAWB,
>> > >> etc.) as part of the IPARPi class.
>> > >>
>> > >> I was wondering if it would be possible to move those to the
>> controller
>> > >> and specific algorithm implementation, as they mostly get the status
>> from
>> > >> the algorithm, perform a few calculations with it and insert the
>> results
>> > >> of those calculations into the ISP controls.
>> > >>
>> > >> The upside of doing so, from my point of view, would be that we could
>> > >> call `apply` similar to the way we call `initialize`, `process` and
>> > >> `prepare`.
>> > >> That way, I would be able to generalize the whole step of applying the
>> > >> algorithm status to a set of controls, which can then be used for
>> setting
>> > >> the ISP controls.
>> > >> Now, the reason why I ask you is that I want to be sure, that I didn't
>> > >> miss a key concept.
>> > >
>> > >
>> > > This was indeed something I did consider doing some time back, but
>> > > never got to it.  There are a couple of considerations with this
>> > > proposal that we need to address:
>> > >
>> > > 1) My original intention with keeping apply* out of the Controllers
>> > > was to *somewhat* decouple the BCM2835 ISP parameters from the
>> > > Controller algorithms.  I say somewhat because some algorithms do
>> > > have to know the statistics structure defined in the bcm2835-isp.h
>> > > header, so there is already a hardware dependency there.  Secondly,
>> > > I wanted to try and not include any libcamera dependency in the
>> > > Controller code.  We do now have libcamera logging in the controller

Could you elaborate why you wanted to keep the libcamera dependencies
out of it? Is it just to decouple it from libcamera elements that might
change and break stuff? I am just asking because my plan is to make the
Controller class generally accessible by all IPAs, and so if I touch the
class it is probably best if I touch it without breaking the inital
design.

>> > > source files, so again, this is also not strictly true.
>> > >
>> > > 2) applyAGC() is a special one in that it runs after
>> > > Controller::Process() whereas all the others run after
>> > > Controller::Prepare.  applyAGC() also runs on IPA::start() so this
>> > > must be treated differently somehow.

Okay so `Prepare`, gets the current status of the algorithm and puts it
into the Metadata to be accessed. Which means in case of the AGC
algorithm, during apply we use the status from the last frame.

So, if I understand the problem properly, no matter if we call `Apply`
before fetching the statistics data from the current frame or calling it
after receiving the statistics data and calling `Prepare`, we still
want to call the same `Apply` method (Meaning we don't need different
versions of `Apply` depending on the moment of calling it). Which means
it is more a matter of deciding, which Algorithm status to apply to the
controls at a given moment and which not.

>> > >
>> > > In a way, having a Controller::Apply() might be quite nice, if we
>> > > are able to somehow add a HW abstraction layer (through templated
>> > > functions perhaps?).  Or does it even matter? David, what are your
>> > > thoughts on this?
>> >
>> > So I remain quite strongly in favour in keeping hardware specific code
>> > away from the algorithms. As Naush says, they do get a bit mixed
>> > together because of the statistics, but for the moment we have to live
>> > with this. We could imagine a formal API which takes the statistics
>> > from the hardware (whatever that is) and produces the numbers the
>> > algorithms want, though sometimes you could imagine tweaking the
>> > algorithm slightly might be better than going through lots of hoops
>> > with statistics, so that's probably one to think about.
>>
>> Keeping the implementation of algorithms hardware-agnostic would be
>> great, but it's a hard problem. We could imagine translating all
>> statistics to a standard format before passing them to algorithms, but
>> that will have a performance impact, and could also result in
>> information loss as a format that would be a superset of all possible
>> hardware features is likely impossible. We need to figure out some
>> middleground, perhaps in the form of device-agnostic algorithm "pieces"
>> that would be assembled by device-specific algorithm "implementations".
>> There's room for research and clever designs.

Proposal for general statistic data:
====================================

General idea:
=============

Okay, so I have thought about this and while I am not perfectly
satisfied with my current result, I still hope that it might lead into
the right direction.
My first approach is to define a set of inputs expected by a particular
algorithm to produce a well-defined output. The idea behind that thought
is that the algorithm should declare what data it expects and not
vice-versa that the hardware dictates which data the algorithm has to
process.
The advantage from my point of view is that we could define generalized
algorithms to be used for multiple platforms (like the first algorithms
in the libipa/ folder from JM (https://rb.gy/o5afyv)) and that the
algorithms don't care about the hardware implementation.
And while the statistics data may look differently for each platform, if
the IPA algorithm implementation uses the same underlying algorithm it
will most likely require an equal set of input data.
This means the idea lives on the assumption, that if an algorithm
requires x and y to generate z, then we only have two possibilites for
each platform A) We can use the algorithm because the statistical data
can be converted to x and y. B) We cannot use the algorithm for this
platform.

Describing the idea for the AWB algorithm:
==========================================

I try to explain my strategy for the example of AWB algorithms. For
calculating the correct color temperature, we have multiple available
algorithms but let's say we use the grey world algorithm and the Bayes
algorithm. This would mean that we define two alternative data formats
for the AWB algorithm implementation:
(This does not claim to be complete)

```
struct awbGreyWorld {
        unsigned int counted;
        unsigned long long redSum;
        unsigned long long greenSum;
        unsigned long long blueSum;
};

struct awbBayes {
        unsigned int counted;
        unsigned long long redSum;
        unsigned long long greenSum;
        unsigned long long blueSum;
        long long lux;
};

struct statsData {
    std::variant<
        std::vector<struct awbGreyWorld>,
        struct awbBayes>
    > awbStats;
}
```

Each algorithm is then associated with a fitting data format, which is
declared with a tag within the tuning file similar to how we work with
algorithms already they should be created with a static function during
start-up and should be retrieved later by their name. If we implement a
new algorithm, we therefore have to add an additional data format.
The data format acts as an additional layer that converts the data from
specific to general and from general to specific. Which means each
platform implements a data-format for each used algorithm.

I have created a minimal example of that procedure here:
https://gitlab.com/-/snippets/2123926

The snippet also contains diagrams that try to visualize my ideas.

The controller then receives the `struct statisiticsData` structure,
which will contain the generalized statistics data for all used
algorithms, via the `controller_->Process()` call.

In case the hardware implementation cannot provide the required data for
the chosen algorithm (no data format exists for the algorithm on the
platform), the algorithm will throw a run time error because it makes
sure that the variant points to the correct structure with a visitor
pattern.

The `controller_->Prepare()` call writes the internal results to the
Metadata.  And finally the `Apply` routines insert the Metadata into
general results for the specific algorithm.
As mentioned by David, it might be beneficial to implement two apply
routines (one that creates controls from the last-frames Metadata and
one that creates them from the current Metadata). But instead of setting
the controls directly we write the finished calculations to a general
`struct algorithmResults`, which contains the results of the used
algorithms.
This further ensures that we encapsulate the controller and algorithms
from the hardware details. It is the job of the specific data-format,
to convert those results to the proper controls as it is allowed to know
about the hardware.
But I am not entierly convinced about that part, as the results look
very similar to the Metadata so maybe we just use the Metadata directly
within the data formats.

My goal, in this case, would be to have exactly one `greyworld` algorithm
in the libcamera source tree and not one algorithm for each platform doing
roughly the same thing.

But the question is how do we provide data, which might not be available
when processing the statistic data (for example at the RPi AWB Bayes
algorithm the lux value). The visible problem for me is, that we might
receive that data through different routes. For example, on the RPi we
get the lux value from another algorithm by calculating:
```
double estimated_lux = shutter_speed_ratio * gain_ratio *
               aperture_ratio * aperture_ratio *
               Y_ratio * reference_lux_;
```
What would we do if the illuminance is directly part of the ISP
statistics data? In that case, it would be a detour to first save that
value into the lux status before receiving it by the AWB Bayer
algorithm.
One idea would be to use the illuminance value directly as part of the
generalized statistics data for the AWB Bayes algorithm, in which case
direct data from the ISP could be used and if the ISP doesn't provide
statistical data for illuminance we could use the calculation from the
last frame (Metadata) or the default value from the tuning file.
Do you see potential complications if we rely on the statistical data
from the last frame?

Different possible locations for the conversion:
================================================

I have two variations in mind for how to implement the new generalized
routines, I have tried to visualize them here: https://rb.gy/pb1qk9

The first version presents the main idea, which makes the controller and
algorithms hardware-agnostic by adding an abstraction layer in between
the IPA class and the controller. That layer is used to convert the
statistics data and the resulting controls.
The second version is a bit more ambitious. Here I thought about
performing that step within the Pipeline Handler. While thinking about
that solution, I got quite excited by the thought of maybe making the
whole IPA class hardware-agnostic.
I know that, that plan would involve a lot more work than just
the statistics data handling, but I thought I should mention it as
having a single IPA handler for all platforms sounds great (in theory).

New folder structure:
=====================

My current design would lead to a change of the folder structure within
`src/ipa`, here is a quick sketch of the new structure:
```
libcamera
└── src
   ├── ipa
    │   ├── ipu3
    │   │   ├── formats
    │   │   │   └── ...
    │   │   ├── data
    │   │   └── ipu3.cpp
    │   ├── libipa
    │   │   ├── algorithms
    │   │   │   ├── awbGreyWorld.cpp
    │   │   │   ├── ...
    │   │   │   └── blackLevelSubtraction.cpp
    │   │   ├── controller.cpp
    │   │   └── algorithm.cpp
    │   ├── raspberrypi
    │   │   ├── formats
    │   │   │   └── ...
    │   │   ├── data
    │   │   └── raspberrypi.cpp
    │   ├── rkisp1
    │   │   ├── formats
    │   │   │   ├── awbGreyWorldFormat.cpp
    │   │   │   ├── ...
    │   │   │   └── blackLevelSubtractionFormat.cpp
    │   │   ├── data
    │   │   └── rkisp1.cpp
    │   └── vimc
    │       └── data
    ├── libcamera
        ...
```

I am quite certain that you have already thought about ther further structure
for the IPA folder as well, so I am anticipating your ideas :).

Potential problems:
===================

One problem that does arise when we split every algorithm into it's own
file and assign a specific data format for it, is that we cannot
switch as smoothly between different algorithms as for example in
`src/ipa/raspberrypi/controller/rpi/awb.cpp`, where a simple boolean
determines which algorithm we choose. From my point of view, it
would be cleaner to split each algorithm into a separate file for
maintainability and reusability. So, I will have to find a solution for
this, in the code we fall back to the greyworld algorithm in case of the
color temperature curve or the priors were not configured.
Just to get the simplest possibility out of the way, this seems to be a
way to prioritize the Bayes algorithm over the greyworld algorithm in
case the user supplied the correct configuration. If the user chooses
the Bayes algorithm within the tuning file explicitly and has a
malformed configuration we could simply abort without using a fallback
right?

Another area, that I have not looked into so far is the whole IPA
isolation topic, my current attempts mostly focused on making it
possible to create general open source algorithms to be used by all, but
maybe some manufacturers would like to create an algorithm
implementation for their platform and keep it closed source, if we want
to continue with my idea, I will have to dive deeper into that I
suppose.

And my current proof of concept doesn't involve any concurrency
handling so far.

Summary:
========

So in summary, each platform is responsible for the conversion of data
in order to isolate the algorithms from the hardware layer. I see a few
pros and contras with my approach:

Pros:
-----
- Abstract the Controller & the algorithms completely from the hardware
- Easy addition of new platforms, all you have to do is locate which
  algorithms can be used with the given statistical data and implement
  data formats for each chosen algorithm to prepare the data for it.
- Reduce the codebase size by not repeating algorithm boilerplate code
  and implementations for every platform

Cons:
-----
- It adds an additional layer, therefore it might hurt performance
  (even though at least the input conversion would have to be done
  anyway)
- Some platforms might not be able to share all too many algorithms, as
  their statistical data cannot be converted to the required data for a
  specific algorithm

>>
>> > Nonetheless, I really don't want more hardware knowledge getting into
>> > the algorithms, nor would I want to see algorithms making direct calls
>> > that will set hardware.
>>
>> Fully agreed on the second part, the IPA design explicitly forbids
>> accessing the device from algorithms.
>>
>
>I think the idea here is to allow Controller::Apply() to only populate the
>ControlList items, and the IPA to send that ControlList to the PH to apply
>the parameters as V4L2 controls, as is done today.
>
>
>>
>> > Similarly, I really don't want one algorithm
>> > to make direct calls that affect another algorithm - this always
>> > starts quite benignly but can ultimately tend towards spaghetti. The
>> > rule that we have is that this is all communicated through metadata.
>>
>> While exception may sometimes be needed, I really want to follow that
>> rule.
>>
>> > But I think that still leaves us with options. Controller::Apply()
>> > sounds like a reasonable thing to have, so long as it doesn't touch
>> > the algorithms and only uses the metadata that they produced. Perhaps
>> > we need a "PreApply" and a "PostApply", to allow for them to operate
>> > either before or after the ISP.

>> > One more point to consider is what we'd do with imaging pipelines that
>> > have a different architecture. One obvious case would be the "straight
>> > through" ISP where the image buffer goes straight from the CSI-2
>> > receiver into the ISP and only reaches memory after that. Another
>> > variant on that might be a hybrid architecture, where some processing
>> > happens in this "straight through" manner, but then there's a "main"
>> > ISP which runs memory-to-memory.
>>
>> The former is what the rkisp1 implements, the latter is also needed to
>> support new platforms. We usually call the "straight through" part the
>> "inline" ISP, and the memory-to-memory part the "offline" ISP. There
>> will usually be a single inline ISP, but we could have multiple offline
>> components (hardware ISPs, general-purpose DSPs, GPUs, ...) in the
>> pipeline.
>>
>> > How would that look? It's probably
>> > worth giving these some consideration now, rather than inventing
>> > something for our current hardware platforms and finding ourselves in
>> > a pickle later!

Alright, so let me try to understand the issue correctly.
Case 1: Raspberry Pi: (offline ISP)
    Send request to camera -> get image/embedded data -> prepare ISP
    controls -> send request to ISP -> get processed image/statistics
    data -> send statistics data to IPAs -> process new metadata ->
    repeat

Case 2: RkISP1: (inline ISP)
    Send request to main-/self-path device -> Receive image/statistics
    data -> send statistics to IPAs -> process new metadata -> create
    parameter buffer and send to ISP to set controls for the next frame
    -> repeat

Case 3: (hybrid ISP)
    send request to inline ISP device -> Receive image/statistics data
    -> send statistics to IPAs -> process new metadata -> set controls
    for inline ISP (either directly or per buffer) -> send request to
    offline ISP -> Receive image/statistics data -> send statistics to
    IPAs -> process new metadata -> set controls for offline ISP ->
    etc... -> repeat

I didn't have an example for case 3, but I think it could look like that,
am I utterly wrong?

 From my point of view we always have the same cycle:
- Send a request to the ISP (Either we fetched the image data from the
  camera or that part is handled internally, but it all boils down to,
  'send raw data to the ISP')
- Receive the processed image data and the statistics
- Process the statistics
- Set the settings of the ISP for the next frame

And the IPAs shouldn't care what differentiates the hardware as all
they do is transform input data into output data. I guess if we have
multiple ISPs in the pipeline, we probably need a controller instance
for each ISP so that they don't affect each other. But I currently
don't see a completely different workflow.

Could you elaborate on which potential problems you had in mind?

>> >
>> > >> --
>> > >>
>> > >> Additionally, off-topic to the topic mentioned above. While
>> > >> investigating the RPiStream abstraction, I noticed the following
>> comment
>> > >> within: `rpi_stream.cpp`:116
>> > >> ```
>> > >>   * A nullptr buffer implies an external stream, but no external
>> > >>   * buffer has been supplied in the Request. So, pick one from the
>> > >>   * availableBuffers_ queue.
>> > >>   ```
>> > >>
>> > >>   And within `rpi_steam.h`:139
>> > >>   ```
>> > >>   * A nullptr indicates any internal buffer can be used (from
>> availableBuffers_),
>> > >>   * whereas a valid pointer indicates an external buffer to be queued.
>> > >>   ```
>> > >>
>> > >>   Which confused me a little bit, could you enlighten me in this
>> matter?
>> > >>   Does a nullptr buffer imply an external stream and therefore an
>> > >>   external buffer? Or does a nullptr buffer imply an internal buffer
>> and
>> > >>   therefore an internal stream?
>> > >
>> > >
>> > > I can see why this caused some confusion :-)
>> > >
>> > > Firstly, my definitions of external and internal streams may need
>> > > clarification.  The former is a stream that *may* be exported to the
>> > > application, and the latter is a stream that can never be exported
>> > > to the application.  For external streams, the buffers used may be
>> > > provided by the application, and these buffers will be returned back
>> > > to the application once they have been processed.  However, we can
>> > > also use internally allocated buffers for external streams that will
>> > > never go back to the application.  This is needed when:
>> > >
>> > > 1) The application has not provided an external buffer to use for
>> > > that stream in a Request.  Example of this is when we have a RAW
>> > > stream set to external, but that application will only request a RAW
>> > > image once every N frames.
>> > > 2) If we need to drop frames as instructed by the IPA, we cannot use
>> > > external buffers as they will eventually be dropped and not returned
>> > > to the application.  In this case, we queue an internal buffer that
>> > > can be dropped safely.  The alternative is to requeue external
>> > > buffers once dropped, but will add a ton of latency to the capture
>> > > cycle.
>> > >
>> > > Hope that helps clarify things a bit.

Indeed that helps to understand the situation thank you!

>>
>> --
>> Regards,
>>
>> Laurent Pinchart
>>

>_______________________________________________
>libcamera-devel mailing list
>libcamera-devel at lists.libcamera.org
>https://lists.libcamera.org/listinfo/libcamera-devel