[libcamera-devel] apply methods in IPARPi

Tue May 25 19:11:40 CEST 2021

Hey David,

On 25.05.2021 15:09, David Plowman wrote:
>Hi Sebastian
>
>Thanks for your message. This is such a huge topic, you have my admiration!
>
>Before getting too much into the details, I was wondering whether it
>might be worth setting up a conference call to discuss some of these
>things with less reply latency? Besides yourself and Laurent, would
>anyone else be interested in joining such a call?
>
>I know we'd like to understand the bigger picture here, so I've listed
>some high-level discussion points, bundled under a few different
>headings. Please feel free to add items to this list!
>
>"One size fits all algorithms"?
>
>* Are algorithms tuned to particular pipelines?
>* Do we risk "lowest common denominator" algorithms?
>* Will 3rd parties be able to implement their own?
>
>Testing
>
>* Testing control algorithms is really hard.
>* How would you test a platform that you don't have? Or a sensor you don't have?
>
>Ownership
>
>* What if company ABC breaks a feature company XYZ relies on?
>* Would they fork the entire repo? Or should we encourage
>vendor-specific implementations, effectively "forked" within the code
>tree?
>
>Algorithms other than AWB...
>
>* AEC/AGC is heavily dependent on the type of statistics it gets. I
>shan't list all the variations here, but can we handle them?
>* People tend to forget ALSC...
>
>Camera Tuning
>
>* Most algorithms require tuning. Where and how will we store this?
>* How to cope with different modules that use the same sensor? And
>module-to-module variation?
>* Algorithms that we supply will need a tuning tool, right?
>
>Anyway, let me know if you'd like me to set up a call. Would later
>this week or early next week suit? (Monday is a public holiday in the
>UK)

I would like to have a call with you and the others, but I am on holiday
until Monday without access to a computer. I would like to have some time
to think about all the mentioned topics and bring up a few of my own,
so I would prefer the week between the 07th of June until the 12th.
On a weekday I am usually available between 5-9 pm CEST, so 4-8 pm BST,
and on the weekend I am available all day. If that time window does not
work, please tell me another time window and I try to arrange it.

>
>Thanks and best regards
>
>David

Thank you for the swift reply,

Sebastian

>
>On Tue, 25 May 2021 at 06:32, Sebastian Fricke
><sebastian.fricke at posteo.net> wrote:
>>
>> Hey Naushir, David and Laurent + CC Jean-Michel,
>>
>> sorry for taking such a long time to reply, I am still new to the whole
>> media subsystem and libcamera, so I needed a bit more time to think
>> stuff through (and I am probably still wrong on some subjects).
>>
>> I have tried to address the general statistics problem, with a design
>> idea + a minimal implementation of that idea. I am sorry that this
>> response got quite large, but I really wanted to discuss my idea
>> properly. If you would like me to propose in a different format for the
>> future (pdf, markdown, presentation etc.), then please just tell me. :)
>>
>> On 06.05.2021 14:45, Naushir Patuck wrote:
>> >On Thu, 6 May 2021 at 14:32, Laurent Pinchart <
>> >laurent.pinchart at ideasonboard.com> wrote:
>> >
>> >> Hello,
>> >>
>> >> On Thu, May 06, 2021 at 09:45:38AM +0100, David Plowman wrote:
>> >> > On Thu, 6 May 2021 at 09:04, Naushir Patuck <naush at raspberrypi.com>
>> >> wrote:
>> >> > > On Wed, 5 May 2021 at 19:08, Sebastian Fricke <
>> >> sebastian.fricke at posteo.net> wrote:
>> >> > >>
>> >> > >> Hello Naushir and David,
>> >> > >>
>> >> > >> I am currently performing a lot of investigation and preparatory work
>> >> > >> for a few generalizations in form of an IPAFramework class, that
>> >> reuses
>> >> > >> some of the great concepts you introduced to libcamera. Along the way
>> >> I
>> >> > >> noticed, that you have implemented the apply methods (applyAGC,
>> >> applyAWB,
>> >> > >> etc.) as part of the IPARPi class.
>> >> > >>
>> >> > >> I was wondering if it would be possible to move those to the
>> >> controller
>> >> > >> and specific algorithm implementation, as they mostly get the status
>> >> from
>> >> > >> the algorithm, perform a few calculations with it and insert the
>> >> results
>> >> > >> of those calculations into the ISP controls.
>> >> > >>
>> >> > >> The upside of doing so, from my point of view, would be that we could
>> >> > >> call `apply` similar to the way we call `initialize`, `process` and
>> >> > >> `prepare`.
>> >> > >> That way, I would be able to generalize the whole step of applying the
>> >> > >> algorithm status to a set of controls, which can then be used for
>> >> setting
>> >> > >> the ISP controls.
>> >> > >> Now, the reason why I ask you is that I want to be sure, that I didn't
>> >> > >> miss a key concept.
>> >> > >
>> >> > >
>> >> > > This was indeed something I did consider doing some time back, but
>> >> > > never got to it.  There are a couple of considerations with this
>> >> > > proposal that we need to address:
>> >> > >
>> >> > > 1) My original intention with keeping apply* out of the Controllers
>> >> > > was to *somewhat* decouple the BCM2835 ISP parameters from the
>> >> > > Controller algorithms.  I say somewhat because some algorithms do
>> >> > > have to know the statistics structure defined in the bcm2835-isp.h
>> >> > > header, so there is already a hardware dependency there.  Secondly,
>> >> > > I wanted to try and not include any libcamera dependency in the
>> >> > > Controller code.  We do now have libcamera logging in the controller
>>
>> Could you elaborate why you wanted to keep the libcamera dependencies
>> out of it? Is it just to decouple it from libcamera elements that might
>> change and break stuff? I am just asking because my plan is to make the
>> Controller class generally accessible by all IPAs, and so if I touch the
>> class it is probably best if I touch it without breaking the inital
>> design.
>>
>> >> > > source files, so again, this is also not strictly true.
>> >> > >
>> >> > > 2) applyAGC() is a special one in that it runs after
>> >> > > Controller::Process() whereas all the others run after
>> >> > > Controller::Prepare.  applyAGC() also runs on IPA::start() so this
>> >> > > must be treated differently somehow.
>>
>> Okay so `Prepare`, gets the current status of the algorithm and puts it
>> into the Metadata to be accessed. Which means in case of the AGC
>> algorithm, during apply we use the status from the last frame.
>>
>> So, if I understand the problem properly, no matter if we call `Apply`
>> before fetching the statistics data from the current frame or calling it
>> after receiving the statistics data and calling `Prepare`, we still
>> want to call the same `Apply` method (Meaning we don't need different
>> versions of `Apply` depending on the moment of calling it). Which means
>> it is more a matter of deciding, which Algorithm status to apply to the
>> controls at a given moment and which not.
>>
>> >> > >
>> >> > > In a way, having a Controller::Apply() might be quite nice, if we
>> >> > > are able to somehow add a HW abstraction layer (through templated
>> >> > > functions perhaps?).  Or does it even matter? David, what are your
>> >> > > thoughts on this?
>> >> >
>> >> > So I remain quite strongly in favour in keeping hardware specific code
>> >> > away from the algorithms. As Naush says, they do get a bit mixed
>> >> > together because of the statistics, but for the moment we have to live
>> >> > with this. We could imagine a formal API which takes the statistics
>> >> > from the hardware (whatever that is) and produces the numbers the
>> >> > algorithms want, though sometimes you could imagine tweaking the
>> >> > algorithm slightly might be better than going through lots of hoops
>> >> > with statistics, so that's probably one to think about.
>> >>
>> >> Keeping the implementation of algorithms hardware-agnostic would be
>> >> great, but it's a hard problem. We could imagine translating all
>> >> statistics to a standard format before passing them to algorithms, but
>> >> that will have a performance impact, and could also result in
>> >> information loss as a format that would be a superset of all possible
>> >> hardware features is likely impossible. We need to figure out some
>> >> middleground, perhaps in the form of device-agnostic algorithm "pieces"
>> >> that would be assembled by device-specific algorithm "implementations".
>> >> There's room for research and clever designs.
>>
>> Proposal for general statistic data:
>> ====================================
>>
>> General idea:
>> =============
>>
>> Okay, so I have thought about this and while I am not perfectly
>> satisfied with my current result, I still hope that it might lead into
>> the right direction.
>> My first approach is to define a set of inputs expected by a particular
>> algorithm to produce a well-defined output. The idea behind that thought
>> is that the algorithm should declare what data it expects and not
>> vice-versa that the hardware dictates which data the algorithm has to
>> process.
>> The advantage from my point of view is that we could define generalized
>> algorithms to be used for multiple platforms (like the first algorithms
>> in the libipa/ folder from JM (https://rb.gy/o5afyv)) and that the
>> algorithms don't care about the hardware implementation.
>> And while the statistics data may look differently for each platform, if
>> the IPA algorithm implementation uses the same underlying algorithm it
>> will most likely require an equal set of input data.
>> This means the idea lives on the assumption, that if an algorithm
>> requires x and y to generate z, then we only have two possibilites for
>> each platform A) We can use the algorithm because the statistical data
>> can be converted to x and y. B) We cannot use the algorithm for this
>> platform.
>>
>> Describing the idea for the AWB algorithm:
>> ==========================================
>>
>> I try to explain my strategy for the example of AWB algorithms. For
>> calculating the correct color temperature, we have multiple available
>> algorithms but let's say we use the grey world algorithm and the Bayes
>> algorithm. This would mean that we define two alternative data formats
>> for the AWB algorithm implementation:
>> (This does not claim to be complete)
>>
>> ```
>> struct awbGreyWorld {
>>         unsigned int counted;
>>         unsigned long long redSum;
>>         unsigned long long greenSum;
>>         unsigned long long blueSum;
>> };
>>
>> struct awbBayes {
>>         unsigned int counted;
>>         unsigned long long redSum;
>>         unsigned long long greenSum;
>>         unsigned long long blueSum;
>>         long long lux;
>> };
>>
>> struct statsData {
>>     std::variant<
>>         std::vector<struct awbGreyWorld>,
>>         struct awbBayes>
>>     > awbStats;
>> }
>> ```
>>
>> Each algorithm is then associated with a fitting data format, which is
>> declared with a tag within the tuning file similar to how we work with
>> algorithms already they should be created with a static function during
>> start-up and should be retrieved later by their name. If we implement a
>> new algorithm, we therefore have to add an additional data format.
>> The data format acts as an additional layer that converts the data from
>> specific to general and from general to specific. Which means each
>> platform implements a data-format for each used algorithm.
>>
>> I have created a minimal example of that procedure here:
>> https://gitlab.com/-/snippets/2123926
>>
>> The snippet also contains diagrams that try to visualize my ideas.
>>
>> The controller then receives the `struct statisiticsData` structure,
>> which will contain the generalized statistics data for all used
>> algorithms, via the `controller_->Process()` call.
>>
>> In case the hardware implementation cannot provide the required data for
>> the chosen algorithm (no data format exists for the algorithm on the
>> platform), the algorithm will throw a run time error because it makes
>> sure that the variant points to the correct structure with a visitor
>> pattern.
>>
>> The `controller_->Prepare()` call writes the internal results to the
>> Metadata.  And finally the `Apply` routines insert the Metadata into
>> general results for the specific algorithm.
>> As mentioned by David, it might be beneficial to implement two apply
>> routines (one that creates controls from the last-frames Metadata and
>> one that creates them from the current Metadata). But instead of setting
>> the controls directly we write the finished calculations to a general
>> `struct algorithmResults`, which contains the results of the used
>> algorithms.
>> This further ensures that we encapsulate the controller and algorithms
>> from the hardware details. It is the job of the specific data-format,
>> to convert those results to the proper controls as it is allowed to know
>> about the hardware.
>> But I am not entierly convinced about that part, as the results look
>> very similar to the Metadata so maybe we just use the Metadata directly
>> within the data formats.
>>
>> My goal, in this case, would be to have exactly one `greyworld` algorithm
>> in the libcamera source tree and not one algorithm for each platform doing
>> roughly the same thing.
>>
>> But the question is how do we provide data, which might not be available
>> when processing the statistic data (for example at the RPi AWB Bayes
>> algorithm the lux value). The visible problem for me is, that we might
>> receive that data through different routes. For example, on the RPi we
>> get the lux value from another algorithm by calculating:
>> ```
>> double estimated_lux = shutter_speed_ratio * gain_ratio *
>>                aperture_ratio * aperture_ratio *
>>                Y_ratio * reference_lux_;
>> ```
>> What would we do if the illuminance is directly part of the ISP
>> statistics data? In that case, it would be a detour to first save that
>> value into the lux status before receiving it by the AWB Bayer
>> algorithm.
>> One idea would be to use the illuminance value directly as part of the
>> generalized statistics data for the AWB Bayes algorithm, in which case
>> direct data from the ISP could be used and if the ISP doesn't provide
>> statistical data for illuminance we could use the calculation from the
>> last frame (Metadata) or the default value from the tuning file.
>> Do you see potential complications if we rely on the statistical data
>> from the last frame?
>>
>> Different possible locations for the conversion:
>> ================================================
>>
>> I have two variations in mind for how to implement the new generalized
>> routines, I have tried to visualize them here: https://rb.gy/pb1qk9
>>
>> The first version presents the main idea, which makes the controller and
>> algorithms hardware-agnostic by adding an abstraction layer in between
>> the IPA class and the controller. That layer is used to convert the
>> statistics data and the resulting controls.
>> The second version is a bit more ambitious. Here I thought about
>> performing that step within the Pipeline Handler. While thinking about
>> that solution, I got quite excited by the thought of maybe making the
>> whole IPA class hardware-agnostic.
>> I know that, that plan would involve a lot more work than just
>> the statistics data handling, but I thought I should mention it as
>> having a single IPA handler for all platforms sounds great (in theory).
>>
>> New folder structure:
>> =====================
>>
>> My current design would lead to a change of the folder structure within
>> `src/ipa`, here is a quick sketch of the new structure:
>> ```
>> libcamera
>> └── src
>>    ├── ipa
>>     │   ├── ipu3
>>     │   │   ├── formats
>>     │   │   │   └── ...
>>     │   │   ├── data
>>     │   │   └── ipu3.cpp
>>     │   ├── libipa
>>     │   │   ├── algorithms
>>     │   │   │   ├── awbGreyWorld.cpp
>>     │   │   │   ├── ...
>>     │   │   │   └── blackLevelSubtraction.cpp
>>     │   │   ├── controller.cpp
>>     │   │   └── algorithm.cpp
>>     │   ├── raspberrypi
>>     │   │   ├── formats
>>     │   │   │   └── ...
>>     │   │   ├── data
>>     │   │   └── raspberrypi.cpp
>>     │   ├── rkisp1
>>     │   │   ├── formats
>>     │   │   │   ├── awbGreyWorldFormat.cpp
>>     │   │   │   ├── ...
>>     │   │   │   └── blackLevelSubtractionFormat.cpp
>>     │   │   ├── data
>>     │   │   └── rkisp1.cpp
>>     │   └── vimc
>>     │       └── data
>>     ├── libcamera
>>         ...
>> ```
>>
>> I am quite certain that you have already thought about ther further structure
>> for the IPA folder as well, so I am anticipating your ideas :).
>>
>> Potential problems:
>> ===================
>>
>> One problem that does arise when we split every algorithm into it's own
>> file and assign a specific data format for it, is that we cannot
>> switch as smoothly between different algorithms as for example in
>> `src/ipa/raspberrypi/controller/rpi/awb.cpp`, where a simple boolean
>> determines which algorithm we choose. From my point of view, it
>> would be cleaner to split each algorithm into a separate file for
>> maintainability and reusability. So, I will have to find a solution for
>> this, in the code we fall back to the greyworld algorithm in case of the
>> color temperature curve or the priors were not configured.
>> Just to get the simplest possibility out of the way, this seems to be a
>> way to prioritize the Bayes algorithm over the greyworld algorithm in
>> case the user supplied the correct configuration. If the user chooses
>> the Bayes algorithm within the tuning file explicitly and has a
>> malformed configuration we could simply abort without using a fallback
>> right?
>>
>> Another area, that I have not looked into so far is the whole IPA
>> isolation topic, my current attempts mostly focused on making it
>> possible to create general open source algorithms to be used by all, but
>> maybe some manufacturers would like to create an algorithm
>> implementation for their platform and keep it closed source, if we want
>> to continue with my idea, I will have to dive deeper into that I
>> suppose.
>>
>> And my current proof of concept doesn't involve any concurrency
>> handling so far.
>>
>> Summary:
>> ========
>>
>> So in summary, each platform is responsible for the conversion of data
>> in order to isolate the algorithms from the hardware layer. I see a few
>> pros and contras with my approach:
>>
>> Pros:
>> -----
>> - Abstract the Controller & the algorithms completely from the hardware
>> - Easy addition of new platforms, all you have to do is locate which
>>   algorithms can be used with the given statistical data and implement
>>   data formats for each chosen algorithm to prepare the data for it.
>> - Reduce the codebase size by not repeating algorithm boilerplate code
>>   and implementations for every platform
>>
>> Cons:
>> -----
>> - It adds an additional layer, therefore it might hurt performance
>>   (even though at least the input conversion would have to be done
>>   anyway)
>> - Some platforms might not be able to share all too many algorithms, as
>>   their statistical data cannot be converted to the required data for a
>>   specific algorithm
>>
>> >>
>> >> > Nonetheless, I really don't want more hardware knowledge getting into
>> >> > the algorithms, nor would I want to see algorithms making direct calls
>> >> > that will set hardware.
>> >>
>> >> Fully agreed on the second part, the IPA design explicitly forbids
>> >> accessing the device from algorithms.
>> >>
>> >
>> >I think the idea here is to allow Controller::Apply() to only populate the
>> >ControlList items, and the IPA to send that ControlList to the PH to apply
>> >the parameters as V4L2 controls, as is done today.
>> >
>> >
>> >>
>> >> > Similarly, I really don't want one algorithm
>> >> > to make direct calls that affect another algorithm - this always
>> >> > starts quite benignly but can ultimately tend towards spaghetti. The
>> >> > rule that we have is that this is all communicated through metadata.
>> >>
>> >> While exception may sometimes be needed, I really want to follow that
>> >> rule.
>> >>
>> >> > But I think that still leaves us with options. Controller::Apply()
>> >> > sounds like a reasonable thing to have, so long as it doesn't touch
>> >> > the algorithms and only uses the metadata that they produced. Perhaps
>> >> > we need a "PreApply" and a "PostApply", to allow for them to operate
>> >> > either before or after the ISP.
>>
>> >> > One more point to consider is what we'd do with imaging pipelines that
>> >> > have a different architecture. One obvious case would be the "straight
>> >> > through" ISP where the image buffer goes straight from the CSI-2
>> >> > receiver into the ISP and only reaches memory after that. Another
>> >> > variant on that might be a hybrid architecture, where some processing
>> >> > happens in this "straight through" manner, but then there's a "main"
>> >> > ISP which runs memory-to-memory.
>> >>
>> >> The former is what the rkisp1 implements, the latter is also needed to
>> >> support new platforms. We usually call the "straight through" part the
>> >> "inline" ISP, and the memory-to-memory part the "offline" ISP. There
>> >> will usually be a single inline ISP, but we could have multiple offline
>> >> components (hardware ISPs, general-purpose DSPs, GPUs, ...) in the
>> >> pipeline.
>> >>
>> >> > How would that look? It's probably
>> >> > worth giving these some consideration now, rather than inventing
>> >> > something for our current hardware platforms and finding ourselves in
>> >> > a pickle later!
>>
>> Alright, so let me try to understand the issue correctly.
>> Case 1: Raspberry Pi: (offline ISP)
>>     Send request to camera -> get image/embedded data -> prepare ISP
>>     controls -> send request to ISP -> get processed image/statistics
>>     data -> send statistics data to IPAs -> process new metadata ->
>>     repeat
>>
>> Case 2: RkISP1: (inline ISP)
>>     Send request to main-/self-path device -> Receive image/statistics
>>     data -> send statistics to IPAs -> process new metadata -> create
>>     parameter buffer and send to ISP to set controls for the next frame
>>     -> repeat
>>
>> Case 3: (hybrid ISP)
>>     send request to inline ISP device -> Receive image/statistics data
>>     -> send statistics to IPAs -> process new metadata -> set controls
>>     for inline ISP (either directly or per buffer) -> send request to
>>     offline ISP -> Receive image/statistics data -> send statistics to
>>     IPAs -> process new metadata -> set controls for offline ISP ->
>>     etc... -> repeat
>>
>> I didn't have an example for case 3, but I think it could look like that,
>> am I utterly wrong?
>>
>>  From my point of view we always have the same cycle:
>> - Send a request to the ISP (Either we fetched the image data from the
>>   camera or that part is handled internally, but it all boils down to,
>>   'send raw data to the ISP')
>> - Receive the processed image data and the statistics
>> - Process the statistics
>> - Set the settings of the ISP for the next frame
>>
>> And the IPAs shouldn't care what differentiates the hardware as all
>> they do is transform input data into output data. I guess if we have
>> multiple ISPs in the pipeline, we probably need a controller instance
>> for each ISP so that they don't affect each other. But I currently
>> don't see a completely different workflow.
>>
>> Could you elaborate on which potential problems you had in mind?
>>
>> >> >
>> >> > >> --
>> >> > >>
>> >> > >> Additionally, off-topic to the topic mentioned above. While
>> >> > >> investigating the RPiStream abstraction, I noticed the following
>> >> comment
>> >> > >> within: `rpi_stream.cpp`:116
>> >> > >> ```
>> >> > >>   * A nullptr buffer implies an external stream, but no external
>> >> > >>   * buffer has been supplied in the Request. So, pick one from the
>> >> > >>   * availableBuffers_ queue.
>> >> > >>   ```
>> >> > >>
>> >> > >>   And within `rpi_steam.h`:139
>> >> > >>   ```
>> >> > >>   * A nullptr indicates any internal buffer can be used (from
>> >> availableBuffers_),
>> >> > >>   * whereas a valid pointer indicates an external buffer to be queued.
>> >> > >>   ```
>> >> > >>
>> >> > >>   Which confused me a little bit, could you enlighten me in this
>> >> matter?
>> >> > >>   Does a nullptr buffer imply an external stream and therefore an
>> >> > >>   external buffer? Or does a nullptr buffer imply an internal buffer
>> >> and
>> >> > >>   therefore an internal stream?
>> >> > >
>> >> > >
>> >> > > I can see why this caused some confusion :-)
>> >> > >
>> >> > > Firstly, my definitions of external and internal streams may need
>> >> > > clarification.  The former is a stream that *may* be exported to the
>> >> > > application, and the latter is a stream that can never be exported
>> >> > > to the application.  For external streams, the buffers used may be
>> >> > > provided by the application, and these buffers will be returned back
>> >> > > to the application once they have been processed.  However, we can
>> >> > > also use internally allocated buffers for external streams that will
>> >> > > never go back to the application.  This is needed when:
>> >> > >
>> >> > > 1) The application has not provided an external buffer to use for
>> >> > > that stream in a Request.  Example of this is when we have a RAW
>> >> > > stream set to external, but that application will only request a RAW
>> >> > > image once every N frames.
>> >> > > 2) If we need to drop frames as instructed by the IPA, we cannot use
>> >> > > external buffers as they will eventually be dropped and not returned
>> >> > > to the application.  In this case, we queue an internal buffer that
>> >> > > can be dropped safely.  The alternative is to requeue external
>> >> > > buffers once dropped, but will add a ton of latency to the capture
>> >> > > cycle.
>> >> > >
>> >> > > Hope that helps clarify things a bit.
>>
>> Indeed that helps to understand the situation thank you!
>>
>> >>
>> >> --
>> >> Regards,
>> >>
>> >> Laurent Pinchart
>> >>
>>
>> >_______________________________________________
>> >libcamera-devel mailing list
>> >libcamera-devel at lists.libcamera.org
>> >https://lists.libcamera.org/listinfo/libcamera-devel
>>