[libcamera-devel] apply methods in IPARPi

Tue May 25 16:09:08 CEST 2021

Hi Sebastian

Thanks for your message. This is such a huge topic, you have my admiration!

Before getting too much into the details, I was wondering whether it
might be worth setting up a conference call to discuss some of these
things with less reply latency? Besides yourself and Laurent, would
anyone else be interested in joining such a call?

I know we'd like to understand the bigger picture here, so I've listed
some high-level discussion points, bundled under a few different
headings. Please feel free to add items to this list!

"One size fits all algorithms"?

* Are algorithms tuned to particular pipelines?
* Do we risk "lowest common denominator" algorithms?
* Will 3rd parties be able to implement their own?

Testing

* Testing control algorithms is really hard.
* How would you test a platform that you don't have? Or a sensor you don't have?

Ownership

* What if company ABC breaks a feature company XYZ relies on?
* Would they fork the entire repo? Or should we encourage
vendor-specific implementations, effectively "forked" within the code
tree?

Algorithms other than AWB...

* AEC/AGC is heavily dependent on the type of statistics it gets. I
shan't list all the variations here, but can we handle them?
* People tend to forget ALSC...

Camera Tuning

* Most algorithms require tuning. Where and how will we store this?
* How to cope with different modules that use the same sensor? And
module-to-module variation?
* Algorithms that we supply will need a tuning tool, right?

Anyway, let me know if you'd like me to set up a call. Would later
this week or early next week suit? (Monday is a public holiday in the
UK)

Thanks and best regards

David

On Tue, 25 May 2021 at 06:32, Sebastian Fricke
<sebastian.fricke at posteo.net> wrote:
>
> Hey Naushir, David and Laurent + CC Jean-Michel,
>
> sorry for taking such a long time to reply, I am still new to the whole
> media subsystem and libcamera, so I needed a bit more time to think
> stuff through (and I am probably still wrong on some subjects).
>
> I have tried to address the general statistics problem, with a design
> idea + a minimal implementation of that idea. I am sorry that this
> response got quite large, but I really wanted to discuss my idea
> properly. If you would like me to propose in a different format for the
> future (pdf, markdown, presentation etc.), then please just tell me. :)
>
> On 06.05.2021 14:45, Naushir Patuck wrote:
> >On Thu, 6 May 2021 at 14:32, Laurent Pinchart <
> >laurent.pinchart at ideasonboard.com> wrote:
> >
> >> Hello,
> >>
> >> On Thu, May 06, 2021 at 09:45:38AM +0100, David Plowman wrote:
> >> > On Thu, 6 May 2021 at 09:04, Naushir Patuck <naush at raspberrypi.com>
> >> wrote:
> >> > > On Wed, 5 May 2021 at 19:08, Sebastian Fricke <
> >> sebastian.fricke at posteo.net> wrote:
> >> > >>
> >> > >> Hello Naushir and David,
> >> > >>
> >> > >> I am currently performing a lot of investigation and preparatory work
> >> > >> for a few generalizations in form of an IPAFramework class, that
> >> reuses
> >> > >> some of the great concepts you introduced to libcamera. Along the way
> >> I
> >> > >> noticed, that you have implemented the apply methods (applyAGC,
> >> applyAWB,
> >> > >> etc.) as part of the IPARPi class.
> >> > >>
> >> > >> I was wondering if it would be possible to move those to the
> >> controller
> >> > >> and specific algorithm implementation, as they mostly get the status
> >> from
> >> > >> the algorithm, perform a few calculations with it and insert the
> >> results
> >> > >> of those calculations into the ISP controls.
> >> > >>
> >> > >> The upside of doing so, from my point of view, would be that we could
> >> > >> call `apply` similar to the way we call `initialize`, `process` and
> >> > >> `prepare`.
> >> > >> That way, I would be able to generalize the whole step of applying the
> >> > >> algorithm status to a set of controls, which can then be used for
> >> setting
> >> > >> the ISP controls.
> >> > >> Now, the reason why I ask you is that I want to be sure, that I didn't
> >> > >> miss a key concept.
> >> > >
> >> > >
> >> > > This was indeed something I did consider doing some time back, but
> >> > > never got to it.  There are a couple of considerations with this
> >> > > proposal that we need to address:
> >> > >
> >> > > 1) My original intention with keeping apply* out of the Controllers
> >> > > was to *somewhat* decouple the BCM2835 ISP parameters from the
> >> > > Controller algorithms.  I say somewhat because some algorithms do
> >> > > have to know the statistics structure defined in the bcm2835-isp.h
> >> > > header, so there is already a hardware dependency there.  Secondly,
> >> > > I wanted to try and not include any libcamera dependency in the
> >> > > Controller code.  We do now have libcamera logging in the controller
>
> Could you elaborate why you wanted to keep the libcamera dependencies
> out of it? Is it just to decouple it from libcamera elements that might
> change and break stuff? I am just asking because my plan is to make the
> Controller class generally accessible by all IPAs, and so if I touch the
> class it is probably best if I touch it without breaking the inital
> design.
>
> >> > > source files, so again, this is also not strictly true.
> >> > >
> >> > > 2) applyAGC() is a special one in that it runs after
> >> > > Controller::Process() whereas all the others run after
> >> > > Controller::Prepare.  applyAGC() also runs on IPA::start() so this
> >> > > must be treated differently somehow.
>
> Okay so `Prepare`, gets the current status of the algorithm and puts it
> into the Metadata to be accessed. Which means in case of the AGC
> algorithm, during apply we use the status from the last frame.
>
> So, if I understand the problem properly, no matter if we call `Apply`
> before fetching the statistics data from the current frame or calling it
> after receiving the statistics data and calling `Prepare`, we still
> want to call the same `Apply` method (Meaning we don't need different
> versions of `Apply` depending on the moment of calling it). Which means
> it is more a matter of deciding, which Algorithm status to apply to the
> controls at a given moment and which not.
>
> >> > >
> >> > > In a way, having a Controller::Apply() might be quite nice, if we
> >> > > are able to somehow add a HW abstraction layer (through templated
> >> > > functions perhaps?).  Or does it even matter? David, what are your
> >> > > thoughts on this?
> >> >
> >> > So I remain quite strongly in favour in keeping hardware specific code
> >> > away from the algorithms. As Naush says, they do get a bit mixed
> >> > together because of the statistics, but for the moment we have to live
> >> > with this. We could imagine a formal API which takes the statistics
> >> > from the hardware (whatever that is) and produces the numbers the
> >> > algorithms want, though sometimes you could imagine tweaking the
> >> > algorithm slightly might be better than going through lots of hoops
> >> > with statistics, so that's probably one to think about.
> >>
> >> Keeping the implementation of algorithms hardware-agnostic would be
> >> great, but it's a hard problem. We could imagine translating all
> >> statistics to a standard format before passing them to algorithms, but
> >> that will have a performance impact, and could also result in
> >> information loss as a format that would be a superset of all possible
> >> hardware features is likely impossible. We need to figure out some
> >> middleground, perhaps in the form of device-agnostic algorithm "pieces"
> >> that would be assembled by device-specific algorithm "implementations".
> >> There's room for research and clever designs.
>
> Proposal for general statistic data:
> ====================================
>
> General idea:
> =============
>
> Okay, so I have thought about this and while I am not perfectly
> satisfied with my current result, I still hope that it might lead into
> the right direction.
> My first approach is to define a set of inputs expected by a particular
> algorithm to produce a well-defined output. The idea behind that thought
> is that the algorithm should declare what data it expects and not
> vice-versa that the hardware dictates which data the algorithm has to
> process.
> The advantage from my point of view is that we could define generalized
> algorithms to be used for multiple platforms (like the first algorithms
> in the libipa/ folder from JM (https://rb.gy/o5afyv)) and that the
> algorithms don't care about the hardware implementation.
> And while the statistics data may look differently for each platform, if
> the IPA algorithm implementation uses the same underlying algorithm it
> will most likely require an equal set of input data.
> This means the idea lives on the assumption, that if an algorithm
> requires x and y to generate z, then we only have two possibilites for
> each platform A) We can use the algorithm because the statistical data
> can be converted to x and y. B) We cannot use the algorithm for this
> platform.
>
> Describing the idea for the AWB algorithm:
> ==========================================
>
> I try to explain my strategy for the example of AWB algorithms. For
> calculating the correct color temperature, we have multiple available
> algorithms but let's say we use the grey world algorithm and the Bayes
> algorithm. This would mean that we define two alternative data formats
> for the AWB algorithm implementation:
> (This does not claim to be complete)
>
> ```
> struct awbGreyWorld {
>         unsigned int counted;
>         unsigned long long redSum;
>         unsigned long long greenSum;
>         unsigned long long blueSum;
> };
>
> struct awbBayes {
>         unsigned int counted;
>         unsigned long long redSum;
>         unsigned long long greenSum;
>         unsigned long long blueSum;
>         long long lux;
> };
>
> struct statsData {
>     std::variant<
>         std::vector<struct awbGreyWorld>,
>         struct awbBayes>
>     > awbStats;
> }
> ```
>
> Each algorithm is then associated with a fitting data format, which is
> declared with a tag within the tuning file similar to how we work with
> algorithms already they should be created with a static function during
> start-up and should be retrieved later by their name. If we implement a
> new algorithm, we therefore have to add an additional data format.
> The data format acts as an additional layer that converts the data from
> specific to general and from general to specific. Which means each
> platform implements a data-format for each used algorithm.
>
> I have created a minimal example of that procedure here:
> https://gitlab.com/-/snippets/2123926
>
> The snippet also contains diagrams that try to visualize my ideas.
>
> The controller then receives the `struct statisiticsData` structure,
> which will contain the generalized statistics data for all used
> algorithms, via the `controller_->Process()` call.
>
> In case the hardware implementation cannot provide the required data for
> the chosen algorithm (no data format exists for the algorithm on the
> platform), the algorithm will throw a run time error because it makes
> sure that the variant points to the correct structure with a visitor
> pattern.
>
> The `controller_->Prepare()` call writes the internal results to the
> Metadata.  And finally the `Apply` routines insert the Metadata into
> general results for the specific algorithm.
> As mentioned by David, it might be beneficial to implement two apply
> routines (one that creates controls from the last-frames Metadata and
> one that creates them from the current Metadata). But instead of setting
> the controls directly we write the finished calculations to a general
> `struct algorithmResults`, which contains the results of the used
> algorithms.
> This further ensures that we encapsulate the controller and algorithms
> from the hardware details. It is the job of the specific data-format,
> to convert those results to the proper controls as it is allowed to know
> about the hardware.
> But I am not entierly convinced about that part, as the results look
> very similar to the Metadata so maybe we just use the Metadata directly
> within the data formats.
>
> My goal, in this case, would be to have exactly one `greyworld` algorithm
> in the libcamera source tree and not one algorithm for each platform doing
> roughly the same thing.
>
> But the question is how do we provide data, which might not be available
> when processing the statistic data (for example at the RPi AWB Bayes
> algorithm the lux value). The visible problem for me is, that we might
> receive that data through different routes. For example, on the RPi we
> get the lux value from another algorithm by calculating:
> ```
> double estimated_lux = shutter_speed_ratio * gain_ratio *
>                aperture_ratio * aperture_ratio *
>                Y_ratio * reference_lux_;
> ```
> What would we do if the illuminance is directly part of the ISP
> statistics data? In that case, it would be a detour to first save that
> value into the lux status before receiving it by the AWB Bayer
> algorithm.
> One idea would be to use the illuminance value directly as part of the
> generalized statistics data for the AWB Bayes algorithm, in which case
> direct data from the ISP could be used and if the ISP doesn't provide
> statistical data for illuminance we could use the calculation from the
> last frame (Metadata) or the default value from the tuning file.
> Do you see potential complications if we rely on the statistical data
> from the last frame?
>
> Different possible locations for the conversion:
> ================================================
>
> I have two variations in mind for how to implement the new generalized
> routines, I have tried to visualize them here: https://rb.gy/pb1qk9
>
> The first version presents the main idea, which makes the controller and
> algorithms hardware-agnostic by adding an abstraction layer in between
> the IPA class and the controller. That layer is used to convert the
> statistics data and the resulting controls.
> The second version is a bit more ambitious. Here I thought about
> performing that step within the Pipeline Handler. While thinking about
> that solution, I got quite excited by the thought of maybe making the
> whole IPA class hardware-agnostic.
> I know that, that plan would involve a lot more work than just
> the statistics data handling, but I thought I should mention it as
> having a single IPA handler for all platforms sounds great (in theory).
>
> New folder structure:
> =====================
>
> My current design would lead to a change of the folder structure within
> `src/ipa`, here is a quick sketch of the new structure:
> ```
> libcamera
> └── src
>    ├── ipa
>     │   ├── ipu3
>     │   │   ├── formats
>     │   │   │   └── ...
>     │   │   ├── data
>     │   │   └── ipu3.cpp
>     │   ├── libipa
>     │   │   ├── algorithms
>     │   │   │   ├── awbGreyWorld.cpp
>     │   │   │   ├── ...
>     │   │   │   └── blackLevelSubtraction.cpp
>     │   │   ├── controller.cpp
>     │   │   └── algorithm.cpp
>     │   ├── raspberrypi
>     │   │   ├── formats
>     │   │   │   └── ...
>     │   │   ├── data
>     │   │   └── raspberrypi.cpp
>     │   ├── rkisp1
>     │   │   ├── formats
>     │   │   │   ├── awbGreyWorldFormat.cpp
>     │   │   │   ├── ...
>     │   │   │   └── blackLevelSubtractionFormat.cpp
>     │   │   ├── data
>     │   │   └── rkisp1.cpp
>     │   └── vimc
>     │       └── data
>     ├── libcamera
>         ...
> ```
>
> I am quite certain that you have already thought about ther further structure
> for the IPA folder as well, so I am anticipating your ideas :).
>
> Potential problems:
> ===================
>
> One problem that does arise when we split every algorithm into it's own
> file and assign a specific data format for it, is that we cannot
> switch as smoothly between different algorithms as for example in
> `src/ipa/raspberrypi/controller/rpi/awb.cpp`, where a simple boolean
> determines which algorithm we choose. From my point of view, it
> would be cleaner to split each algorithm into a separate file for
> maintainability and reusability. So, I will have to find a solution for
> this, in the code we fall back to the greyworld algorithm in case of the
> color temperature curve or the priors were not configured.
> Just to get the simplest possibility out of the way, this seems to be a
> way to prioritize the Bayes algorithm over the greyworld algorithm in
> case the user supplied the correct configuration. If the user chooses
> the Bayes algorithm within the tuning file explicitly and has a
> malformed configuration we could simply abort without using a fallback
> right?
>
> Another area, that I have not looked into so far is the whole IPA
> isolation topic, my current attempts mostly focused on making it
> possible to create general open source algorithms to be used by all, but
> maybe some manufacturers would like to create an algorithm
> implementation for their platform and keep it closed source, if we want
> to continue with my idea, I will have to dive deeper into that I
> suppose.
>
> And my current proof of concept doesn't involve any concurrency
> handling so far.
>
> Summary:
> ========
>
> So in summary, each platform is responsible for the conversion of data
> in order to isolate the algorithms from the hardware layer. I see a few
> pros and contras with my approach:
>
> Pros:
> -----
> - Abstract the Controller & the algorithms completely from the hardware
> - Easy addition of new platforms, all you have to do is locate which
>   algorithms can be used with the given statistical data and implement
>   data formats for each chosen algorithm to prepare the data for it.
> - Reduce the codebase size by not repeating algorithm boilerplate code
>   and implementations for every platform
>
> Cons:
> -----
> - It adds an additional layer, therefore it might hurt performance
>   (even though at least the input conversion would have to be done
>   anyway)
> - Some platforms might not be able to share all too many algorithms, as
>   their statistical data cannot be converted to the required data for a
>   specific algorithm
>
> >>
> >> > Nonetheless, I really don't want more hardware knowledge getting into
> >> > the algorithms, nor would I want to see algorithms making direct calls
> >> > that will set hardware.
> >>
> >> Fully agreed on the second part, the IPA design explicitly forbids
> >> accessing the device from algorithms.
> >>
> >
> >I think the idea here is to allow Controller::Apply() to only populate the
> >ControlList items, and the IPA to send that ControlList to the PH to apply
> >the parameters as V4L2 controls, as is done today.
> >
> >
> >>
> >> > Similarly, I really don't want one algorithm
> >> > to make direct calls that affect another algorithm - this always
> >> > starts quite benignly but can ultimately tend towards spaghetti. The
> >> > rule that we have is that this is all communicated through metadata.
> >>
> >> While exception may sometimes be needed, I really want to follow that
> >> rule.
> >>
> >> > But I think that still leaves us with options. Controller::Apply()
> >> > sounds like a reasonable thing to have, so long as it doesn't touch
> >> > the algorithms and only uses the metadata that they produced. Perhaps
> >> > we need a "PreApply" and a "PostApply", to allow for them to operate
> >> > either before or after the ISP.
>
> >> > One more point to consider is what we'd do with imaging pipelines that
> >> > have a different architecture. One obvious case would be the "straight
> >> > through" ISP where the image buffer goes straight from the CSI-2
> >> > receiver into the ISP and only reaches memory after that. Another
> >> > variant on that might be a hybrid architecture, where some processing
> >> > happens in this "straight through" manner, but then there's a "main"
> >> > ISP which runs memory-to-memory.
> >>
> >> The former is what the rkisp1 implements, the latter is also needed to
> >> support new platforms. We usually call the "straight through" part the
> >> "inline" ISP, and the memory-to-memory part the "offline" ISP. There
> >> will usually be a single inline ISP, but we could have multiple offline
> >> components (hardware ISPs, general-purpose DSPs, GPUs, ...) in the
> >> pipeline.
> >>
> >> > How would that look? It's probably
> >> > worth giving these some consideration now, rather than inventing
> >> > something for our current hardware platforms and finding ourselves in
> >> > a pickle later!
>
> Alright, so let me try to understand the issue correctly.
> Case 1: Raspberry Pi: (offline ISP)
>     Send request to camera -> get image/embedded data -> prepare ISP
>     controls -> send request to ISP -> get processed image/statistics
>     data -> send statistics data to IPAs -> process new metadata ->
>     repeat
>
> Case 2: RkISP1: (inline ISP)
>     Send request to main-/self-path device -> Receive image/statistics
>     data -> send statistics to IPAs -> process new metadata -> create
>     parameter buffer and send to ISP to set controls for the next frame
>     -> repeat
>
> Case 3: (hybrid ISP)
>     send request to inline ISP device -> Receive image/statistics data
>     -> send statistics to IPAs -> process new metadata -> set controls
>     for inline ISP (either directly or per buffer) -> send request to
>     offline ISP -> Receive image/statistics data -> send statistics to
>     IPAs -> process new metadata -> set controls for offline ISP ->
>     etc... -> repeat
>
> I didn't have an example for case 3, but I think it could look like that,
> am I utterly wrong?
>
>  From my point of view we always have the same cycle:
> - Send a request to the ISP (Either we fetched the image data from the
>   camera or that part is handled internally, but it all boils down to,
>   'send raw data to the ISP')
> - Receive the processed image data and the statistics
> - Process the statistics
> - Set the settings of the ISP for the next frame
>
> And the IPAs shouldn't care what differentiates the hardware as all
> they do is transform input data into output data. I guess if we have
> multiple ISPs in the pipeline, we probably need a controller instance
> for each ISP so that they don't affect each other. But I currently
> don't see a completely different workflow.
>
> Could you elaborate on which potential problems you had in mind?
>
> >> >
> >> > >> --
> >> > >>
> >> > >> Additionally, off-topic to the topic mentioned above. While
> >> > >> investigating the RPiStream abstraction, I noticed the following
> >> comment
> >> > >> within: `rpi_stream.cpp`:116
> >> > >> ```
> >> > >>   * A nullptr buffer implies an external stream, but no external
> >> > >>   * buffer has been supplied in the Request. So, pick one from the
> >> > >>   * availableBuffers_ queue.
> >> > >>   ```
> >> > >>
> >> > >>   And within `rpi_steam.h`:139
> >> > >>   ```
> >> > >>   * A nullptr indicates any internal buffer can be used (from
> >> availableBuffers_),
> >> > >>   * whereas a valid pointer indicates an external buffer to be queued.
> >> > >>   ```
> >> > >>
> >> > >>   Which confused me a little bit, could you enlighten me in this
> >> matter?
> >> > >>   Does a nullptr buffer imply an external stream and therefore an
> >> > >>   external buffer? Or does a nullptr buffer imply an internal buffer
> >> and
> >> > >>   therefore an internal stream?
> >> > >
> >> > >
> >> > > I can see why this caused some confusion :-)
> >> > >
> >> > > Firstly, my definitions of external and internal streams may need
> >> > > clarification.  The former is a stream that *may* be exported to the
> >> > > application, and the latter is a stream that can never be exported
> >> > > to the application.  For external streams, the buffers used may be
> >> > > provided by the application, and these buffers will be returned back
> >> > > to the application once they have been processed.  However, we can
> >> > > also use internally allocated buffers for external streams that will
> >> > > never go back to the application.  This is needed when:
> >> > >
> >> > > 1) The application has not provided an external buffer to use for
> >> > > that stream in a Request.  Example of this is when we have a RAW
> >> > > stream set to external, but that application will only request a RAW
> >> > > image once every N frames.
> >> > > 2) If we need to drop frames as instructed by the IPA, we cannot use
> >> > > external buffers as they will eventually be dropped and not returned
> >> > > to the application.  In this case, we queue an internal buffer that
> >> > > can be dropped safely.  The alternative is to requeue external
> >> > > buffers once dropped, but will add a ton of latency to the capture
> >> > > cycle.
> >> > >
> >> > > Hope that helps clarify things a bit.
>
> Indeed that helps to understand the situation thank you!
>
> >>
> >> --
> >> Regards,
> >>
> >> Laurent Pinchart
> >>
>
> >_______________________________________________
> >libcamera-devel mailing list
> >libcamera-devel at lists.libcamera.org
> >https://lists.libcamera.org/listinfo/libcamera-devel
>