Soft ISP TODO list

Mon Apr 22 14:42:24 CEST 2024

Hi,

On 4/22/24 2:12 PM, Laurent Pinchart wrote:
> Hello,
> 
> On Mon, Apr 22, 2024 at 01:38:58PM +0200, Hans de Goede wrote:
>> On 4/22/24 1:24 PM, Milan Zamazal wrote:
>>> Hi,
>>>
>>> thank you all for your comments and clarifications.  Let me summarize the
>>> discussion, with my notes merged in, before I get lost in it:
>>>
>>> - Since our current black level and color gain operations are linear, we average
>>>   only the same color pixels and the lookup table is applied after averaging the
>>>   pixels, the current debayering implementation is correct.  It doesn't matter
>>>   whether we do "average pixels -> subtract black -> multiply by color gain ->
>>>   apply gamma" or "subtract black -> multiply by color gain -> average pixels ->
>>>   apply gamma".
>>
>> This is not entirely true, because of the clamping involved in black-level
>> subtraction (and the same for clamping at the top for gains), e.g.
>>
>> Lets say that for the red component we average 2 pixels with a blacklevel
>> compensation of 8 for all colors and the value of the 2 pixels is:
>> 6 and 10, then if we average first and then do blacklevel compensation the
>> result of blacklevel correction is 2 pixels with a value of 0 and 2,
>> which averages to 1.  Where as if we first average we get an average of 8
>> and then after blacklevel compensation we end up with 0.
>>
>>> - This may no longer hold if we change the operations, for example:
>>>
>>> - The optimization suggested by Hans (preprocessing while copying lines from the
>>>   input buffer).  It may or may not have a significant impact on the performance
>>>   and may or may not have a significant impact on the image quality.  I think we
>>>   need some experiments here.  Performance on the CPU is generally challenging,
>>>   so if we can demonstrate a performance benefit, we should consider including
>>>   it (most likely configurable), if it doesn't turn the implementation into a
>>>   complete mess.
>>
>> I think that if we want to move to applying blacklevel + gain before
>> debayering that we then should do this unconditionally, which means also
>> unconditionally doing the memcpy to a temporary line buffer.
> 
> If we can move it before debayering with acceptable performance, then I
> wouldn't make it configurable.
> 
>>> - Or if we add support for non-linear gains, as mentioned by Pavel.
> 
> Sensors are mostly linear these days. That is, until they reach
> saturation of course. I don't think the soft ISP needs a linearization
> LUT.
> 
>>> - Laurent suggested adding a CCM matrix between debayering and gamma to achieve
>>>   a reasonable image quality.  According to Hans, this is a heavy operation on
>>>   CPUs.  Laurent still sees CPU processing as useful for experiments.  I think
>>>   we can make this simply configurable (if it gets implemented).
>>
>> I agree that adding a CCM matrix as an optional step sounds interesting
>> and this should indeed be configurable. This can be done as an optional
>> (gaurded by a single if per line) post-debayer step run on the output buffer.
> 
> Given that the debayering currently produces an RGB value, do you think
> an additional multiplication by a 3x3 matrix (ideally accelerated by
> SIMD) would be that costly ?

Currently we do max 4 additions + 1 divide per pixel. this adds * 9 *
multiplications + 6 adds per pixel. With FHD at 30 FPS that is
~560 million multiplications per second!

And many sensors do way higher resolutions/fps.

Maybe with SIMD we can do some of these in parallel and it certainly
is interesting to try at least on some beefy x86 laptops (or upcoming
beefy ARM laptops) but this is significantly going to hurt performance.

I'm not saying this is not interesting to experiment with, but I have
doubts this is something which we will enable by default anywhere.

Regards,

Hans

> 
>>> - If we split the processing to pre-bayer and post-bayer parts, we should
>>>   probably work with uint16_t or float's, which may have impact on performance.
>>>
>>> - Pavel couldn't get a better performance by using SIMD CPU instructions for
>>>   debayering.  Applying a CCM matrix may be a different matter.  Anyway, SIMD on
>>>   CPU is hard to use and may differ on architectures, so the question is whether
>>>   it's worth to invest into it.
> 
> Good question :-)
> 
>>> - GPU processing should make many of these things easier and more things
>>>   possible.
>>>
>>> - Do we already know how to map the buffers to GPU efficiently?
> 
> I haven't looked into it yet personally.
> 
>>> My conclusions are:
>>>
>>> - The current CPU debayering computation itself is fine at the moment and we
>>>   shouldn't change it without a good reason.  It can be replaced in future if
>>>   needed, once we have a better understanding of what and how we can
>>>   realistically achieve.  General cleanups, like moving table computations
>>>   elsewhere, would be still useful already now.
>>>
>>> - We can experiment with whatever mentioned above to get the better
>>>   understanding, but:
>>>
>>> - GPU processing is clearly a priority.
>>>
>>> - We have also other items in the TODO file!  (I already work on some of them.)
> 
> On this topic, I'm having a look at how we could use the soft ISP with
> the vimc pipeline handler.
> 
>>> - We should probably change the e-mail thread subject.
> 
> Done :-)
> 
>> I agree with all of your conclusions :)
>