[libcamera-devel] softISP for libcamera

Laurent Pinchart laurent.pinchart at ideasonboard.com
Wed Dec 6 20:52:20 CET 2023


On Wed, Dec 06, 2023 at 07:15:06PM +0100, Hans de Goede via libcamera-devel wrote:
> Hi Pavel,
> 
> On 12/5/23 22:49, Pavel Machek wrote:
> > Hi!
> > 
> >>> (I'd really like ability to collect statistics without doing the
> >>> debayering,
> >>
> >> That is actually already a part of my refactoring, the stats
> >> collection is now done separately (on a line by line basis
> >> to keep the input data hot in the cache for debayering.
> > 
> > #define SWISP_LINARO_START_LINE_STATS()                 \
> >         uint8_t r, g1, g2, b;                           \
> > 
> > ...
> > 
> > void SwIspLinaro::IspWorker::statsBGGR10PLine0(const uint8_t *src0)
> > void SwIspLinaro::IspWorker::statsGBRG10PLine0(const uint8_t *src0)
> > 
> > ...
> > 
> > This is quite evil code. Could we have struct for statistics and
> > inline functions instead?
> 
> I thought about this, but with a struct with say the R + G + B
> sums used for white-balancing, the compiler will take the address
> of the pointer passed to that struct and then deref that.
> 
> Where we really want these accumulators to be in registers
> in the inner loop.
> 
> I guess the compiler might be smart enough to replace
> a struct which is only used locally and passed around by reference
> to static inline helpers with putting all the struct members into
> registers and directly access them but I would rather not count
> on this.
> 
> 
> > There must be a better way then this.
> > 
> > What about
> > 
> > void inline statsBayer10PLine0(const uint8_t *src0, int bayer_type)
> > 
> > ... have a switch for BGGR/GBRG, ... variants passed as bayer_type. Then
> > 
> > void SwIspLinaro::IspWorker::statsBGGR10PLine0(const uint8_t *src0)
> > {
> >     statsBayer10PLine0(src0, 0);
> > }
> > 
> > and rely on optimizer doing the right thing? Having 4 copies is
> > ... not fun. Especially because I'll need to do one more copy for 8
> > bit support...
> 
> Note that with the 3:
> 
> SWISP_LINARO_START_LINE_STATS()
> SWISP_LINARO_ACCUMULATE_LINE_STATS()
> SWISP_LINARO_FINISH_LINE_STATS()
> 
> helpers at least for the stats most of the work is done in
> these 3 macros and those macros work for your 8 bits per
> pixel bayer fmt unmodified. What remeans is actually reading
> the r + g + b values from the bayer-grid which we need
> separate code for for each variant regardless.
> 
> (using offsets is not a good idea because ideally we
> want to read the bytes in memory order).
> 
> So in a way these ugly macros actually make handling different
> bayer fmts easy, at least for the statistics gathering.
> 
> For the debayering itself having to type out everything
> manually admittedly is a bit painful, but note how
> with the 10bits / pixel packed fmt this allows skipping
> the byte with the least-significant bits in an effecient
> manner. Where as Andrey's generic debayer code was
> full of offset calculations for each separate pixel
> surrounding the currently processed pixel.
> 
> I agree having to type all this out this way is not
> ideal, but it leads to significantly faster code and
> we only need to write all this once. I plan to "hide"
> all of this in a separate debayer class and then we
> only need to write all this once and after that
> I expect this to not need a lot of maintenance
> (speaking from my experience with libv4lconvert).

Throwing in a random idea, I wonder if we should consider using
https://github.com/GStreamer/orc.

> I realize my approach leads to pretty verbose
> code, but we only need to write + review this
> once (per x-bits per pixel variant) and I would
> rather focus on all the other work which needs
> to be done then try to come up with a clever
> solution for this.

Another random idea, I don't know if it can be of any help, but
generating code at build time (using C++ templates, macros, or jinja
templates) could also be an option instead of manually writing
similar-but-different sources manually.

> With that said, if you find a clever solution
> which is both equally readable and as fast as
> my verbose solution I won't object against that.
> 
> A possible candidate for inspiration here is:
> 
> https://git.linuxtv.org/v4l-utils.git/tree/lib/libv4lconvert/bayer.c
> 
> which at least for 8 bits per pixel has some nice
> pretty generic code. Although I would still prefer
> separate functions for the inner loop for
> blue first vs not blue first, removing the
> if for this so that we don't mess up the branch
> predictor and so that chances are better that
> the entire inner-loop will fiy in the u-op cache.

-- 
Regards,

Laurent Pinchart


More information about the libcamera-devel mailing list