[PATCH v5 10/10] libcamera: software_isp: Apply CCM in debayering

Laurent Pinchart laurent.pinchart at ideasonboard.com
Tue Feb 4 21:02:54 CET 2025


Hi Milan,

On Tue, Feb 04, 2025 at 05:52:07PM +0100, Milan Zamazal wrote:
> Laurent Pinchart writes:
> > On Thu, Jan 30, 2025 at 07:14:47PM +0100, Milan Zamazal wrote:
> >> This patch applies color correction matrix (CCM) in debayering if the
> >> CCM is specified.  Not using CCM must still be supported for performance
> >> reasons.
> >> 
> >> The CCM is applied as follows:
> >> 
> >>             [r1 r2 r3]
> >>   [r g b] * [g1 g2 g3]
> >>             [b1 b2 b3]
> >> 
> >> The CCM matrix (the right side of the multiplication) is constant during
> >> single frame processing, while the input pixel (the left side) changes.
> >> Because each of the color channels is only 8-bit in software ISP, we can
> >> make 9 lookup tables with 256 input values for multiplications of each
> >> of the r_i, g_i, b_i values.  This way we don't have to multiply each
> >> pixel, we can use table lookups and additions instead.  Gamma (which is
> >> non-linear and thus cannot be a part of the 9 lookup tables values) is
> >> applied on the final values rounded to integers using another lookup
> >> table.
> >> 
> >> We use int16_t to store the precomputed multiplications.  This seems to
> >> be noticeably (>10%) faster than `float' for the price of slightly less
> >> accuracy and it covers the range of values that sane CCMs produce.  The
> >> selection and structure of data is performance critical, for example
> >> using bytes would add significant (>10%) speedup but would be too short
> >> to cover the value range.
> >> 
> >> The color lookup tables can be represented either as unions,
> >> accommodating tables for both the CCM and non-CCM cases, or as separate
> >> tables for each of the cases, leaving the tables for the other case
> >> unused.  The latter is selected as a matter of preference.
> >> 
> >> The tables are copied (as before), which is not elegant but also not a
> >> big problem.  There are patches posted that use shared buffers for
> >> parameters passing in software ISP (see software ISP TODO #5) and they
> >> can be adjusted for the new parameter format.
> >> 
> >> Color gains from white balance are supposed not to be a part of the
> >> specified CCM.  They are applied on it using matrix multiplication,
> >> which is simple and in correspondence with future additions in the form
> >> of matrix multiplication, like saturation adjustment.
> >> 
> >> With this patch, the reported per-frame slowdown when applying CCM is
> >> about 45% on Debix Model A and about 75% on TI AM69 SK.
> >> 
> >> Using std::clamp in debayering adds some performance penalty (a few
> >> percent).  The clamping is necessary to eliminate out of range values
> >> possibly produced by the CCM.  If it could be avoided by adjusting the
> >> precomputed tables some way then performance could be improved a bit.
> >> 
> >> Signed-off-by: Milan Zamazal <mzamazal at redhat.com>
> >> ---
> >>  .../internal/software_isp/debayer_params.h    | 15 ++++-
> >>  src/ipa/simple/algorithms/lut.cpp             | 65 ++++++++++++++-----
> >>  src/ipa/simple/algorithms/lut.h               |  1 +
> >>  src/libcamera/software_isp/debayer.cpp        | 52 ++++++++++++++-
> >>  src/libcamera/software_isp/debayer_cpu.cpp    | 35 ++++++++--
> >>  src/libcamera/software_isp/debayer_cpu.h      |  4 ++
> >>  6 files changed, 145 insertions(+), 27 deletions(-)
> >> 
> >> diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h
> >> index 7d8fdd48..9a7dbe93 100644
> >> --- a/include/libcamera/internal/software_isp/debayer_params.h
> >> +++ b/include/libcamera/internal/software_isp/debayer_params.h
> >> @@ -1,6 +1,6 @@
> >>  /* SPDX-License-Identifier: LGPL-2.1-or-later */
> >>  /*
> >> - * Copyright (C) 2023, 2024 Red Hat Inc.
> >> + * Copyright (C) 2023-2025 Red Hat Inc.
> >>   *
> >>   * Authors:
> >>   * Hans de Goede <hdegoede at redhat.com>
> >> @@ -18,11 +18,24 @@ namespace libcamera {
> >>  struct DebayerParams {
> >>  	static constexpr unsigned int kRGBLookupSize = 256;
> >>  
> >> +	struct CcmColumn {
> >> +		int16_t r;
> >> +		int16_t g;
> >> +		int16_t b;
> >> +	};
> >> +
> >>  	using ColorLookupTable = std::array<uint8_t, kRGBLookupSize>;
> >> +	using CcmLookupTable = std::array<CcmColumn, kRGBLookupSize>;
> >> +	using GammaLookupTable = std::array<uint8_t, kRGBLookupSize>;
> >>  
> >>  	ColorLookupTable red;
> >>  	ColorLookupTable green;
> >>  	ColorLookupTable blue;
> >> +
> >> +	CcmLookupTable redCcm;
> >> +	CcmLookupTable greenCcm;
> >> +	CcmLookupTable blueCcm;
> >> +	GammaLookupTable gammaLut;
> >
> > Given that the ColorLookupTable and GammaLookupTable types are the same,
> > would it make sense to save a bit of memory by combining the combining
> > the colour and gamma tables ? I'm thinking about something like
> >
> > 	struct ColorLookupTables {
> > 		CcmLookupTable ccm;
> > 		GammaLookupTable gamma;
> > 	}
> >
> > 	ColorLookupTables red;
> > 	ColorLookupTables green;
> > 	ColorLookupTables blue;
> >
> > The three gamma tables would be identical when CCM is disabled. 
> 
> Hm, this looks messy.  Let's think a bit in term of a common type
> 
>   using LookupTable = std::array<uint8_t, kRGBLookupSize>
> 
> If CCM is disabled, we need three LookupTable's for red, green, blue and
> that's all.
> 
> If CCM is enabled, we need three CcmLookupTable's for red, green, blue
> and a single LookupTable for gamma.
> 
> This means in your proposal it should actually be
> 
>   struct ColorLookupTables {
>           CcmLookupTable ccm;
>           LookupTable colorOrGamma;
>   }
> 
> With CCM disabled, `ccm' would be unused and colorOrGamma used.  With
> CCM enabled, `ccm' would be used and colorOrGamma would be also used but
> it would be the same for all three colors.  Such an arrangement makes
> little sense to me.

I agree it's not perfect.

> It's obvious the current version is also not crystal clear so I'll think
> whether it could be done a bit better to make this patch more
> satisfactory.

Maybe adding some extra comments would be enough. Documenting what
processing steps are handled by each table in the CCM and !CCM case
would work for me.

> The rest of the series is already settled down (with your
> other suggestions incorporated in v6).  I'll also fix the bugs below in
> v6.
> 
> > If that causes performance issues (possibly because the three
> > identical gamma tables would take more cache space), then we can keep
> > them separate. I'm tempted to merge the ColorLookupTable and
> > GammaLookupTable into a single type.
> >
> > You should also document somewhere which tables are used in which case
> > (<color> when CCM is disabled, combining gains and gamma, and <color>Ccm
> > and gammaLut when CCM is enabled, with <color>Ccm covering CCM and
> > gains, and gammaLut covering gamma). I think this would be easier to
> > document with the ColorLookupTables type, as you can then document what
> > the ccm and gamma fields cover.
> >
> > Either way, with this addressed,
> >
> > Reviewed-by: Laurent Pinchart <laurent.pinchart at ideasonboard.com>
> >
> >>  };
> >>  
> >>  } /* namespace libcamera */
> >> diff --git a/src/ipa/simple/algorithms/lut.cpp b/src/ipa/simple/algorithms/lut.cpp
> >> index e5e995c7..7719aecc 100644
> >> --- a/src/ipa/simple/algorithms/lut.cpp
> >> +++ b/src/ipa/simple/algorithms/lut.cpp
> >> @@ -1,6 +1,6 @@
> >>  /* SPDX-License-Identifier: LGPL-2.1-or-later */
> >>  /*
> >> - * Copyright (C) 2024, Red Hat Inc.
> >> + * Copyright (C) 2024-2025, Red Hat Inc.
> >>   *
> >>   * Color lookup tables construction
> >>   */
> >> @@ -80,6 +80,11 @@ void Lut::updateGammaTable(IPAContext &context)
> >>  	context.activeState.gamma.contrast = contrast;
> >>  }
> >>  
> >> +int16_t Lut::ccmValue(unsigned int i, float ccm) const
> >> +{
> >> +	return std::round(i * ccm);
> >> +}
> >> +
> >>  void Lut::prepare(IPAContext &context,
> >>  		  [[maybe_unused]] const uint32_t frame,
> >>  		  [[maybe_unused]] IPAFrameContext &frameContext,
> >> @@ -91,28 +96,52 @@ void Lut::prepare(IPAContext &context,
> >>  	 * observed, it's not permanently prone to minor fluctuations or
> >>  	 * rounding errors.
> >>  	 */
> >> -	if (context.activeState.gamma.blackLevel != context.activeState.blc.level ||
> >> -	    context.activeState.gamma.contrast != context.activeState.knobs.contrast)
> >> +	const bool gammaUpdateNeeded =
> >> +		context.activeState.gamma.blackLevel != context.activeState.blc.level ||
> >> +		context.activeState.gamma.contrast != context.activeState.knobs.contrast;
> >> +	if (gammaUpdateNeeded)
> >>  		updateGammaTable(context);
> >>  
> >>  	auto &gains = context.activeState.awb.gains;
> >>  	auto &gammaTable = context.activeState.gamma.gammaTable;
> >>  	const unsigned int gammaTableSize = gammaTable.size();
> >> -
> >> -	for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
> >> -		const double div = static_cast<double>(DebayerParams::kRGBLookupSize) /
> >> -				   gammaTableSize;
> >> -		/* Apply gamma after gain! */
> >> -		unsigned int idx;
> >> -		idx = std::min({ static_cast<unsigned int>(i * gains.r() / div),
> >> -				 gammaTableSize - 1 });
> >> -		params->red[i] = gammaTable[idx];
> >> -		idx = std::min({ static_cast<unsigned int>(i * gains.g() / div),
> >> -				 gammaTableSize - 1 });
> >> -		params->green[i] = gammaTable[idx];
> >> -		idx = std::min({ static_cast<unsigned int>(i * gains.b() / div),
> >> -				 gammaTableSize - 1 });
> >> -		params->blue[i] = gammaTable[idx];
> >> +	const double div = static_cast<double>(DebayerParams::kRGBLookupSize) /
> >> +			   gammaTableSize;
> >> +
> >> +	if (!context.ccmEnabled) {
> >> +		for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
> >> +			/* Apply gamma after gain! */
> >> +			unsigned int idx;
> >> +			idx = std::min({ static_cast<unsigned int>(i * gains.r() / div),
> >> +					 gammaTableSize - 1 });
> >> +			params->red[i] = gammaTable[idx];
> >> +			idx = std::min({ static_cast<unsigned int>(i * gains.g() / div),
> >> +					 gammaTableSize - 1 });
> >> +			params->green[i] = gammaTable[idx];
> >> +			idx = std::min({ static_cast<unsigned int>(i * gains.b() / div),
> >> +					 gammaTableSize - 1 });
> >> +			params->blue[i] = gammaTable[idx];
> >> +		}
> >> +	} else if (context.activeState.ccm.changed || gammaUpdateNeeded) {
> >> +		Matrix<double, 3, 3> gainCcm = { { gains.r(), 0, 0,
> >> +						   0, gains.g(), 0,
> >> +						   0, 0, gains.b() } };
> >> +		auto ccm = gainCcm * context.activeState.ccm.ccm;
> >> +		auto &red = params->redCcm;
> >> +		auto &green = params->greenCcm;
> >> +		auto &blue = params->blueCcm;
> >> +		for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
> >> +			red[i].r = ccmValue(i, ccm[0][0]);
> >> +			red[i].g = ccmValue(i, ccm[1][0]);
> >> +			red[i].b = ccmValue(i, ccm[2][0]);
> >> +			green[i].r = ccmValue(i, ccm[0][1]);
> >> +			green[i].g = ccmValue(i, ccm[1][1]);
> >> +			green[i].b = ccmValue(i, ccm[2][1]);
> >> +			blue[i].r = ccmValue(i, ccm[0][2]);
> >> +			blue[i].g = ccmValue(i, ccm[1][2]);
> >> +			blue[i].b = ccmValue(i, ccm[2][2]);
> 
> This needs to be transposed to conform with the CCM format in the tuning
> files as clarified by Stefan.
> 
> >> +			params->gammaLut[i] = gammaTable[i / div];
> >> +		}
> >>  	}
> >>  }
> >>  
> >> diff --git a/src/ipa/simple/algorithms/lut.h b/src/ipa/simple/algorithms/lut.h
> >> index 889f864b..77324800 100644
> >> --- a/src/ipa/simple/algorithms/lut.h
> >> +++ b/src/ipa/simple/algorithms/lut.h
> >> @@ -33,6 +33,7 @@ public:
> >>  
> >>  private:
> >>  	void updateGammaTable(IPAContext &context);
> >> +	int16_t ccmValue(unsigned int i, float ccm) const;
> >>  };
> >>  
> >>  } /* namespace ipa::soft::algorithms */
> >> diff --git a/src/libcamera/software_isp/debayer.cpp b/src/libcamera/software_isp/debayer.cpp
> >> index 45fe6960..4c84a556 100644
> >> --- a/src/libcamera/software_isp/debayer.cpp
> >> +++ b/src/libcamera/software_isp/debayer.cpp
> >> @@ -23,9 +23,39 @@ namespace libcamera {
> >>   * \brief Size of a color lookup table
> >>   */
> >>  
> >> +/**
> >> + * \struct DebayerParams::CcmColumn
> >> + * \brief Type of a single column of a color correction matrix (CCM)
> >> + */
> >> +
> >> +/**
> >> + * \var DebayerParams::CcmColumn::r
> >> + * \brief Red (first) component of a CCM column
> >> + */
> >> +
> >> +/**
> >> + * \var DebayerParams::CcmColumn::g
> >> + * \brief Green (second) component of a CCM column
> >> + */
> >> +
> >> +/**
> >> + * \var DebayerParams::CcmColumn::b
> >> + * \brief Blue (third) component of a CCM column
> >> + */
> >> +
> >>  /**
> >>   * \typedef DebayerParams::ColorLookupTable
> >> - * \brief Type of the lookup tables for red, green, blue values
> >> + * \brief Type of the simple lookup tables for red, green, blue values
> >> + */
> >> +
> >> +/**
> >> + * \typedef DebayerParams::CcmLookupTable
> >> + * \brief Type of the CCM lookup tables for red, green, blue values
> >> + */
> >> +
> >> +/**
> >> + * \typedef DebayerParams::GammaLookupTable
> >> + * \brief Type of the gamma lookup tables for CCM
> >>   */
> >>  
> >>  /**
> >> @@ -43,6 +73,26 @@ namespace libcamera {
> >>   * \brief Lookup table for blue color, mapping input values to output values
> >>   */
> >>  
> >> +/**
> >> + * \var DebayerParams::redCcm
> >> + * \brief CCM lookup table for red color, mapping input values to output values
> >> + */
> >> +
> >> +/**
> >> + * \var DebayerParams::greenCcm
> >> + * \brief CCM lookup table for green color, mapping input values to output values
> >> + */
> >> +
> >> +/**
> >> + * \var DebayerParams::blueCcm
> >> + * \brief CCM lookup table for blue color, mapping input values to output values
> >> + */
> >> +
> >> +/**
> >> + * \var DebayerParams::gammaLut
> >> + * \brief Gamma lookup table used with color correction matrix
> >> + */
> >> +
> >>  /**
> >>   * \class Debayer
> >>   * \brief Base debayering class
> >> diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp
> >> index 0cd03a8f..5ddfdcf6 100644
> >> --- a/src/libcamera/software_isp/debayer_cpu.cpp
> >> +++ b/src/libcamera/software_isp/debayer_cpu.cpp
> >> @@ -11,6 +11,7 @@
> >>  
> >>  #include "debayer_cpu.h"
> >>  
> >> +#include <algorithm>
> >>  #include <stdlib.h>
> >>  #include <sys/ioctl.h>
> >>  #include <time.h>
> >> @@ -51,8 +52,12 @@ DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats)
> >>  	enableInputMemcpy_ = true;
> >>  
> >>  	/* Initialize color lookup tables */
> >> -	for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++)
> >> +	for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
> >>  		red_[i] = green_[i] = blue_[i] = i;
> >> +		redCcm_[i] = { 1, 0, 0 };
> >> +		greenCcm_[i] = { 0, 1, 0 };
> >> +		blueCcm_[i] = { 0, 0, 1 };
> >> +	}
> >>  }
> >>  
> >>  DebayerCpu::~DebayerCpu() = default;
> >> @@ -62,12 +67,24 @@ DebayerCpu::~DebayerCpu() = default;
> >>  	const pixel_t *curr = (const pixel_t *)src[1] + xShift_; \
> >>  	const pixel_t *next = (const pixel_t *)src[2] + xShift_;
> >>  
> >> -#define STORE_PIXEL(b, g, r)        \
> >> -	*dst++ = blue_[b];          \
> >> -	*dst++ = green_[g];         \
> >> -	*dst++ = red_[r];           \
> >> -	if constexpr (addAlphaByte) \
> >> -		*dst++ = 255;       \
> >> +#define GAMMA(value) \
> >> +	*dst++ = gammaLut_[std::clamp(value, 0, static_cast<int>(gammaLut_.size()) - 1)]
> >> +
> >> +#define STORE_PIXEL(b_, g_, r_)                                        \
> >> +	if constexpr (ccmEnabled) {                                    \
> >> +		const DebayerParams::CcmColumn &blue = blueCcm_[b_];   \
> >> +		const DebayerParams::CcmColumn &green = greenCcm_[g_]; \
> >> +		const DebayerParams::CcmColumn &red = redCcm_[r_];     \
> >> +		GAMMA(blue.r + blue.g + blue.b);                       \
> 
> This should be GAMMA(red.b + green.b + blue.b).  Similarly for the other
> colors.  (It's embarrassing I haven't noticed this earlier but the wrong
> version often produces believable outputs.  I identified the problem
> when working on a saturation control when color casts in a supposedly
> monochrome output were no longer plausible :-).)
> 
> Unfortunately, the fixed version causes some slowdown (close to 10% in
> my environment), apparently due to worse locality.
> 
> >> +		GAMMA(green.r + green.g + green.b);                    \
> >> +		GAMMA(red.r + red.g + red.b);                          \
> >> +	} else {                                                       \
> >> +		*dst++ = blue_[b_];                                    \
> >> +		*dst++ = green_[g_];                                   \
> >> +		*dst++ = red_[r_];                                     \
> >> +	}                                                              \
> >> +	if constexpr (addAlphaByte)                                    \
> >> +		*dst++ = 255;                                          \
> >>  	x++;
> >>  
> >>  /*
> >> @@ -757,6 +774,10 @@ void DebayerCpu::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output
> >>  	green_ = params.green;
> >>  	red_ = swapRedBlueGains_ ? params.blue : params.red;
> >>  	blue_ = swapRedBlueGains_ ? params.red : params.blue;
> >> +	greenCcm_ = params.greenCcm;
> >> +	redCcm_ = swapRedBlueGains_ ? params.blueCcm : params.redCcm;
> >> +	blueCcm_ = swapRedBlueGains_ ? params.redCcm : params.blueCcm;
> 
> Additionally, `red' and 'blue' must be swapped in redCcm_ and blueCcm_
> items when swapRedBlueGains_ is true.  This became visible after fixing
> the bug above.
> 
> >> +	gammaLut_ = params.gammaLut;
> >>  
> >>  	/* Copy metadata from the input buffer */
> >>  	FrameMetadata &metadata = output->_d()->metadata();
> >> diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h
> >> index 21c08a2d..9f73a2fd 100644
> >> --- a/src/libcamera/software_isp/debayer_cpu.h
> >> +++ b/src/libcamera/software_isp/debayer_cpu.h
> >> @@ -141,6 +141,10 @@ private:
> >>  	DebayerParams::ColorLookupTable red_;
> >>  	DebayerParams::ColorLookupTable green_;
> >>  	DebayerParams::ColorLookupTable blue_;
> >> +	DebayerParams::CcmLookupTable redCcm_;
> >> +	DebayerParams::CcmLookupTable greenCcm_;
> >> +	DebayerParams::CcmLookupTable blueCcm_;
> >> +	DebayerParams::GammaLookupTable gammaLut_;
> >>  	debayerFn debayer0_;
> >>  	debayerFn debayer1_;
> >>  	debayerFn debayer2_;

-- 
Regards,

Laurent Pinchart


More information about the libcamera-devel mailing list