[PATCH v5 10/10] libcamera: software_isp: Apply CCM in debayering

Milan Zamazal mzamazal at redhat.com
Tue Feb 4 17:52:07 CET 2025


Hi Laurent,

thank you for review.

Laurent Pinchart <laurent.pinchart at ideasonboard.com> writes:

> Hi Milan,
>
> Thank you for the patch.
>
> On Thu, Jan 30, 2025 at 07:14:47PM +0100, Milan Zamazal wrote:
>> This patch applies color correction matrix (CCM) in debayering if the
>> CCM is specified.  Not using CCM must still be supported for performance
>> reasons.
>> 
>> The CCM is applied as follows:
>> 
>>             [r1 r2 r3]
>>   [r g b] * [g1 g2 g3]
>>             [b1 b2 b3]
>> 
>> The CCM matrix (the right side of the multiplication) is constant during
>> single frame processing, while the input pixel (the left side) changes.
>> Because each of the color channels is only 8-bit in software ISP, we can
>> make 9 lookup tables with 256 input values for multiplications of each
>> of the r_i, g_i, b_i values.  This way we don't have to multiply each
>> pixel, we can use table lookups and additions instead.  Gamma (which is
>> non-linear and thus cannot be a part of the 9 lookup tables values) is
>> applied on the final values rounded to integers using another lookup
>> table.
>> 
>> We use int16_t to store the precomputed multiplications.  This seems to
>> be noticeably (>10%) faster than `float' for the price of slightly less
>> accuracy and it covers the range of values that sane CCMs produce.  The
>> selection and structure of data is performance critical, for example
>> using bytes would add significant (>10%) speedup but would be too short
>> to cover the value range.
>> 
>> The color lookup tables can be represented either as unions,
>> accommodating tables for both the CCM and non-CCM cases, or as separate
>> tables for each of the cases, leaving the tables for the other case
>> unused.  The latter is selected as a matter of preference.
>> 
>> The tables are copied (as before), which is not elegant but also not a
>> big problem.  There are patches posted that use shared buffers for
>> parameters passing in software ISP (see software ISP TODO #5) and they
>> can be adjusted for the new parameter format.
>> 
>> Color gains from white balance are supposed not to be a part of the
>> specified CCM.  They are applied on it using matrix multiplication,
>> which is simple and in correspondence with future additions in the form
>> of matrix multiplication, like saturation adjustment.
>> 
>> With this patch, the reported per-frame slowdown when applying CCM is
>> about 45% on Debix Model A and about 75% on TI AM69 SK.
>> 
>> Using std::clamp in debayering adds some performance penalty (a few
>> percent).  The clamping is necessary to eliminate out of range values
>> possibly produced by the CCM.  If it could be avoided by adjusting the
>> precomputed tables some way then performance could be improved a bit.
>> 
>> Signed-off-by: Milan Zamazal <mzamazal at redhat.com>
>> ---
>>  .../internal/software_isp/debayer_params.h    | 15 ++++-
>>  src/ipa/simple/algorithms/lut.cpp             | 65 ++++++++++++++-----
>>  src/ipa/simple/algorithms/lut.h               |  1 +
>>  src/libcamera/software_isp/debayer.cpp        | 52 ++++++++++++++-
>>  src/libcamera/software_isp/debayer_cpu.cpp    | 35 ++++++++--
>>  src/libcamera/software_isp/debayer_cpu.h      |  4 ++
>>  6 files changed, 145 insertions(+), 27 deletions(-)
>> 
>> diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h
>> index 7d8fdd48..9a7dbe93 100644
>> --- a/include/libcamera/internal/software_isp/debayer_params.h
>> +++ b/include/libcamera/internal/software_isp/debayer_params.h
>> @@ -1,6 +1,6 @@
>>  /* SPDX-License-Identifier: LGPL-2.1-or-later */
>>  /*
>> - * Copyright (C) 2023, 2024 Red Hat Inc.
>> + * Copyright (C) 2023-2025 Red Hat Inc.
>>   *
>>   * Authors:
>>   * Hans de Goede <hdegoede at redhat.com>
>> @@ -18,11 +18,24 @@ namespace libcamera {
>>  struct DebayerParams {
>>  	static constexpr unsigned int kRGBLookupSize = 256;
>>  
>> +	struct CcmColumn {
>> +		int16_t r;
>> +		int16_t g;
>> +		int16_t b;
>> +	};
>> +
>>  	using ColorLookupTable = std::array<uint8_t, kRGBLookupSize>;
>> +	using CcmLookupTable = std::array<CcmColumn, kRGBLookupSize>;
>> +	using GammaLookupTable = std::array<uint8_t, kRGBLookupSize>;
>>  
>>  	ColorLookupTable red;
>>  	ColorLookupTable green;
>>  	ColorLookupTable blue;
>> +
>> +	CcmLookupTable redCcm;
>> +	CcmLookupTable greenCcm;
>> +	CcmLookupTable blueCcm;
>> +	GammaLookupTable gammaLut;
>
> Given that the ColorLookupTable and GammaLookupTable types are the same,
> would it make sense to save a bit of memory by combining the combining
> the colour and gamma tables ? I'm thinking about something like
>
> 	struct ColorLookupTables {
> 		CcmLookupTable ccm;
> 		GammaLookupTable gamma;
> 	}
>
> 	ColorLookupTables red;
> 	ColorLookupTables green;
> 	ColorLookupTables blue;
>
> The three gamma tables would be identical when CCM is disabled. 

Hm, this looks messy.  Let's think a bit in term of a common type

  using LookupTable = std::array<uint8_t, kRGBLookupSize>

If CCM is disabled, we need three LookupTable's for red, green, blue and
that's all.

If CCM is enabled, we need three CcmLookupTable's for red, green, blue
and a single LookupTable for gamma.

This means in your proposal it should actually be

  struct ColorLookupTables {
          CcmLookupTable ccm;
          LookupTable colorOrGamma;
  }

With CCM disabled, `ccm' would be unused and colorOrGamma used.  With
CCM enabled, `ccm' would be used and colorOrGamma would be also used but
it would be the same for all three colors.  Such an arrangement makes
little sense to me.

It's obvious the current version is also not crystal clear so I'll think
whether it could be done a bit better to make this patch more
satisfactory.  The rest of the series is already settled down (with your
other suggestions incorporated in v6).  I'll also fix the bugs below in
v6.

> If that causes performance issues (possibly because the three
> identical gamma tables would take more cache space), then we can keep
> them separate. I'm tempted to merge the ColorLookupTable and
> GammaLookupTable into a single type.
>
> You should also document somewhere which tables are used in which case
> (<color> when CCM is disabled, combining gains and gamma, and <color>Ccm
> and gammaLut when CCM is enabled, with <color>Ccm covering CCM and
> gains, and gammaLut covering gamma). I think this would be easier to
> document with the ColorLookupTables type, as you can then document what
> the ccm and gamma fields cover.
>
> Either way, with this addressed,
>
> Reviewed-by: Laurent Pinchart <laurent.pinchart at ideasonboard.com>
>
>>  };
>>  
>>  } /* namespace libcamera */
>> diff --git a/src/ipa/simple/algorithms/lut.cpp b/src/ipa/simple/algorithms/lut.cpp
>> index e5e995c7..7719aecc 100644
>> --- a/src/ipa/simple/algorithms/lut.cpp
>> +++ b/src/ipa/simple/algorithms/lut.cpp
>> @@ -1,6 +1,6 @@
>>  /* SPDX-License-Identifier: LGPL-2.1-or-later */
>>  /*
>> - * Copyright (C) 2024, Red Hat Inc.
>> + * Copyright (C) 2024-2025, Red Hat Inc.
>>   *
>>   * Color lookup tables construction
>>   */
>> @@ -80,6 +80,11 @@ void Lut::updateGammaTable(IPAContext &context)
>>  	context.activeState.gamma.contrast = contrast;
>>  }
>>  
>> +int16_t Lut::ccmValue(unsigned int i, float ccm) const
>> +{
>> +	return std::round(i * ccm);
>> +}
>> +
>>  void Lut::prepare(IPAContext &context,
>>  		  [[maybe_unused]] const uint32_t frame,
>>  		  [[maybe_unused]] IPAFrameContext &frameContext,
>> @@ -91,28 +96,52 @@ void Lut::prepare(IPAContext &context,
>>  	 * observed, it's not permanently prone to minor fluctuations or
>>  	 * rounding errors.
>>  	 */
>> -	if (context.activeState.gamma.blackLevel != context.activeState.blc.level ||
>> -	    context.activeState.gamma.contrast != context.activeState.knobs.contrast)
>> +	const bool gammaUpdateNeeded =
>> +		context.activeState.gamma.blackLevel != context.activeState.blc.level ||
>> +		context.activeState.gamma.contrast != context.activeState.knobs.contrast;
>> +	if (gammaUpdateNeeded)
>>  		updateGammaTable(context);
>>  
>>  	auto &gains = context.activeState.awb.gains;
>>  	auto &gammaTable = context.activeState.gamma.gammaTable;
>>  	const unsigned int gammaTableSize = gammaTable.size();
>> -
>> -	for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
>> -		const double div = static_cast<double>(DebayerParams::kRGBLookupSize) /
>> -				   gammaTableSize;
>> -		/* Apply gamma after gain! */
>> -		unsigned int idx;
>> -		idx = std::min({ static_cast<unsigned int>(i * gains.r() / div),
>> -				 gammaTableSize - 1 });
>> -		params->red[i] = gammaTable[idx];
>> -		idx = std::min({ static_cast<unsigned int>(i * gains.g() / div),
>> -				 gammaTableSize - 1 });
>> -		params->green[i] = gammaTable[idx];
>> -		idx = std::min({ static_cast<unsigned int>(i * gains.b() / div),
>> -				 gammaTableSize - 1 });
>> -		params->blue[i] = gammaTable[idx];
>> +	const double div = static_cast<double>(DebayerParams::kRGBLookupSize) /
>> +			   gammaTableSize;
>> +
>> +	if (!context.ccmEnabled) {
>> +		for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
>> +			/* Apply gamma after gain! */
>> +			unsigned int idx;
>> +			idx = std::min({ static_cast<unsigned int>(i * gains.r() / div),
>> +					 gammaTableSize - 1 });
>> +			params->red[i] = gammaTable[idx];
>> +			idx = std::min({ static_cast<unsigned int>(i * gains.g() / div),
>> +					 gammaTableSize - 1 });
>> +			params->green[i] = gammaTable[idx];
>> +			idx = std::min({ static_cast<unsigned int>(i * gains.b() / div),
>> +					 gammaTableSize - 1 });
>> +			params->blue[i] = gammaTable[idx];
>> +		}
>> +	} else if (context.activeState.ccm.changed || gammaUpdateNeeded) {
>> +		Matrix<double, 3, 3> gainCcm = { { gains.r(), 0, 0,
>> +						   0, gains.g(), 0,
>> +						   0, 0, gains.b() } };
>> +		auto ccm = gainCcm * context.activeState.ccm.ccm;
>> +		auto &red = params->redCcm;
>> +		auto &green = params->greenCcm;
>> +		auto &blue = params->blueCcm;
>> +		for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
>> +			red[i].r = ccmValue(i, ccm[0][0]);
>> +			red[i].g = ccmValue(i, ccm[1][0]);
>> +			red[i].b = ccmValue(i, ccm[2][0]);
>> +			green[i].r = ccmValue(i, ccm[0][1]);
>> +			green[i].g = ccmValue(i, ccm[1][1]);
>> +			green[i].b = ccmValue(i, ccm[2][1]);
>> +			blue[i].r = ccmValue(i, ccm[0][2]);
>> +			blue[i].g = ccmValue(i, ccm[1][2]);
>> +			blue[i].b = ccmValue(i, ccm[2][2]);

This needs to be transposed to conform with the CCM format in the tuning
files as clarified by Stefan.

>> +			params->gammaLut[i] = gammaTable[i / div];
>> +		}
>>  	}
>>  }
>>  
>> diff --git a/src/ipa/simple/algorithms/lut.h b/src/ipa/simple/algorithms/lut.h
>> index 889f864b..77324800 100644
>> --- a/src/ipa/simple/algorithms/lut.h
>> +++ b/src/ipa/simple/algorithms/lut.h
>> @@ -33,6 +33,7 @@ public:
>>  
>>  private:
>>  	void updateGammaTable(IPAContext &context);
>> +	int16_t ccmValue(unsigned int i, float ccm) const;
>>  };
>>  
>>  } /* namespace ipa::soft::algorithms */
>> diff --git a/src/libcamera/software_isp/debayer.cpp b/src/libcamera/software_isp/debayer.cpp
>> index 45fe6960..4c84a556 100644
>> --- a/src/libcamera/software_isp/debayer.cpp
>> +++ b/src/libcamera/software_isp/debayer.cpp
>> @@ -23,9 +23,39 @@ namespace libcamera {
>>   * \brief Size of a color lookup table
>>   */
>>  
>> +/**
>> + * \struct DebayerParams::CcmColumn
>> + * \brief Type of a single column of a color correction matrix (CCM)
>> + */
>> +
>> +/**
>> + * \var DebayerParams::CcmColumn::r
>> + * \brief Red (first) component of a CCM column
>> + */
>> +
>> +/**
>> + * \var DebayerParams::CcmColumn::g
>> + * \brief Green (second) component of a CCM column
>> + */
>> +
>> +/**
>> + * \var DebayerParams::CcmColumn::b
>> + * \brief Blue (third) component of a CCM column
>> + */
>> +
>>  /**
>>   * \typedef DebayerParams::ColorLookupTable
>> - * \brief Type of the lookup tables for red, green, blue values
>> + * \brief Type of the simple lookup tables for red, green, blue values
>> + */
>> +
>> +/**
>> + * \typedef DebayerParams::CcmLookupTable
>> + * \brief Type of the CCM lookup tables for red, green, blue values
>> + */
>> +
>> +/**
>> + * \typedef DebayerParams::GammaLookupTable
>> + * \brief Type of the gamma lookup tables for CCM
>>   */
>>  
>>  /**
>> @@ -43,6 +73,26 @@ namespace libcamera {
>>   * \brief Lookup table for blue color, mapping input values to output values
>>   */
>>  
>> +/**
>> + * \var DebayerParams::redCcm
>> + * \brief CCM lookup table for red color, mapping input values to output values
>> + */
>> +
>> +/**
>> + * \var DebayerParams::greenCcm
>> + * \brief CCM lookup table for green color, mapping input values to output values
>> + */
>> +
>> +/**
>> + * \var DebayerParams::blueCcm
>> + * \brief CCM lookup table for blue color, mapping input values to output values
>> + */
>> +
>> +/**
>> + * \var DebayerParams::gammaLut
>> + * \brief Gamma lookup table used with color correction matrix
>> + */
>> +
>>  /**
>>   * \class Debayer
>>   * \brief Base debayering class
>> diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp
>> index 0cd03a8f..5ddfdcf6 100644
>> --- a/src/libcamera/software_isp/debayer_cpu.cpp
>> +++ b/src/libcamera/software_isp/debayer_cpu.cpp
>> @@ -11,6 +11,7 @@
>>  
>>  #include "debayer_cpu.h"
>>  
>> +#include <algorithm>
>>  #include <stdlib.h>
>>  #include <sys/ioctl.h>
>>  #include <time.h>
>> @@ -51,8 +52,12 @@ DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats)
>>  	enableInputMemcpy_ = true;
>>  
>>  	/* Initialize color lookup tables */
>> -	for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++)
>> +	for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
>>  		red_[i] = green_[i] = blue_[i] = i;
>> +		redCcm_[i] = { 1, 0, 0 };
>> +		greenCcm_[i] = { 0, 1, 0 };
>> +		blueCcm_[i] = { 0, 0, 1 };
>> +	}
>>  }
>>  
>>  DebayerCpu::~DebayerCpu() = default;
>> @@ -62,12 +67,24 @@ DebayerCpu::~DebayerCpu() = default;
>>  	const pixel_t *curr = (const pixel_t *)src[1] + xShift_; \
>>  	const pixel_t *next = (const pixel_t *)src[2] + xShift_;
>>  
>> -#define STORE_PIXEL(b, g, r)        \
>> -	*dst++ = blue_[b];          \
>> -	*dst++ = green_[g];         \
>> -	*dst++ = red_[r];           \
>> -	if constexpr (addAlphaByte) \
>> -		*dst++ = 255;       \
>> +#define GAMMA(value) \
>> +	*dst++ = gammaLut_[std::clamp(value, 0, static_cast<int>(gammaLut_.size()) - 1)]
>> +
>> +#define STORE_PIXEL(b_, g_, r_)                                        \
>> +	if constexpr (ccmEnabled) {                                    \
>> +		const DebayerParams::CcmColumn &blue = blueCcm_[b_];   \
>> +		const DebayerParams::CcmColumn &green = greenCcm_[g_]; \
>> +		const DebayerParams::CcmColumn &red = redCcm_[r_];     \
>> +		GAMMA(blue.r + blue.g + blue.b);                       \

This should be GAMMA(red.b + green.b + blue.b).  Similarly for the other
colors.  (It's embarrassing I haven't noticed this earlier but the wrong
version often produces believable outputs.  I identified the problem
when working on a saturation control when color casts in a supposedly
monochrome output were no longer plausible :-).)

Unfortunately, the fixed version causes some slowdown (close to 10% in
my environment), apparently due to worse locality.

>> +		GAMMA(green.r + green.g + green.b);                    \
>> +		GAMMA(red.r + red.g + red.b);                          \
>> +	} else {                                                       \
>> +		*dst++ = blue_[b_];                                    \
>> +		*dst++ = green_[g_];                                   \
>> +		*dst++ = red_[r_];                                     \
>> +	}                                                              \
>> +	if constexpr (addAlphaByte)                                    \
>> +		*dst++ = 255;                                          \
>>  	x++;
>>  
>>  /*
>> @@ -757,6 +774,10 @@ void DebayerCpu::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output
>>  	green_ = params.green;
>>  	red_ = swapRedBlueGains_ ? params.blue : params.red;
>>  	blue_ = swapRedBlueGains_ ? params.red : params.blue;
>> +	greenCcm_ = params.greenCcm;
>> +	redCcm_ = swapRedBlueGains_ ? params.blueCcm : params.redCcm;
>> +	blueCcm_ = swapRedBlueGains_ ? params.redCcm : params.blueCcm;

Additionally, `red' and 'blue' must be swapped in redCcm_ and blueCcm_
items when swapRedBlueGains_ is true.  This became visible after fixing
the bug above.

>> +	gammaLut_ = params.gammaLut;
>>  
>>  	/* Copy metadata from the input buffer */
>>  	FrameMetadata &metadata = output->_d()->metadata();
>> diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h
>> index 21c08a2d..9f73a2fd 100644
>> --- a/src/libcamera/software_isp/debayer_cpu.h
>> +++ b/src/libcamera/software_isp/debayer_cpu.h
>> @@ -141,6 +141,10 @@ private:
>>  	DebayerParams::ColorLookupTable red_;
>>  	DebayerParams::ColorLookupTable green_;
>>  	DebayerParams::ColorLookupTable blue_;
>> +	DebayerParams::CcmLookupTable redCcm_;
>> +	DebayerParams::CcmLookupTable greenCcm_;
>> +	DebayerParams::CcmLookupTable blueCcm_;
>> +	DebayerParams::GammaLookupTable gammaLut_;
>>  	debayerFn debayer0_;
>>  	debayerFn debayer1_;
>>  	debayerFn debayer2_;



More information about the libcamera-devel mailing list