[PATCH v4 9/9] libcamera: software_isp: Apply CCM in debayering

Tue Jan 28 16:37:01 CET 2025

Hi Laurent,

thank you for review.

Laurent Pinchart <laurent.pinchart at ideasonboard.com> writes:

> Hi Milan,
>
> Thank you for the patch.
>
> On Mon, Jan 13, 2025 at 02:51:06PM +0100, Milan Zamazal wrote:
>> This patch applies color correction matrix (CCM) in debayering if the
>> CCM is specified.  Not using CCM must still be supported for performance
>> reasons.
>> 
>> The CCM is applied as follows:
>> 
>>             [r1 r2 r3]
>>   [r g b] * [g1 g2 g3]
>>             [b1 b2 b3]
>> 
>> The CCM matrix (the right side of the multiplication) is constant during
>> single frame processing, while the input pixel (the left side) changes.
>> Because each of the color channels is only 8-bit in software ISP, we can
>> make 9 lookup tables with 256 input values for multiplications of each
>> of the r_i, g_i, b_i values.  This way we don't have to multiply each
>> pixel, we can use table lookups and additions instead.  Gamma (which is
>> non-linear and thus cannot be a part of the 9 lookup tables values) is
>> applied on the final values rounded to integers using another lookup
>> table.
>
> What's the speedup compared to using fixed-point integer multiplications
> ?

Just a quick test by replacing the table lookups with multiplications by
arbitrary constant integers + right shift on the final result increases
the measured time per frame by ~25% in my environment.  (When on it, I
tried using floats, without the final shift, and the slowdown was close
to 70%.)

>> We use int16_t to store the precomputed multiplications.  This seems to
>> be noticeably (>10%) faster than `float' for the price of slightly less
>> accuracy and it covers the range of values that sane CCMs produce.  The
>> selection and structure of data is performance critical, for example
>> using bytes would add significant (>10%) speedup but would be too short
>> to cover the value range.
>> 
>> The color lookup tables are changed to unions to be able to serve as
>> both simple lookup tables or CCM lookup tables.  This is arguable but it
>> keeps the code easier to understand if nothing else.  It is important to
>> make the unions on the whole lookup tables rather than their values
>> because the latter might cause a noticeable slowdown on simple lookup
>> due to the sheer fact that the table is not compact.  Using unions makes
>> the initialization of the tables dubious because it is done before the
>> decision about using CCM is made but the initial values are dubious
>> anyway.
>
> I have a hard time feeling sympathy towards that argument. It sounds
> like "it's not great already, so let's make it worse".

OK, OK, ...  It'll be resolved by not using the union.

>> Using std::variant instead of unions would be even more
>> difficult, consider e.g. mmap allocation and the followup type-casting
>> of the debayer params.
>> 
>> The tables are copied (as before), which is not elegant but also not a
>> big problem.  There are patches posted that use shared buffers for
>> parameters passing in software ISP (see software ISP TODO #5) and they
>> can be adjusted for the new parameter format.
>> 
>> Color gains from white balance are supposed not to be a part of the
>> specified CCM.  They are applied on it using matrix multiplication,
>> which is simple and in correspondence with future additions in the form
>> of matrix multiplication, like saturation adjustment.
>> 
>> With this patch, the reported per-frame slowdown when applying CCM is
>> about 45% on Debix Model A and about 75% on TI AM69 SK.
>> 
>> Using std::clamp in debayering adds some performance penalty (a few
>> percent).  The clamping is necessary to eliminate out of range values
>> possibly produced by the CCM.  If it could be avoided by adjusting the
>> precomputed tables some way then performance could be improved a bit.
>> 
>> Signed-off-by: Milan Zamazal <mzamazal at redhat.com>
>> ---
>>  .../internal/software_isp/debayer_params.h    | 20 +++++-
>>  src/ipa/simple/algorithms/lut.cpp             | 63 ++++++++++++++-----
>>  src/ipa/simple/algorithms/lut.h               |  1 +
>>  src/libcamera/software_isp/debayer.cpp        | 50 +++++++++++++++
>>  src/libcamera/software_isp/debayer_cpu.cpp    | 34 +++++++---
>>  src/libcamera/software_isp/debayer_cpu.h      |  7 ++-
>>  src/libcamera/software_isp/software_isp.cpp   |  6 +-
>>  7 files changed, 147 insertions(+), 34 deletions(-)
>> 
>> diff --git a/include/libcamera/internal/software_isp/debayer_params.h b/include/libcamera/internal/software_isp/debayer_params.h
>> index 7d8fdd48..f8731eae 100644
>> --- a/include/libcamera/internal/software_isp/debayer_params.h
>> +++ b/include/libcamera/internal/software_isp/debayer_params.h
>> @@ -18,11 +18,25 @@ namespace libcamera {
>>  struct DebayerParams {
>>  	static constexpr unsigned int kRGBLookupSize = 256;
>>  
>> +	struct CcmColumn {
>> +		int16_t r;
>> +		int16_t g;
>> +		int16_t b;
>> +	};
>> +
>>  	using ColorLookupTable = std::array<uint8_t, kRGBLookupSize>;
>> +	using CcmLookupTable = std::array<CcmColumn, kRGBLookupSize>;
>> +	using GammaLookupTable = std::array<uint8_t, kRGBLookupSize>;
>> +
>> +	union LookupTable {
>> +		ColorLookupTable simple;
>> +		CcmLookupTable ccm;
>> +	};
>>  
>> -	ColorLookupTable red;
>> -	ColorLookupTable green;
>> -	ColorLookupTable blue;
>> +	LookupTable red;
>> +	LookupTable green;
>> +	LookupTable blue;
>> +	GammaLookupTable gammaLut;
>
> This is difficult to understand, especially without comments. If I'm not
> mistaken, the idea is that the CPU implementation of the software ISP
> will use either <color>.simple, or <color>.ccm + gammaLut. The former is
> used when CCM is disabled, and combines colour gains and gamma in a
> single LUT. The latter is used when CCM is enabled, and combines colour
> gains and CCM in the CCM LUT, with the gamma LUT applying gamma only.

Yes.

> I think it would be simpler to understand if you specified all LUTs for
> all colour components, unconditionally.
>
> 	struct ColorLookupTable {
> 		CcmLookupTable ccm;
> 		GammaLookupTable gamma;
> 	}
>
> 	ColorLookupTable red;
> 	ColorLookupTable green;
> 	ColorLookupTable blue;

Something like this was the other option I originally considered, so I'm
fine with such a change.  Gamma is applied only on the final result
(with CCM), so it must be kept separately.  And non-CCM and CCM lookup
tables should be separated.  I'll do it in v5. 

>>  };
>>  
>>  } /* namespace libcamera */
>> diff --git a/src/ipa/simple/algorithms/lut.cpp b/src/ipa/simple/algorithms/lut.cpp
>> index 243f0818..ddb2adf1 100644
>> --- a/src/ipa/simple/algorithms/lut.cpp
>> +++ b/src/ipa/simple/algorithms/lut.cpp
>> @@ -80,6 +80,11 @@ void Lut::updateGammaTable(IPAContext &context)
>>  	context.activeState.gamma.contrast = contrast;
>>  }
>>  
>> +int16_t Lut::ccmValue(unsigned int i, float ccm) const
>> +{
>> +	return std::round(i * ccm);
>> +}
>> +
>>  void Lut::prepare(IPAContext &context,
>>  		  [[maybe_unused]] const uint32_t frame,
>>  		  [[maybe_unused]] IPAFrameContext &frameContext,
>> @@ -91,28 +96,52 @@ void Lut::prepare(IPAContext &context,
>>  	 * observed, it's not permanently prone to minor fluctuations or
>>  	 * rounding errors.
>>  	 */
>> -	if (context.activeState.gamma.blackLevel != context.activeState.blc.level ||
>> -	    context.activeState.gamma.contrast != context.activeState.knobs.contrast)
>> +	const bool gammaUpdateNeeded =
>> +		context.activeState.gamma.blackLevel != context.activeState.blc.level ||
>> +		context.activeState.gamma.contrast != context.activeState.knobs.contrast;
>> +	if (gammaUpdateNeeded)
>>  		updateGammaTable(context);
>>  
>>  	auto &gains = context.activeState.awb.gains;
>>  	auto &gammaTable = context.activeState.gamma.gammaTable;
>>  	const unsigned int gammaTableSize = gammaTable.size();
>> -
>> -	for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
>> -		const double div = static_cast<double>(DebayerParams::kRGBLookupSize) /
>> -				   gammaTableSize;
>> -		/* Apply gamma after gain! */
>> -		unsigned int idx;
>> -		idx = std::min({ static_cast<unsigned int>(i * gains.red / div),
>> -				 gammaTableSize - 1 });
>> -		params->red[i] = gammaTable[idx];
>> -		idx = std::min({ static_cast<unsigned int>(i * gains.green / div),
>> -				 gammaTableSize - 1 });
>> -		params->green[i] = gammaTable[idx];
>> -		idx = std::min({ static_cast<unsigned int>(i * gains.blue / div),
>> -				 gammaTableSize - 1 });
>> -		params->blue[i] = gammaTable[idx];
>> +	const double div = static_cast<double>(DebayerParams::kRGBLookupSize) /
>> +			   gammaTableSize;
>> +
>> +	if (!context.activeState.ccm.enabled) {
>> +		for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
>> +			/* Apply gamma after gain! */
>> +			unsigned int idx;
>> +			idx = std::min({ static_cast<unsigned int>(i * gains.red / div),
>> +					 gammaTableSize - 1 });
>> +			params->red.simple[i] = gammaTable[idx];
>> +			idx = std::min({ static_cast<unsigned int>(i * gains.green / div),
>> +					 gammaTableSize - 1 });
>> +			params->green.simple[i] = gammaTable[idx];
>> +			idx = std::min({ static_cast<unsigned int>(i * gains.blue / div),
>> +					 gammaTableSize - 1 });
>> +			params->blue.simple[i] = gammaTable[idx];
>> +		}
>> +	} else if (context.activeState.ccm.changed || gammaUpdateNeeded) {
>> +		Matrix<double, 3, 3> gainCcm = { { gains.red, 0, 0,
>> +						   0, gains.green, 0,
>> +						   0, 0, gains.blue } };
>> +		auto ccm = gainCcm * context.activeState.ccm.ccm;
>> +		auto &red = params->red.ccm;
>> +		auto &green = params->green.ccm;
>> +		auto &blue = params->blue.ccm;
>> +		for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
>> +			red[i].r = ccmValue(i, ccm[0][0]);
>> +			red[i].g = ccmValue(i, ccm[1][0]);
>> +			red[i].b = ccmValue(i, ccm[2][0]);
>> +			green[i].r = ccmValue(i, ccm[0][1]);
>> +			green[i].g = ccmValue(i, ccm[1][1]);
>> +			green[i].b = ccmValue(i, ccm[2][1]);
>> +			blue[i].r = ccmValue(i, ccm[0][2]);
>> +			blue[i].g = ccmValue(i, ccm[1][2]);
>> +			blue[i].b = ccmValue(i, ccm[2][2]);
>> +			params->gammaLut[i] = gammaTable[i / div];
>> +		}
>>  	}
>>  }
>>  
>> diff --git a/src/ipa/simple/algorithms/lut.h b/src/ipa/simple/algorithms/lut.h
>> index 889f864b..77324800 100644
>> --- a/src/ipa/simple/algorithms/lut.h
>> +++ b/src/ipa/simple/algorithms/lut.h
>> @@ -33,6 +33,7 @@ public:
>>  
>>  private:
>>  	void updateGammaTable(IPAContext &context);
>> +	int16_t ccmValue(unsigned int i, float ccm) const;
>>  };
>>  
>>  } /* namespace ipa::soft::algorithms */
>> diff --git a/src/libcamera/software_isp/debayer.cpp b/src/libcamera/software_isp/debayer.cpp
>> index 45fe6960..559de3e6 100644
>> --- a/src/libcamera/software_isp/debayer.cpp
>> +++ b/src/libcamera/software_isp/debayer.cpp
>> @@ -23,11 +23,56 @@ namespace libcamera {
>>   * \brief Size of a color lookup table
>>   */
>>  
>> +/**
>> + * \struct DebayerParams::CcmColumn
>> + * \brief Type of a single column of a color correction matrix
>> + */
>> +
>> +/**
>> + * \var DebayerParams::CcmColumn::r
>> + * \brief Red (first) component of a CCM column
>> + */
>> +
>> +/**
>> + * \var DebayerParams::CcmColumn::g
>> + * \brief Green (second) component of a CCM column
>> + */
>> +
>> +/**
>> + * \var DebayerParams::CcmColumn::b
>> + * \brief Blue (third) component of a CCM column
>> + */
>> +
>>  /**
>>   * \typedef DebayerParams::ColorLookupTable
>> + * \brief Type of the simple lookup tables for red, green, blue values
>> + */
>> +
>> +/**
>> + * \typedef DebayerParams::CcmLookupTable
>> + * \brief Type of the CCM lookup tables for red, green, blue values
>> + */
>> +
>> +/**
>> + * \typedef DebayerParams::GammaLookupTable
>> + * \brief Type of the gamma lookup tables for CCM
>> + */
>> +
>> +/**
>> + * \union DebayerParams::LookupTable
>>   * \brief Type of the lookup tables for red, green, blue values
>>   */
>>  
>> +/**
>> + * \var DebayerParams::LookupTable::simple
>> + * \brief Simple lookup table for red, green, blue values
>> + */
>> +
>> +/**
>> + * \var DebayerParams::LookupTable::ccm
>> + * \brief CCM lookup table for red, green, blue values
>> + */
>> +
>>  /**
>>   * \var DebayerParams::red
>>   * \brief Lookup table for red color, mapping input values to output values
>> @@ -43,6 +88,11 @@ namespace libcamera {
>>   * \brief Lookup table for blue color, mapping input values to output values
>>   */
>>  
>> +/**
>> + * \var DebayerParams::gammaLut
>> + * \brief Gamma lookup table used with color correction matrix
>> + */
>> +
>>  /**
>>   * \class Debayer
>>   * \brief Base debayering class
>> diff --git a/src/libcamera/software_isp/debayer_cpu.cpp b/src/libcamera/software_isp/debayer_cpu.cpp
>> index 3c6597f9..38715b72 100644
>> --- a/src/libcamera/software_isp/debayer_cpu.cpp
>> +++ b/src/libcamera/software_isp/debayer_cpu.cpp
>> @@ -11,6 +11,7 @@
>>  
>>  #include "debayer_cpu.h"
>>  
>> +#include <algorithm>
>>  #include <stdlib.h>
>>  #include <sys/ioctl.h>
>>  #include <time.h>
>> @@ -51,8 +52,12 @@ DebayerCpu::DebayerCpu(std::unique_ptr<SwStatsCpu> stats)
>>  	enableInputMemcpy_ = true;
>>  
>>  	/* Initialize color lookup tables */
>> -	for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++)
>> -		red_[i] = green_[i] = blue_[i] = i;
>> +	for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
>> +		red_.simple[i] = green_.simple[i] = blue_.simple[i] = i;
>> +		red_.ccm[i].r = red_.ccm[i].g = red_.ccm[i].b = 0;
>> +		green_.ccm[i].r = green_.ccm[i].g = green_.ccm[i].b = 0;
>> +		blue_.ccm[i].r = blue_.ccm[i].g = blue_.ccm[i].b = 0;
>> +	}
>>  }
>>  
>>  DebayerCpu::~DebayerCpu() = default;
>> @@ -62,12 +67,24 @@ DebayerCpu::~DebayerCpu() = default;
>>  	const pixel_t *curr = (const pixel_t *)src[1] + xShift_; \
>>  	const pixel_t *next = (const pixel_t *)src[2] + xShift_;
>>  
>> -#define STORE_PIXEL(b, g, r)        \
>> -	*dst++ = blue_[b];          \
>> -	*dst++ = green_[g];         \
>> -	*dst++ = red_[r];           \
>> -	if constexpr (addAlphaByte) \
>> -		*dst++ = 255;       \
>> +#define GAMMA(value) \
>> +	*dst++ = gammaLut_[std::clamp(value, 0, static_cast<int>(gammaLut_.size()) - 1)]
>
> What's the point in specifying the CCM lookup table values as 16-bit
> integers if you clamp to 8 bits anyway ? Is that because you need signed
> values ?

The table values are uint8_t values multiplied by matrix elements, which
may be arbitrary numbers.  Each multiplication result can be within
uint8_t range or negative or bigger.  Three multiplication results are
summed so they may return back to uint8_t range and only then clamping
can be applied.

>> +
>> +#define STORE_PIXEL(b_, g_, r_)                                   \
>> +	if constexpr (ccmEnabled) {                               \
>> +		DebayerParams::CcmColumn &blue = blue_.ccm[b_];   \
>> +		DebayerParams::CcmColumn &green = green_.ccm[g_]; \
>> +		DebayerParams::CcmColumn &red = red_.ccm[r_];     \
>
> const

Ack.

>> +		GAMMA(blue.r + blue.g + blue.b);                  \
>> +		GAMMA(green.r + green.g + green.b);               \
>> +		GAMMA(red.r + red.g + red.b);                     \
>> +	} else {                                                  \
>> +		*dst++ = blue_.simple[b_];                        \
>> +		*dst++ = green_.simple[g_];                       \
>> +		*dst++ = red_.simple[r_];                         \
>> +	}                                                         \
>> +	if constexpr (addAlphaByte)                               \
>> +		*dst++ = 255;                                     \
>>  	x++;
>>  
>>  /*
>> @@ -752,6 +769,7 @@ void DebayerCpu::process(uint32_t frame, FrameBuffer *input, FrameBuffer *output
>>  	green_ = params.green;
>>  	red_ = swapRedBlueGains_ ? params.blue : params.red;
>>  	blue_ = swapRedBlueGains_ ? params.red : params.blue;
>> +	gammaLut_ = params.gammaLut;
>>  
>>  	/* Copy metadata from the input buffer */
>>  	FrameMetadata &metadata = output->_d()->metadata();
>> diff --git a/src/libcamera/software_isp/debayer_cpu.h b/src/libcamera/software_isp/debayer_cpu.h
>> index b2ec8f1b..35c51e4f 100644
>> --- a/src/libcamera/software_isp/debayer_cpu.h
>> +++ b/src/libcamera/software_isp/debayer_cpu.h
>> @@ -137,9 +137,10 @@ private:
>>  	/* Max. supported Bayer pattern height is 4, debayering this requires 5 lines */
>>  	static constexpr unsigned int kMaxLineBuffers = 5;
>>  
>> -	DebayerParams::ColorLookupTable red_;
>> -	DebayerParams::ColorLookupTable green_;
>> -	DebayerParams::ColorLookupTable blue_;
>> +	DebayerParams::LookupTable red_;
>> +	DebayerParams::LookupTable green_;
>> +	DebayerParams::LookupTable blue_;
>> +	DebayerParams::GammaLookupTable gammaLut_;
>>  	debayerFn debayer0_;
>>  	debayerFn debayer1_;
>>  	debayerFn debayer2_;
>> diff --git a/src/libcamera/software_isp/software_isp.cpp b/src/libcamera/software_isp/software_isp.cpp
>> index 6a07de85..40e11b9e 100644
>> --- a/src/libcamera/software_isp/software_isp.cpp
>> +++ b/src/libcamera/software_isp/software_isp.cpp
>> @@ -82,9 +82,9 @@ SoftwareIsp::SoftwareIsp(PipelineHandler *pipe, const CameraSensor *sensor,
>>  	for (unsigned int i = 0; i < 256; i++)
>>  		gammaTable[i] = UINT8_MAX * std::pow(i / 256.0, 0.5);
>>  	for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
>> -		debayerParams_.red[i] = gammaTable[i];
>> -		debayerParams_.green[i] = gammaTable[i];
>> -		debayerParams_.blue[i] = gammaTable[i];
>> +		debayerParams_.red.simple[i] = gammaTable[i];
>> +		debayerParams_.green.simple[i] = gammaTable[i];
>> +		debayerParams_.blue.simple[i] = gammaTable[i];
>>  	}
>>  
>>  	if (!dmaHeap_.isValid()) {