[PATCH 4/5] libipa: histogram: Fix interQuantileMean() for small ranges

Tue Apr 1 12:38:52 CEST 2025

Hi Laurent,

Thank you for the review. 

On Tue, Apr 01, 2025 at 03:02:14AM +0300, Laurent Pinchart wrote:
> Hi Stefan,
> 
> Thank you for the patch.
> 
> On Mon, Mar 24, 2025 at 06:07:39PM +0100, Stefan Klug wrote:
> > The interQuantileMean() is supposed to return a weighted mean value
> > between two quantiles. This works for reasonably fine histograms, but
> > fails for coarse histograms and small quantile ranges because the weight
> > is always taken from the lower border of the bin.
> > 
> > Fix that by rewriting the algorithm to calculate a lower and upper bound
> > for every (partial) bin that goes into the mean calculation and weight
> > the bins by the middle of these bounds.
> > 
> > Signed-off-by: Stefan Klug <stefan.klug at ideasonboard.com>
> > ---
> >  src/ipa/libipa/histogram.cpp | 20 +++++++++++---------
> >  1 file changed, 11 insertions(+), 9 deletions(-)
> > 
> > diff --git a/src/ipa/libipa/histogram.cpp b/src/ipa/libipa/histogram.cpp
> > index c19a4cbbf3cd..31f017af3458 100644
> > --- a/src/ipa/libipa/histogram.cpp
> > +++ b/src/ipa/libipa/histogram.cpp
> > @@ -153,22 +153,24 @@ double Histogram::interQuantileMean(double lowQuantile, double highQuantile) con
> >  	double lowPoint = quantile(lowQuantile);
> >  	/* Proportion of pixels which lies below highQuantile */
> >  	double highPoint = quantile(highQuantile, static_cast<uint32_t>(lowPoint));
> 
> Those two variables can now be const. You can write

Does that technically help in any way? On compact algorithms I didn't
think about putting const anywhere. Anyways I added it.

> 
> 	ASSERT(highQuantile > lowQuantile);
> 	
> 	/* Proportion of pixels which lies below lowQuantile and highQuantile. */
> 	const double lowPoint = quantile(lowQuantile);
>         const double highPoint = quantile(highQuantile, static_cast<uint32_t>(lowPoint));
> 
> > -	double sumBinFreq = 0, cumulFreq = 0;
> > +	double sumBinFreq = 0;
> > +	double cumulFreq = 0;
> > +
> 
> Let's document the algorithm (and see if I understand it correctly :-)).
> 
> 	/*
> 	 * Calculate the mean pixel value between the low and high points by
> 	 * summing all the pixels between the two points, and dividing the sum
> 	 * by the number of pixels. Given the discrete nature of the histogram
> 	 * data, the sum of the pixels is approximated by accummulating the
> 	 * product of the bin values (calculated as the mid point of the bin) by
> 	 * the number of pixels they contain, for each bin in the internal.
> 	 */

That nicely summarizes it. And actually it took me quite a while to
understand the algorithm. So that really helps. Thanks.

> 
> > +	for (int bin = std::floor(lowPoint); bin < std::ceil(highPoint); bin++) {
> 
> It looks like bin can be unsigned.

I don't like unsigned :-) ... anyways, changed it.

> 
> > +		double lowBound = std::max(static_cast<double>(bin), lowPoint);
> 
> I think you can also write
> 
> 		double lowBound = std::max<double>(bin, lowPoint);

Oh yes. that looks way nicer.

> 
> Same for the next line. Up to you. Oh, and you can make them const too.
> 
> > +		double highBound = std::min(static_cast<double>(bin + 1), highPoint);
> 
> If I understand the code correctly, this is only meaningful for the
> first and last iterations. I can't easily find a better construct that
> wouldn't need to be run for each iteration, so this seems fine.
> 
> >  
> > -	for (double p_next = floor(lowPoint) + 1.0;
> > -	     p_next <= ceil(highPoint);
> > -	     lowPoint = p_next, p_next += 1.0) {
> > -		int bin = floor(lowPoint);
> >  		double freq = (cumulative_[bin + 1] - cumulative_[bin])
> > -			* (std::min(p_next, highPoint) - lowPoint);
> > +			* (highBound - lowBound);
> 
> 	 	/*
> 		 * The low and high quantile may not lie at bin boundaries, so
> 		 * the first and last bins need to be weighted accordingly. The
> 		 * best available approximation is to multiply the number of
> 		 * pixels by the partial bin width.
> 		 */
> 		const double freq = (cumulative_[bin + 1] - cumulative_[bin])
> 				  * (highBound - lowBound);
> 
> >  
> >  		/* Accumulate weighted bin */
> > -		sumBinFreq += bin * freq;
> > +		sumBinFreq += 0.5 * (highBound + lowBound) * freq;
> 
> I wondered for a moment where the 0.5 came from. I think
> 
> 		sumBinFreq += (highBound + lowBound) / 2 * freq;
> 
> would better reflect the intent.
> 
> > +
> >  		/* Accumulate weights */
> >  		cumulFreq += freq;
> 
> I wonder if we should rename sumBinFreq to sumPixelValues and numPixels.

Depends on the background. The math people only talk about frequency and
we even have a cumulativeFrequency() function. In the docs we often use
pixels as that is mostly what we count.

I left it as is, as that's not wrong either.

> 
> Reviewed-by: Laurent Pinchart <laurent.pinchart at ideasonboard.com>

Thank you!

Best regards,
Stefan

> 
> >  	}
> > -	/* add 0.5 to give an average for bin mid-points */
> > -	return sumBinFreq / cumulFreq + 0.5;
> > +
> > +	return sumBinFreq / cumulFreq;
> >  }
> >  
> >  } /* namespace ipa */
> 
> -- 
> Regards,
> 
> Laurent Pinchart