[PATCH 4/5] libipa: histogram: Fix interQuantileMean() for small ranges

Tue Apr 1 15:22:08 CEST 2025

On Tue, Apr 01, 2025 at 12:38:52PM +0200, Stefan Klug wrote:
> On Tue, Apr 01, 2025 at 03:02:14AM +0300, Laurent Pinchart wrote:
> > On Mon, Mar 24, 2025 at 06:07:39PM +0100, Stefan Klug wrote:
> > > The interQuantileMean() is supposed to return a weighted mean value
> > > between two quantiles. This works for reasonably fine histograms, but
> > > fails for coarse histograms and small quantile ranges because the weight
> > > is always taken from the lower border of the bin.
> > > 
> > > Fix that by rewriting the algorithm to calculate a lower and upper bound
> > > for every (partial) bin that goes into the mean calculation and weight
> > > the bins by the middle of these bounds.
> > > 
> > > Signed-off-by: Stefan Klug <stefan.klug at ideasonboard.com>
> > > ---
> > >  src/ipa/libipa/histogram.cpp | 20 +++++++++++---------
> > >  1 file changed, 11 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/src/ipa/libipa/histogram.cpp b/src/ipa/libipa/histogram.cpp
> > > index c19a4cbbf3cd..31f017af3458 100644
> > > --- a/src/ipa/libipa/histogram.cpp
> > > +++ b/src/ipa/libipa/histogram.cpp
> > > @@ -153,22 +153,24 @@ double Histogram::interQuantileMean(double lowQuantile, double highQuantile) con
> > >  	double lowPoint = quantile(lowQuantile);
> > >  	/* Proportion of pixels which lies below highQuantile */
> > >  	double highPoint = quantile(highQuantile, static_cast<uint32_t>(lowPoint));
> > 
> > Those two variables can now be const. You can write
> 
> Does that technically help in any way? On compact algorithms I didn't
> think about putting const anywhere. Anyways I added it.

It's not mandatory. I think I like it because it shows what variables
are not meant to be modified, and I can classify such variable
declarations in my mind as aliases when reading the code.

> > 
> > 	ASSERT(highQuantile > lowQuantile);
> > 	
> > 	/* Proportion of pixels which lies below lowQuantile and highQuantile. */
> > 	const double lowPoint = quantile(lowQuantile);
> >         const double highPoint = quantile(highQuantile, static_cast<uint32_t>(lowPoint));
> > 
> > > -	double sumBinFreq = 0, cumulFreq = 0;
> > > +	double sumBinFreq = 0;
> > > +	double cumulFreq = 0;
> > > +
> > 
> > Let's document the algorithm (and see if I understand it correctly :-)).
> > 
> > 	/*
> > 	 * Calculate the mean pixel value between the low and high points by
> > 	 * summing all the pixels between the two points, and dividing the sum
> > 	 * by the number of pixels. Given the discrete nature of the histogram
> > 	 * data, the sum of the pixels is approximated by accummulating the
> > 	 * product of the bin values (calculated as the mid point of the bin) by
> > 	 * the number of pixels they contain, for each bin in the internal.
> > 	 */
> 
> That nicely summarizes it. And actually it took me quite a while to
> understand the algorithm. So that really helps. Thanks.

It took me way too long to write those few lines :-)

> > 
> > > +	for (int bin = std::floor(lowPoint); bin < std::ceil(highPoint); bin++) {
> > 
> > It looks like bin can be unsigned.
> 
> I don't like unsigned :-) ... anyways, changed it.
> 
> > 
> > > +		double lowBound = std::max(static_cast<double>(bin), lowPoint);
> > 
> > I think you can also write
> > 
> > 		double lowBound = std::max<double>(bin, lowPoint);
> 
> Oh yes. that looks way nicer.
> 
> > 
> > Same for the next line. Up to you. Oh, and you can make them const too.
> > 
> > > +		double highBound = std::min(static_cast<double>(bin + 1), highPoint);
> > 
> > If I understand the code correctly, this is only meaningful for the
> > first and last iterations. I can't easily find a better construct that
> > wouldn't need to be run for each iteration, so this seems fine.
> > 
> > >  
> > > -	for (double p_next = floor(lowPoint) + 1.0;
> > > -	     p_next <= ceil(highPoint);
> > > -	     lowPoint = p_next, p_next += 1.0) {
> > > -		int bin = floor(lowPoint);
> > >  		double freq = (cumulative_[bin + 1] - cumulative_[bin])
> > > -			* (std::min(p_next, highPoint) - lowPoint);
> > > +			* (highBound - lowBound);
> > 
> > 	 	/*
> > 		 * The low and high quantile may not lie at bin boundaries, so
> > 		 * the first and last bins need to be weighted accordingly. The
> > 		 * best available approximation is to multiply the number of
> > 		 * pixels by the partial bin width.
> > 		 */
> > 		const double freq = (cumulative_[bin + 1] - cumulative_[bin])
> > 				  * (highBound - lowBound);
> > 
> > >  
> > >  		/* Accumulate weighted bin */
> > > -		sumBinFreq += bin * freq;
> > > +		sumBinFreq += 0.5 * (highBound + lowBound) * freq;
> > 
> > I wondered for a moment where the 0.5 came from. I think
> > 
> > 		sumBinFreq += (highBound + lowBound) / 2 * freq;
> > 
> > would better reflect the intent.
> > 
> > > +
> > >  		/* Accumulate weights */
> > >  		cumulFreq += freq;
> > 
> > I wonder if we should rename sumBinFreq to sumPixelValues and numPixels.
> 
> Depends on the background. The math people only talk about frequency and
> we even have a cumulativeFrequency() function. In the docs we often use
> pixels as that is mostly what we count.
> 
> I left it as is, as that's not wrong either.
> 
> > Reviewed-by: Laurent Pinchart <laurent.pinchart at ideasonboard.com>
> 
> Thank you!
> 
> > >  	}
> > > -	/* add 0.5 to give an average for bin mid-points */
> > > -	return sumBinFreq / cumulFreq + 0.5;
> > > +
> > > +	return sumBinFreq / cumulFreq;
> > >  }
> > >  
> > >  } /* namespace ipa */

-- 
Regards,

Laurent Pinchart