[PATCH v6 06/18] libcamera: software_isp: Add SwStatsCpu class

Hans de Goede hdegoede at redhat.com
Wed Mar 27 16:18:45 CET 2024


Hi Laurent,

On 3/27/24 2:51 PM, Laurent Pinchart wrote:
> Hi Milan and Hans,
> 
> Thank you for the patch.
> 
> I understand all this is work in progress, so I won't comment on all the
> improvements we could add in the future, but focus on merging this as
> quickly as possible (aiming for one last new version for the series to
> address simple issues only).

Thank you for the review. Milan is going to prepare a v7 of this
series, so I'm going to <snip> all the trivial remarks and only
comment on the remaining remarks.

> 
> On Tue, Mar 19, 2024 at 01:35:53PM +0100, Milan Zamazal wrote:
>> From: Hans de Goede <hdegoede at redhat.com>
>>
>> Add a CPU based SwStats implementation for SoftwareISP / SoftIPA use.
>>
>> This implementation offers a configure function + functions to gather
>> statistics on a line by line basis. This allows CPU based software
>> debayering to call into interlace debayering and statistics gathering
> 
> I think you meant interleave, not interlace.

Ack.

<snip>

>> +struct SwIspStats {
>> +	/**
>> +	 * \brief Holds the sum of all sampled red pixels.
> 
> Does it wrap around or saturate in case of overflow ?

Wrap, the idea being that it needs to be big enough to
handle max-pixel-value * number-of-samples pixels.

Saturate would require an conditional for each samples
pixel and benchmarking has shown that adding any conditionals
which run for each pixel greatly slow things down.

> 
>> +	 */
>> +	unsigned long sumR_;
> 
> Any reason for using a platform-dependent type instead of uint32_t or
> uint64_t ?

Not really, the sums really should be uint64_t-s to avoid
the mentioned wrapping. Milan can you change the type here
to uint64_t please.


> 
>> +	/**
>> +	 * \brief Holds the sum of all sampled green pixels.
>> +	 */
>> +	unsigned long sumG_;
>> +	/**
>> +	 * \brief Holds the sum of all sampled blue pixels.
>> +	 */
>> +	unsigned long sumB_;
>> +	/**
>> +	 * \brief Number of bins in the yHistogram.
>> +	 */
>> +	static constexpr unsigned int kYHistogramSize = 16;
>> +	/**
>> +	 * \brief A histogram of luminance values.
>> +	 */
>> +	std::array<unsigned int, kYHistogramSize> yHistogram;
> 
> Same comments here.

This also wraps, but since a single bin is increased by 1
for a single pixel in the worst case scenario a single bin
gets increased to the total amount of sampled pixels which
will easily fit in an uint32_t, and this is an array so
size matters (for cacheline usage) so lets make this
an uint32_t array.

>> +/**
>> + * \brief Reset state to start statistics gathering for a new frame.
>> + *
>> + * This may only be called after a successful setWindow() call.
> 
> It would be nice to design the API in a way that makes this kind of
> logic error impossible, or at least more difficult.

Ack, I guess we can check for an empty window and then log
an error?

> 
>> + */
>> +void SwStatsCpu::startFrame(void)
>> +{
>> +	stats_.sumR_ = 0;
>> +	stats_.sumB_ = 0;
>> +	stats_.sumG_ = 0;
>> +	stats_.yHistogram.fill(0);
>> +}
>> +
>> +/**
>> + * \brief Finish statistics calculation for the current frame.
>> + *
>> + * This may only be called after a successful setWindow() call.
>> + */
>> +void SwStatsCpu::finishFrame(void)
>> +{
>> +	*sharedStats_ = stats_;
> 
> Is it more efficient to copy the stats instead of operating directly on
> the shared memory ?

I inherited doing things this way from Andrey. I kept this because
we don't really have any synchronization with the IPA reading this.

So the idea is to only touch this when the next set of statistics
is ready since we don't know when the IPA is done with accessing
the previous set of statistics ...

This is both something which seems mostly a theoretic problem,
yet also definitely something which I think we need to fix.

Maybe use a ringbuffer of stats buffers and pass the index into
the ringbuffer to the emit signal ?

Milan can you add a SoftwareISP-TODO document for the next
version and mention this as something which we need to look at
in the future ?

> 
>> +	statsReady.emit(0);

<snip>


>> +	int configure(const StreamConfiguration &inputCfg);
>> +	void setWindow(Rectangle window);
> 
> const Rectangle &window
> 
> Any reason to separate the two functions ?

The inputCfg gets set once before starting streaming, the window
can be updated on the fly, to e.g. track a face (if we ever add
face tracking).

> 
>> +	void startFrame();
>> +	void finishFrame();
>> +
>> +	/**
>> +	 * \brief Process line 0.
>> +	 * \param[in] y The y coordinate.
>> +	 * \param[in] src The input data.
>> +	 *
>> +	 * This function processes line 0 for input formats with patternSize height == 1.
>> +	 * It'll process line 0 and 1 for input formats with patternSize height >= 2.
>> +	 * This function may only be called after a successful setWindow() call.
>> +	 */
>> +	void processLine0(unsigned int y, const uint8_t *src[])
>> +	{
>> +		if ((y & ySkipMask_) || y < (unsigned int)window_.y ||
> 
> C++ casts please.
> 
>> +		    y >= (window_.y + window_.height))
>> +			return;
>> +
>> +		(this->*stats0_)(src);
>> +	}
> 
> Does making the function inline improve performances ?

That is the idea yes. I've not measured this separately but this
was part of a bigger rework for performance reasons and that
entire rework did speed up things by a factor 2x - 3x .


> 
>> +
>> +	/**
>> +	 * \brief Process line 2 and 3.
>> +	 * \param[in] y The y coordinate.
>> +	 * \param[in] src The input data.
>> +	 *
>> +	 * This function processes line 2 and 3 for input formats with patternSize height == 4.
>> +	 * This function may only be called after a successful setWindow() call.
>> +	 */
>> +	void processLine2(unsigned int y, const uint8_t *src[])
>> +	{
>> +		if ((y & ySkipMask_) || y < (unsigned int)window_.y ||
>> +		    y >= (window_.y + window_.height))
>> +			return;
>> +
>> +		(this->*stats2_)(src);
>> +	}
>> +
>> +	/**
>> +	 * \brief Signals that the statistics are ready.
>> +	 *
>> +	 * The int parameter isn't actually used.
> 
> Then drop it :-)
> 
> 	Signal<> statsReady;
> 
> is perfectly valid.

Ah so just empty? This was remarked on a previous version (possibly by you?)
and I think I tried adding <void> as the more of a C-programmer that I am
and that definitely did not work.

Milan can you give this a try.

> But better, I wonder if the signal could be dropped completely. The
> SwStatsCpu class does not operate asynchronously. Shouldn't whoever
> calls the finishFrame() function then handle emitting the signal ?
> 
> Now, the trouble is that this would be the DebayerCpu class, whose name
> doesn't indicate as a prime candidate to handle stats. However, it
> already exposes a getStatsFD() function, so we're already calling for
> trouble :-) Either that should be moved to somewhere else, or the class
> should be renamed. Considering that the class applies colour gains in
> addition to performing the interpolation, it may be more of a naming
> issue.
> 
> Removing the signal and refactoring those classes doesn't have to be
> addressed now, I think it would be part of a larger refactoring
> (possibly also considering platforms that have no ISP but can produce
> stats in hardware, such as the i.MX7), but please keep it on your radar.

Right, I think this is all stuff to add to the SoftwareISP-TODO doc
which I suggested above.

Regards,

Hans




More information about the libcamera-devel mailing list