[PATCH v6 08/18] libcamera: software_isp: Add DebayerCpu class

Wed Mar 27 19:08:02 CET 2024

Hi Laurent,

On 3/27/24 3:12 PM, Laurent Pinchart wrote:
> Hi Milan and Hans,
> 
> Thank you for the patch.

Same as with the previous review: Thank you for the review.

Milan is going to prepare a v7 of this series, so I'm going to <snip>
all the trivial remarks and only comment on the remaining remarks.

> On Tue, Mar 19, 2024 at 01:35:55PM +0100, Milan Zamazal wrote:
>> From: Hans de Goede <hdegoede at redhat.com>

<snip>

>> +{
>> +#ifdef __x86_64__
>> +	enableInputMemcpy_ = false;
>> +#else
>> +	enableInputMemcpy_ = true;
>> +#endif
> 
> This is very much *not* nice :-( It needs to at least be explained
> clearly somewhere.

Yeah this really needs to be a parameter to the SoftwareISP
constructor because some non x86 designs may also have
fully cached buffers, so they can also skip the extra
memcpy step.

My suggestion would be to introduce a struct SoftwareIspParams
with only a bool enableInputMemcpy_ in there for now and
then in the simplepipeline handler instead of having a bool
flag to indicate if the SoftwareISP should be list for a specific
media-controller driver-name, have a pointer to this struct and
have that pointer being non NULL indicate the swisp should
be used.

For now we would then have 2 SoftwareIspParams structs
this pointer can point to (uncached and cached) but in the future
we may very well have 1 SoftwareIspParams struct per supported
mediactl driver.

Actually since the IPU6 kernel side has not been merged yet,
for now we can just replace this with a hardcoded:

	enableInputMemcpy_ = true;

Milan, lets do that for the v7 submission.

Then I can add some mechanism to config this as a follow up
patch when IPU6 has actually landed in the kernel and thus
we can enable it in libcamera upstream.

<snip>

>> +void DebayerCpu::debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])
>> +{
>> +	const int width_in_bytes = window_.width * 5 / 4;
> 
> snakeCase here too, ane where applicable everywhere else.
> 
>> +	const uint8_t *prev = (const uint8_t *)src[0];
> 
> Is the cast needed ? If so it should be a C++ cast.

For the set of debayer10P_* functions introduced as the first
supported bayer fmt these casts are not necessary and can
be dropped.

In "[PATCH v6 14/18] libcamera: debayer_cpu: Add support for 8, 10
and 12 bpp unpacked bayer input" there are more casts and these
are actually necessary and will need to be replaced with
static c++ style casts.

>> +	LOG(Debayer, Info)
> 
> Shouldn't this be an Error mesage ? Same below.

These functions are called when the simple pipeline handler
is discovering supported formats so on CSI receivers or sensors
supporting multiple formats this happen for one of the fmts
is sorta expected behavior.

I don't feel strongly about this though, so if you want this
to be upgraded to an Error we can do that.

> 
>> +		<< "Unsupported input format " << inputFormat.toString();
>> +	return -EINVAL;
>> +}

<snip>

>> +	/* Measure before emitting signals */
>> +	if (measuredFrames_ < DebayerCpu::kLastFrameToMeasure &&
>> +	    ++measuredFrames_ > DebayerCpu::kFramesToSkip) {
>> +		timespec frameEndTime = {};
>> +		clock_gettime(CLOCK_MONOTONIC_RAW, &frameEndTime);
>> +		frameProcessTime_ += timeDiff(frameEndTime, frameStartTime);
>> +		if (measuredFrames_ == DebayerCpu::kLastFrameToMeasure) {
>> +			const unsigned int measuredFrames = DebayerCpu::kLastFrameToMeasure -
>> +							    DebayerCpu::kFramesToSkip;
>> +			LOG(Debayer, Info)
>> +				<< "Processed " << measuredFrames
>> +				<< " frames in " << frameProcessTime_ / 1000 << "us, "
>> +				<< frameProcessTime_ / (1000 * measuredFrames)
>> +				<< " us/frame";
>> +		}
>> +	}
> 
> Is this debugging leftovers, or should it be cleaned up ? It seems quite
> a bit of a hack. If we want to enable measurements, we shouldn't
> hardcode them to frames 30 to 60.

This is meant to be kept around, also see:

[PATCH v6 16/18] libcamera: Add "Software ISP benchmarking" documentation

the SoftwareISP is sensitive to performance regressions so this is
intended as a simple builtin benchmark to check for those.

So that means this falls under the "or should it be cleaned up" part
of your question.

You indicate you don't like the hardcoded frames 30 - 60. The idea is
to warm up the caches a bit and then measure a bunch of frames, which
is just what this does. What alternative approach are you suggesting?

>> +	/**
>> +	 * \brief Get the file descriptor for the statistics.
>> +	 *
>> +	 * \return the file descriptor pointing to the statistics.
>> +	 */
>> +	const SharedFD &getStatsFD() { return stats_->getStatsFD(); }
> 
> This,

Note the statistics pass-through stuff is sort of a necessary evil
since we want one main loop going over the data line by line and
doing both debayering as well as stats while the line is still
hot in the l2 cache. And things like the process2() and process4()
loops are highly CPU debayering specific so I don't think we should
move those out of the CpuDebayer code.

> plus the fact that this class handles colour gains and gamma,
> makes me thing we have either a naming issue, or an architecture issue.

I agree that this does a bit more then debayering, although
the debayering really is the main thing it does.

I guess the calculation of the rgb lookup tables which do the
color gains and gamma could be moved outside of this class,
that might even be beneficial for GPU based debayering assuming
that that is going to use rgb lookup tables too (it could
implement actual color gains + gamma correction in some different
way).

I think this falls under the lets wait until we have a GPU
based SoftISP MVP/POC and then do some refactoring to see which
bits should go where.

Regards,

Hans