[PATCH v6 08/18] libcamera: software_isp: Add DebayerCpu class

Wed Mar 27 20:37:29 CET 2024

Hi Hans,

On Wed, Mar 27, 2024 at 07:08:02PM +0100, Hans de Goede wrote:
> On 3/27/24 3:12 PM, Laurent Pinchart wrote:
> > Hi Milan and Hans,
> > 
> > Thank you for the patch.
> 
> Same as with the previous review: Thank you for the review.
> 
> Milan is going to prepare a v7 of this series, so I'm going to <snip>
> all the trivial remarks and only comment on the remaining remarks.
> 
> > On Tue, Mar 19, 2024 at 01:35:55PM +0100, Milan Zamazal wrote:
> >> From: Hans de Goede <hdegoede at redhat.com>
> 
> <snip>
> 
> >> +{
> >> +#ifdef __x86_64__
> >> +	enableInputMemcpy_ = false;
> >> +#else
> >> +	enableInputMemcpy_ = true;
> >> +#endif
> > 
> > This is very much *not* nice :-( It needs to at least be explained
> > clearly somewhere.
> 
> Yeah this really needs to be a parameter to the SoftwareISP
> constructor because some non x86 designs may also have
> fully cached buffers, so they can also skip the extra
> memcpy step.
> 
> My suggestion would be to introduce a struct SoftwareIspParams
> with only a bool enableInputMemcpy_ in there for now and
> then in the simplepipeline handler instead of having a bool
> flag to indicate if the SoftwareISP should be list for a specific
> media-controller driver-name, have a pointer to this struct and
> have that pointer being non NULL indicate the swisp should
> be used.
> 
> For now we would then have 2 SoftwareIspParams structs
> this pointer can point to (uncached and cached) but in the future
> we may very well have 1 SoftwareIspParams struct per supported
> mediactl driver.
> 
> Actually since the IPU6 kernel side has not been merged yet,
> for now we can just replace this with a hardcoded:
> 
> 	enableInputMemcpy_ = true;
> 
> Milan, lets do that for the v7 submission.

Could you please also add a comment that explains why we want the memcpy
to be configurable ?

> Then I can add some mechanism to config this as a follow up
> patch when IPU6 has actually landed in the kernel and thus
> we can enable it in libcamera upstream.
> 
> <snip>
> 
> >> +void DebayerCpu::debayer10P_BGBG_BGR888(uint8_t *dst, const uint8_t *src[])
> >> +{
> >> +	const int width_in_bytes = window_.width * 5 / 4;
> > 
> > snakeCase here too, ane where applicable everywhere else.
> > 
> >> +	const uint8_t *prev = (const uint8_t *)src[0];
> > 
> > Is the cast needed ? If so it should be a C++ cast.
> 
> For the set of debayer10P_* functions introduced as the first
> supported bayer fmt these casts are not necessary and can
> be dropped.
> 
> In "[PATCH v6 14/18] libcamera: debayer_cpu: Add support for 8, 10
> and 12 bpp unpacked bayer input" there are more casts and these
> are actually necessary and will need to be replaced with
> static c++ style casts.

Right, I hadn't seen that yet when reviewing this patch.

> >> +	LOG(Debayer, Info)
> > 
> > Shouldn't this be an Error mesage ? Same below.
> 
> These functions are called when the simple pipeline handler
> is discovering supported formats so on CSI receivers or sensors
> supporting multiple formats this happen for one of the fmts
> is sorta expected behavior.
> 
> I don't feel strongly about this though, so if you want this
> to be upgraded to an Error we can do that.

If it's an expected behaviour, I would make this a debug message then.

> >> +		<< "Unsupported input format " << inputFormat.toString();
> >> +	return -EINVAL;
> >> +}
> 
> <snip>
> 
> >> +	/* Measure before emitting signals */
> >> +	if (measuredFrames_ < DebayerCpu::kLastFrameToMeasure &&
> >> +	    ++measuredFrames_ > DebayerCpu::kFramesToSkip) {
> >> +		timespec frameEndTime = {};
> >> +		clock_gettime(CLOCK_MONOTONIC_RAW, &frameEndTime);
> >> +		frameProcessTime_ += timeDiff(frameEndTime, frameStartTime);
> >> +		if (measuredFrames_ == DebayerCpu::kLastFrameToMeasure) {
> >> +			const unsigned int measuredFrames = DebayerCpu::kLastFrameToMeasure -
> >> +							    DebayerCpu::kFramesToSkip;
> >> +			LOG(Debayer, Info)
> >> +				<< "Processed " << measuredFrames
> >> +				<< " frames in " << frameProcessTime_ / 1000 << "us, "
> >> +				<< frameProcessTime_ / (1000 * measuredFrames)
> >> +				<< " us/frame";
> >> +		}
> >> +	}
> > 
> > Is this debugging leftovers, or should it be cleaned up ? It seems quite
> > a bit of a hack. If we want to enable measurements, we shouldn't
> > hardcode them to frames 30 to 60.
> 
> This is meant to be kept around, also see:
> 
> [PATCH v6 16/18] libcamera: Add "Software ISP benchmarking" documentation
> 
> the SoftwareISP is sensitive to performance regressions so this is
> intended as a simple builtin benchmark to check for those.
> 
> So that means this falls under the "or should it be cleaned up" part
> of your question.
> 
> You indicate you don't like the hardcoded frames 30 - 60. The idea is
> to warm up the caches a bit and then measure a bunch of frames, which
> is just what this does. What alternative approach are you suggesting?

I wonder if there would be a way to control at runtime when/how to
perform those measurements. Maybe that's a bit overkill. In any case, it
would be something to address later.

> >> +	/**
> >> +	 * \brief Get the file descriptor for the statistics.
> >> +	 *
> >> +	 * \return the file descriptor pointing to the statistics.
> >> +	 */
> >> +	const SharedFD &getStatsFD() { return stats_->getStatsFD(); }
> > 
> > This,
> 
> Note the statistics pass-through stuff is sort of a necessary evil
> since we want one main loop going over the data line by line and
> doing both debayering as well as stats while the line is still
> hot in the l2 cache. And things like the process2() and process4()
> loops are highly CPU debayering specific so I don't think we should
> move those out of the CpuDebayer code.

Yes, that I understood from the review. "necessary evil" is indeed the
right term :-) I expect it will take quite some design skills to balance
the need for performances and the need for a maintainable architecture.

> > plus the fact that this class handles colour gains and gamma,
> > makes me thing we have either a naming issue, or an architecture issue.
> 
> I agree that this does a bit more then debayering, although
> the debayering really is the main thing it does.
> 
> I guess the calculation of the rgb lookup tables which do the
> color gains and gamma could be moved outside of this class,
> that might even be beneficial for GPU based debayering assuming
> that that is going to use rgb lookup tables too (it could
> implement actual color gains + gamma correction in some different
> way).
> 
> I think this falls under the lets wait until we have a GPU
> based SoftISP MVP/POC and then do some refactoring to see which
> bits should go where.

As long as we record this in the todo file (with a short summary of the
above), that's fine with me.

-- 
Regards,

Laurent Pinchart