[PATCH 03/27] libcamera: dma_buf_allocator: Favour udmabuf over cma heap allocations

Fri Apr 25 17:07:01 CEST 2025

On Fri, Apr 25, 2025 at 09:01:33AM -0400, Nicolas Dufresne wrote:
> Le jeudi 24 avril 2025 à 17:14 +0300, Laurent Pinchart a écrit :
> > On Thu, Apr 24, 2025 at 02:51:42PM +0100, Bryan O'Donoghue wrote:
> > > On 24/04/2025 14:17, Nicolas Dufresne wrote:
> > > > > even if the CPU only 'reads' the data ?
> > > > Its any CPU write -> GPU that in serious trouble on recent Intel. That
> > > > being said, if nothing either flush or invalidate the cache, if you
> > > > read before, write with GPU, and read again after that same buffer, the
> > > > data you'll read may be from old cache. In practice, GPU don't do
> > > > anything in-place, so that is non-issue here.
> > > 
> > > Well we have the GPU write to its own frame buffer and currently then 
> > > get a GBM pointer to that surface and memcpy() out into the libcamera 
> > > buffer be that CMA or UDMABuf.
> > > 
> > > +	gbm_surface_lock_front_buffer(gbm_surface_);
> > > +
> > > +	sync.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ;
> > > +	ioctl(bo_fd_, DMA_BUF_IOCTL_SYNC, &sync);
> > > +
> > > +	if (data_len > framesize_) {
> > > +		LOG(GBM, Error) << "Invalid read size " << data_len << " max is " << framesize_;
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	memcpy(data, map_, data_len);
> > > +
> > > +	sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ;
> > > +	ioctl(bo_fd_, DMA_BUF_IOCTL_SYNC, &sync);
> > > +
> > > +	gbm_surface_release_buffer(gbm_surface_, gbm_bo_);
> > > +
> > > 
> > > That I believe should be cache-coherent for both CMA and UDMAbuf across 
> > > architectures.
> > > 
> > > We had an interesting discussion on how render-to-texture would work - 
> > > if you created the render texture using eglCreateImageKHR.
> > > 
> > > // render to texture
> > > 
> > > dmabuf_handle = handle to CMA or UDMABuf buffer.
> > > 
> > > EGLint attrs[] = {
> > > 	EGL_DMA_BUF_PLANE0_FD_EXT, dmabuf_handle
> > > };
> > > 
> > > mytexture-output = glCreateImageKHR(display, context,
> > >                                      EGL_LINUX_DMA_BUF_EXT,
> > >                                      NULL, attrs);
> > > 
> > > glBindFramebuffer(GL_FRAMBUFFER, someframebuffer);
> > > glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
> > >                         GL_TEXTURE_2D, mytexture-output);
> > > 
> > > 
> > > if dmabuf_handle points to CMA heap - who is responsible to flush the 
> > > cache before the CPU accessess the content of the DMA buf handle..
> > > 
> > > actually as I type that, I think the answer is that it is always the 
> > > responsibility of the CPU side to manage its own cache so..
> > > 
> > > glDraw();
> > > 
> > > sync.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ;
> > > ioctl(dmabuf_handle, DMA_BUF_IOCTL_SYNC, &sync);
> > > 
> > > sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ;
> > > ioctl(dmabuf_handle, DMA_BUF_IOCTL_SYNC, &sync);
> > > 
> > > something like that ..
> > > 
> > > > > For GPU ISP - the only access the CPU should do is read the data to
> > > > > generate some statistics. And even that - I would hope in the future
> > > > > will be turned into operations performed by the GPU ...
> > > > So read bayer, process in GPU, output YUV should just work. Thanks for
> > > > clarification, running out of CMA is pretty common, defaults CMA
> > > > reservation is usually very small. Appart from the buggy drivers,
> > > > virtual memory is a much better choice, and this is mostly what this
> > > > patch is about here.
> > > 
> > > Thinking about this patch.
> > > 
> > > A user on a system such as imx/hantro is free to configure the system to 
> > > support CMA and UDMABuf.
> > > 
> > > If you never pass that buffer to the video encoder - where the encoder 
> > > requires phys contig/cma heap memory - then you probably want udmabuf.
> > 
> > Add the display controller here, it's not just the video encoder.
> > 
> > Have you tried displaying images with cam (using the -D flag) on a
> > system without an IOMMU, with this patch applied ? Does it still work ?
> 
> In this case glCreateImageKHR() is expected to fail once it discover
> that the memory is scattered. In my experience, this part is well
> handled.

It will only fail if the GPU has no IOMMU, right ? What if the GPU has
an IOMMU, but the display controller doesn't ?

> What's missing sometimes is verification that memory
> allocation have happened with CPU cached enabled. Without the mmu,
> there is nothing to handle cached pages. Recent Intel graphics happily
> succeed, resulting in a funny mess on screen.
> 
> I assume the code here is just for example, and that there is failure
> and fallback in place.
> 
> > > The reverse is also true.
> > > 
> > > Its almost use-case specific. If you want the encoder you need CMA and 
> > > if you just want to say - display the output in cheese you want UDMAbuf.
> > > 
> > > Probably the right thing to do is to leave CMA heap as default and leave 
> > > it to the system designer to configure the system as they wish.
> > > 
> > > In my case, switching off the CMA heap or perhaps a libcamera config/env 
> > > variable to specify which to use...

-- 
Regards,

Laurent Pinchart