[PATCH 03/27] libcamera: dma_buf_allocator: Favour udmabuf over cma heap allocations

Thu Apr 24 15:51:42 CEST 2025

On 24/04/2025 14:17, Nicolas Dufresne wrote:
>> even if the CPU only 'reads' the data ?
> Its any CPU write -> GPU that in serious trouble on recent Intel. That
> being said, if nothing either flush or invalidate the cache, if you
> read before, write with GPU, and read again after that same buffer, the
> data you'll read may be from old cache. In practice, GPU don't do
> anything in-place, so that is non-issue here.

Well we have the GPU write to its own frame buffer and currently then 
get a GBM pointer to that surface and memcpy() out into the libcamera 
buffer be that CMA or UDMABuf.

+	gbm_surface_lock_front_buffer(gbm_surface_);
+
+	sync.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ;
+	ioctl(bo_fd_, DMA_BUF_IOCTL_SYNC, &sync);
+
+	if (data_len > framesize_) {
+		LOG(GBM, Error) << "Invalid read size " << data_len << " max is " << 
framesize_;
+		return -EINVAL;
+	}
+
+	memcpy(data, map_, data_len);
+
+	sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ;
+	ioctl(bo_fd_, DMA_BUF_IOCTL_SYNC, &sync);
+
+	gbm_surface_release_buffer(gbm_surface_, gbm_bo_);
+

That I believe should be cache-coherent for both CMA and UDMAbuf across 
architectures.

We had an interesting discussion on how render-to-texture would work - 
if you created the render texture using eglCreateImageKHR.

// render to texture

dmabuf_handle = handle to CMA or UDMABuf buffer.

EGLint attrs[] = {
	EGL_DMA_BUF_PLANE0_FD_EXT, dmabuf_handle
};

mytexture-output = glCreateImageKHR(display, context,
                                     EGL_LINUX_DMA_BUF_EXT,
                                     NULL, attrs);

glBindFramebuffer(GL_FRAMBUFFER, someframebuffer);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
                        GL_TEXTURE_2D, mytexture-output);

if dmabuf_handle points to CMA heap - who is responsible to flush the 
cache before the CPU accessess the content of the DMA buf handle..

actually as I type that, I think the answer is that it is always the 
responsibility of the CPU side to manage its own cache so..

glDraw();

sync.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ;
ioctl(dmabuf_handle, DMA_BUF_IOCTL_SYNC, &sync);

sync.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ;
ioctl(dmabuf_handle, DMA_BUF_IOCTL_SYNC, &sync);

something like that ..

>> For GPU ISP - the only access the CPU should do is read the data to
>> generate some statistics. And even that - I would hope in the future
>> will be turned into operations performed by the GPU ...
> So read bayer, process in GPU, output YUV should just work. Thanks for
> clarification, running out of CMA is pretty common, defaults CMA
> reservation is usually very small. Appart from the buggy drivers,
> virtual memory is a much better choice, and this is mostly what this
> patch is about here.
Thinking about this patch.

A user on a system such as imx/hantro is free to configure the system to 
support CMA and UDMABuf.

If you never pass that buffer to the video encoder - where the encoder 
requires phys contig/cma heap memory - then you probably want udmabuf.

The reverse is also true.

Its almost use-case specific. If you want the encoder you need CMA and 
if you just want to say - display the output in cheese you want UDMAbuf.

Probably the right thing to do is to leave CMA heap as default and leave 
it to the system designer to configure the system as they wish.

In my case, switching off the CMA heap or perhaps a libcamera config/env 
variable to specify which to use...

---
bod