[PATCH 03/27] libcamera: dma_buf_allocator: Favour udmabuf over cma heap allocations

Thu Apr 24 16:12:08 CEST 2025

On Thu, Apr 24, 2025 at 09:17:48AM -0400, Nicolas Dufresne wrote:
> Le mercredi 23 avril 2025 à 14:07 +0100, Kieran Bingham a écrit :
> > Quoting Nicolas Dufresne (2025-04-23 13:50:03)
> > > Le mercredi 23 avril 2025 à 10:36 +0100, Kieran Bingham a écrit :
> > > > Quoting Laurent Pinchart (2025-04-22 23:22:31)
> > > > > On Tue, Apr 22, 2025 at 10:58:56PM +0100, Bryan O'Donoghue wrote:
> > > > > > When /dev/dma_heap/linux,cma or /dev/dma_heap/system exist currently we
> > > > > > favour allocation from this type of heap over /dev/udmabuf.
> > > > > > 
> > > > > > We ought to favour udmabuf though
> > > > > > 
> > > > > > - udmabuf is the preferred method by various distros for security reasons
> > > > > > - Contiguous memory is a scarce resource
> > > > > > 
> > > > > > Change the ordering of the allocator lookup so that the udmabuf lookup
> > > > > > comes first.
> > > > > 
> > > > > This means that on a system where CMA allocation is possible from
> > > > > userspace, the buffers allocated by libcamera for the virtual pipeline
> > > > > handler and ISP won't be shareable without copies with a consumer that
> > > > > requires contiguous memory (e.g. most KMS devices on Arm platforms).
> > > > > Isn't that an issue ? It seems to even count as a regression.
> > > > 
> > > > On my X13s, everything works better for me on udmabuf. If we try to use
> > > > the cma allocator things fail.
> > > > 
> > > > That /could/ be a separate topic/issue - but at least might explain some
> > > > of the rationale behind this.
> > > > 
> > > > I usually end up with something like this on my system.
> > > 
> > > Did you mean to add a link ? I'm quite curious, since here on Meteor
> > 
> > No, sorry - In the text above, I meant "I usually end up with something
> > like ... 'this patch' ... on my system". Meaning I configure my x13s to
> > prefer udmabuf over cma because cma allocations frequently fail for me.
> > 
> > It's not clear if it's a resource leak, or indeterminate usage by other
> > components - (hence might be something else to look into) but udmabuf
> > always works on softisp for x13s so far, so just use it.
> > 
> > > Lake GPU, passing over anything that has been touched by the CPU
> > > results in a big mess. This used to only happen when passing these to
> > > the display controller, now the GPU also does not handle the CPU
> > > coherent memory properly.
> > 
> > even if the CPU only 'reads' the data ?
> 
> Its any CPU write -> GPU that in serious trouble on recent Intel. That
> being said, if nothing either flush or invalidate the cache, if you
> read before, write with GPU, and read again after that same buffer, the
> data you'll read may be from old cache. In practice, GPU don't do
> anything in-place, so that is non-issue here.

To clarify, we have two different access patterns here.

GPU input buffers contain raw images, and are currently captured from a
V4L2 device (typically a DMA engine at the output of a CSI-2 receiver).
They are

- Allocated on a V4L2 device
- Written by the V4L2 capure device (no cache involved)
- Read by the CPU (to compute statistics, CPU cache involved)
- Read by the GPU (to process them, GPU cache involved)

Note that, in the future, we may want to use the GPU for pre- or
post-processing with a hardware ISP, or to process data generated by the
CPU. This will need to be taken into account for cache management.

GPU output buffers contain processed (RGB, or in the future YUV) images.
They are

- Allocated through udmabuf or DMA heaps
- Written by the GPU
- Read by consumers out of our control (CPU, GPU or other hardware,
  typically through DRM, KMS or V4L2)

> > For GPU ISP - the only access the CPU should do is read the data to
> > generate some statistics. And even that - I would hope in the future
> > will be turned into operations performed by the GPU ...
> 
> So read bayer, process in GPU, output YUV should just work. Thanks for
> clarification, running out of CMA is pretty common, defaults CMA
> reservation is usually very small. Appart from the buggy drivers,
> virtual memory is a much better choice, and this is mostly what this
> patch is about here.

That depends on the consumer though. We want to enable zero-copy
operation, and udmabuf will prevent that on systems without an IOMMU if
the consumer is a hardware device. This issue needs to eventually be
solved by a centralized allocator, but in the meantime, this patch
introduces a regression by breaking zero-copy.

> > > > > > Fixes: ea4baaacc325 ("libcamera: DmaBufAllocator: Support allocating from /dev/udmabuf")
> > > > > > Signed-off-by: Bryan O'Donoghue <bryan.odonoghue at linaro.org>
> > > > > > ---
> > > > > >  src/libcamera/dma_buf_allocator.cpp | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/src/libcamera/dma_buf_allocator.cpp b/src/libcamera/dma_buf_allocator.cpp
> > > > > > index d8c62dd6..722ffd46 100644
> > > > > > --- a/src/libcamera/dma_buf_allocator.cpp
> > > > > > +++ b/src/libcamera/dma_buf_allocator.cpp
> > > > > > @@ -45,10 +45,10 @@ static constexpr std::array<DmaBufAllocatorInfo, 4> providerInfos = { {
> > > > > >        * /dev/dma_heap/linux,cma is the CMA dma-heap. When the cma heap size is
> > > > > >        * specified on the kernel command line, this gets renamed to "reserved".
> > > > > >        */
> > > > > > +     { DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
> > > > > >       { DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/linux,cma" },
> > > > > >       { DmaBufAllocator::DmaBufAllocatorFlag::CmaHeap, "/dev/dma_heap/reserved" },
> > > > > >       { DmaBufAllocator::DmaBufAllocatorFlag::SystemHeap, "/dev/dma_heap/system" },
> > > > > > -     { DmaBufAllocator::DmaBufAllocatorFlag::UDmaBuf, "/dev/udmabuf" },
> > > > > >  } };
> > > > > >  
> > > > > >  LOG_DEFINE_CATEGORY(DmaBufAllocator)

-- 
Regards,

Laurent Pinchart