[PATCH] libcamera: debayer_cpu: Sync output buffer
Naushir Patuck
naush at raspberrypi.com
Tue Sep 3 09:21:54 CEST 2024
On Mon, 2 Sept 2024 at 21:32, Laurent Pinchart
<laurent.pinchart at ideasonboard.com> wrote:
>
> On Mon, Sep 02, 2024 at 12:56:31PM +0200, Robert Mader wrote:
> > On 01.09.24 13:39, Robert Mader wrote:
> > >
> > > On 01.09.24 13:07, Laurent Pinchart wrote:
> > >> Hans, would you be able to test this on an IPU6-based device, and check
> > >> the performance impact ? I don't expect expensive cache management
> > >> operations on an x86 device.
> > >>
> > >> Bryan, could you do the same with camss ?
> >
> > Heads up that in my initial testing around different Gstreamer pipelines
> > on arm64 I saw mixed results:
> >
> > 1. Cases involving successful dmabuf import to the GPU are (much) less
> > prone to glitches while not seeming to regress much in terms of frame
> > rates. This includes running Gnome-Snapshot or waylandsink on devices
> > like the Librem5, PinePhone or Pixel 3a (generally qcom).
> >
> > 2. Cases where Gst mmaps the buffers seem to get a noticeable
> > performance hit.
> >
> > Crucially this applies to common fallback paths like in following example:
> >
> > - glupload tries to import the buffer as dmabuf
> >
> > - fails due to stride requirements...
> >
> > - uses the "raw" importer that mmap the buffer
> >
> > This case is almost tragic IMO. The buffer data ends up only getting
> > accessed by the CPU but we flush the catches/sync to the GPU *twice* -
> > just to upload a copy in the end.
> >
> > And while I see potential to improve this scenario in the other parts of
> > the stack, I don't see anything we can about it in libcamera right now
> > (apart from not landing a patch like this).
>
> It's a bit late, but maybe there's a possibility to submit a lightning
> talk/BoF topic for LPC in two weeks ? Cache handling is a topic that
> crosses many subsystem boundaries, and I think we'll have quite a few
> people with relevant expertise in Vienna.
>
This is quite a complicated topic indeed. The RPi camera stack
switched to using cacheable dma bufs for performance reasons (> 10%
uplift in certain use cases) and we had to be very careful with how to
handle the DMA_BUF_IOCTL_SYNC calls at the application level.
However, I don't think handling this in MappedFrameBuffer is the right
thing for hardware based ISPs because of unexpected stale data
flushing/invalidation. I can expand on this during our F2F in Vienna.
Naush
> --
> Regards,
>
> Laurent Pinchart
More information about the libcamera-devel
mailing list