[PATCH] libcamera: debayer_cpu: Sync output buffer
Laurent Pinchart
laurent.pinchart at ideasonboard.com
Tue Sep 3 15:27:32 CEST 2024
On Tue, Sep 03, 2024 at 08:21:54AM +0100, Naushir Patuck wrote:
> On Mon, 2 Sept 2024 at 21:32, Laurent Pinchart wrote:
> > On Mon, Sep 02, 2024 at 12:56:31PM +0200, Robert Mader wrote:
> > > On 01.09.24 13:39, Robert Mader wrote:
> > > > On 01.09.24 13:07, Laurent Pinchart wrote:
> > > >> Hans, would you be able to test this on an IPU6-based device, and check
> > > >> the performance impact ? I don't expect expensive cache management
> > > >> operations on an x86 device.
> > > >>
> > > >> Bryan, could you do the same with camss ?
> > >
> > > Heads up that in my initial testing around different Gstreamer pipelines
> > > on arm64 I saw mixed results:
> > >
> > > 1. Cases involving successful dmabuf import to the GPU are (much) less
> > > prone to glitches while not seeming to regress much in terms of frame
> > > rates. This includes running Gnome-Snapshot or waylandsink on devices
> > > like the Librem5, PinePhone or Pixel 3a (generally qcom).
> > >
> > > 2. Cases where Gst mmaps the buffers seem to get a noticeable
> > > performance hit.
> > >
> > > Crucially this applies to common fallback paths like in following example:
> > >
> > > - glupload tries to import the buffer as dmabuf
> > >
> > > - fails due to stride requirements...
> > >
> > > - uses the "raw" importer that mmap the buffer
> > >
> > > This case is almost tragic IMO. The buffer data ends up only getting
> > > accessed by the CPU but we flush the catches/sync to the GPU *twice* -
> > > just to upload a copy in the end.
> > >
> > > And while I see potential to improve this scenario in the other parts of
> > > the stack, I don't see anything we can about it in libcamera right now
> > > (apart from not landing a patch like this).
> >
> > It's a bit late, but maybe there's a possibility to submit a lightning
> > talk/BoF topic for LPC in two weeks ? Cache handling is a topic that
> > crosses many subsystem boundaries, and I think we'll have quite a few
> > people with relevant expertise in Vienna.
>
> This is quite a complicated topic indeed. The RPi camera stack
> switched to using cacheable dma bufs for performance reasons (> 10%
> uplift in certain use cases) and we had to be very careful with how to
> handle the DMA_BUF_IOCTL_SYNC calls at the application level.
> However, I don't think handling this in MappedFrameBuffer is the right
> thing for hardware based ISPs because of unexpected stale data
> flushing/invalidation. I can expand on this during our F2F in Vienna.
That seems a good discussion topic to me.
--
Regards,
Laurent Pinchart
More information about the libcamera-devel
mailing list