[PATCH] libcamera: debayer_cpu: Sync output buffer
Robert Mader
robert.mader at collabora.com
Tue Sep 3 15:39:29 CEST 2024
On 03.09.24 15:27, Laurent Pinchart wrote:
> On Tue, Sep 03, 2024 at 08:21:54AM +0100, Naushir Patuck wrote:
>> On Mon, 2 Sept 2024 at 21:32, Laurent Pinchart wrote:
>>> On Mon, Sep 02, 2024 at 12:56:31PM +0200, Robert Mader wrote:
>>>> On 01.09.24 13:39, Robert Mader wrote:
>>>>> On 01.09.24 13:07, Laurent Pinchart wrote:
>>>>>> Hans, would you be able to test this on an IPU6-based device, and check
>>>>>> the performance impact ? I don't expect expensive cache management
>>>>>> operations on an x86 device.
>>>>>>
>>>>>> Bryan, could you do the same with camss ?
>>>> Heads up that in my initial testing around different Gstreamer pipelines
>>>> on arm64 I saw mixed results:
>>>>
>>>> 1. Cases involving successful dmabuf import to the GPU are (much) less
>>>> prone to glitches while not seeming to regress much in terms of frame
>>>> rates. This includes running Gnome-Snapshot or waylandsink on devices
>>>> like the Librem5, PinePhone or Pixel 3a (generally qcom).
>>>>
>>>> 2. Cases where Gst mmaps the buffers seem to get a noticeable
>>>> performance hit.
>>>>
>>>> Crucially this applies to common fallback paths like in following example:
>>>>
>>>> - glupload tries to import the buffer as dmabuf
>>>>
>>>> - fails due to stride requirements...
>>>>
>>>> - uses the "raw" importer that mmap the buffer
>>>>
>>>> This case is almost tragic IMO. The buffer data ends up only getting
>>>> accessed by the CPU but we flush the catches/sync to the GPU *twice* -
>>>> just to upload a copy in the end.
>>>>
>>>> And while I see potential to improve this scenario in the other parts of
>>>> the stack, I don't see anything we can about it in libcamera right now
>>>> (apart from not landing a patch like this).
>>> It's a bit late, but maybe there's a possibility to submit a lightning
>>> talk/BoF topic for LPC in two weeks ? Cache handling is a topic that
>>> crosses many subsystem boundaries, and I think we'll have quite a few
>>> people with relevant expertise in Vienna.
>> This is quite a complicated topic indeed. The RPi camera stack
>> switched to using cacheable dma bufs for performance reasons (> 10%
>> uplift in certain use cases) and we had to be very careful with how to
>> handle the DMA_BUF_IOCTL_SYNC calls at the application level.
>> However, I don't think handling this in MappedFrameBuffer is the right
>> thing for hardware based ISPs because of unexpected stale data
>> flushing/invalidation. I can expand on this during our F2F in Vienna.
> That seems a good discussion topic to me.
>
Unfortunately I won't be able to join that, but I'd be really interested
to hear about ideas what could be done to improve the situation. :/
--
Robert Mader
Consultant Software Developer
Collabora Ltd.
Platinum Building, St John's Innovation Park, Cambridge CB4 0DS, UK
Registered in England & Wales, no. 5513718
More information about the libcamera-devel
mailing list