[libcamera-devel] Docs about buffers

Thu Nov 18 13:11:30 CET 2021

Hi Dorota,

Thanks for digging in here. I know we've tried to add as much
documentation as we can, but it comes from people who already work with
V4L2 so it's very easy for us to make lots of assumptions.

Fresh eyes are /really/ helpful.

I'm replying to things I think I can talk about below, I may have said
the same thing as Jacopo in some cases ... but hopefully I can add some
value too.

Quoting Jacopo Mondi (2021-11-18 09:51:06)
> Hi Dorota
> 
> On Wed, Nov 17, 2021 at 04:46:54PM +0100, Dorota Czaplejewicz wrote:
> > Hi,
> >
> > thanks to Jacopo, I read the pipeline handler guide. Most of it was
> > great, confirming some of my reasoning about the code, but I got
> > stuck there in the same place that gave me the most trouble in code:
> > frame buffers.
> >
> > I got the impression that docs about frame buffers are written for
> > someone with a different background than I have. Perhaps it assumes
> > some knowledge about V4L2 internals, whereas I'm just a lowly
> > consumer of those APIs, and I haven't had to deal with media-device
> > complexity with my driver (no ISP of any kind). As a result, I was
> > unable to understand the vocabulary, and I didn't learn much from
> > the docs that I couldn't guess already. Without false humility, I
> > think that folks like me can be useful in writing pipeline handlers,
> > so I wanted to give my perspective on what gave me trouble about
> > frame buffers.
> 
> Yes indeed there are some assumed knowledge about the underlying
> mechanism. Not that much about anything related to MC but rather about
> a few key memory handling concepts, only some of them V4L2-specific.
> 
> I can't explain all the details as I would probably get something
> wrong and confuse you more, but I'll leave you some pointers to
> hopefully clarify your questions and get to fix the documentation
> where it's not good enough.
> 
> >
> > When I look at some code, I ask myself the crucial questions of
> > "what is the intention?", "how does this work?", "why this way?",
> > and I will focus on that.
> >
> > I had some smaller difficulties with the FrameBuffer class itself.
> > https://libcamera.org/api-html/classlibcamera_1_1FrameBuffer.html
> >
> > While the description does quite a good job at explaining the
> > intention and its contents, it's leaving parts of the other
> > questions unanswered. For example, where is the actual data stored?
> > Some other place mentions that the buffer is a DMABUF buffer, which
> > presumably means that the data is referenced using Plane::fd. Since
> > buffers are usually just plain in-memory arrays, this could also use
> > a "why?" explanation (to allow zero-copy between x and y?). I know
> > that there's a sentenct about that:
> >
> > > The planes are specified when creating the FrameBuffer and are
> > > expressed as a set of dmabuf file descriptors, offset and length.
> >
> > but I think the language is overly complicated, when it could say
> > sth like "The frame buffer data is stored in the dmabuf file
> > descriptors for each Plane object" to easily answer "where is the
> > data?". (I'm saying that cause I missed it myself while tired after
> > reading the guide.)
> 
> The thing is that, in my understanding at least, saying "the data is
> stored in the dmabuf" is thecnically wrong. dmabuf is a cross-device
> memory sharing and representation mechanism.
> 
> I'm afraid the "where is the data" very much depend on where memory is
> allocated and how the application makes use of the API.
> 
> Generally speaking the memory (but I'm sure this is partial) for the
> frame buffer can be allocated - By another device (DRM/KMS in example)
> as a dmabuf descriptor and passed to libcamera wrapped in  a
> FrameBuffer - In the camera video device itself and then export to
> other consumers (including the Camera itself, see the
> FrameBufferAllocator class)

The memory behind a dma-buf is indeed some memory stored in RAM
somewhere.  But as Jacopo says, we may not know where. And we may not
even have access to it from the CPU at all.

The FD is simply a handle that says 'This is a reference to a chunk of
memory that contains what you want, you can pass it to someone else, or
if you know what's in it, and how to read it - you can try to mmap it'

And yes, indeed the purpose of using dma-buf is to allow zero-copy
sharing.

In the context of a FrameBuffer - we might need 1, 2, or even 3 separate
'pieces of memory' (each represented by an FD/dma-buf handle) to store a
full image. Which is why they are referenced as Planes. Each plane must
store the fd that it is in, and the length, but also the offset, because
there can also be more than one plane stored in a single contiguous
piece of memory, which is referenced by a single dma-buf/fd.

Even with all that from me, and Jacopo, I am struggling to see if that
actually helps answer your question above or clarify it. I hope so - but
certainly keep pushing for more detail if you need it.

> > The description to access the data is placed in the docs for Plane,
> > which makes it slightly harder to discover. Perhaps it would be
> > beneficial to place all the docs regarding Plane and FrameBuffer in
> > one place, since the two classes are basically two aspects of one
> > thing. Having one description would make writing easier: no more
> > questions "should I mention memory mapping here or reference the
> > docs for Plane instead?".
> 
> That could be indeed considered, care to send a patch ?

One thing I'd really like to see sometime is more documentation /
introductions of higher levels from the front of the API documentation.
We don't even have a start page there - so it is certainly difficult to
get going.

There's definitely scope for a whole 'page'/'section'? discussing
memory/framebuffer concepts. But even then we need to make sure those
pages are easy to find as well, as you have mentioned.

> > A related thing that I think should get more visibility is to stress
> > that the buffers don't necessarily live on the CPU. (Where do they
> > live? Is FrameBuffer suitable to represent data flowing from the
> > sensor to the MIPI controller? Can it live in any memory area
> > accessible via DMA?)
> 
> I'm afraid I'm not fully getting this.
> 
> With 'living on the CPU' do you mean they have a CPU-accessible
> address that can be accessed by userspace and that gets obtained by
> mmap ? Because that's again a representation and where the memory area
> is allocated depends on the usege of the API.
> 
> When it comes to the second part, I'm afraid there's no such thing as
> 'data flowing to the sensor to the MIPI controller'. I mean, there of
> course is a stream of serial data flowing on the bus, but there's not
> representation of those 'live' data which can be accessed as those
> data are not 'memory' yet.

Buffers certainly don't "live" on a CPU. They are stored in Memory. That
memory can be CPU accessible, or it might be only accessible by a
dma-controller or specific peripheral.

There's also a different type of 'buffer' which is *not* stored in
memory. That is the case with inline ISPs. (Yes, the ISP likely has some
internal buffering in that case) ...

In this instance - the host CPU isn't even aware of the presence of a
'buffer' or memory as the pixel data could be fed directly into the next
hardware block - without being written to memory ('DDR').

This would be the case for any image data between the sensor and the
CSI2 receiver. All of that happens 'inline' and flows into the CSI2
reciever.

However the CSI2 receiver can then do one of two things ... it could ...
  - flow those pixels 'immediately' into an ISP (inline ISP)
  - Write them into a buffer, and make the whole buffer available to the
    next component.

So a FrameBuffer is not suitable or possible to represent pixel data or
frames between the Sensor and CSI2 receiver (MIPI controller), or any
interactions between a CSI2 receiver and an inline ISP.

A FrameBuffer is only able to describe frames which gets stored in
memory, and can be passed between components.

> It's rather the CSI-2 receiver which, depending on the SoC
> architecture, buffers those data and transfers them in chunks in some
> DMA accessible memory areas which are later accessible to application
> as a memory mapped cpu-address or a dmabuf descriptor.
> 
> The above "DMA accessible memory area" is where the buffers are
> actually allocated and again it really depends on what the application
> does.
> 
> >
> >
> > Another problem is the FrameBuffer::cancel() method. It says:
> >
> > > Marks the buffer as cancelled.  If a buffer is not used by a
> > > request, it shall be marked as cancelled to indicate that the
> > > metadata is invalid.
> >
> > I'm missing the intention/purpose of this. Why do I want to indicate
> > the metadata is invalid? What can bring metadata to invalidity? I
> > can imagine how it works (some flag), but I also don't know why it
> > changes based on request. What if the buffer *is* used on request?
> > Will the call get silently ignored?
> 
> I guess the Metadata being cancelled implies the whole frame is
> cancelled. I think we wanted to expose that through the Metadata, but
> maybe it's confusing.

'FrameBuffers' can be queued to hardware, and there may be for instance
4 FrameBuffers waiting to be used by an ISP to write processed image
data into.

If the pipeline is stopped, those buffers that are not yet used do not
contain any data (or perhaps only half a buffer if one was stopped while
processing?).

In this event, the buffers are returned from the kernel drivers, and
they are not 'valid' - they have been 'cancelled'.

The cancel method is there to indicate that on the object itself, so
that when an application gets the FrameBuffer back - it can know if the
buffer contains a usable picture, and metadata (such as timestamp) which
is valid - or - if it would be meaningless to read from any of the
fields.

> > It's also not clear if the FrameBuffer is focused on the memory
> > allocation (mutable place to put data, reusable), or on the data
> > itself (no need to be mutable, used once). I suspect the former, but
> > it could be spelled out.
> 
> FrameBuffer is just a representation of a memory area in form of of a
> set of dmabuf descriptors with an associated length.

Yes, a FrameBuffer is about the specific piece of memory allocation, and
can be reused for storing/passing multipe image data.

But it's important to remember that the FrameBuffer object is just a
'description' of that memory - it is not the allocation itself. So there
is nothing preventing us (except inefficiencies) from having multiple
FrameBuffers that might 'point' to the exact same image. Or they could
be created, and destroyed - without affecting the data beneath (as long
as someone else, like the application or allocator also keeps a
reference to the dma-buf handles).

So ... a FrameBuffer is 'our' description of a memory buffer that
represents a Frame. But ... a Frame is not always some 'pixel data' or
an image. It could also be a parameter buffer, or metadata or statistics
from the ISP. Those buffers are also passed around using the same
'FrameBuffer' type.

> > * * *
> >
> > Now, on to the parts that I can't get through: allocateBuffers,
> > importBuffers, exportBuffers, and releaseBuffers.
> >
> > I know what "allocate" means, but when I try to read the
> > description, I'm immediately confused:
> >
> > > This function wraps buffer allocation with the V4L2 MMAP memory
> > > type.
> >
> > I'm not sure what "V4L2 memory types" are, somehow I managed to
> > avoid coming across them. The next sentence is better:
> 
> I'm afraid there's no other suggestion I can give if not reading about
> REQBUFS and the there described memory mapped, user pointer or DMABUF
> based I/O methods.
> 
> https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/vidioc-reqbufs.html
> 
> The class you're looking at models a V4L2 device so it tighgly coupled
> with the V4L2 infrastructure and unfortunately requires some
> background knowledge about the V4L2 internals.
> 
> When it comes to documentation, there's no point in us trying to
> explain again here what is already documented in the proper place.
> Also adding a link would not be enough imho, as we would need plenty
> of them :)

Maybe we need a glossary of terms with related links to external
documentation somewhere?

> > > It requests count buffers from the driver, allocating the
> > > corresponding memory, and exports them as a set of FrameBuffer
> > > objects in buffers.
> >
> > I'm not really sure about where the allocations take place, but I'm
> > reading that as "this allocates `count` buffers on the device, and
> > makes them accessible to the CPU (to DMA?) via FrameBuffer objects.
> > The FrameBuffer objects are appended to the vector `buffers`".
> >
> > Next is "export", which is like "allocate", but:
> >
> > > Unlike allocateBuffers(), this function leaves the driver's
> > > internal buffer management uninitialized.
> >
> > Together, the purpose/intention is missing: when should one be
> > chosen and when the other? Why do driver internals matter? Maybe
> > it's worth to link some external resource here.
> 
> A few requisites: - V4L2 buffer orphaning (in the REQBUFS
> documentation) - V4L2 expbuf IOCTL
> https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/vidioc-expbuf.html
> - Laurent's cover letter patch series that introduced used of orphaned
> buffers
> https://lists.libcamera.org/pipermail/libcamera-devel/2020-March/007224.html
> 
> For you above question about when to use one or the other, for each
> function in the below pasted documentation there is a pragraph that
> explains the intended use case. How can those paragraph be clarified
> in your opinion ?
> 
> 
>  * - The allocateBuffers() function wraps buffer allocation with the
>  V4L2 MMAP *   memory type. It requests buffers from the driver,
>  allocating the *   corresponding memory, and exports them as a set of
>  FrameBuffer objects.  *   Upon successful return the driver's
>  internal buffer management is *   initialized in MMAP mode, and the
>  video device is ready to accept *   queueBuffer() calls.  * *   This
>  is the most traditional V4L2 buffer management, and is mostly useful
>  *   to support internal buffer pools in pipeline handlers, either for
>  CPU *   consumption (such as statistics or parameters pools), or for
>  internal *   image buffers shared between devices.
> 
>  [in example this last sentence could be changed imho, as for internal
>  buffer pools allocating is not enough as one side would need to
>  export (== allocate, export and orhpan) + import (== set the queue in
>  dmabuf mode) and the other side to import]
> 
>  * * - The exportBuffers() function operates similarly to
>  allocateBuffers(), but *   leaves the driver's internal buffer
>  management uninitialized. It uses the *   V4L2 buffer orphaning
>  support to allocate buffers with the MMAP method, *   export them as
>  a set of FrameBuffer objects, and reset the driver's *   internal
>  buffer management. The video device shall be initialized with *
>  importBuffers() or allocateBuffers() before it can accept
>  queueBuffer() *   calls. The exported buffers are directly usable
>  with any V4L2 video device *   in DMABUF mode, or with other dmabuf
>  importers.  * *   This method is mostly useful to implement buffer
>  allocation helpers or to *   allocate ancillary buffers, when a V4L2
>  video device is used in DMABUF *   mode but no other source of
>  buffers is available. An example use case *   would be allocation of
>  scratch buffers to be used in case of buffer *   underruns on a video
>  device that is otherwise supplied with external *   buffers.  * * -
>  The importBuffers() function initializes the driver's buffer
>  management to *   import buffers in DMABUF mode. It requests buffers
>  from the driver, but *   doesn't allocate memory. Upon successful
>  return, the video device is ready *   to accept queueBuffer() calls.
>  The buffers to be imported are provided to *   queueBuffer(), and may
>  be supplied externally, or come from a previous *   exportBuffers()
>  call.  * *   This is the usual buffers initialization method for
>  video devices whose *   buffers are exposed outside of libcamera. It
>  is also typically used on one *   of the two video device that
>  participate in buffer sharing inside *   pipelines, the other video
>  device typically using allocateBuffers().  * * - The releaseBuffers()
>  function resets the driver's internal buffer *   management that was
>  initialized by a previous call to allocateBuffers() or *
>  importBuffers(). Any memory allocated by allocateBuffers() is freed.
>  *   Buffer exported by exportBuffers() are not affected by this
>  function.
> 
> >
> > About "import", I would have guessed that it works in reverse to
> > export, that is: takes buffers on the CPU, and makes them accessible
> > to the device. Then it would take a collection of buffers, but
> > instead it takes... a number? I'm unable to make any sense out of
> > this alone, not the intention, not how it works, not when to use it.
> 
> Hopefully reading how REQBUFS in V4L2_DMABUF mode works will help
> 
> Understanding also how the FrameBufferAllocator works might help
> understanding the 'allocate-export-import' pattern

I think the part that is hard to grasp is also the fact that there is a
'buffer' object for V4L2, as well as a dma-buf handle, and our
FrameBuffer object.

When a buffer is allocated by V4L2 - the V4L2 'buffer object' is
allocated, and we get the information we need to use the memory 'the
fd/FrameBuffer construction'.

But after that, the V4L2 buffer object doesn't have to be used for the
same 'memory'.

We try to make sure that regardless of if the buffers are allocated on
the V4L2 device, or if they come from external allocators, the importing
is the same. But as the V4L2 device isn't actually allocating anything
at the 'import' time - it only needs to know how many 'V4L2 Buffer'
instances it needs.

I've recently learned that there's a maximum of 32, and that these are
really cheap - so I'm thinking about sending a patch to remove the need
for 'importing' altogether. I think the V4L2 device should just prepare
itself for handling up to '32' buffers.

That's a bit of a digression, I hope it doesn't complicate the
discussion.

> > "release" is also somewhat confusing, in that it's not a dual to
> > "allocate", in that it seems to release ~all buffers associated with
> > the device. What happens if those buffers are used anyway?
> 
> Again from the V4L2 REQBUF ioctl documentation
> 
> Applications can call ioctl VIDIOC_REQBUFS again to change the number
> of buffers. Note that if any buffers are still mapped or exported via
> DMABUF, then ioctl VIDIOC_REQBUFS can only succeed if the
> V4L2_BUF_CAP_SUPPORTS_ORPHANED_BUFS capability is set. Otherwise ioctl
> VIDIOC_REQBUFS will return the EBUSY error code. If
> V4L2_BUF_CAP_SUPPORTS_ORPHANED_BUFS is set, then these buffers are
> orphaned and will be freed when they are unmapped or when the exported
> DMABUF fds are closed. A count value of zero frees or orphans all
> buffers, after aborting or finishing any DMA in progress, an implicit
> VIDIOC_STREAMOFF.
> 
> >
> >
> > I haven't found a more general guide to buffers: when to allocate
> > them, when to reuse them, how to pass them between devices, who owns
> > FrameBuffer objects, etc. I think having something like this would
> > make first encounters go smoothly.
> 
> I'm afraid we've tried but not had been able to completely isolate the
> V4L2 specificites implemented by the V4L2VideoDevice class with a more
> general description of FrameBuffer.
> 
> If you're not interested in dealing with the V4L2 internals you
> shouldn't be required to understand what happens in the video device,
> but you should indeed be allowed to easily use the FrameBuffer class.
> 
> >
> > Overall, I don't want to come out as a complainer :) So in exchange
> > for an explanation over email, I offer including what I learn in the
> > docs.
> 
> Absolutely, that's helpful feedback and I hope the references here
> help your understanding of the background and lead to a better
> documentation of the FrameBuffer class.

Seconded. This isn't complaining at all. It's valuable review and input
from my perspective.

Kieran