<div dir="ltr"><div dir="ltr">Hi all,<div><br></div><div>Few thoughts for me below.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 16 Dec 2022 at 09:05, Jacopo Mondi via libcamera-devel <<a href="mailto:libcamera-devel@lists.libcamera.org">libcamera-devel@lists.libcamera.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Umang<br>

   thanks for starting the thread.<br>

<br>

On Fri, Dec 16, 2022 at 11:30:39AM +0530, Umang Jain via libcamera-devel wrote:<br>

> Hi everyone,<br>

><br>

> Goal:<br>

> I am kicking-off a thread to discuss the Request queuing behaviour that can<br>

> be expected by libcamera  from application's side. In the recent<br>

> discussions, we have found that we don't yet advertise or document this<br>

> aspect explicitly. Hence, it would be helpful to do so, especially it helps<br>

> with internal plumbing of requests routing through pipeline-handlers(s), IPA<br>

> and various requests holding placeholders for state tracking (queues,<br>

> arrays, FCQueue etc.)  or even post-processing.<br>

><br>

> Assumptions up till this point:<br>

> Atleast what _I_ have assumed up until this point is that applications queue<br>

> a certain number of requests (let's say 4) and then continuing queuing the<br>

> request from the `requestCompleted` callback handler (slot). I see this<br>

> behaviour implemented in our `libcamera/src/apps`. This assumption<br>

> inherently suggests that these applications, are not trying to over-queue to<br>

> the hardware hence, respecting a certain depth of the hardware pipeline.<br>

> They makes sure new requests get queued when previous ones gets completed.<br>

> Other argument can be that, reuse of Request objects are cheaper than<br>

> constructing a new one for each queueRequest() and queuing all at once.<br>

><br>

> Defining the explicit behaviour by applications:<br>

> Note, in the last segment - "Assumption". We do not explicitly say that the<br>

> applications should queue the requests in a certain manner.<br>

><br>

<br>

I agree this is the canonical way, but it's certainly not the only one<br>

<br>

> Hence, it is time to defining our expectations on what applications are<br>

> allowed or not allowed to do, with respect of queuing requests. Below are a<br>

> few documentation headers to move the discussion forwards.<br>

><br>

> - Exposing the hardware capabilities<br>

> Should libcamera expose the underlying hardware capabilities/constraints to<br>

> applications and expect them to rate-limit the queuing of requests based?<br>

><br></blockquote><div><br></div><div>An application ought to know to submit a minimum number of requests in order to<br>get the underlying hardware to generate an output stream.  However, I don't feel<br>that an application should constrain itself on the maximum number of requests.<br>This ought to be done in the framework, just like your patch has done.  This<br>sounds like what you describe in your "No limit" section (*) below?</div><div><br></div><div>Additionally, my example of the application queueing 20 in-flight requests and<br>then just waiting for completion without recycling any further requests may<br>sound a bit contrived, but it does highlight that an application is free to<br>do it, and chances are one of them will!  If we want to make restrictions here,<br>it must be baked into the API.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Let's start by clearly enumerating these, to see how we handle them<br>

today.<br>

<br>

I would start by listing the requirements of the current libcamera<br>

backend (V4L2) and see how these have surfaced to the current API and<br>

what we might want to change.<br>

<br>

<br>

1) Startup:<br>

<br>

   V4L2 video devices have a requirement of having a min number of buffers<br>

   queued before streaming is started. The requirement comes from the<br>

   need of having enough memory buffer available to sustain memory<br>

   transfer and data capture operations.<br>

<br>

   The number of min buffers queued is a platform-specific parameter<br>

<br>

   The only documentation I found is in the vb2 framework documentation<br>

   <a href="https://www.kernel.org/doc/html/latest/driver-api/media/v4l2-videobuf2.html?highlight=min_buffers_needed" rel="noreferrer" target="_blank">https://www.kernel.org/doc/html/latest/driver-api/media/v4l2-videobuf2.html?highlight=min_buffers_needed</a><br>

<br>

   libcamera currently queues buffers to the capture video devices<br>

   at queueRequest time only, hence applications need to queue enough<br>

   requests before streaming is actually started. The Camera state<br>

   machine demands that Camera::start() is called before any<br>

   Camera::queueRequest() can be called, but if applications do not<br>

   queue enough requests, streaming will never be started.<br>

<br>

   We currently do not have a clear way to express that, if not by<br>

   validating StreamConfiguration::bufferCount which is set by<br>

   pipelines to a known valid values for the platform, but nothing in<br>

   our API tells applications "you have to queue enough requests to<br>

   have the camera actually started" as far as I can see ?<br>

<br>

2) Runtime<br>

<br>

   Following the principle of "1 request - 1 buffer", which I guess we<br>

   aim to change in future, you can't queue more requests to a camera<br>

   than the number of buffers requested on a video device.<br>

<br>

   The number of requests buffers is again usually defined by<br>

   StreamConfiguration::buffersCount and it usually also dictates the<br>

   number of intermediate buffers the pipeline handler allocates<br>

   (think of the CIO2 buffers, or the ISP parameter buffers as<br>

   examples).<br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

   Please note this is an issue for application that allocates buffers<br>

   from elsewhere (ie a dmabuf-heap pool or on the display) and import<br>

   them in libcamera. If applications allocates buffers in the Camera<br>

   they will be limited to work with those buffers, and by definition,<br>

   they can't use more than what they have allocated.<br>

<br>

   It's interesting to notice how some pipeline had to take of that by<br>

   themselves<br>

<br>

   <a href="https://git.libcamera.org/libcamera/libcamera.git/tree/src/libcamera/pipeline/ipu3/ipu3.cpp#n853" rel="noreferrer" target="_blank">https://git.libcamera.org/libcamera/libcamera.git/tree/src/libcamera/pipeline/ipu3/ipu3.cpp#n853</a><br>

   89dae5844964 ("libcamera: pipeline: ipu3: Store requests in the case a buffer shortage")<br>

<br>

   Your work on throttling the number of requests in the base<br>

   PipelineHandler class according to the depth of the frame contexts<br>

   queue could be re-considered in this new context: up to bufferCount<br>

   requests can be queued -to the device- at the same time and the<br>

   PipelineHandler base class caches them on behalf of the pipeline<br>

   handlers. This will make StreamConfiguration::bufferCount a central<br>

   part of our API, and in that case the implications of chosing a<br>

   non-accurate bufferCount should be clearly documented.<br></blockquote><div><br></div>I've always been a bit unsure of the intended use of StreamConfiguration::buffersCount!<br>We set a value for each StreamRole based on what we think might be a "reasonable"<br>number of buffers for a particular use case.  The application is free to use this<br>number, or override with how many buffers it actually wants.<br><br>You could equally take the view that buffersCount gives the absolute minimum<br>number of buffers/requests in-flight for a stream to produce output.  But then<br>your stream will generate output, but might compromise on performance.  In the<br>RPi platform, StreamConfiguration::buffersCount == 1 in this case, but I would<br>never want an application to actually only ever use 1 request as we essentially<br>will run at half our intended rate.<br><br>Additionally, in the RPi platform, this number was also set/updated by the<br>application to provide guidance on the number of internal buffer allocations to<br>use for a given stream.  But that's been removed now, particularly with my<br>pending series at <a href="https://patchwork.libcamera.org/project/libcamera/list/?series=3663">https://patchwork.libcamera.org/project/libcamera/list/?series=3663</a><br><br>If buffersCount is used going forward, we probably need a stricter definition.<br>Also should mention, this field is marked to be removed with:<br><div><a href="https://patchwork.libcamera.org/patch/13470/">https://patchwork.libcamera.org/patch/13470/</a></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

   We can also say we expect applications to deal with that by<br>

   themselves, but that's not realistic at the moment, more over in<br>

   the perspective that libcamera will increasingly be used by other<br>

   frameworks and adaptation layers, and all of them should<br>

   re-implement the same safety mechanism.<br>

<br>

3) Stop<br>

<br>

   That's where we offer gurantees: requests will all be completed,<br>

   and will be completed in order. We don't expect applications to do<br>

   much here.<br>

<br>

Do you agree with my understaning on these parts ? Are there other<br>

aspects I have not considered ?<br>

<br>

All in all, I feel like we have one big topic here: we had the V4L2<br>

buffer handling requirments surfaced to our API, and it has surfaced<br>

not much in the API signature, but rather in how the library behaves<br>

with too many implicit behaviour and an assumption of a certain<br>

understanding of V4L2 from application developers.<br>

<br>

<br>

> - No limit<br>

> Advertise explicitly that there are no constraints in queuing Request to<br>

> libcamera *whatsoever*, from the applications side?<br>

><br>

> - Number of requests to be queued by the application, to get X number of<br>

> frames<br>

><br>

>         X   =    1 Frame<br>

>         X <=    frames less than processing blocks on a platform<br>

>         X   >    frames less than processing blocks on a platform<br>

<br>

I'm not sure I got this part :)<br></blockquote><div><br></div><div>* I think this "No limit" case matches my understanding of application constraints</div><div>described above?<br></div><div><br></div><div>Regards,</div><div>Naush</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

><br>

> Depending on the request queuing behaviour libcamera expects, I see that<br>

> documenting the aforementioned might be useful for applications developers<br>

> as well (maybe this goes in per-platform information?). lc-compliance can<br>

> incorporate checks for these as well.<br>

><br>

> These are just on top off my head, new discussions points are welcomed as<br>

> well.<br>

<br>

Let's start by clarifying how things work today and what implicit<br>

requirements we have today and idealy make a plan if our API should<br>

stabilize on the '1 request - 1 buffer' model.<br>

<br>

Thanks<br>

  j<br>

<br>

><br>

> Thanks,<br>

> Umang<br>

</blockquote></div></div>