<div dir="ltr"><div dir="ltr">Hi everyone,</div><div dir="ltr"><div><br></div></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 26 Apr 2021 at 03:40, Laurent Pinchart <<a href="mailto:laurent.pinchart@ideasonboard.com">laurent.pinchart@ideasonboard.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Nícolas,<br>
<br>
Thank you for the patch.<br>
<br>
On Wed, Apr 21, 2021 at 01:51:36PM -0300, Nícolas F. R. A. Prado wrote:<br>
> The QueueDepth property reports the minimum amount of requests needed in<br>
> the camera pipeline.<br>
> <br>
> Signed-off-by: Nícolas F. R. A. Prado <<a href="mailto:nfraprado@collabora.com" target="_blank">nfraprado@collabora.com</a>><br>
> ---<br>
> src/libcamera/pipeline/ipu3/ipu3.cpp | 4 ++++<br>
> src/libcamera/pipeline/raspberrypi/raspberrypi.cpp | 4 ++++<br>
> src/libcamera/pipeline/rkisp1/rkisp1.cpp | 2 ++<br>
> src/libcamera/pipeline/rkisp1/rkisp1_path.h | 4 ++--<br>
> src/libcamera/pipeline/simple/simple.cpp | 6 ++++--<br>
> src/libcamera/pipeline/uvcvideo/uvcvideo.cpp | 2 ++<br>
> src/libcamera/pipeline/vimc/vimc.cpp | 3 +++<br>
> src/libcamera/property_ids.yaml | 5 +++++<br>
> 8 files changed, 26 insertions(+), 4 deletions(-)<br>
> <br>
> diff --git a/src/libcamera/pipeline/ipu3/ipu3.cpp b/src/libcamera/pipeline/ipu3/ipu3.cpp<br>
> index 51446fcf5bc1..6067db2f37a3 100644<br>
> --- a/src/libcamera/pipeline/ipu3/ipu3.cpp<br>
> +++ b/src/libcamera/pipeline/ipu3/ipu3.cpp<br>
> @@ -1049,6 +1049,10 @@ int PipelineHandlerIPU3::registerCameras()<br>
> /* Initialize the camera properties. */<br>
> data->properties_ = cio2->sensor()->properties();<br>
> <br>
> + /* TODO This can be changed to CIO2 after configuration, but<br>
> + * both are 4 currently */<br>
<br>
/*<br>
* \todo This can be changed to CIO2 after configuration, but<br>
* both are 4 currently<br>
*/<br>
<br>
The \todo is doxygen style and is used through the code base. The<br>
placement of /* */ is our standard coding style. Same below.<br>
<br>
> +<br>
> +<br>
> + data->properties_.set(properties::QueueDepth, 4);<br>
<br>
I think you can already lower the value, to 1 or 2 depending on the<br>
definition of the property (see below).<br>
<br>
> +<br>
> ret = initControls(data.get());<br>
> if (ret)<br>
> continue;<br>
> diff --git a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp<br>
> index 2a917455500f..8d1ade3a4352 100644<br>
> --- a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp<br>
> +++ b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp<br>
> @@ -1035,6 +1035,10 @@ bool PipelineHandlerRPi::match(DeviceEnumerator *enumerator)<br>
> /* Initialize the camera properties. */<br>
> data->properties_ = data->sensor_->properties();<br>
> <br>
> + /* TODO Can be 1, 2 or 4 depending on configuration, for now use the max<br>
> + * which is 4 */<br>
> + data->properties_.set(properties::QueueDepth, 4);<br>
<br>
Same here, Naush commented that this should be 1, but depending on how<br>
we define the control, 2 could be a better value.<br>
<br>
> +<br>
> /*<br>
> * Set a default value for the ScalerCropMaximum property to show<br>
> * that we support its use, however, initialise it to zero because<br>
> diff --git a/src/libcamera/pipeline/rkisp1/rkisp1.cpp b/src/libcamera/pipeline/rkisp1/rkisp1.cpp<br>
> index 549f4a4e61a8..7d876e9387d7 100644<br>
> --- a/src/libcamera/pipeline/rkisp1/rkisp1.cpp<br>
> +++ b/src/libcamera/pipeline/rkisp1/rkisp1.cpp<br>
> @@ -21,6 +21,7 @@<br>
> #include <libcamera/ipa/core_ipa_interface.h><br>
> #include <libcamera/ipa/rkisp1_ipa_interface.h><br>
> #include <libcamera/ipa/rkisp1_ipa_proxy.h><br>
> +#include <libcamera/property_ids.h><br>
> #include <libcamera/request.h><br>
> #include <libcamera/stream.h><br>
> <br>
> @@ -940,6 +941,7 @@ int PipelineHandlerRkISP1::createCamera(MediaEntity *sensor)<br>
> <br>
> /* Initialize the camera properties. */<br>
> data->properties_ = data->sensor_->properties();<br>
> + data->properties_.set(properties::QueueDepth, RKISP1_BUFFER_COUNT);<br>
<br>
Here 3 would be a better value, to match the driver. However, I wonder<br>
if the hardware really needs 3 buffers. Dafna, Helen, do you have<br>
information about this ? A quick glance at the driver makes me think we<br>
could lower the number of buffers to 2. Lowering it to 1 may be nice,<br>
but probably more tricky.<br>
<br>
> <br>
> /*<br>
> * \todo Read dealy values from the sensor itself or from a<br>
> diff --git a/src/libcamera/pipeline/rkisp1/rkisp1_path.h b/src/libcamera/pipeline/rkisp1/rkisp1_path.h<br>
> index 3b3e37d258d0..7540dd41ad84 100644<br>
> --- a/src/libcamera/pipeline/rkisp1/rkisp1_path.h<br>
> +++ b/src/libcamera/pipeline/rkisp1/rkisp1_path.h<br>
> @@ -26,6 +26,8 @@ class V4L2Subdevice;<br>
> struct StreamConfiguration;<br>
> struct V4L2SubdeviceFormat;<br>
> <br>
> +static constexpr unsigned int RKISP1_BUFFER_COUNT = 4;<br>
> +<br>
> class RkISP1Path<br>
> {<br>
> public:<br>
> @@ -56,8 +58,6 @@ public:<br>
> Signal<FrameBuffer *> &bufferReady() { return video_->bufferReady; }<br>
> <br>
> private:<br>
> - static constexpr unsigned int RKISP1_BUFFER_COUNT = 4;<br>
> -<br>
> const char *name_;<br>
> bool running_;<br>
> <br>
> diff --git a/src/libcamera/pipeline/simple/simple.cpp b/src/libcamera/pipeline/simple/simple.cpp<br>
> index f6095d38e97a..6ee24f2f14e8 100644<br>
> --- a/src/libcamera/pipeline/simple/simple.cpp<br>
> +++ b/src/libcamera/pipeline/simple/simple.cpp<br>
> @@ -22,6 +22,7 @@<br>
> #include <linux/media-bus-format.h><br>
> <br>
> #include <libcamera/camera.h><br>
> +#include <libcamera/property_ids.h><br>
> #include <libcamera/request.h><br>
> #include <libcamera/stream.h><br>
> <br>
> @@ -141,6 +142,8 @@ static const SimplePipelineInfo supportedDevices[] = {<br>
> <br>
> } /* namespace */<br>
> <br>
> +static constexpr unsigned int kNumInternalBuffers = 3;<br>
> +<br>
> class SimpleCameraData : public CameraData<br>
> {<br>
> public:<br>
> @@ -238,8 +241,6 @@ protected:<br>
> int queueRequestDevice(Camera *camera, Request *request) override;<br>
> <br>
> private:<br>
> - static constexpr unsigned int kNumInternalBuffers = 3;<br>
> -<br>
<br>
Is this change needed ?<br>
<br>
> SimpleCameraData *cameraData(const Camera *camera)<br>
> {<br>
> return static_cast<SimpleCameraData *>(<br>
> @@ -424,6 +425,7 @@ int SimpleCameraData::init()<br>
> }<br>
> <br>
> properties_ = sensor_->properties();<br>
> + properties_.set(properties::QueueDepth, kNumInternalBuffers);<br>
> <br>
> return 0;<br>
> }<br>
> diff --git a/src/libcamera/pipeline/uvcvideo/uvcvideo.cpp b/src/libcamera/pipeline/uvcvideo/uvcvideo.cpp<br>
> index b6c6ade5ebaf..591f46b60d23 100644<br>
> --- a/src/libcamera/pipeline/uvcvideo/uvcvideo.cpp<br>
> +++ b/src/libcamera/pipeline/uvcvideo/uvcvideo.cpp<br>
> @@ -525,6 +525,8 @@ int UVCCameraData::init(MediaDevice *media)<br>
> properties_.set(properties::PixelArraySize, resolution);<br>
> properties_.set(properties::PixelArrayActiveAreas, { Rectangle(resolution) });<br>
> <br>
> + properties_.set(properties::QueueDepth, 4);<br>
<br>
You can set this to 1 or 2 (again, see below).<br>
<br>
> +<br>
> /* Initialise the supported controls. */<br>
> ControlInfoMap::Map ctrls;<br>
> <br>
> diff --git a/src/libcamera/pipeline/vimc/vimc.cpp b/src/libcamera/pipeline/vimc/vimc.cpp<br>
> index 8f5f4ba30953..605b3fe89152 100644<br>
> --- a/src/libcamera/pipeline/vimc/vimc.cpp<br>
> +++ b/src/libcamera/pipeline/vimc/vimc.cpp<br>
> @@ -20,6 +20,7 @@<br>
> #include <libcamera/formats.h><br>
> #include <libcamera/ipa/ipa_interface.h><br>
> #include <libcamera/ipa/ipa_module_info.h><br>
> +#include <libcamera/property_ids.h><br>
> #include <libcamera/request.h><br>
> #include <libcamera/stream.h><br>
> <br>
> @@ -516,6 +517,8 @@ int VimcCameraData::init()<br>
> /* Initialize the camera properties. */<br>
> properties_ = sensor_->properties();<br>
> <br>
> + properties_.set(properties::QueueDepth, 4);<br>
<br>
And here, 2 should be enough.<br>
<br>
> +<br>
> return 0;<br>
> }<br>
> <br>
> diff --git a/src/libcamera/property_ids.yaml b/src/libcamera/property_ids.yaml<br>
> index 104e9aaf4fa3..0b7d1271a26b 100644<br>
> --- a/src/libcamera/property_ids.yaml<br>
> +++ b/src/libcamera/property_ids.yaml<br>
> @@ -678,6 +678,11 @@ controls:<br>
> \todo Turn this property into a "maximum control value" for the<br>
> ScalerCrop control once "dynamic" controls have been implemented.<br>
> <br>
> + - QueueDepth:<br>
> + type: int32_t<br>
> + description: |<br>
> + Minimum amount of requests needed in the camera pipeline.<br>
<br>
I think this needs to be better defined.<br>
<br>
First of all, the name QueueDepth doesn't really relate to concepts that<br>
are explicitly defined in libcamera. We should thus either define those<br>
concepts, or, if we want to define the property in a self-contained way,<br>
change the name to match the description of the property. This could for<br>
instance become MinNumRequests.<br>
<br>
Then, we should also expand the definition of the control. This requires<br>
a bit of design work. What do we want to convey here ? Let's start the<br>
brainstorming session, which means you can challenge anything I'll<br>
propose here, and present any new idea.<br>
<br>
There are multiple reasons why we need a minimum number of requests<br>
queued:<br>
<br>
- Fulfilling driver requirements, which are related to hardware<br>
requirements.<br>
<br>
It is common for video capture hardware to require multiple buffers to<br>
be queued to the hardware. A very common situation (mostly seen in<br>
low-end hardware) is to have two buffer address registers for the<br>
capture DMA engine, and ping-pong between the two of them. The<br>
hardware in that case can't deal with buffer underruns, so the driver<br>
needs to supply two buffers before capture can be started. The driver<br>
is also limited by the hardware when it comes to buffer completion.<br>
When a frame capture complete, a new buffer is needed to replace the<br>
completed one. Drivers usually hold on the completed buffer if no new<br>
buffer is available, so that the DMA engine will always have two<br>
buffers available. This means that queueing two buffers would start<br>
the capture, and a third buffer is needed for the driver to send<br>
completed buffers back to userspace. A similar situation occurs when<br>
the hardware uses shadow registers instead of an explicit ping-pong<br>
mechanism.<br>
<br>
There are several things that drivers can (and should) do to minimize<br>
the impact on userspace. One of them would be to allocate a scratch<br>
buffer to be used when userspace doesn't provide buffers fast enough.<br>
This is costly in terms of memory. Userspace is in a better position<br>
to handle scratch buffers, and skip them if it can guarantee that<br>
there will be no buffer queue underrun. However, when the device is<br>
behind an IOMMU, a single physical page can be allocated and mapped to<br>
the device as a large DMA area that spans a full frame, which makes<br>
scratch buffers cheap if handled by the kernel.<br>
<br>
Another contingency measure is possible in the ping-pong case. If the<br>
hardware requires buffers A and B to be available, capture can start<br>
immediately when two buffers (let's call them 1 and 2) are queued.<br>
When capture to A completes, the driver needs to reprogram the address<br>
of buffer A. If a third buffer is available in the queue at that time,<br>
it will use it. Otherwise, it could reuse buffer 2 to program address<br>
A. This means that, when capture to B completes, that frame will be<br>
dropped, as the buffer will be reused immediately for A. This makes it<br>
possible to dequeue buffers with only two buffers queued instead of<br>
three. Using less than three buffers would however still result in<br>
frames being dropped.<br>
<br>
Note that these constraints should be applicable to inline hardware<br>
only. For offline ISPs (memory-to-memory), the hardware shouldn't have<br>
such constraints, and should be fine with starting processing as soon<br>
as one buffer is available, and to stop processing until a new buffer<br>
is available in case of a buffer queue underrun. That's at least the<br>
case for sane hardware, and we may not be able to completely exclude<br>
insanity on the hardware side.<br>
<br>
- Peeking ahead in the request parameters.<br>
<br>
It can take time for some parameters to be applied. For instance, it's<br>
quite typical for the camera sensor's exposure time and gain to<br>
require 2 and 1 frames to be applied. This means that, to guarantee<br>
proper function of per-frame control, we need to have parameters<br>
available in advance. When running with auto-exposure enabled this may<br>
be less of a problem for the exposure time and gain, but if the<br>
application wants to control those parameters manually, we need enough<br>
requests in the queue.<br>
<br>
Generally speaking, a camera pipeline can be composed of multiple<br>
processing stages. A typical case has a camera sensor connected to an<br>
inline ISP (or just a CSI-2 receiver) to capture raw images, followed<br>
by an offline (memory-to-memory) ISP to process raw frames and output<br>
YUV. The result can then be encoded to JPEG by a memory-to-memory<br>
encoder (this is only relevant for the Android camera HAL<br>
implementation, the libcamera core doesn't deal with JPEG encoding).<br>
While frame N is being encoded to JPEG, frame N+1 is being processed<br>
by the offline ISP, frame N+2 is being read out from the sensor (and<br>
possibly processed by the inline ISP), and frame N+3 may be exposed.<br>
Even if the hardware doesn't require more than one buffer to be queued<br>
to function, if request N contains parameters for the camera sensor,<br>
for the offline ISP, and for the JPEG encoder, those parameters need<br>
to be available to libcamera three frames ealier to be applied at the<br>
right time.<br>
<br>
Note that some processing steps could in theory take more than the<br>
time of one frame to complete, if they're pipelined at the hardware<br>
level, or if they can be run in parallel for multiple frames (for<br>
instance we could have two JPEG encoders that take the time of one<br>
frame and a half to encode a frame). I'm not sure if this would impact<br>
what we need to expose to applications.<br>
<br>
The question is how to translate this to requirements exposed to<br>
applications. Going back to the IPU3 and RPi pipeline handlers, both<br>
have the same hardware architecture, with a CSI-2 receiver capturing to<br>
memory, and an offline ISP. The hardware and drivers will operate with a<br>
single buffer, but the inline part will drop frames if we don't have at<br>
least two buffers for raw frames (which may be internal buffers if the<br>
application only captures YUV streams, or application-provided buffers<br>
if the application also captures a RAW stream). A minimum of 1 request<br>
will be able to capture frames in suboptimal conditions. 2 requests<br>
would be needed to avoid dropping frames, and 3 requests to ensure<br>
manual exposure time can be applied early enough (a 4th request would<br>
then be needed to handle the JPEG encoding, but as that's for the<br>
Android camera HAL implementation only, the libcamera core doesn't need<br>
to consider it). Note that, if IPA algorithms are available, we could<br>
possibly get away with less requests as the IPA could compute parameters<br>
earlier, but we'd be screwed if a request disables algorithms, which we<br>
can't really predict.<br>
<br>
Let's also note that this doesn't mean that we need to have 3 or 4<br>
buffers available, the buffers are only needed on the output side of the<br>
pipeline, later than the parameters, but we currently have no way to<br>
decouple buffers and controls.<br>
<br>
What should we report in such a case ? I think we need to provide enough<br>
information for applications to implement the optimal case (no frame<br>
dropped, controls applied on the right frame, ...). We probably also<br>
want to let applications know what the bare minimum is in order to<br>
capture frames in suboptimal conditions. For instance, an application<br>
may want to capture frames with a very low frame rate, or even capture a<br>
single snapshot, and it would probably be overkill to require queuing<br>
more than one request in that case. That leaves open the question of how<br>
to deal with drivers that need more than one buffer to operate (the<br>
non-IOMMU case above). Do we want to support them or can we draw the<br>
line there (I think we should support them, unfortunately) ? Do we want<br>
to allocate scratch buffers in the pipeline handler to make this<br>
completely transparent to the application, which would result in a waste<br>
of memory if the application provides us with enough buffers at all<br>
times (and would also give us "interesting" problems to solve, such as<br>
deciding when to queue a scratch buffer at start time) ? In many cases<br>
the IPA needs to run for a few frames before the algorithms stabilize,<br>
which would require multiple requests to be queued anyway (a single<br>
snapshot use case could be implemented with a helper on top of<br>
libcamera), but if an application disables all algorithms and provides<br>
manual values for all controls, it could be feasible to capture a proper<br>
snapshot with a single buffer and a single request.<br>
<br>
Naush, David, your input would be welcome here, both on the explanation<br>
of the issue (are my understanding and explanation correct ?), and on<br>
the questions that follow.<br></blockquote><div><br></div><div>I think Laurent's explanation provides a comprehensive summary of what</div><div>this control might need to encompass. To provide a bit more context, for</div><div>Raspberry Pi, our CSI2 device driver (Unicam) does use a scratch buffer</div><div>if a framebuffer is not provided by the application. This scratch buffer is</div><div>16k (single page) in size, as our hardware allows us to truncate or run in</div><div>circular buffer mode. I intentionally left allocation and control of the scratch</div><div>buffer within the kernel driver as buffer handling is driven by the device</div><div>interrupt handler for timing guarantees. Given that Unicam writes out to</div><div>memory, and our ISP is purely a memory to memory device, I have suggested</div><div>our minimum queue depth be set to 1 for guaranteed operation.</div><div><br></div><div>Having said that, there is an interesting point raised by Laurent about manual</div><div>exposure and gain controls. I believe we have always said that these</div><div>controls don't need to be consumed by the buffer returned in the Request</div><div>that set them (i.e. the gain/exposure updates can return N frames later),</div><div>is that still correct? Of course, setting manual exposure/gain before starting</div><div>streaming would (should!) guarantee the buffer returned in the Request</div><div>will have the correct values set.</div><div><br></div><div>In my opinion, I don't see much value in tying this QueueDepth property with</div><div>sensor delays - which can and will be different per sensor. If an application</div><div>has requested manual exposure/gain values *while in a streaming state*</div><div>it might just have to scan the metadata on each returned framebuffer until</div><div>it has the frame with the values it requested. This can be easily done in</div><div>it's request/return loop without needing to know the frame delays, so no</div><div>need to adjust QueueDepth for this.</div><div><br></div><div>Regards,</div><div>Naush</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Nícolas, don't let this scare you, you don't have to solve this issue as<br>
part of this patch series :-) We can merge it as a first step, and<br>
continue working on top. I would however like this patch to define the<br>
QueueDepth property a bit better. Based on the above, you can decide<br>
what you want to represent with QueueDepth, and document and name it<br>
accordingly. If you're not sure, we can of course discuss it.<br>
<br>
> +<br>
> # ----------------------------------------------------------------------------<br>
> # Draft properties section<br>
> <br>
<br>
-- <br>
Regards,<br>
<br>
Laurent Pinchart<br>
</blockquote></div></div>