[libcamera-devel] [PATCH v3 1/4] libcamera: property: Add QueueDepth property

Mon Apr 26 04:40:36 CEST 2021

Hi Nícolas,

Thank you for the patch.

On Wed, Apr 21, 2021 at 01:51:36PM -0300, Nícolas F. R. A. Prado wrote:
> The QueueDepth property reports the minimum amount of requests needed in
> the camera pipeline.
> 
> Signed-off-by: Nícolas F. R. A. Prado <nfraprado at collabora.com>
> ---
>  src/libcamera/pipeline/ipu3/ipu3.cpp               | 4 ++++
>  src/libcamera/pipeline/raspberrypi/raspberrypi.cpp | 4 ++++
>  src/libcamera/pipeline/rkisp1/rkisp1.cpp           | 2 ++
>  src/libcamera/pipeline/rkisp1/rkisp1_path.h        | 4 ++--
>  src/libcamera/pipeline/simple/simple.cpp           | 6 ++++--
>  src/libcamera/pipeline/uvcvideo/uvcvideo.cpp       | 2 ++
>  src/libcamera/pipeline/vimc/vimc.cpp               | 3 +++
>  src/libcamera/property_ids.yaml                    | 5 +++++
>  8 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/src/libcamera/pipeline/ipu3/ipu3.cpp b/src/libcamera/pipeline/ipu3/ipu3.cpp
> index 51446fcf5bc1..6067db2f37a3 100644
> --- a/src/libcamera/pipeline/ipu3/ipu3.cpp
> +++ b/src/libcamera/pipeline/ipu3/ipu3.cpp
> @@ -1049,6 +1049,10 @@ int PipelineHandlerIPU3::registerCameras()
>  		/* Initialize the camera properties. */
>  		data->properties_ = cio2->sensor()->properties();
>  
> +		/* TODO This can be changed to CIO2 after configuration, but
> +		 * both are 4 currently */

		/*
		 * \todo This can be changed to CIO2 after configuration, but
		 * both are 4 currently
		 */

The \todo is doxygen style and is used through the code base. The
placement of /* */ is our standard coding style. Same below.

> +
> +
> +		data->properties_.set(properties::QueueDepth, 4);

I think you can already lower the value, to 1 or 2 depending on the
definition of the property (see below).

> +
>  		ret = initControls(data.get());
>  		if (ret)
>  			continue;
> diff --git a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> index 2a917455500f..8d1ade3a4352 100644
> --- a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> +++ b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> @@ -1035,6 +1035,10 @@ bool PipelineHandlerRPi::match(DeviceEnumerator *enumerator)
>  	/* Initialize the camera properties. */
>  	data->properties_ = data->sensor_->properties();
>  
> +	/* TODO Can be 1, 2 or 4 depending on configuration, for now use the max
> +	 * which is 4 */
> +	data->properties_.set(properties::QueueDepth, 4);

Same here, Naush commented that this should be 1, but depending on how
we define the control, 2 could be a better value.

> +
>  	/*
>  	 * Set a default value for the ScalerCropMaximum property to show
>  	 * that we support its use, however, initialise it to zero because
> diff --git a/src/libcamera/pipeline/rkisp1/rkisp1.cpp b/src/libcamera/pipeline/rkisp1/rkisp1.cpp
> index 549f4a4e61a8..7d876e9387d7 100644
> --- a/src/libcamera/pipeline/rkisp1/rkisp1.cpp
> +++ b/src/libcamera/pipeline/rkisp1/rkisp1.cpp
> @@ -21,6 +21,7 @@
>  #include <libcamera/ipa/core_ipa_interface.h>
>  #include <libcamera/ipa/rkisp1_ipa_interface.h>
>  #include <libcamera/ipa/rkisp1_ipa_proxy.h>
> +#include <libcamera/property_ids.h>
>  #include <libcamera/request.h>
>  #include <libcamera/stream.h>
>  
> @@ -940,6 +941,7 @@ int PipelineHandlerRkISP1::createCamera(MediaEntity *sensor)
>  
>  	/* Initialize the camera properties. */
>  	data->properties_ = data->sensor_->properties();
> +	data->properties_.set(properties::QueueDepth, RKISP1_BUFFER_COUNT);

Here 3 would be a better value, to match the driver. However, I wonder
if the hardware really needs 3 buffers. Dafna, Helen, do you have
information about this ? A quick glance at the driver makes me think we
could lower the number of buffers to 2. Lowering it to 1 may be nice,
but probably more tricky.

>  
>  	/*
>  	 * \todo Read dealy values from the sensor itself or from a
> diff --git a/src/libcamera/pipeline/rkisp1/rkisp1_path.h b/src/libcamera/pipeline/rkisp1/rkisp1_path.h
> index 3b3e37d258d0..7540dd41ad84 100644
> --- a/src/libcamera/pipeline/rkisp1/rkisp1_path.h
> +++ b/src/libcamera/pipeline/rkisp1/rkisp1_path.h
> @@ -26,6 +26,8 @@ class V4L2Subdevice;
>  struct StreamConfiguration;
>  struct V4L2SubdeviceFormat;
>  
> +static constexpr unsigned int RKISP1_BUFFER_COUNT = 4;
> +
>  class RkISP1Path
>  {
>  public:
> @@ -56,8 +58,6 @@ public:
>  	Signal<FrameBuffer *> &bufferReady() { return video_->bufferReady; }
>  
>  private:
> -	static constexpr unsigned int RKISP1_BUFFER_COUNT = 4;
> -
>  	const char *name_;
>  	bool running_;
>  
> diff --git a/src/libcamera/pipeline/simple/simple.cpp b/src/libcamera/pipeline/simple/simple.cpp
> index f6095d38e97a..6ee24f2f14e8 100644
> --- a/src/libcamera/pipeline/simple/simple.cpp
> +++ b/src/libcamera/pipeline/simple/simple.cpp
> @@ -22,6 +22,7 @@
>  #include <linux/media-bus-format.h>
>  
>  #include <libcamera/camera.h>
> +#include <libcamera/property_ids.h>
>  #include <libcamera/request.h>
>  #include <libcamera/stream.h>
>  
> @@ -141,6 +142,8 @@ static const SimplePipelineInfo supportedDevices[] = {
>  
>  } /* namespace */
>  
> +static constexpr unsigned int kNumInternalBuffers = 3;
> +
>  class SimpleCameraData : public CameraData
>  {
>  public:
> @@ -238,8 +241,6 @@ protected:
>  	int queueRequestDevice(Camera *camera, Request *request) override;
>  
>  private:
> -	static constexpr unsigned int kNumInternalBuffers = 3;
> -

Is this change needed ?

>  	SimpleCameraData *cameraData(const Camera *camera)
>  	{
>  		return static_cast<SimpleCameraData *>(
> @@ -424,6 +425,7 @@ int SimpleCameraData::init()
>  	}
>  
>  	properties_ = sensor_->properties();
> +	properties_.set(properties::QueueDepth, kNumInternalBuffers);
>  
>  	return 0;
>  }
> diff --git a/src/libcamera/pipeline/uvcvideo/uvcvideo.cpp b/src/libcamera/pipeline/uvcvideo/uvcvideo.cpp
> index b6c6ade5ebaf..591f46b60d23 100644
> --- a/src/libcamera/pipeline/uvcvideo/uvcvideo.cpp
> +++ b/src/libcamera/pipeline/uvcvideo/uvcvideo.cpp
> @@ -525,6 +525,8 @@ int UVCCameraData::init(MediaDevice *media)
>  	properties_.set(properties::PixelArraySize, resolution);
>  	properties_.set(properties::PixelArrayActiveAreas, { Rectangle(resolution) });
>  
> +	properties_.set(properties::QueueDepth, 4);

You can set this to 1 or 2 (again, see below).

> +
>  	/* Initialise the supported controls. */
>  	ControlInfoMap::Map ctrls;
>  
> diff --git a/src/libcamera/pipeline/vimc/vimc.cpp b/src/libcamera/pipeline/vimc/vimc.cpp
> index 8f5f4ba30953..605b3fe89152 100644
> --- a/src/libcamera/pipeline/vimc/vimc.cpp
> +++ b/src/libcamera/pipeline/vimc/vimc.cpp
> @@ -20,6 +20,7 @@
>  #include <libcamera/formats.h>
>  #include <libcamera/ipa/ipa_interface.h>
>  #include <libcamera/ipa/ipa_module_info.h>
> +#include <libcamera/property_ids.h>
>  #include <libcamera/request.h>
>  #include <libcamera/stream.h>
>  
> @@ -516,6 +517,8 @@ int VimcCameraData::init()
>  	/* Initialize the camera properties. */
>  	properties_ = sensor_->properties();
>  
> +	properties_.set(properties::QueueDepth, 4);

And here, 2 should be enough.

> +
>  	return 0;
>  }
>  
> diff --git a/src/libcamera/property_ids.yaml b/src/libcamera/property_ids.yaml
> index 104e9aaf4fa3..0b7d1271a26b 100644
> --- a/src/libcamera/property_ids.yaml
> +++ b/src/libcamera/property_ids.yaml
> @@ -678,6 +678,11 @@ controls:
>          \todo Turn this property into a "maximum control value" for the
>          ScalerCrop control once "dynamic" controls have been implemented.
>  
> +  - QueueDepth:
> +      type: int32_t
> +      description: |
> +        Minimum amount of requests needed in the camera pipeline.

I think this needs to be better defined.

First of all, the name QueueDepth doesn't really relate to concepts that
are explicitly defined in libcamera. We should thus either define those
concepts, or, if we want to define the property in a self-contained way,
change the name to match the description of the property. This could for
instance become MinNumRequests.

Then, we should also expand the definition of the control. This requires
a bit of design work. What do we want to convey here ? Let's start the
brainstorming session, which means you can challenge anything I'll
propose here, and present any new idea.

There are multiple reasons why we need a minimum number of requests
queued:

- Fulfilling driver requirements, which are related to hardware
  requirements.

  It is common for video capture hardware to require multiple buffers to
  be queued to the hardware. A very common situation (mostly seen in
  low-end hardware) is to have two buffer address registers for the
  capture DMA engine, and ping-pong between the two of them. The
  hardware in that case can't deal with buffer underruns, so the driver
  needs to supply two buffers before capture can be started. The driver
  is also limited by the hardware when it comes to buffer completion.
  When a frame capture complete, a new buffer is needed to replace the
  completed one. Drivers usually hold on the completed buffer if no new
  buffer is available, so that the DMA engine will always have two
  buffers available. This means that queueing two buffers would start
  the capture, and a third buffer is needed for the driver to send
  completed buffers back to userspace. A similar situation occurs when
  the hardware uses shadow registers instead of an explicit ping-pong
  mechanism.

  There are several things that drivers can (and should) do to minimize
  the impact on userspace. One of them would be to allocate a scratch
  buffer to be used when userspace doesn't provide buffers fast enough.
  This is costly in terms of memory. Userspace is in a better position
  to handle scratch buffers, and skip them if it can guarantee that
  there will be no buffer queue underrun. However, when the device is
  behind an IOMMU, a single physical page can be allocated and mapped to
  the device as a large DMA area that spans a full frame, which makes
  scratch buffers cheap if handled by the kernel.

  Another contingency measure is possible in the ping-pong case. If the
  hardware requires buffers A and B to be available, capture can start
  immediately when two buffers (let's call them 1 and 2) are queued.
  When capture to A completes, the driver needs to reprogram the address
  of buffer A. If a third buffer is available in the queue at that time,
  it will use it.  Otherwise, it could reuse buffer 2 to program address
  A. This means that, when capture to B completes, that frame will be
  dropped, as the buffer will be reused immediately for A. This makes it
  possible to dequeue buffers with only two buffers queued instead of
  three. Using less than three buffers would however still result in
  frames being dropped.

  Note that these constraints should be applicable to inline hardware
  only. For offline ISPs (memory-to-memory), the hardware shouldn't have
  such constraints, and should be fine with starting processing as soon
  as one buffer is available, and to stop processing until a new buffer
  is available in case of a buffer queue underrun. That's at least the
  case for sane hardware, and we may not be able to completely exclude
  insanity on the hardware side.

- Peeking ahead in the request parameters.

  It can take time for some parameters to be applied. For instance, it's
  quite typical for the camera sensor's exposure time and gain to
  require 2 and 1 frames to be applied. This means that, to guarantee
  proper function of per-frame control, we need to have parameters
  available in advance. When running with auto-exposure enabled this may
  be less of a problem for the exposure time and gain, but if the
  application wants to control those parameters manually, we need enough
  requests in the queue.

  Generally speaking, a camera pipeline can be composed of multiple
  processing stages. A typical case has a camera sensor connected to an
  inline ISP (or just a CSI-2 receiver) to capture raw images, followed
  by an offline (memory-to-memory) ISP to process raw frames and output
  YUV. The result can then be encoded to JPEG by a memory-to-memory
  encoder (this is only relevant for the Android camera HAL
  implementation, the libcamera core doesn't deal with JPEG encoding).
  While frame N is being encoded to JPEG, frame N+1 is being processed
  by the offline ISP, frame N+2 is being read out from the sensor (and
  possibly processed by the inline ISP), and frame N+3 may be exposed.
  Even if the hardware doesn't require more than one buffer to be queued
  to function, if request N contains parameters for the camera sensor,
  for the offline ISP, and for the JPEG encoder, those parameters need
  to be available to libcamera three frames ealier to be applied at the
  right time.

  Note that some processing steps could in theory take more than the
  time of one frame to complete, if they're pipelined at the hardware
  level, or if they can be run in parallel for multiple frames (for
  instance we could have two JPEG encoders that take the time of one
  frame and a half to encode a frame). I'm not sure if this would impact
  what we need to expose to applications.

The question is how to translate this to requirements exposed to
applications. Going back to the IPU3 and RPi pipeline handlers, both
have the same hardware architecture, with a CSI-2 receiver capturing to
memory, and an offline ISP. The hardware and drivers will operate with a
single buffer, but the inline part will drop frames if we don't have at
least two buffers for raw frames (which may be internal buffers if the
application only captures YUV streams, or application-provided buffers
if the application also captures a RAW stream). A minimum of 1 request
will be able to capture frames in suboptimal conditions. 2 requests
would be needed to avoid dropping frames, and 3 requests to ensure
manual exposure time can be applied early enough (a 4th request would
then be needed to handle the JPEG encoding, but as that's for the
Android camera HAL implementation only, the libcamera core doesn't need
to consider it). Note that, if IPA algorithms are available, we could
possibly get away with less requests as the IPA could compute parameters
earlier, but we'd be screwed if a request disables algorithms, which we
can't really predict.

Let's also note that this doesn't mean that we need to have 3 or 4
buffers available, the buffers are only needed on the output side of the
pipeline, later than the parameters, but we currently have no way to
decouple buffers and controls.

What should we report in such a case ? I think we need to provide enough
information for applications to implement the optimal case (no frame
dropped, controls applied on the right frame, ...). We probably also
want to let applications know what the bare minimum is in order to
capture frames in suboptimal conditions. For instance, an application
may want to capture frames with a very low frame rate, or even capture a
single snapshot, and it would probably be overkill to require queuing
more than one request in that case. That leaves open the question of how
to deal with drivers that need more than one buffer to operate (the
non-IOMMU case above). Do we want to support them or can we draw the
line there (I think we should support them, unfortunately) ? Do we want
to allocate scratch buffers in the pipeline handler to make this
completely transparent to the application, which would result in a waste
of memory if the application provides us with enough buffers at all
times (and would also give us "interesting" problems to solve, such as
deciding when to queue a scratch buffer at start time) ? In many cases
the IPA needs to run for a few frames before the algorithms stabilize,
which would require multiple requests to be queued anyway (a single
snapshot use case could be implemented with a helper on top of
libcamera), but if an application disables all algorithms and provides
manual values for all controls, it could be feasible to capture a proper
snapshot with a single buffer and a single request.

Naush, David, your input would be welcome here, both on the explanation
of the issue (are my understanding and explanation correct ?), and on
the questions that follow.

Nícolas, don't let this scare you, you don't have to solve this issue as
part of this patch series :-) We can merge it as a first step, and
continue working on top. I would however like this patch to define the
QueueDepth property a bit better. Based on the above, you can decide
what you want to represent with QueueDepth, and document and name it
accordingly. If you're not sure, we can of course discuss it.

> +
>    # ----------------------------------------------------------------------------
>    # Draft properties section
>  

-- 
Regards,

Laurent Pinchart