[libcamera-devel] [PATCH v3 1/4] libcamera: property: Add QueueDepth property
Naushir Patuck
naush at raspberrypi.com
Mon Apr 26 12:38:06 CEST 2021
Hi everyone,
On Mon, 26 Apr 2021 at 03:40, Laurent Pinchart <
laurent.pinchart at ideasonboard.com> wrote:
> Hi Nícolas,
>
> Thank you for the patch.
>
> On Wed, Apr 21, 2021 at 01:51:36PM -0300, Nícolas F. R. A. Prado wrote:
> > The QueueDepth property reports the minimum amount of requests needed in
> > the camera pipeline.
> >
> > Signed-off-by: Nícolas F. R. A. Prado <nfraprado at collabora.com>
> > ---
> > src/libcamera/pipeline/ipu3/ipu3.cpp | 4 ++++
> > src/libcamera/pipeline/raspberrypi/raspberrypi.cpp | 4 ++++
> > src/libcamera/pipeline/rkisp1/rkisp1.cpp | 2 ++
> > src/libcamera/pipeline/rkisp1/rkisp1_path.h | 4 ++--
> > src/libcamera/pipeline/simple/simple.cpp | 6 ++++--
> > src/libcamera/pipeline/uvcvideo/uvcvideo.cpp | 2 ++
> > src/libcamera/pipeline/vimc/vimc.cpp | 3 +++
> > src/libcamera/property_ids.yaml | 5 +++++
> > 8 files changed, 26 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/libcamera/pipeline/ipu3/ipu3.cpp
> b/src/libcamera/pipeline/ipu3/ipu3.cpp
> > index 51446fcf5bc1..6067db2f37a3 100644
> > --- a/src/libcamera/pipeline/ipu3/ipu3.cpp
> > +++ b/src/libcamera/pipeline/ipu3/ipu3.cpp
> > @@ -1049,6 +1049,10 @@ int PipelineHandlerIPU3::registerCameras()
> > /* Initialize the camera properties. */
> > data->properties_ = cio2->sensor()->properties();
> >
> > + /* TODO This can be changed to CIO2 after configuration,
> but
> > + * both are 4 currently */
>
> /*
> * \todo This can be changed to CIO2 after configuration,
> but
> * both are 4 currently
> */
>
> The \todo is doxygen style and is used through the code base. The
> placement of /* */ is our standard coding style. Same below.
>
> > +
> > +
> > + data->properties_.set(properties::QueueDepth, 4);
>
> I think you can already lower the value, to 1 or 2 depending on the
> definition of the property (see below).
>
> > +
> > ret = initControls(data.get());
> > if (ret)
> > continue;
> > diff --git a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > index 2a917455500f..8d1ade3a4352 100644
> > --- a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > +++ b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > @@ -1035,6 +1035,10 @@ bool PipelineHandlerRPi::match(DeviceEnumerator
> *enumerator)
> > /* Initialize the camera properties. */
> > data->properties_ = data->sensor_->properties();
> >
> > + /* TODO Can be 1, 2 or 4 depending on configuration, for now use
> the max
> > + * which is 4 */
> > + data->properties_.set(properties::QueueDepth, 4);
>
> Same here, Naush commented that this should be 1, but depending on how
> we define the control, 2 could be a better value.
>
> > +
> > /*
> > * Set a default value for the ScalerCropMaximum property to show
> > * that we support its use, however, initialise it to zero because
> > diff --git a/src/libcamera/pipeline/rkisp1/rkisp1.cpp
> b/src/libcamera/pipeline/rkisp1/rkisp1.cpp
> > index 549f4a4e61a8..7d876e9387d7 100644
> > --- a/src/libcamera/pipeline/rkisp1/rkisp1.cpp
> > +++ b/src/libcamera/pipeline/rkisp1/rkisp1.cpp
> > @@ -21,6 +21,7 @@
> > #include <libcamera/ipa/core_ipa_interface.h>
> > #include <libcamera/ipa/rkisp1_ipa_interface.h>
> > #include <libcamera/ipa/rkisp1_ipa_proxy.h>
> > +#include <libcamera/property_ids.h>
> > #include <libcamera/request.h>
> > #include <libcamera/stream.h>
> >
> > @@ -940,6 +941,7 @@ int PipelineHandlerRkISP1::createCamera(MediaEntity
> *sensor)
> >
> > /* Initialize the camera properties. */
> > data->properties_ = data->sensor_->properties();
> > + data->properties_.set(properties::QueueDepth, RKISP1_BUFFER_COUNT);
>
> Here 3 would be a better value, to match the driver. However, I wonder
> if the hardware really needs 3 buffers. Dafna, Helen, do you have
> information about this ? A quick glance at the driver makes me think we
> could lower the number of buffers to 2. Lowering it to 1 may be nice,
> but probably more tricky.
>
> >
> > /*
> > * \todo Read dealy values from the sensor itself or from a
> > diff --git a/src/libcamera/pipeline/rkisp1/rkisp1_path.h
> b/src/libcamera/pipeline/rkisp1/rkisp1_path.h
> > index 3b3e37d258d0..7540dd41ad84 100644
> > --- a/src/libcamera/pipeline/rkisp1/rkisp1_path.h
> > +++ b/src/libcamera/pipeline/rkisp1/rkisp1_path.h
> > @@ -26,6 +26,8 @@ class V4L2Subdevice;
> > struct StreamConfiguration;
> > struct V4L2SubdeviceFormat;
> >
> > +static constexpr unsigned int RKISP1_BUFFER_COUNT = 4;
> > +
> > class RkISP1Path
> > {
> > public:
> > @@ -56,8 +58,6 @@ public:
> > Signal<FrameBuffer *> &bufferReady() { return video_->bufferReady;
> }
> >
> > private:
> > - static constexpr unsigned int RKISP1_BUFFER_COUNT = 4;
> > -
> > const char *name_;
> > bool running_;
> >
> > diff --git a/src/libcamera/pipeline/simple/simple.cpp
> b/src/libcamera/pipeline/simple/simple.cpp
> > index f6095d38e97a..6ee24f2f14e8 100644
> > --- a/src/libcamera/pipeline/simple/simple.cpp
> > +++ b/src/libcamera/pipeline/simple/simple.cpp
> > @@ -22,6 +22,7 @@
> > #include <linux/media-bus-format.h>
> >
> > #include <libcamera/camera.h>
> > +#include <libcamera/property_ids.h>
> > #include <libcamera/request.h>
> > #include <libcamera/stream.h>
> >
> > @@ -141,6 +142,8 @@ static const SimplePipelineInfo supportedDevices[] =
> {
> >
> > } /* namespace */
> >
> > +static constexpr unsigned int kNumInternalBuffers = 3;
> > +
> > class SimpleCameraData : public CameraData
> > {
> > public:
> > @@ -238,8 +241,6 @@ protected:
> > int queueRequestDevice(Camera *camera, Request *request) override;
> >
> > private:
> > - static constexpr unsigned int kNumInternalBuffers = 3;
> > -
>
> Is this change needed ?
>
> > SimpleCameraData *cameraData(const Camera *camera)
> > {
> > return static_cast<SimpleCameraData *>(
> > @@ -424,6 +425,7 @@ int SimpleCameraData::init()
> > }
> >
> > properties_ = sensor_->properties();
> > + properties_.set(properties::QueueDepth, kNumInternalBuffers);
> >
> > return 0;
> > }
> > diff --git a/src/libcamera/pipeline/uvcvideo/uvcvideo.cpp
> b/src/libcamera/pipeline/uvcvideo/uvcvideo.cpp
> > index b6c6ade5ebaf..591f46b60d23 100644
> > --- a/src/libcamera/pipeline/uvcvideo/uvcvideo.cpp
> > +++ b/src/libcamera/pipeline/uvcvideo/uvcvideo.cpp
> > @@ -525,6 +525,8 @@ int UVCCameraData::init(MediaDevice *media)
> > properties_.set(properties::PixelArraySize, resolution);
> > properties_.set(properties::PixelArrayActiveAreas, {
> Rectangle(resolution) });
> >
> > + properties_.set(properties::QueueDepth, 4);
>
> You can set this to 1 or 2 (again, see below).
>
> > +
> > /* Initialise the supported controls. */
> > ControlInfoMap::Map ctrls;
> >
> > diff --git a/src/libcamera/pipeline/vimc/vimc.cpp
> b/src/libcamera/pipeline/vimc/vimc.cpp
> > index 8f5f4ba30953..605b3fe89152 100644
> > --- a/src/libcamera/pipeline/vimc/vimc.cpp
> > +++ b/src/libcamera/pipeline/vimc/vimc.cpp
> > @@ -20,6 +20,7 @@
> > #include <libcamera/formats.h>
> > #include <libcamera/ipa/ipa_interface.h>
> > #include <libcamera/ipa/ipa_module_info.h>
> > +#include <libcamera/property_ids.h>
> > #include <libcamera/request.h>
> > #include <libcamera/stream.h>
> >
> > @@ -516,6 +517,8 @@ int VimcCameraData::init()
> > /* Initialize the camera properties. */
> > properties_ = sensor_->properties();
> >
> > + properties_.set(properties::QueueDepth, 4);
>
> And here, 2 should be enough.
>
> > +
> > return 0;
> > }
> >
> > diff --git a/src/libcamera/property_ids.yaml
> b/src/libcamera/property_ids.yaml
> > index 104e9aaf4fa3..0b7d1271a26b 100644
> > --- a/src/libcamera/property_ids.yaml
> > +++ b/src/libcamera/property_ids.yaml
> > @@ -678,6 +678,11 @@ controls:
> > \todo Turn this property into a "maximum control value" for the
> > ScalerCrop control once "dynamic" controls have been
> implemented.
> >
> > + - QueueDepth:
> > + type: int32_t
> > + description: |
> > + Minimum amount of requests needed in the camera pipeline.
>
> I think this needs to be better defined.
>
> First of all, the name QueueDepth doesn't really relate to concepts that
> are explicitly defined in libcamera. We should thus either define those
> concepts, or, if we want to define the property in a self-contained way,
> change the name to match the description of the property. This could for
> instance become MinNumRequests.
>
> Then, we should also expand the definition of the control. This requires
> a bit of design work. What do we want to convey here ? Let's start the
> brainstorming session, which means you can challenge anything I'll
> propose here, and present any new idea.
>
> There are multiple reasons why we need a minimum number of requests
> queued:
>
> - Fulfilling driver requirements, which are related to hardware
> requirements.
>
> It is common for video capture hardware to require multiple buffers to
> be queued to the hardware. A very common situation (mostly seen in
> low-end hardware) is to have two buffer address registers for the
> capture DMA engine, and ping-pong between the two of them. The
> hardware in that case can't deal with buffer underruns, so the driver
> needs to supply two buffers before capture can be started. The driver
> is also limited by the hardware when it comes to buffer completion.
> When a frame capture complete, a new buffer is needed to replace the
> completed one. Drivers usually hold on the completed buffer if no new
> buffer is available, so that the DMA engine will always have two
> buffers available. This means that queueing two buffers would start
> the capture, and a third buffer is needed for the driver to send
> completed buffers back to userspace. A similar situation occurs when
> the hardware uses shadow registers instead of an explicit ping-pong
> mechanism.
>
> There are several things that drivers can (and should) do to minimize
> the impact on userspace. One of them would be to allocate a scratch
> buffer to be used when userspace doesn't provide buffers fast enough.
> This is costly in terms of memory. Userspace is in a better position
> to handle scratch buffers, and skip them if it can guarantee that
> there will be no buffer queue underrun. However, when the device is
> behind an IOMMU, a single physical page can be allocated and mapped to
> the device as a large DMA area that spans a full frame, which makes
> scratch buffers cheap if handled by the kernel.
>
> Another contingency measure is possible in the ping-pong case. If the
> hardware requires buffers A and B to be available, capture can start
> immediately when two buffers (let's call them 1 and 2) are queued.
> When capture to A completes, the driver needs to reprogram the address
> of buffer A. If a third buffer is available in the queue at that time,
> it will use it. Otherwise, it could reuse buffer 2 to program address
> A. This means that, when capture to B completes, that frame will be
> dropped, as the buffer will be reused immediately for A. This makes it
> possible to dequeue buffers with only two buffers queued instead of
> three. Using less than three buffers would however still result in
> frames being dropped.
>
> Note that these constraints should be applicable to inline hardware
> only. For offline ISPs (memory-to-memory), the hardware shouldn't have
> such constraints, and should be fine with starting processing as soon
> as one buffer is available, and to stop processing until a new buffer
> is available in case of a buffer queue underrun. That's at least the
> case for sane hardware, and we may not be able to completely exclude
> insanity on the hardware side.
>
> - Peeking ahead in the request parameters.
>
> It can take time for some parameters to be applied. For instance, it's
> quite typical for the camera sensor's exposure time and gain to
> require 2 and 1 frames to be applied. This means that, to guarantee
> proper function of per-frame control, we need to have parameters
> available in advance. When running with auto-exposure enabled this may
> be less of a problem for the exposure time and gain, but if the
> application wants to control those parameters manually, we need enough
> requests in the queue.
>
> Generally speaking, a camera pipeline can be composed of multiple
> processing stages. A typical case has a camera sensor connected to an
> inline ISP (or just a CSI-2 receiver) to capture raw images, followed
> by an offline (memory-to-memory) ISP to process raw frames and output
> YUV. The result can then be encoded to JPEG by a memory-to-memory
> encoder (this is only relevant for the Android camera HAL
> implementation, the libcamera core doesn't deal with JPEG encoding).
> While frame N is being encoded to JPEG, frame N+1 is being processed
> by the offline ISP, frame N+2 is being read out from the sensor (and
> possibly processed by the inline ISP), and frame N+3 may be exposed.
> Even if the hardware doesn't require more than one buffer to be queued
> to function, if request N contains parameters for the camera sensor,
> for the offline ISP, and for the JPEG encoder, those parameters need
> to be available to libcamera three frames ealier to be applied at the
> right time.
>
> Note that some processing steps could in theory take more than the
> time of one frame to complete, if they're pipelined at the hardware
> level, or if they can be run in parallel for multiple frames (for
> instance we could have two JPEG encoders that take the time of one
> frame and a half to encode a frame). I'm not sure if this would impact
> what we need to expose to applications.
>
> The question is how to translate this to requirements exposed to
> applications. Going back to the IPU3 and RPi pipeline handlers, both
> have the same hardware architecture, with a CSI-2 receiver capturing to
> memory, and an offline ISP. The hardware and drivers will operate with a
> single buffer, but the inline part will drop frames if we don't have at
> least two buffers for raw frames (which may be internal buffers if the
> application only captures YUV streams, or application-provided buffers
> if the application also captures a RAW stream). A minimum of 1 request
> will be able to capture frames in suboptimal conditions. 2 requests
> would be needed to avoid dropping frames, and 3 requests to ensure
> manual exposure time can be applied early enough (a 4th request would
> then be needed to handle the JPEG encoding, but as that's for the
> Android camera HAL implementation only, the libcamera core doesn't need
> to consider it). Note that, if IPA algorithms are available, we could
> possibly get away with less requests as the IPA could compute parameters
> earlier, but we'd be screwed if a request disables algorithms, which we
> can't really predict.
>
> Let's also note that this doesn't mean that we need to have 3 or 4
> buffers available, the buffers are only needed on the output side of the
> pipeline, later than the parameters, but we currently have no way to
> decouple buffers and controls.
>
> What should we report in such a case ? I think we need to provide enough
> information for applications to implement the optimal case (no frame
> dropped, controls applied on the right frame, ...). We probably also
> want to let applications know what the bare minimum is in order to
> capture frames in suboptimal conditions. For instance, an application
> may want to capture frames with a very low frame rate, or even capture a
> single snapshot, and it would probably be overkill to require queuing
> more than one request in that case. That leaves open the question of how
> to deal with drivers that need more than one buffer to operate (the
> non-IOMMU case above). Do we want to support them or can we draw the
> line there (I think we should support them, unfortunately) ? Do we want
> to allocate scratch buffers in the pipeline handler to make this
> completely transparent to the application, which would result in a waste
> of memory if the application provides us with enough buffers at all
> times (and would also give us "interesting" problems to solve, such as
> deciding when to queue a scratch buffer at start time) ? In many cases
> the IPA needs to run for a few frames before the algorithms stabilize,
> which would require multiple requests to be queued anyway (a single
> snapshot use case could be implemented with a helper on top of
> libcamera), but if an application disables all algorithms and provides
> manual values for all controls, it could be feasible to capture a proper
> snapshot with a single buffer and a single request.
>
> Naush, David, your input would be welcome here, both on the explanation
> of the issue (are my understanding and explanation correct ?), and on
> the questions that follow.
>
I think Laurent's explanation provides a comprehensive summary of what
this control might need to encompass. To provide a bit more context, for
Raspberry Pi, our CSI2 device driver (Unicam) does use a scratch buffer
if a framebuffer is not provided by the application. This scratch buffer is
16k (single page) in size, as our hardware allows us to truncate or run in
circular buffer mode. I intentionally left allocation and control of the
scratch
buffer within the kernel driver as buffer handling is driven by the device
interrupt handler for timing guarantees. Given that Unicam writes out to
memory, and our ISP is purely a memory to memory device, I have suggested
our minimum queue depth be set to 1 for guaranteed operation.
Having said that, there is an interesting point raised by Laurent about
manual
exposure and gain controls. I believe we have always said that these
controls don't need to be consumed by the buffer returned in the Request
that set them (i.e. the gain/exposure updates can return N frames later),
is that still correct? Of course, setting manual exposure/gain before
starting
streaming would (should!) guarantee the buffer returned in the Request
will have the correct values set.
In my opinion, I don't see much value in tying this QueueDepth property with
sensor delays - which can and will be different per sensor. If an
application
has requested manual exposure/gain values *while in a streaming state*
it might just have to scan the metadata on each returned framebuffer until
it has the frame with the values it requested. This can be easily done in
it's request/return loop without needing to know the frame delays, so no
need to adjust QueueDepth for this.
Regards,
Naush
>
> Nícolas, don't let this scare you, you don't have to solve this issue as
> part of this patch series :-) We can merge it as a first step, and
> continue working on top. I would however like this patch to define the
> QueueDepth property a bit better. Based on the above, you can decide
> what you want to represent with QueueDepth, and document and name it
> accordingly. If you're not sure, we can of course discuss it.
>
> > +
> > #
> ----------------------------------------------------------------------------
> > # Draft properties section
> >
>
> --
> Regards,
>
> Laurent Pinchart
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.libcamera.org/pipermail/libcamera-devel/attachments/20210426/9941d81f/attachment-0001.htm>
More information about the libcamera-devel
mailing list