[libcamera-devel] [PATCH] [RFC] libcamera: camera: Define an explicit pipeline model

Mon Nov 9 15:25:16 CET 2020

Hi Laurent,
  a few misc comments I had while reading this extensive
  documentation block.

The first one, which applies to the previous patch as well, is:
is the Camera class documentation the right place for this ? wouldn't
this be better as general library documentation ?

On Tue, Nov 03, 2020 at 05:15:37AM +0200, Laurent Pinchart wrote:
> Expand the existing implicit pipeline model with an explicit pipeline
> API. The aim is to support advanced use cases where an application needs
> to access frames between pipeline stages. In the Android world, this is
> known as reprocessing.
>
> This patch only contains the documentation that defines and details the
> concepts and their usage. The rework of the camera configuration API
> will follow.
>
> Signed-off-by: Laurent Pinchart <laurent.pinchart at ideasonboard.com>
> ---
>  src/libcamera/camera.cpp | 232 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 232 insertions(+)
>
> diff --git a/src/libcamera/camera.cpp b/src/libcamera/camera.cpp
> index eff999ec322a..83be4202735a 100644
> --- a/src/libcamera/camera.cpp
> +++ b/src/libcamera/camera.cpp
> @@ -99,6 +99,238 @@
>   * on the crop rectangle and the output stream size. The crop rectangle is
>   * expressed relatively to the full pixel array size and indicates how the field
>   * of view is affected by the pipeline.
> + *
> + * \section pipeline-stages Pipeline Stages
> + *
> + * At the hardware level, pipelines are often more complex. A camera is usually
> + * made of multiple independent stages chained together. For instance, a common
> + * pattern seen in camera hardware architectures splits the image processing,
> + * after the camera sensor, in two parts:
> + *
> + * - The first hardware processing stage is connected to the camera sensor and
> + *   captures raw frames to memory, possibly applying image processing to the
> + *   raw data (such as black level subtraction or lens shading correction).
> + *   This is referred to as inline processing, as frames are processed as they
> + *   arrive, in real time.
> + *
> + * - The second hardware processing stage reads the raw frames from memory,
> + *   applies demosaicing, color space conversion and other processing steps,
> + *   and stores the processed frames in memory in YUV format. This is referred
> + *   to as offline processing, as the timing constraints are not driven by a
> + *   live input.

what about platforms which inlines capture and processing without
going through memory ? Do they sit in the middle, as they process
frames 'in real time' but as well apply processing steps that are here
listed in the second paragraph ?

In general, do you think that associating typical ISP processing (like
color space conversion) with the location where frames are read from
is useful ?

I would describe in-line and off-line pipelines first, the list
transformations that can be applied on raw frames and typical ISP
functionalities.

> + *
> + * More offline processing stages may be chained after the first one to produce
> + * the final images. In libcamera, control of the pipeline stages happens by
> + * default behind the scenes in pipeline handlers to hide the complexity from
> + * applications.
> + *
> + * \subsection pipeline-stages-control Explicit Control of Pipeline Stages
> + *
> + * Applications may have use cases that require explicit control of the
> + * pipeline stages. In the previous example, an application may need to apply
> + * custom processing to the raw images between the inline and offline stages.
> + * libcamera supports this feature by making the pipelines explicit.
> + *
> + * The pipeline concept introduced previously is generalized as a logical view
> + * of processing operations applied to frames, covering one or multiple
> + * hardware stages. Each pipeline receives frames from a single input and
> + * produces one or multiple output streams of frames. The input corresponds to
> + * either the camera sensor, or frames stored in memory. A pipeline that
> + * produces frames generated by the camera sensor is known as a capture
> + * pipeline, while a pipeline that produces frames based on a memory input is
> + * known as a processing pipeline. Not all cameras may support processing
> + * pipelines.

As I read the description of 'processing pipeline' I think of purely
m2m devices, which again cut out devices with an in-line ISP.

> + *
> + * Pipelines operate on streams, which model an input or output of the pipeline.

Or do they operate on a single input and produce multiple output
streams ?

> + * With the exception of the stream corresponding to the camera sensor, known
> + * as the live stream, all streams operate on memory. Output streams capture
> + * frames to memory, and input streams fetch frames from memory for further
> + * processing.

Ok, I see where this is going and why you need the previous
distinction. In-line ISP would likely not support 'input streams'
then as they're fed from the CSI-2 receiver buffers ?

> + *
> + * Pipelines are constructed by applications when configuring the camera. To
> + * create a pipeline, applications shall select, among all the streams exposed
> + * by the camera, the streams that best match their use case based on the
> + * capabilities the streams expose. libcamera provides helper functions to

A Stream currently exposes nothing more than information relative to
the image format it produces (format, size, stride and bufferNum (aka
pipeline depth?)). What other capabilities are you thinking to ? ie
'direction' (input = from memory, output = from live source?)

> + * assist this streams selection process.
> + *
> + * \todo Provide an example of two pipelines being used concurrently in the
> + * form of a diagram
> + *
> + * \subsection pipeline-resources Resource Sharing
> + *
> + * Within a camera, multiple pipelines may share hardware resources. For
> + * instance, with the typical inline/offline hardware architecture described
> + * above, an application may construct a capture pipeline to capture frames to
> + * memory in both raw format and processed YUV format, and a processing
> + * pipeline to process raw frames from memory. The capture pipeline would use
> + * both the inline stage (to capture raw frames) and the offline stage (to
> + * generate processed YUV frames), and the processing pipeline would use the
> + * same offline stage for memory to memory processing. Those two pipelines may
> + * be operated concurrently by an application, resulting in the offline stage
> + * and its streams being shared between the pipelines.
> + *

oh my /o\

> + * Resource sharing between multiple pipelines is handled by libcamera as
> + * transparently as possible. The camera configuration API exposes information
> + * to inform of any user-visible impact of resource sharing and allow
> + * applications to make appropriate usage decisions.
> + *
> + * \subsection pipeline-stages-model-mapping Mapping to The Pipeline Model
> + *
> + * Depending on which input and output streams it uses, a pipeline usually

nit: 'which combination of input and output streams' ?

> + * supports a subset of the operations defined by the
> + * \ref camera-pipeline-model "pipeline model". For instance, a capture
> + * pipeline that ends at a stream capturing raw images may support operations
> + * up to pixel readout, or up to lens shading correction. A processing pipeline
> + * operating on raw frames and outputting YUV frames may start at black level
> + * subtraction or at spatial noise filtering.
> + *
> + * \section camera-use-cases Sample Use Cases
> + *
> + * To better understand the usage of pipelines and streams, this section
> + * presents several common use cases and how they map to pipelines and streams.
> + *
> + * \subsection camera-use-case-viewfinder Viewfinder
> + *
> + * The simplest use case captures a single live stream from the camera to
> + * display it on the screen. This is named a viewfinder, due to its usage in
> + * photo applications to display a preview of what will appear on the picture.
> + *
> + * In this use case, the camera operates with a single capture pipeline,
> + * containing a single output stream. The output stream shall be selected for
> + * its ability to produce a format and a size compatible with the display
> + * requirements. It will thus typically support scaling frames. The pixel
> + * format and size of the output stream are selected by the application.
> + *
> + * \subsection camera-use-case-viewfinder-still Viewfinder and Still Image Capture
> + *
> + * A slightly more advanced use case combines the viewfinder from the previous
> + * use case with high resolution still image capture. This is the most common
> + * simple point-and-shoot camera implementation, with the viewfinder offering
> + * live display on the screen, and still images being occasionally captured
> + * based on user input at a high(er) resolution.
> + *
> + * In this use case, the camera operates with a single capture pipeline,
> + * containing two output streams, respectively named viewfinder and still
> + * capture. Note that the stream naming only serves to ease referring to
> + * streams in the documentation of a particular use case, the streams selected
> + * for the viewfinder and still capture roles may support more use cases and may
> + * not be intrinsicly dedicated to these roles.
> + *
> + * As in the previous use case, the output streams shall be selected for their
> + * compatibility with the display and still capture requirements. The still
> + * capture stream may not support scaling, but may offer additional image
> + * quality improvements compared to the viewfinder stream (such as higher
> + * quality noise reduction).
> + *
> + * The pixel format and size of both streams are selected by the application.
> + *
> + * \subsection camera-use-case-viewfinder-video Viewfinder and Video Capture
> + *
> + * Similarly to the previous use case, this use case combines a viewfinder with
> + * a second stream, this time to capture video. This is a common use case for
> + * video recording or video conferencing applications, with the viewfinder
> + * offering live preview of the video on the screen, and the captured video
> + * being sent to an encoder and recorded on permanent storage or sent over the
> + * network.
> + *
> + * In this use case, the camera operates with a single capture pipeline,
> + * containing two output streams, respectively named viewfinder and video.
> + * Selection of the output streams by the application follows the same process
> + * as before. Both the viewfinder and video streams are typically selected for
> + * their ability to scale the image and output a format compatible with the
> + * display and the encoder respectively. The video stream may offer additional
> + * features such as video stabilization, and the viewfinder stream may support
> + * mirroring the image to present a more usual self view on the screen.
> + *
> + * The video stream in this use case is not limited to being encoded and stored
> + * or streamed. It may be used by the application for other purposes, such as
> + * analysis by computer vision algorithms.

Isn't this just another case of 2 output streams, with minimal
differences from the pipeline configuration point of view ?

> + *
> + * \subsection camera-use-case-raw Raw Capture
> + *
> + * In addition to processed frames, cameras may support capturing raw frames as
> + * produced by the camera sensor, with no or minimal processing. Raw frames may
> + * be used to capture in the Digital Negative (DNG) file format, or for the
> + * purpose of camera tuning during system development and integration. This may
> + * be combined with any of the previous use cases.
> + *
> + * In this use case, the camera operates with a single capture pipeline,
> + * containing one raw output stream, in addition to the processed output
> + * streams required for other purposes (such as viewfinder or still image
> + * capture). The raw stream is selected for its ability to generate raw frames.

The last sentence is a bit redundant maybe ?

> + *
> + * While applications select the format and size of processed streams, the raw

s/while/if ?

> + * stream typically offers less flexibility. Its pixel format is dictated by
> + * what the camera sensor produces, and may allow selection of a lower bit
> + * depth. The raw stream's size may be fixed when the sensor provides no
> + * scaling capability. Otherwise, it interacts with the size of the processed
> + * streams.

I would explain the constraint better, as in example pointing out that
(up-scaling apart) the raw image should be large as the smallest
requested processed output stream.

> + *
> + * \subsection camera-use-case-raw-processing Viewfinder and Still Image Capture with Custom Processing
> + *
> + * In the \ref camera-use-case-viewfinder-still "viewfinder and still image capture"
> + * use case described previously, all the processing applied to the still image
> + * is performed by the device. In order to increase the still image quality
> + * further, additional processing of the raw frame too complex for the
> + * hardware, or simply not supported by the camera, may be desirable. This
> + * includes, for instance, temporal noise reduction that combines multiple
> + * consecutive frames to reduce the average noise.
> + *
> + * Capturing the raw frame and processing it in the application is possible as
> + * explained in the \ref camera-use-case-raw "raw capture" use case. In that
> + * case, however, the application would be responsible for the complete
> + * processing of the raw frame to produce a still capture, severely increasing
> + * the application complexity. To avoid this, cameras can expose processing
> + * pipelines to applications, allowing them to capture raw frames, process them
> + * with custom algorithms, and send those pre-processed frames back to the
> + * camera's processing pipeline to apply all the regular camera processing. The
> + * pre-processing step is in that case fully implemented on the application
> + * side (usually based on custom software running on the main CPU, but nothing
> + * in libcamera would limit the application's ability to offload to a GPU or
> + * another processing engine), while harnessing the full power of the camera's
> + * hardware processing.
> + *
> + * In this use case, the camera operates with two pipelines, a capture pipeline
> + * and a processing pipeline. The capture pipeline contains one viewfinder
> + * output stream and one raw output stream. The processing pipeline contains one
> + * raw input stream and one still image capture output stream. The raw input
> + * stream is selected for its ability to consume raw frames in the same format
> + * as generated by the raw output stream.
> + *
> + * During viewfinder operation, the application uses the capture pipeline only,
> + * to capture viewfinder frames. When a still image capture is needed, the
> + * application additionally captures a raw frame from the capture pipeline,
> + * pre-processes it, and then submits the pre-processed frame to the processing
> + * pipeline to complete the still image capture operation. If the pre-processing
> + * requires more than one frame, the application may capture multiple raw
> + * frames, process them together into one pre-processed raw frame, and submit
> + * that frame back to the processing pipeline.
> + *
> + * \subsection camera-use-case-zsl Viewfinder and Zero Shutter Lag Still Image Capture
> + *
> + * When a user wants to capture a still image in a point-and-shoot camera
> + * application, various delays are involved at all stages of the process. On
> + * the device side, registering a button press or a tap on the screen, and
> + * processing the event, introduces a significant delay. Even if that delay
> + * was to be minimized, the delay from the scene event that needs to be
> + * captured and the user's action on the device is also significant. This often
> + * results in missed shots when trying to capture fast actions.
> + *
> + * To solve this issue, a technique can be used by the application to capture an
> + * image from the past, compensating the system's delays with a "negative
> + * delay". The camera uses the same capture and processing pipelines as in the
> + * previous use case. The capture pipeline is operated differently, with raw
> + * frames being captured continuously to a small ring buffer of frames managed
> + * by the application. When the still image capture is requested, the
> + * application selects the appropriate raw frame from the ring buffer, based on
> + * an evaluation of the capture event delay, and submits it to the processing
> + * pipeline to generate a processed still image. This technique is referred to
> + * as zero shutter lag, or ZSL, due to the apparent removal of all delays.
> + *
> + * Zero shutter lag can be combined with application-side processing of raw
> + * frames, for instance using multiple raw frames from the ring buffer to
> + * perform temporal noise reduction, or using image analysis to pick the best
> + * raw frame from the ring buffer.

Nice!

Thanks
  j

>   */
>
>  namespace libcamera {
> --
> Regards,
>
> Laurent Pinchart
>
> _______________________________________________
> libcamera-devel mailing list
> libcamera-devel at lists.libcamera.org
> https://lists.libcamera.org/listinfo/libcamera-devel