[libcamera-devel] [PATCH] [RFC] libcamera: camera: Define an explicit pipeline model

Tue Nov 3 04:15:37 CET 2020

Expand the existing implicit pipeline model with an explicit pipeline
API. The aim is to support advanced use cases where an application needs
to access frames between pipeline stages. In the Android world, this is
known as reprocessing.

This patch only contains the documentation that defines and details the
concepts and their usage. The rework of the camera configuration API
will follow.

Signed-off-by: Laurent Pinchart <laurent.pinchart at ideasonboard.com>
---
 src/libcamera/camera.cpp | 232 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 232 insertions(+)

diff --git a/src/libcamera/camera.cpp b/src/libcamera/camera.cpp
index eff999ec322a..83be4202735a 100644
--- a/src/libcamera/camera.cpp
+++ b/src/libcamera/camera.cpp
@@ -99,6 +99,238 @@
  * on the crop rectangle and the output stream size. The crop rectangle is
  * expressed relatively to the full pixel array size and indicates how the field
  * of view is affected by the pipeline.
+ *
+ * \section pipeline-stages Pipeline Stages
+ *
+ * At the hardware level, pipelines are often more complex. A camera is usually
+ * made of multiple independent stages chained together. For instance, a common
+ * pattern seen in camera hardware architectures splits the image processing,
+ * after the camera sensor, in two parts:
+ *
+ * - The first hardware processing stage is connected to the camera sensor and
+ *   captures raw frames to memory, possibly applying image processing to the
+ *   raw data (such as black level subtraction or lens shading correction).
+ *   This is referred to as inline processing, as frames are processed as they
+ *   arrive, in real time.
+ *
+ * - The second hardware processing stage reads the raw frames from memory,
+ *   applies demosaicing, color space conversion and other processing steps,
+ *   and stores the processed frames in memory in YUV format. This is referred
+ *   to as offline processing, as the timing constraints are not driven by a
+ *   live input.
+ *
+ * More offline processing stages may be chained after the first one to produce
+ * the final images. In libcamera, control of the pipeline stages happens by
+ * default behind the scenes in pipeline handlers to hide the complexity from
+ * applications.
+ *
+ * \subsection pipeline-stages-control Explicit Control of Pipeline Stages
+ *
+ * Applications may have use cases that require explicit control of the
+ * pipeline stages. In the previous example, an application may need to apply
+ * custom processing to the raw images between the inline and offline stages.
+ * libcamera supports this feature by making the pipelines explicit.
+ *
+ * The pipeline concept introduced previously is generalized as a logical view
+ * of processing operations applied to frames, covering one or multiple
+ * hardware stages. Each pipeline receives frames from a single input and
+ * produces one or multiple output streams of frames. The input corresponds to
+ * either the camera sensor, or frames stored in memory. A pipeline that
+ * produces frames generated by the camera sensor is known as a capture
+ * pipeline, while a pipeline that produces frames based on a memory input is
+ * known as a processing pipeline. Not all cameras may support processing
+ * pipelines.
+ *
+ * Pipelines operate on streams, which model an input or output of the pipeline.
+ * With the exception of the stream corresponding to the camera sensor, known
+ * as the live stream, all streams operate on memory. Output streams capture
+ * frames to memory, and input streams fetch frames from memory for further
+ * processing.
+ *
+ * Pipelines are constructed by applications when configuring the camera. To
+ * create a pipeline, applications shall select, among all the streams exposed
+ * by the camera, the streams that best match their use case based on the
+ * capabilities the streams expose. libcamera provides helper functions to
+ * assist this streams selection process.
+ *
+ * \todo Provide an example of two pipelines being used concurrently in the
+ * form of a diagram
+ *
+ * \subsection pipeline-resources Resource Sharing
+ *
+ * Within a camera, multiple pipelines may share hardware resources. For
+ * instance, with the typical inline/offline hardware architecture described
+ * above, an application may construct a capture pipeline to capture frames to
+ * memory in both raw format and processed YUV format, and a processing
+ * pipeline to process raw frames from memory. The capture pipeline would use
+ * both the inline stage (to capture raw frames) and the offline stage (to
+ * generate processed YUV frames), and the processing pipeline would use the
+ * same offline stage for memory to memory processing. Those two pipelines may
+ * be operated concurrently by an application, resulting in the offline stage
+ * and its streams being shared between the pipelines.
+ *
+ * Resource sharing between multiple pipelines is handled by libcamera as
+ * transparently as possible. The camera configuration API exposes information
+ * to inform of any user-visible impact of resource sharing and allow
+ * applications to make appropriate usage decisions.
+ *
+ * \subsection pipeline-stages-model-mapping Mapping to The Pipeline Model
+ *
+ * Depending on which input and output streams it uses, a pipeline usually
+ * supports a subset of the operations defined by the
+ * \ref camera-pipeline-model "pipeline model". For instance, a capture
+ * pipeline that ends at a stream capturing raw images may support operations
+ * up to pixel readout, or up to lens shading correction. A processing pipeline
+ * operating on raw frames and outputting YUV frames may start at black level
+ * subtraction or at spatial noise filtering.
+ *
+ * \section camera-use-cases Sample Use Cases
+ *
+ * To better understand the usage of pipelines and streams, this section
+ * presents several common use cases and how they map to pipelines and streams.
+ *
+ * \subsection camera-use-case-viewfinder Viewfinder
+ *
+ * The simplest use case captures a single live stream from the camera to
+ * display it on the screen. This is named a viewfinder, due to its usage in
+ * photo applications to display a preview of what will appear on the picture.
+ *
+ * In this use case, the camera operates with a single capture pipeline,
+ * containing a single output stream. The output stream shall be selected for
+ * its ability to produce a format and a size compatible with the display
+ * requirements. It will thus typically support scaling frames. The pixel
+ * format and size of the output stream are selected by the application.
+ *
+ * \subsection camera-use-case-viewfinder-still Viewfinder and Still Image Capture
+ *
+ * A slightly more advanced use case combines the viewfinder from the previous
+ * use case with high resolution still image capture. This is the most common
+ * simple point-and-shoot camera implementation, with the viewfinder offering
+ * live display on the screen, and still images being occasionally captured
+ * based on user input at a high(er) resolution.
+ *
+ * In this use case, the camera operates with a single capture pipeline,
+ * containing two output streams, respectively named viewfinder and still
+ * capture. Note that the stream naming only serves to ease referring to
+ * streams in the documentation of a particular use case, the streams selected
+ * for the viewfinder and still capture roles may support more use cases and may
+ * not be intrinsicly dedicated to these roles.
+ *
+ * As in the previous use case, the output streams shall be selected for their
+ * compatibility with the display and still capture requirements. The still
+ * capture stream may not support scaling, but may offer additional image
+ * quality improvements compared to the viewfinder stream (such as higher
+ * quality noise reduction).
+ *
+ * The pixel format and size of both streams are selected by the application.
+ *
+ * \subsection camera-use-case-viewfinder-video Viewfinder and Video Capture
+ *
+ * Similarly to the previous use case, this use case combines a viewfinder with
+ * a second stream, this time to capture video. This is a common use case for
+ * video recording or video conferencing applications, with the viewfinder
+ * offering live preview of the video on the screen, and the captured video
+ * being sent to an encoder and recorded on permanent storage or sent over the
+ * network.
+ *
+ * In this use case, the camera operates with a single capture pipeline,
+ * containing two output streams, respectively named viewfinder and video.
+ * Selection of the output streams by the application follows the same process
+ * as before. Both the viewfinder and video streams are typically selected for
+ * their ability to scale the image and output a format compatible with the
+ * display and the encoder respectively. The video stream may offer additional
+ * features such as video stabilization, and the viewfinder stream may support
+ * mirroring the image to present a more usual self view on the screen.
+ *
+ * The video stream in this use case is not limited to being encoded and stored
+ * or streamed. It may be used by the application for other purposes, such as
+ * analysis by computer vision algorithms.
+ *
+ * \subsection camera-use-case-raw Raw Capture
+ *
+ * In addition to processed frames, cameras may support capturing raw frames as
+ * produced by the camera sensor, with no or minimal processing. Raw frames may
+ * be used to capture in the Digital Negative (DNG) file format, or for the
+ * purpose of camera tuning during system development and integration. This may
+ * be combined with any of the previous use cases.
+ *
+ * In this use case, the camera operates with a single capture pipeline,
+ * containing one raw output stream, in addition to the processed output
+ * streams required for other purposes (such as viewfinder or still image
+ * capture). The raw stream is selected for its ability to generate raw frames.
+ *
+ * While applications select the format and size of processed streams, the raw
+ * stream typically offers less flexibility. Its pixel format is dictated by
+ * what the camera sensor produces, and may allow selection of a lower bit
+ * depth. The raw stream's size may be fixed when the sensor provides no
+ * scaling capability. Otherwise, it interacts with the size of the processed
+ * streams.
+ *
+ * \subsection camera-use-case-raw-processing Viewfinder and Still Image Capture with Custom Processing
+ *
+ * In the \ref camera-use-case-viewfinder-still "viewfinder and still image capture"
+ * use case described previously, all the processing applied to the still image
+ * is performed by the device. In order to increase the still image quality
+ * further, additional processing of the raw frame too complex for the
+ * hardware, or simply not supported by the camera, may be desirable. This
+ * includes, for instance, temporal noise reduction that combines multiple
+ * consecutive frames to reduce the average noise.
+ *
+ * Capturing the raw frame and processing it in the application is possible as
+ * explained in the \ref camera-use-case-raw "raw capture" use case. In that
+ * case, however, the application would be responsible for the complete
+ * processing of the raw frame to produce a still capture, severely increasing
+ * the application complexity. To avoid this, cameras can expose processing
+ * pipelines to applications, allowing them to capture raw frames, process them
+ * with custom algorithms, and send those pre-processed frames back to the
+ * camera's processing pipeline to apply all the regular camera processing. The
+ * pre-processing step is in that case fully implemented on the application
+ * side (usually based on custom software running on the main CPU, but nothing
+ * in libcamera would limit the application's ability to offload to a GPU or
+ * another processing engine), while harnessing the full power of the camera's
+ * hardware processing.
+ *
+ * In this use case, the camera operates with two pipelines, a capture pipeline
+ * and a processing pipeline. The capture pipeline contains one viewfinder
+ * output stream and one raw output stream. The processing pipeline contains one
+ * raw input stream and one still image capture output stream. The raw input
+ * stream is selected for its ability to consume raw frames in the same format
+ * as generated by the raw output stream.
+ *
+ * During viewfinder operation, the application uses the capture pipeline only,
+ * to capture viewfinder frames. When a still image capture is needed, the
+ * application additionally captures a raw frame from the capture pipeline,
+ * pre-processes it, and then submits the pre-processed frame to the processing
+ * pipeline to complete the still image capture operation. If the pre-processing
+ * requires more than one frame, the application may capture multiple raw
+ * frames, process them together into one pre-processed raw frame, and submit
+ * that frame back to the processing pipeline.
+ *
+ * \subsection camera-use-case-zsl Viewfinder and Zero Shutter Lag Still Image Capture
+ *
+ * When a user wants to capture a still image in a point-and-shoot camera
+ * application, various delays are involved at all stages of the process. On
+ * the device side, registering a button press or a tap on the screen, and
+ * processing the event, introduces a significant delay. Even if that delay
+ * was to be minimized, the delay from the scene event that needs to be
+ * captured and the user's action on the device is also significant. This often
+ * results in missed shots when trying to capture fast actions.
+ *
+ * To solve this issue, a technique can be used by the application to capture an
+ * image from the past, compensating the system's delays with a "negative
+ * delay". The camera uses the same capture and processing pipelines as in the
+ * previous use case. The capture pipeline is operated differently, with raw
+ * frames being captured continuously to a small ring buffer of frames managed
+ * by the application. When the still image capture is requested, the
+ * application selects the appropriate raw frame from the ring buffer, based on
+ * an evaluation of the capture event delay, and submits it to the processing
+ * pipeline to generate a processed still image. This technique is referred to
+ * as zero shutter lag, or ZSL, due to the apparent removal of all delays.
+ *
+ * Zero shutter lag can be combined with application-side processing of raw
+ * frames, for instance using multiple raw frames from the ring buffer to
+ * perform temporal noise reduction, or using image analysis to pick the best
+ * raw frame from the ring buffer.
  */
 
 namespace libcamera {
-- 
Regards,

Laurent Pinchart