[PATCH 4/6] controls: ipa: rpi: Add CNN controls

Fri Dec 13 10:38:27 CET 2024

Add the follwing RPi vendor controls to handle Convolutional Neural
Network processing:

CnnOutputTensor
CnnOutputTensorInfo
CnnEnableInputTensor
CnnInputTensor
CnnInputTensorInfo
CnnKpiInfo

These controls will be used to support the new Raspberry Pi AI Camera,
using an IMX500 sensor with on-board neural network processing.

Signed-off-by: Naushir Patuck <naush at raspberrypi.com>
---
 src/ipa/rpi/controller/controller.h |  33 +++++++++
 src/libcamera/control_ids_rpi.yaml  | 108 ++++++++++++++++++++++++++++
 2 files changed, 141 insertions(+)

diff --git a/src/ipa/rpi/controller/controller.h b/src/ipa/rpi/controller/controller.h
index 64f93f414524..489188b44d9b 100644
--- a/src/ipa/rpi/controller/controller.h
+++ b/src/ipa/rpi/controller/controller.h
@@ -25,6 +25,39 @@
 
 namespace RPiController {
 
+/*
+ * The following structures are used to export the CNN input/output tensor information
+ * through the rpi::CnnOutputTensorInfo and rpi::CnnInputTensorInfo controls.
+ * Applications must cast the span to these structures exactly.
+ */
+static constexpr unsigned int NetworkNameLen = 64;
+static constexpr unsigned int MaxNumTensors = 16;
+static constexpr unsigned int MaxNumDimensions = 16;
+
+struct OutputTensorInfo {
+	uint32_t tensorDataNum;
+	uint32_t numDimensions;
+	uint16_t size[MaxNumDimensions];
+};
+
+struct CnnOutputTensorInfo {
+	char networkName[NetworkNameLen];
+	uint32_t numTensors;
+	OutputTensorInfo info[MaxNumTensors];
+};
+
+struct CnnInputTensorInfo {
+	char networkName[NetworkNameLen];
+	uint32_t width;
+	uint32_t height;
+	uint32_t numChannels;
+};
+
+struct CnnKpiInfo {
+	uint32_t dnnRuntime;
+	uint32_t dspRuntime;
+};
+
 class Algorithm;
 typedef std::unique_ptr<Algorithm> AlgorithmPtr;
 
diff --git a/src/libcamera/control_ids_rpi.yaml b/src/libcamera/control_ids_rpi.yaml
index 34bbdfc863c5..c0b5f63df525 100644
--- a/src/libcamera/control_ids_rpi.yaml
+++ b/src/libcamera/control_ids_rpi.yaml
@@ -55,4 +55,112 @@ controls:
         official libcamera API support for per-stream controls in the future.
 
         \sa ScalerCrop
+
+  - CnnOutputTensor:
+      type: float
+      size: [n]
+      description: |
+        This control returns a span of floating point values that represent the
+        output tensors from a Convolutional Neural Network (CNN). The size and
+        format of this array of values is entirely dependent on the neural
+        network used, and further post-processing may need to be performed at
+        the application level to generate the final desired output. This control
+        is agnostic of the hardware or software used to generate the output
+        tensors.
+
+        The structure of the span is described by the CnnOutputTensorInfo
+        control.
+
+        \sa CnnOutputTensorInfo
+
+  - CnnOutputTensorInfo:
+      type: uint8_t
+      size: [n]
+      description: |
+        This control returns the structure of the CnnOutputTensor. This structure
+        takes the following form:
+
+        constexpr unsigned int NetworkNameLen = 64;
+        constexpr unsigned int MaxNumTensors = 16;
+        constexpr unsigned int MaxNumDimensions = 16;
+
+        struct CnnOutputTensorInfo {
+          char networkName[NetworkNameLen];
+          uint32_t numTensors;
+          OutputTensorInfo info[MaxNumTensors];
+        };
+
+        with
+
+        struct OutputTensorInfo {
+          uint32_t tensorDataNum;
+          uint32_t numDimensions;
+          uint16_t size[MaxNumDimensions];
+        };
+
+        networkName is the name of the CNN used,
+        numTensors is the number of output tensors returned,
+        tensorDataNum gives the number of elements in each output tensor,
+        numDimensions gives the dimensionality of each output tensor,
+        size gives the size of each dimension in each output tensor.
+
+        \sa CnnOutputTensor
+
+  - CnnEnableInputTensor:
+      type: bool
+      description: |
+        Boolean to control if the IPA returns the input tensor used by the CNN
+        to generate the output tensors via the CnnInputTensor control. Because
+        the input tensor may be relatively large, for efficiency reason avoid
+        enabling input tensor output unless required for debugging purposes.
+
+        \sa CnnInputTensor
+
+  - CnnInputTensor:
+       type: uint8_t
+       size: [n]
+       description: |
+        This control returns a span of uint8_t pixel values that represent the
+        input tensor for a Convolutional Neural Network (CNN). The size and
+        format of this array of values is entirely dependent on the neural
+        network used, and further post-processing (e.g. pixel normalisations) may
+        need to be performed at the application level to generate the final input
+        image.
+
+        The structure of the span is described by the CnnInputTensorInfo
+        control.
+
+        \sa CnnInputTensorInfo
+
+  - CnnInputTensorInfo:
+      type: uint8_t
+      size: [n]
+      description: |
+        This control returns the structure of the CnnInputTensor. This structure
+        takes the following form:
+
+        constexpr unsigned int NetworkNameLen = 64;
+
+        struct CnnInputTensorInfo {
+          char networkName[NetworkNameLen];
+          uint32_t width;
+          uint32_t height;
+          uint32_t numChannels;
+        };
+
+        where
+
+        networkName is the name of the CNN used,
+        width and height are the input tensor image width and height in pixels,
+        numChannels is the number of channels in the input tensor image.
+
+        \sa CnnInputTensor
+
+  - CnnKpiInfo:
+      type: int32_t
+      size: [2]
+      description: |
+        This control returns performance metrics for the CNN processing stage.
+        Two values are returned in this span, the runtime of the CNN/DNN stage
+        and the DSP stage in milliseconds.
 ...
-- 
2.43.0