[PATCH 4/6] controls: ipa: rpi: Add CNN controls

Fri Dec 13 11:24:05 CET 2024

Hi Naush

Thanks for the patch.

On Fri, 13 Dec 2024 at 09:46, Naushir Patuck <naush at raspberrypi.com> wrote:
>
> Add the follwing RPi vendor controls to handle Convolutional Neural
> Network processing:
>
> CnnOutputTensor
> CnnOutputTensorInfo
> CnnEnableInputTensor
> CnnInputTensor
> CnnInputTensorInfo
> CnnKpiInfo
>
> These controls will be used to support the new Raspberry Pi AI Camera,
> using an IMX500 sensor with on-board neural network processing.
>
> Signed-off-by: Naushir Patuck <naush at raspberrypi.com>
> ---
>  src/ipa/rpi/controller/controller.h |  33 +++++++++
>  src/libcamera/control_ids_rpi.yaml  | 108 ++++++++++++++++++++++++++++
>  2 files changed, 141 insertions(+)
>
> diff --git a/src/ipa/rpi/controller/controller.h b/src/ipa/rpi/controller/controller.h
> index 64f93f414524..489188b44d9b 100644
> --- a/src/ipa/rpi/controller/controller.h
> +++ b/src/ipa/rpi/controller/controller.h
> @@ -25,6 +25,39 @@
>
>  namespace RPiController {
>
> +/*
> + * The following structures are used to export the CNN input/output tensor information
> + * through the rpi::CnnOutputTensorInfo and rpi::CnnInputTensorInfo controls.
> + * Applications must cast the span to these structures exactly.
> + */
> +static constexpr unsigned int NetworkNameLen = 64;
> +static constexpr unsigned int MaxNumTensors = 16;
> +static constexpr unsigned int MaxNumDimensions = 16;

Did I see elsewhere that we like to put "k" on the front of constants
these days? Not that I'm bothered either way...

> +
> +struct OutputTensorInfo {
> +       uint32_t tensorDataNum;
> +       uint32_t numDimensions;
> +       uint16_t size[MaxNumDimensions];
> +};
> +
> +struct CnnOutputTensorInfo {
> +       char networkName[NetworkNameLen];
> +       uint32_t numTensors;
> +       OutputTensorInfo info[MaxNumTensors];
> +};
> +
> +struct CnnInputTensorInfo {
> +       char networkName[NetworkNameLen];
> +       uint32_t width;
> +       uint32_t height;
> +       uint32_t numChannels;
> +};
> +
> +struct CnnKpiInfo {
> +       uint32_t dnnRuntime;
> +       uint32_t dspRuntime;
> +};
> +

I wondered momentarily whether these should be in a separate header
file, but honestly, there are so few I'm happy not to bother with it!

>  class Algorithm;
>  typedef std::unique_ptr<Algorithm> AlgorithmPtr;
>
> diff --git a/src/libcamera/control_ids_rpi.yaml b/src/libcamera/control_ids_rpi.yaml
> index 34bbdfc863c5..c0b5f63df525 100644
> --- a/src/libcamera/control_ids_rpi.yaml
> +++ b/src/libcamera/control_ids_rpi.yaml
> @@ -55,4 +55,112 @@ controls:
>          official libcamera API support for per-stream controls in the future.
>
>          \sa ScalerCrop
> +
> +  - CnnOutputTensor:
> +      type: float
> +      size: [n]
> +      description: |
> +        This control returns a span of floating point values that represent the
> +        output tensors from a Convolutional Neural Network (CNN). The size and
> +        format of this array of values is entirely dependent on the neural
> +        network used, and further post-processing may need to be performed at
> +        the application level to generate the final desired output. This control
> +        is agnostic of the hardware or software used to generate the output
> +        tensors.
> +
> +        The structure of the span is described by the CnnOutputTensorInfo
> +        control.
> +
> +        \sa CnnOutputTensorInfo
> +
> +  - CnnOutputTensorInfo:
> +      type: uint8_t
> +      size: [n]
> +      description: |
> +        This control returns the structure of the CnnOutputTensor. This structure
> +        takes the following form:
> +
> +        constexpr unsigned int NetworkNameLen = 64;
> +        constexpr unsigned int MaxNumTensors = 16;
> +        constexpr unsigned int MaxNumDimensions = 16;
> +
> +        struct CnnOutputTensorInfo {
> +          char networkName[NetworkNameLen];
> +          uint32_t numTensors;
> +          OutputTensorInfo info[MaxNumTensors];
> +        };
> +
> +        with
> +
> +        struct OutputTensorInfo {
> +          uint32_t tensorDataNum;
> +          uint32_t numDimensions;
> +          uint16_t size[MaxNumDimensions];
> +        };
> +
> +        networkName is the name of the CNN used,
> +        numTensors is the number of output tensors returned,
> +        tensorDataNum gives the number of elements in each output tensor,
> +        numDimensions gives the dimensionality of each output tensor,
> +        size gives the size of each dimension in each output tensor.
> +
> +        \sa CnnOutputTensor
> +
> +  - CnnEnableInputTensor:
> +      type: bool
> +      description: |
> +        Boolean to control if the IPA returns the input tensor used by the CNN
> +        to generate the output tensors via the CnnInputTensor control. Because
> +        the input tensor may be relatively large, for efficiency reason avoid

s/reason/reasons/

> +        enabling input tensor output unless required for debugging purposes.

Actually I found some of this just a bit tricky to parse and
understand. Is "via the CnnInputTensor control" maybe superfluous?
Also "input tensor output" took me a moment. Maybe "output of the
input tensor" is easier?

> +
> +        \sa CnnInputTensor
> +
> +  - CnnInputTensor:
> +       type: uint8_t
> +       size: [n]
> +       description: |
> +        This control returns a span of uint8_t pixel values that represent the
> +        input tensor for a Convolutional Neural Network (CNN). The size and
> +        format of this array of values is entirely dependent on the neural
> +        network used, and further post-processing (e.g. pixel normalisations) may
> +        need to be performed at the application level to generate the final input
> +        image.
> +
> +        The structure of the span is described by the CnnInputTensorInfo
> +        control.
> +
> +        \sa CnnInputTensorInfo
> +
> +  - CnnInputTensorInfo:
> +      type: uint8_t
> +      size: [n]
> +      description: |
> +        This control returns the structure of the CnnInputTensor. This structure
> +        takes the following form:
> +
> +        constexpr unsigned int NetworkNameLen = 64;
> +
> +        struct CnnInputTensorInfo {
> +          char networkName[NetworkNameLen];
> +          uint32_t width;
> +          uint32_t height;
> +          uint32_t numChannels;
> +        };
> +
> +        where
> +
> +        networkName is the name of the CNN used,
> +        width and height are the input tensor image width and height in pixels,
> +        numChannels is the number of channels in the input tensor image.
> +
> +        \sa CnnInputTensor
> +
> +  - CnnKpiInfo:
> +      type: int32_t
> +      size: [2]
> +      description: |
> +        This control returns performance metrics for the CNN processing stage.
> +        Two values are returned in this span, the runtime of the CNN/DNN stage

No particular issue, just wondering why we sometimes have CNN and
sometimes DNN. Are they the same, really (we define CNN, but did we
ever define DNN)? Should we standardise?

Minor edits aside:

Reviewed-by: David Plowman <david.plowman at raspberrypi.com>

Thanks!
David

> +        and the DSP stage in milliseconds.
>  ...
> --
> 2.43.0
>