[PATCH 4/6] controls: ipa: rpi: Add CNN controls

Wed Dec 18 00:00:05 CET 2024

Hi Naush,

On Mon, Dec 16, 2024 at 10:11:28AM +0000, Naushir Patuck wrote:
> On Sun, 15 Dec 2024 at 16:38, Laurent Pinchart wrote:
> > On Fri, Dec 13, 2024 at 09:38:27AM +0000, Naushir Patuck wrote:
> > > Add the follwing RPi vendor controls to handle Convolutional Neural
> > > Network processing:
> > >
> > > CnnOutputTensor
> > > CnnOutputTensorInfo
> > > CnnEnableInputTensor
> > > CnnInputTensor
> > > CnnInputTensorInfo
> > > CnnKpiInfo
> > >
> > > These controls will be used to support the new Raspberry Pi AI Camera,
> > > using an IMX500 sensor with on-board neural network processing.
> >
> > I think those controls should be reviewed in the context of the IMX500
> > kernel driver. That would also help with the libcamera policy that
> > drivers need to be on their way to mainline. When do you plan to post it
> > for review on the linux-media mailing list ?
> 
> The intention of these controls was to avoid tying them to the IMX500
> specifically and be generic.  Of course the only user of these
> currently would be the imx500, but there is no reliance on e.g. the
> IMX500 camera helper.
> 
> With regards to upstreaming, as soon as we have completed the streams
> API, I'll be posting the imx500, imx477 and imx708 drivers to
> linux-media.  However I have to be realistic with everyone, the IMX500
> driver with neural network functionality has close to zero chance of
> being accepted upstream.  We rely on closed firmware blobs to drive
> the DSP, the models are also closed blobs that are made with closed
> source (but freely available) software, and the output stream
> structure cannot be documented as it is network dependent.

Close to zero is small, but I wouldn't entirely rule it out. Maybe not
right now, but let's see how the situation will evolve.

>  So as not to waste everyone's time, I'll only be posting the imaging
> part of the imx500 driver for upstream. I can understand if this means
> you don't want to merge this patch upstream.  Let me know if you want
> this patch removed, and we can get the other patches in this series
> merged.

For the time being that would be preferable. I'm sorry about that.

> > > Signed-off-by: Naushir Patuck <naush at raspberrypi.com>
> > > ---
> > >  src/ipa/rpi/controller/controller.h |  33 +++++++++
> > >  src/libcamera/control_ids_rpi.yaml  | 108 ++++++++++++++++++++++++++++
> > >  2 files changed, 141 insertions(+)
> > >
> > > diff --git a/src/ipa/rpi/controller/controller.h b/src/ipa/rpi/controller/controller.h
> > > index 64f93f414524..489188b44d9b 100644
> > > --- a/src/ipa/rpi/controller/controller.h
> > > +++ b/src/ipa/rpi/controller/controller.h
> > > @@ -25,6 +25,39 @@
> > >
> > >  namespace RPiController {
> > >
> > > +/*
> > > + * The following structures are used to export the CNN input/output tensor information
> > > + * through the rpi::CnnOutputTensorInfo and rpi::CnnInputTensorInfo controls.
> > > + * Applications must cast the span to these structures exactly.
> > > + */
> > > +static constexpr unsigned int NetworkNameLen = 64;
> > > +static constexpr unsigned int MaxNumTensors = 16;
> > > +static constexpr unsigned int MaxNumDimensions = 16;
> > > +
> > > +struct OutputTensorInfo {
> > > +     uint32_t tensorDataNum;
> > > +     uint32_t numDimensions;
> > > +     uint16_t size[MaxNumDimensions];
> > > +};
> > > +
> > > +struct CnnOutputTensorInfo {
> > > +     char networkName[NetworkNameLen];
> > > +     uint32_t numTensors;
> > > +     OutputTensorInfo info[MaxNumTensors];
> > > +};
> > > +
> > > +struct CnnInputTensorInfo {
> > > +     char networkName[NetworkNameLen];
> > > +     uint32_t width;
> > > +     uint32_t height;
> > > +     uint32_t numChannels;
> > > +};
> > > +
> > > +struct CnnKpiInfo {
> > > +     uint32_t dnnRuntime;
> > > +     uint32_t dspRuntime;
> > > +};
> > > +
> > >  class Algorithm;
> > >  typedef std::unique_ptr<Algorithm> AlgorithmPtr;
> > >
> > > diff --git a/src/libcamera/control_ids_rpi.yaml b/src/libcamera/control_ids_rpi.yaml
> > > index 34bbdfc863c5..c0b5f63df525 100644
> > > --- a/src/libcamera/control_ids_rpi.yaml
> > > +++ b/src/libcamera/control_ids_rpi.yaml
> > > @@ -55,4 +55,112 @@ controls:
> > >          official libcamera API support for per-stream controls in the future.
> > >
> > >          \sa ScalerCrop
> > > +
> > > +  - CnnOutputTensor:
> > > +      type: float
> > > +      size: [n]
> > > +      description: |
> > > +        This control returns a span of floating point values that represent the
> > > +        output tensors from a Convolutional Neural Network (CNN). The size and
> > > +        format of this array of values is entirely dependent on the neural
> > > +        network used, and further post-processing may need to be performed at
> > > +        the application level to generate the final desired output. This control
> > > +        is agnostic of the hardware or software used to generate the output
> > > +        tensors.
> > > +
> > > +        The structure of the span is described by the CnnOutputTensorInfo
> > > +        control.
> > > +
> > > +        \sa CnnOutputTensorInfo
> > > +
> > > +  - CnnOutputTensorInfo:
> > > +      type: uint8_t
> > > +      size: [n]
> > > +      description: |
> > > +        This control returns the structure of the CnnOutputTensor. This structure
> > > +        takes the following form:
> > > +
> > > +        constexpr unsigned int NetworkNameLen = 64;
> > > +        constexpr unsigned int MaxNumTensors = 16;
> > > +        constexpr unsigned int MaxNumDimensions = 16;
> > > +
> > > +        struct CnnOutputTensorInfo {
> > > +          char networkName[NetworkNameLen];
> > > +          uint32_t numTensors;
> > > +          OutputTensorInfo info[MaxNumTensors];
> > > +        };
> > > +
> > > +        with
> > > +
> > > +        struct OutputTensorInfo {
> > > +          uint32_t tensorDataNum;
> > > +          uint32_t numDimensions;
> > > +          uint16_t size[MaxNumDimensions];
> > > +        };
> > > +
> > > +        networkName is the name of the CNN used,
> > > +        numTensors is the number of output tensors returned,
> > > +        tensorDataNum gives the number of elements in each output tensor,
> > > +        numDimensions gives the dimensionality of each output tensor,
> > > +        size gives the size of each dimension in each output tensor.
> > > +
> > > +        \sa CnnOutputTensor
> > > +
> > > +  - CnnEnableInputTensor:
> > > +      type: bool
> > > +      description: |
> > > +        Boolean to control if the IPA returns the input tensor used by the CNN
> > > +        to generate the output tensors via the CnnInputTensor control. Because
> > > +        the input tensor may be relatively large, for efficiency reason avoid
> > > +        enabling input tensor output unless required for debugging purposes.
> > > +
> > > +        \sa CnnInputTensor
> > > +
> > > +  - CnnInputTensor:
> > > +       type: uint8_t
> > > +       size: [n]
> > > +       description: |
> > > +        This control returns a span of uint8_t pixel values that represent the
> > > +        input tensor for a Convolutional Neural Network (CNN). The size and
> > > +        format of this array of values is entirely dependent on the neural
> > > +        network used, and further post-processing (e.g. pixel normalisations) may
> > > +        need to be performed at the application level to generate the final input
> > > +        image.
> > > +
> > > +        The structure of the span is described by the CnnInputTensorInfo
> > > +        control.
> > > +
> > > +        \sa CnnInputTensorInfo
> > > +
> > > +  - CnnInputTensorInfo:
> > > +      type: uint8_t
> > > +      size: [n]
> > > +      description: |
> > > +        This control returns the structure of the CnnInputTensor. This structure
> > > +        takes the following form:
> > > +
> > > +        constexpr unsigned int NetworkNameLen = 64;
> > > +
> > > +        struct CnnInputTensorInfo {
> > > +          char networkName[NetworkNameLen];
> > > +          uint32_t width;
> > > +          uint32_t height;
> > > +          uint32_t numChannels;
> > > +        };
> > > +
> > > +        where
> > > +
> > > +        networkName is the name of the CNN used,
> > > +        width and height are the input tensor image width and height in pixels,
> > > +        numChannels is the number of channels in the input tensor image.
> > > +
> > > +        \sa CnnInputTensor
> > > +
> > > +  - CnnKpiInfo:
> > > +      type: int32_t
> > > +      size: [2]
> > > +      description: |
> > > +        This control returns performance metrics for the CNN processing stage.
> > > +        Two values are returned in this span, the runtime of the CNN/DNN stage
> > > +        and the DSP stage in milliseconds.
> > >  ...

-- 
Regards,

Laurent Pinchart