[PATCH 4/6] controls: ipa: rpi: Add CNN controls
Laurent Pinchart
laurent.pinchart at ideasonboard.com
Wed Dec 18 00:00:05 CET 2024
Hi Naush,
On Mon, Dec 16, 2024 at 10:11:28AM +0000, Naushir Patuck wrote:
> On Sun, 15 Dec 2024 at 16:38, Laurent Pinchart wrote:
> > On Fri, Dec 13, 2024 at 09:38:27AM +0000, Naushir Patuck wrote:
> > > Add the follwing RPi vendor controls to handle Convolutional Neural
> > > Network processing:
> > >
> > > CnnOutputTensor
> > > CnnOutputTensorInfo
> > > CnnEnableInputTensor
> > > CnnInputTensor
> > > CnnInputTensorInfo
> > > CnnKpiInfo
> > >
> > > These controls will be used to support the new Raspberry Pi AI Camera,
> > > using an IMX500 sensor with on-board neural network processing.
> >
> > I think those controls should be reviewed in the context of the IMX500
> > kernel driver. That would also help with the libcamera policy that
> > drivers need to be on their way to mainline. When do you plan to post it
> > for review on the linux-media mailing list ?
>
> The intention of these controls was to avoid tying them to the IMX500
> specifically and be generic. Of course the only user of these
> currently would be the imx500, but there is no reliance on e.g. the
> IMX500 camera helper.
>
> With regards to upstreaming, as soon as we have completed the streams
> API, I'll be posting the imx500, imx477 and imx708 drivers to
> linux-media. However I have to be realistic with everyone, the IMX500
> driver with neural network functionality has close to zero chance of
> being accepted upstream. We rely on closed firmware blobs to drive
> the DSP, the models are also closed blobs that are made with closed
> source (but freely available) software, and the output stream
> structure cannot be documented as it is network dependent.
Close to zero is small, but I wouldn't entirely rule it out. Maybe not
right now, but let's see how the situation will evolve.
> So as not to waste everyone's time, I'll only be posting the imaging
> part of the imx500 driver for upstream. I can understand if this means
> you don't want to merge this patch upstream. Let me know if you want
> this patch removed, and we can get the other patches in this series
> merged.
For the time being that would be preferable. I'm sorry about that.
> > > Signed-off-by: Naushir Patuck <naush at raspberrypi.com>
> > > ---
> > > src/ipa/rpi/controller/controller.h | 33 +++++++++
> > > src/libcamera/control_ids_rpi.yaml | 108 ++++++++++++++++++++++++++++
> > > 2 files changed, 141 insertions(+)
> > >
> > > diff --git a/src/ipa/rpi/controller/controller.h b/src/ipa/rpi/controller/controller.h
> > > index 64f93f414524..489188b44d9b 100644
> > > --- a/src/ipa/rpi/controller/controller.h
> > > +++ b/src/ipa/rpi/controller/controller.h
> > > @@ -25,6 +25,39 @@
> > >
> > > namespace RPiController {
> > >
> > > +/*
> > > + * The following structures are used to export the CNN input/output tensor information
> > > + * through the rpi::CnnOutputTensorInfo and rpi::CnnInputTensorInfo controls.
> > > + * Applications must cast the span to these structures exactly.
> > > + */
> > > +static constexpr unsigned int NetworkNameLen = 64;
> > > +static constexpr unsigned int MaxNumTensors = 16;
> > > +static constexpr unsigned int MaxNumDimensions = 16;
> > > +
> > > +struct OutputTensorInfo {
> > > + uint32_t tensorDataNum;
> > > + uint32_t numDimensions;
> > > + uint16_t size[MaxNumDimensions];
> > > +};
> > > +
> > > +struct CnnOutputTensorInfo {
> > > + char networkName[NetworkNameLen];
> > > + uint32_t numTensors;
> > > + OutputTensorInfo info[MaxNumTensors];
> > > +};
> > > +
> > > +struct CnnInputTensorInfo {
> > > + char networkName[NetworkNameLen];
> > > + uint32_t width;
> > > + uint32_t height;
> > > + uint32_t numChannels;
> > > +};
> > > +
> > > +struct CnnKpiInfo {
> > > + uint32_t dnnRuntime;
> > > + uint32_t dspRuntime;
> > > +};
> > > +
> > > class Algorithm;
> > > typedef std::unique_ptr<Algorithm> AlgorithmPtr;
> > >
> > > diff --git a/src/libcamera/control_ids_rpi.yaml b/src/libcamera/control_ids_rpi.yaml
> > > index 34bbdfc863c5..c0b5f63df525 100644
> > > --- a/src/libcamera/control_ids_rpi.yaml
> > > +++ b/src/libcamera/control_ids_rpi.yaml
> > > @@ -55,4 +55,112 @@ controls:
> > > official libcamera API support for per-stream controls in the future.
> > >
> > > \sa ScalerCrop
> > > +
> > > + - CnnOutputTensor:
> > > + type: float
> > > + size: [n]
> > > + description: |
> > > + This control returns a span of floating point values that represent the
> > > + output tensors from a Convolutional Neural Network (CNN). The size and
> > > + format of this array of values is entirely dependent on the neural
> > > + network used, and further post-processing may need to be performed at
> > > + the application level to generate the final desired output. This control
> > > + is agnostic of the hardware or software used to generate the output
> > > + tensors.
> > > +
> > > + The structure of the span is described by the CnnOutputTensorInfo
> > > + control.
> > > +
> > > + \sa CnnOutputTensorInfo
> > > +
> > > + - CnnOutputTensorInfo:
> > > + type: uint8_t
> > > + size: [n]
> > > + description: |
> > > + This control returns the structure of the CnnOutputTensor. This structure
> > > + takes the following form:
> > > +
> > > + constexpr unsigned int NetworkNameLen = 64;
> > > + constexpr unsigned int MaxNumTensors = 16;
> > > + constexpr unsigned int MaxNumDimensions = 16;
> > > +
> > > + struct CnnOutputTensorInfo {
> > > + char networkName[NetworkNameLen];
> > > + uint32_t numTensors;
> > > + OutputTensorInfo info[MaxNumTensors];
> > > + };
> > > +
> > > + with
> > > +
> > > + struct OutputTensorInfo {
> > > + uint32_t tensorDataNum;
> > > + uint32_t numDimensions;
> > > + uint16_t size[MaxNumDimensions];
> > > + };
> > > +
> > > + networkName is the name of the CNN used,
> > > + numTensors is the number of output tensors returned,
> > > + tensorDataNum gives the number of elements in each output tensor,
> > > + numDimensions gives the dimensionality of each output tensor,
> > > + size gives the size of each dimension in each output tensor.
> > > +
> > > + \sa CnnOutputTensor
> > > +
> > > + - CnnEnableInputTensor:
> > > + type: bool
> > > + description: |
> > > + Boolean to control if the IPA returns the input tensor used by the CNN
> > > + to generate the output tensors via the CnnInputTensor control. Because
> > > + the input tensor may be relatively large, for efficiency reason avoid
> > > + enabling input tensor output unless required for debugging purposes.
> > > +
> > > + \sa CnnInputTensor
> > > +
> > > + - CnnInputTensor:
> > > + type: uint8_t
> > > + size: [n]
> > > + description: |
> > > + This control returns a span of uint8_t pixel values that represent the
> > > + input tensor for a Convolutional Neural Network (CNN). The size and
> > > + format of this array of values is entirely dependent on the neural
> > > + network used, and further post-processing (e.g. pixel normalisations) may
> > > + need to be performed at the application level to generate the final input
> > > + image.
> > > +
> > > + The structure of the span is described by the CnnInputTensorInfo
> > > + control.
> > > +
> > > + \sa CnnInputTensorInfo
> > > +
> > > + - CnnInputTensorInfo:
> > > + type: uint8_t
> > > + size: [n]
> > > + description: |
> > > + This control returns the structure of the CnnInputTensor. This structure
> > > + takes the following form:
> > > +
> > > + constexpr unsigned int NetworkNameLen = 64;
> > > +
> > > + struct CnnInputTensorInfo {
> > > + char networkName[NetworkNameLen];
> > > + uint32_t width;
> > > + uint32_t height;
> > > + uint32_t numChannels;
> > > + };
> > > +
> > > + where
> > > +
> > > + networkName is the name of the CNN used,
> > > + width and height are the input tensor image width and height in pixels,
> > > + numChannels is the number of channels in the input tensor image.
> > > +
> > > + \sa CnnInputTensor
> > > +
> > > + - CnnKpiInfo:
> > > + type: int32_t
> > > + size: [2]
> > > + description: |
> > > + This control returns performance metrics for the CNN processing stage.
> > > + Two values are returned in this span, the runtime of the CNN/DNN stage
> > > + and the DSP stage in milliseconds.
> > > ...
--
Regards,
Laurent Pinchart
More information about the libcamera-devel
mailing list