[libcamera-devel] Constraints on the sensor subdevice

Mon Feb 24 06:22:16 CET 2020

Hi Dave,

A bit later than I planned, but before you get to the office after the
weekend, so hopefully still in time.

Please let me first say that I'm sorry for not chiming in earlier in
this discussion, I should have kept an eye on it.

On Fri, Feb 21, 2020 at 05:02:01PM +0000, Dave Stevenson wrote:
> On Fri, 21 Feb 2020 at 15:42, Jacopo Mondi wrote:
> > On Fri, Feb 21, 2020 at 02:48:34PM +0000, Dave Stevenson wrote:
> > > Hi Jacopo
> > >
> > > I'll preface this with please don't take it as a personal attack!
> >
> > Absolutely :) I'm sorry if anything I've said made you feel like you
> > have to specify this.
> 
> It was more in case my words could be misconstrued, or were typed in
> haste. It's either that or I end up redrafting the email a dozen
> times, and still sending out something that can be misinterpreted!
> 
> > > It's not meant as one, and most uses of the word "you" are more the
> > > V4L2 spec/behaviour than any individual.
> >
> > I don't take any blame for the V4L2 specs :D
> 
> I wish someone would :-D

I do take the blame for part of it. I have an excuse, I didn't foresee
that Nokia would pull the plug while we were in the middle of the media
controller development, before we had time to work on the userspace part
of the framework. Still, it's partly my fault, and I'm now trying to
redeem my sins by creating libcamera. Reading the backlog it seems I'm
not entirely successful yet :-)

> > > On Fri, 21 Feb 2020 at 12:04, Jacopo Mondi <jacopo at jmondi.org> wrote:
> > > > On Fri, Feb 21, 2020 at 11:25:05AM +0000, Dave Stevenson wrote:
> > > >
> > > > [snip]
> > > >
> > > > > > > If I got you right here, on your platform the image format selected on
> > > > > > > the CSI-2 receiver gets propagated to the sensor subdevice by
> > > > > > > performing the 4cc->mbus translation logic in kernel space. This
> > > > > > > implies the sensor subdevice and the CSI-2 receiver drivers have to be
> > > > > > > tightly coupled, unless complex mbus selection heuristics are
> > > > > > > implemented in the CSI-2 receiver driver to support a wide range of
> > > > > > > sensor drivers.
> > > > > >
> > > > > > Yes, this is correct.  The CSI-2 Rx and sensor subdevice are
> > > > > > inherently coupled on our platform.
> > > > > >
> > > > > > There is a distinction to make here though.  A CSI-2 Rx could either
> > > > > > be directly connected to an ISP pipeline, or dump out image data into
> > > > > > memory to be read back by an ISP.  For the latter case, it might also
> > > > > > be possible for the CSI-2 block to do some simple format conversion,
> > > > > > or even rescaling.  In our platform, we only allow our CSI-2 RX to
> > > > > > optionally do a data unpack operation (so a format conversion).  So,
> > > > > > during probe, the kernel driver enumerates the subdevice formats, and
> > > > > > generates a list of V4L formats to advertise - and these formats may
> > > > > > be duplicated to represent packed and unpacked options.

Nitpicking a little bit, but I think it's important, there's not always
a clear cut between the two cases when it comes to complexity. A CSI-2
receiver, as you mentioned may have a scaler, and perform format
conversion, sometimes with a colour space conversion engine, or other
additional processing. The boundary is fuzzy, I thus prefer not saying
that no ISP necessarily implies an easy to configure device.

An additional important point is that a device consisting solely of a
CSI-2 receiver and a DMA engine should always be considered to perform
format conversion, as it converts from a bus format to a memory format.
In the simplest case it will pack the bits it receives on CSI-2 as bytes
and just dump them in memory, but the point is that the conversion
between a format on the CSI-2 bus and the format in memory is a property
of the device, regardless of whether the device supports a single or
multiple memory formats for a given CSI-2 bus format. Let's keep this in
mind for the conversation below.

> > > > > > > This seems not standard behaviour to me, but I can't tell if upstream
> > > > > > > would be fine with it or not. From a libcamera perspective, you won't
> > > > > > > be able to use CameraSensor::getFormat() but you will have to select
> > > > > > > which format to set on the CSI-2 receiver taking into account the mbus
> > > > > > > selection logic performed by the same driver.
> > > > > > >
> > > > > > > Can I ask if you have any requirement that prevents you from decoupling that
> > > > > > > logic in kernel space, and leave the mbus code selection on the sensor
> > > > > > > to userspace ? It seems a lot of complication in the driver that
> > > > > > > should instead live in your pipeline handler...
> > > > > >
> > > > > > With our approach described above, there is no need to apply a MBUS
> > > > > > format on the sensor subdevice as well as a V4L format on the CSI-2 Rx
> > > > > > device as they are inherently the same operation, if that makes sense
> > > > > > :)
> > > > > >
> > > > > > For more complicated topologies (where the CSI-2 Rx could rescale for
> > > > > > example), I understand that we may want to decouple this, but for now,
> > > > > > MBUS codes are essentially hidden from our pipeline handler.
> > > > >
> > > > > I originally wrote this driver based on the TI VPFE [1]. That's a
> > > > > non-MC device, and suited our requirements nicely.
> > > > > From an application perspective we want the simple use cases, so you
> > > > > set the format and that is what you get. Beyond repacking from eg
> > > > > V4L2_PIX_FMT_SBGGR10P to V4L2_PIX_FMT_SBGGR10 there is no image
> > > > > processing going on.
> > > > >
> > > > > The change to use media controller was actually your libcamera
> > > > > requirement for enumeration of devices. We have no need for it, so
> > > > > would sooner rip it back out.
> > > > > Complicating userspace apps for such a simple hardware block is nuts.

This is our first misunderstanding.

The media controller API in itself doesn't complicate userspace, as what
it is really about is exposing the internal topology of devices to
userspace. That's mostly a read-only API (with the exception of the
MEDIA_IOC_SETUP_LINK ioctl that I will ignore for now), and completely
optional for userspace. It is however very often couple with the V4L2
subdev userspace API (and the MEDIA_IOC_SETUP_LINK ioctl), which allows
full control of the subdevices in the pipeline from userspace. That API
is much more complex, and deciding to expose it in a driver has pros and
cons.

The only API that libcamera requires, as you have noted below, is the
media controller API, in its simple read-only form. This decision was
based on the fact that many V4L2 drivers already implement the MC API
(usually along with the V4L2 subdev userspace API), and adding support
for the MC API itself (the read-only part) to a driver is very simple
(see the uvcvideo driver for an example, it's about 160 lines of code in
total, including the #ifdef's to make it optional). It offers us a
single API to enumerate all devices in the system. There's no hard
requirement for any of those devices to expose the V4L2 subdev userspace
API, or even to use V4L2 at all (albeit without another camera API,
there's not much choice upstream, but that's a different story).

We can thus discuss on the best option for this driver when it comes to
the V4L2 subdev userspace API, but let's not ditch MC completely as it
gives as an enumeration API at a negligible cost.

> > > > I'm sorry I really don't agree here, but I don't think this comes
> > > > unexpected :)
> > > >
> > > > Implementing heuristics for 4cc->mbus code translation in kernel space
> > > > requires the csi-2 receiver and the sensor subdevice driver to be
> > > > somehow coupled,
> > >
> > > Please define what you mean by coupling?
> > > They are coupled in that they are given handles to each other via
> > > device tree. They have to be coupled to start/stop streaming if
> > > nothing else. V4L2 subdev API calls are made between the two. The V4L2
> > > subdev API also defines pad operations for kernel drivers to set and
> > > get formats.
> >
> > Ok, let me take a step back and make this email thread a nice exchange of
> > personal views, as that's what I can provide :)
> >
> > In the ideal world where I would like to live, all new code that gets
> > upstreamd is MC centric. The number of subdev ops calls between driver
> > is minimized and most of the configuration is performed from
> > userspace, in order to finally end the 'single video device node'
> > abstraction, which is limiting what applications can do on each single
> > piece of the pipeline, is limiting the number of feature each block
> > can expose, is blocking applications from implementing advanced use
> > cases and overall is limiting V4L2 from being suitable to support most
> > of modern and complex camera usages out there (I know, this is a
> > simplification, but please bear with me, I know you are not in love
> > with the V4L2 API too :)
> >
> > I myself have upstreamed a driver for a legacy platform which does
> > exactly what you describe, so I'm not saying "this is wrong" by
> > itself. What I'm saying is that, as you're now looking into libcamera,
> > you have a space for implementing that logic in uspace, freeing your
> > driver from any heuristic and let them just implement mechanisms to
> > apply what user requires to HW, and let drivers focus on exposing
> > 'features' to application instead of implementing 'logic'
> >
> > I know this is theory, I know it's nice on paper, I know when you have
> > customers the world outside there looks very different, but a man can
> > dream, can't he ?
> 
> I can hear the snoring as you dream away ;-)
> 
> If MC had really gained any traction in userspace then that theory
> would be great, but it never has, so we're slightly stuck.
> 
> > > > or better, it's very hard to guarantee that your csi-2
> > > > driver format selection logic work with all sensor drivers out there.
> > > > Am I wrong ?
> > >
> > > Looking in the mainline tree I can see:
> > > - TI am437x
> > > - Atmel ISC/ISI
> > > - Exynos4-is
> > > - possibly via-camera
> > > - (TI) cal
> > > all of which have a table of V4L2_PIX_FMT_ to MEDIA_BUS_FMT_ format conversions.
> >
> > None of them is exactly a new implementation. I think they all come
> > from the 'single video device node' era, and have been ported to MC
> > later, but I get your point: you're not doing anything particularly
> > nasty or not in line with existing code in v4l2, I agree.
> 
> A quick check says none of them use MC except Exynos4-is, probably
> because of this enforced userspace behaviour change if you do enable
> it :-/
> 
> > > Yes there is the potential for missing one, but we've already worked
> > > around the slight craziness of there being both
> > > MEDIA_BUS_FMT_YUYV8_1X8 and MEDIA_BUS_FMT_YUYV8_1X16 and similar (CSI
> > > is really 16X1 as well seeing as it's serial!).
> > > Where is the *obligation* to update libcamera format selection logic
> > > when one adds a new MBUS format to the kernel?
> > >
> > > > Delegating that to userspace would allow you to de-couple the two drivers
> > > > completely, but I understand libcamera was not there when you first
> > > > designed that. Now that you have a place in userspace where to
> > > > implement those heuristics (your pipeline handler) I really would not
> > > > push them to kernel space.
> > >
> > > Where's the obligation to use libcamera in all situations?!
> >
> > Ah well, we aim to conquer the world, you should know that already :)

There's no obligation to use libcamera in all situations, let's be very
clear about this. We don't want to conquer the world by forcing anyone,
we are trying to create a userspace camera stack that will be as useful
for as many use cases as possible.

You mention below the case of the HDMI or analog to CSI-2 converters,
and the existing userspace implementations that make use of them, and
there is clearly no reason to break them. The challenge is to keep
supporting them while making the driver implementation evolve to support
new use cases (both from a kernel point of view and from a userspace
point of view), but if there was no challenge, we'd be bored :-)

> > > Forget the complex camera use case, we have a trivially simple
> > > hardware pipeline. CSI2 data is sent from a source and is written to
> > > memory. Why should userspace be forced to worry about media controller
> > > and MBUS formats at all?
> >
> > Ah, this question have been asked multiple times from many vendors,
> > and I really get your point: Why should I do a complex thing and
> > complicate my users lives if I can get away with a simpler solution.
> >
> > My perspective might be mistaken because I thought we where talking
> > about a new design. Nobody is asking you to trash whatever you have
> > and I will always be on the "better to have good code upstream than
> > perfect code out-of-tree" boat.
> >
> > > We have existing pipelines working with bcm2835-unicam using the
> > > Toshiba TC358743 HDMI to CSI2 chip, and Analog Devices ADV728xM
> > > analogue video to CSI chip that should only be broken with very good
> > > reason. In both cases you have to set the timings/standard first, but
> > > then it streams data on request.
> > > GStreamer's v4l2src works very nicely with it, and nothing worries
> > > about media controller, and why should it. GStreamer can switch
> > > merrily between the UYVY or BGR3 output formats of TC358743 via
> > > V4L2_PIX_FMT_xxx as they get forwarded to the TC358743 driver (as
> > > MEDIA_BUS_FMT_xxx) to action.
> > >
> > > With the behaviour of media controller you're now proposing,
> > > GStreamer, and FFmpeg, and ALL other clients every have to talk to
> > > media controller to set up that format change. I don't see code
> > > supporting that in v4l2src, so I have an issue that has to be
> > > resolved.
> >
> > v4l2src does not know anything about MC and we have a libcamera gstreamer
> > element on the list, but yes, it is still a new thing.
> 
> I know libcamera is coming, and with GStreamer support, but forcing
> changes on users gets serious pushback.
> 
> > > Paragraph 8 of [1] does state
> > > "Formats are not propagated across links, as that would involve
> > > propagating them from one sub-device file handle to another.
> > > Applications must then take care to configure both ends of every link
> > > explicitly with compatible formats. Identical formats on the two ends
> > > of a link are guaranteed to be compatible. Drivers are free to accept
> > > different formats matching device requirements as being compatible."
> > > which to my mind is barking mad in the simple case, and puts a
> > > significant burden on all apps if the pipeline is simple.
> >
> > We might get into philosophy here, but if you are considering a new
> > design, defining it with the "simple" use case only in mind is really
> > tying your hand in regards to any future-proof use case.
> >
> > If instead you have a lot of existing code, a lot of established
> > users, and you feel like you need to trash it away just to make us
> > pesky libcamera people happy, that's not what I meant to say at all.
> 
> It's probably not a huge number of existing users, but they are there,
> and people whinge like mad if you break their systems!
> 
> The unicam driver has been kicking around for a couple of years, and
> keeps getting fixed as my understanding of V4L2 is frequently
> corrected or Hans updates the compliance tests to expose yet another
> wrong assumption by me!
> 
> > > How does one support the simple applications if the underlying device
> > > insists on using media controller?
> >
> > libcamera :D ?
> 
> **rolls eyes** ;-)
> 
> > > [1] https://linuxtv.org/downloads/v4l-dvb-apis-new/uapi/v4l/dev-subdev.html#format-negotiation
> > >
> > > A question in return. Other than enumeration, what does media
> > > controller gain us in this situation?
> > > We don't need the request API.
> > > We don't need to configure/validate any links or set/get pad formats.

The V4L2 subdev userspace API bring finer-grained configuration of the
pipeline. Whether you need it or not depends on your pipeline, and quite
importantly on the part of the pipeline before the CSI-2 receiver.

To give you a bit of context, the API was developed between the Nokia
N900 and N9, which were both based on the same OMAP3 family. On the
N900, the kernel driver hardcoded use cases, and exposed three V4L2
video device nodes, one for still image capture, one for video capture,
and one for the viewfinder. On the N9 userspace required new use cases,
and we realized that this would require very intrusive changes on the
kernel side. That's what prompted us to design this API, to expose all
the features of the hardware to userspace, and let userspace decide how
to use it.

Back to our case, unicam is simple, but you don't control what a user
may connect in front. Even for camera sensors, a SMIA(++)- or
CCS-compatible camera sensor can support cropping at multiple stages, as
well skipping, binning, and scaling. The smiapp driver doesn't hardcode
any mode, it exposes all the configuration parameters instead. You can
control those parameters from either kernelspace or userspace, but if
you do so in kernelspace, unicam will have to hardcode use cases, as the
V4L2 video node API can't express all the configurability of the sensor.

The situation is thus not all black or white, and the V4L2 subdev
userspace API can be useful, even for unicam. It however requires more
complexity in userspace, which is an unsolved issue. libcamera will
solve it for some use cases, but HDMI or analog capture still needs to
be considered. One step at a time though, this is long term development,
and while we need to make sure not to break existing use cases on the
Raspberry Pi platforms, we should think about future development too.

> > I understand, and again, I thought we were talking about a new design,
> > and I didn't get why you wanted so badly to force everything to go
> > through a single device node.
> >
> > The benefit of MC are self evident to me in the perspective of
> > designing and implementing new code that is able to support any future
> > extension to allow more complex use cases to happen on my platform.
> >
> > I understand you have a huge user base which just care of the most
> > simple use case, and I understand for them, and for you
> > consequentially, the benefits are secondary compared to the
> > complications.
> >
> > > > Again, my 2c only, the linux-media community will provide much more
> > > > interesting feedbacks then mine :)
> > >
> > > At the moment I'm seeing us pushing the driver without media
> > > controller, and we'll have to come up with an alternative mechanism
> > > for enumeration.
> > > Media controller support can come later because it is going to create
> > > regressions. Possibly make media controller usage optional? Switch
> > > based on device tree, or perhaps based on whether /dev/mediaX is
> > > opened or not. (but how do you tell if that is the same app that is
> > > asking for /dev/videoX?!)
> > > Perhaps we need to look at uvcvideo as that supports media controller
> > > and yet allows setting resolution and seemingly it propagating to the
> > > sensor device. Does it only get away with this as the sub entities are
> > > not V4L2 subdevices? It seems to be a very similar situation.
> >
> > I happly defer this question to linux-media, Laurent and Kieran.
> 
> Kieran added the MC support for us, so the format setting change
> really stems from that. I wasn't aware of the required behaviour
> change at that time :-(
> 
> It'd be nice to know Laurent and Kieran's take on this, particularly
> Laurent's with his knowledge of UVC and how that avoids this.
> 
> One thought that we're just trying. If we don't call
> v4l2_device_register_subdev_nodes to register the subdevices, then
> media controller still exists but logically it still makes sense in
> that the format is set via the video node. No subdevice nodes means
> you can't set the format any other way.
> All still a bit of a kludge, but may be a solution that is vaguely acceptable.

It's more than vaguely acceptable, it's the recommended solution, as
explained above.

There's a few caveats though. We've discussed before the need for the
IPA to know the cropping, binning and skipping parameters associated
with a sensor mode. My preference would be for those parameters to be
configurable (as with the smiapp driver), but in practice, most sensor
drivers expose a fixed list of modes instead. In that case, I think the
parameters should be exposed by the sensor. With unicam enabling the
V4L2 subdev userspace API, we could use the get format and selection
ioctls on the sensor subdev to get those parameters (assuming that the
corresponding sensor driver would implement the get selection ioctl,
which is something we should be able to do easily). However, that would
allow setting the configuration from userspace, which isn't something
you desire. One option could be to extend the
v4l2_device_register_subdev_nodes() function with a read-only flag that
would only enable the subdev query ioctls. Do you think that would be a
good way forward ?

> > Hope you get that all the above is my personal interpretation and it
> > is not my intention to define any upstream acceptance criteria for
> > your driver nor for libcamera.
> 
> Fully understood. I know I'm the outsider with my own preconceived
> views too :-)

Your points are valid.

Now, to answer Naush' initial questions:

- We can relax the constraints on the CameraSensor class to support
  sensors with multiple pads (possibly temporarily until the situation
  changes on the kernel side, with proper support for multiplexed
  streams).

- The rationale behind CameraSensor::getFormat() was to provide a helper
  for pipeline handlers to pick an appropriate sensor configuration. As
  Jacopo correctly described, the pipeline handler is the piece of code
  that is supposed to have knowledge of the whole pipeline, including
  the sensor, and to make decisions on how to configure the pipeline
  appropriately. We envisioned that this would require heuristics, and
  getFormat() was developed to avoid duplicating those heuristics in
  pipeline handlers in slightly different (and in many cases probably
  incorrect) ways. However, I do agree that the current implementation
  isn't a good fit in all cases. We can possibly extend it with more
  parameters (such as scaling hints for instance), but in any case
  pipeline handlers can bypass the helper and pick an appropriate
  resolution themselves if really needed.

- The reason why we don't expose the full map of media bus codes to
  sizes it because of the complexity it would incur in pipeline
  handlers. For the same reason as explained above, we want to avoid too
  complex heuristics being implemented multiple times. I understand that
  you need more information, and we can extend the API to provide it,
  but at the same time I'd like to think about how we could create
  helpers for pipeline handlers to deal with the additional complexity.
  Do you have any opinion on this topic ?

Finally, let's also note that if we disable the V4L2 subdev userspace
API, you won't get access to any format from the CameraSensor class, as
it won't be able to query the information from the kernel. I'm thus
really tempted to investigate the read-only API extension option. What's
your opinion on this idea ?

-- 
Regards,

Laurent Pinchart