[libcamera-devel] [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification

Sat May 6 10:12:29 CEST 2023

Hello,

On Fri, May 05, 2023 at 04:55:33PM +0100, Alex Bennée via libcamera-devel wrote:
> Kieran Bingham writes:
> 
> > Hi All,
> >
> > Coming in late, thanks to lei/lore spotting the libcamera keyword.
> >
> > + Cc: libcamera-devel to raise awareness of the discussion there.
> >
> > Quoting Alexander Gordeev (2023-05-05 10:57:29)
> >> On 03.05.23 17:53, Cornelia Huck wrote:
> >> > On Wed, May 03 2023, Alex Bennée <alex.bennee at linaro.org> wrote:
> >> >> Cornelia Huck <cohuck at redhat.com> writes:
> >> >>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev at opensynergy.com> wrote:
> >> >>>> On 27.04.23 15:16, Alexandre Courbot wrote:
> >> >>>>> But in any case, that's irrelevant to the guest-host interface, and I
> >> >>>>> think a big part of the disagreement stems from the misconception that
> >> >>>>> V4L2 absolutely needs to be used on either the guest or the host,
> >> >>>>> which is absolutely not the case.
> >> >>>>
> >> >>>> I understand this, of course. I'm arguing, that it is harder to
> >> >>>> implement it, get it straight and then maintain it over years. Also it
> >> >>>> brings limitations, that sometimes can be workarounded in the virtio
> >> >>>> spec, but this always comes at a cost of decreased readability and
> >> >>>> increased complexity. Overall it looks clearly as a downgrade compared
> >> >>>> to virtio-video for our use-case. And I believe it would be the same for
> >> >>>> every developer, that has to actually implement the spec, not just do
> >> >>>> the pass through. So if we think of V4L2 UAPI pass through as a
> >> >>>> compatibility device (which I believe it is), then it is fine to have
> >> >>>> both and keep improving the virtio-video, including taking the best
> >> >>>> ideas from the V4L2 and overall using it as a reference to make writing
> >> >>>> the driver simpler.
> >> >>>
> >> >>> Let me jump in here and ask another question:
> >> >>>
> >> >>> Imagine that, some years in the future, somebody wants to add a virtio
> >> >>> device for handling video encoding/decoding to their hypervisor.
> >> >>>
> >> >>> Option 1: There are different devices to chose from. How is the person
> >> >>> implementing this supposed to pick a device? They might have a narrow
> >> >>> use case, where it is clear which of the devices is the one that needs to
> >> >>> be supported; but they also might have multiple, diverse use cases, and
> >> >>> end up needing to implement all of the devices.
> >> >>>
> >> >>> Option 2: There is one device with various optional features. The person
> >> >>> implementing this can start off with a certain subset of features
> >> >>> depending on their expected use cases, and add to it later, if needed;
> >> >>> but the upfront complexity might be too high for specialized use cases.
> >> >>>
> >> >>> Leaving concrete references to V4L2 out of the picture, we're currently
> >> >>> trying to decide whether our future will be more like Option 1 or Option
> >> >>> 2, with their respective trade-offs.
> >> >>>
> >> >>> I'm slightly biased towards Option 2; does it look feasible at all, or
> >> >>> am I missing something essential here? (I had the impression that some
> >> >>> previous confusion had been cleared up; apologies in advance if I'm
> >> >>> misrepresenting things.)
> >> >>>
> >> >>> I'd really love to see some kind of consensus for 1.3, if at all
> >> >>> possible :)
> >> >>
> >> >> I think feature discovery and extensibility is a key part of the VirtIO
> >> >> paradigm which is why I find the virtio-v4l approach limiting. By
> >> >> pegging the device to a Linux API we effectively limit the growth of the
> >> >> device specification to as fast as the Linux API changes. I'm not fully
> >> >> immersed in v4l but I don't think it is seeing any additional features
> >> >> developed for it and its limitations for camera are one of the reasons
> >> >> stuff is being pushed to userspace in solutions like libcamera:
> >> >>
> >> >>    How is libcamera different from V4L2?
> >> >>
> >> >>    We see libcamera as a continuation of V4L2. One that can more easily
> >> >>    handle the recent advances in hardware design. As embedded cameras have
> >> >>    developed, all of the complexity has been pushed on to the developers.
> >> >>    With libcamera, all of that complexity is simplified and a single model
> >> >>    is presented to application developers.
> >> >
> >> > Ok, that is interesting; thanks for the information.
> >> >
> >> >>
> >> >> That said its not totally our experience to have virtio devices act as
> >> >> simple pipes for some higher level protocol. The virtio-gpu spec says
> >> >> very little about the details of how 3D devices work and simply offers
> >> >> an opaque pipe to push a (potentially propriety) command stream to the
> >> >> back end. As far as I'm aware the proposals for Vulkan and Wayland
> >> >> device support doesn't even offer a feature bit but simply changes the
> >> >> graphics stream type in the command packets.
> >> >>
> >> >> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
> >> >> incompatible with other feature bits and make that the baseline
> >> >> implementation but it's not really in the spirit of what VirtIO is
> >> >> trying to achieve.
> >> >
> >> > I'd not be in favour of an incompatible feature flag,
> >> > either... extensions are good, but conflicting features is something
> >> > that I'd like to avoid.
> >> >
> >> > So, given that I'd still prefer to have a single device: How well does
> >> > the proposed virtio-video device map to a Linux driver implementation
> >> > that hooks into V4L2?
> >> 
> >> IMO it hooks into V4L2 pretty well. And I'm going to spend next few
> >> months making the existing driver fully V4L2 compliant. If this goal
> >> requires changing the spec, than we still have time to do that. I don't
> >> expect a lot of problems on this side. There might be problems with
> >> Android using V4L2 in weird ways. Well, let's see. Anyway, I think all
> >> of this can be accomplished over time.
> >> 
> >> > If the general process flow is compatible and it
> >> > is mostly a question of wiring the parts together, I think pushing that
> >> > part of the complexity into the Linux driver is a reasonable
> >> > trade-off. Being able to use an existing protocol is nice, but if that
> >> > protocol is not perceived as flexible enough, it is probably not worth
> >> > encoding it into a spec. (Similar considerations apply to hooking up the
> >> > device in the hypervisor.)
> >> 
> >> I very much agree with these statements. I think this is how it should
> >> be: we start with a compact but usable device, then add features and
> >> enable them using feature flags. Eventually we can cover all the
> >> use-cases of V4L2 unless we decide to have separate devices for them
> >> (virtio-camera, etc). This would be better in the long term I think.
> >
> > Camera's definitely have their quirks - mostly because many usecases are
> > hard to convey over a single Video device node (with the hardware) but I
> > think we might expect that complexity to be managed by the host, and
> > probably offer a ready made stream to the guest. Of course how to handle
> > multiple streams and configuration of the whole pipeline may get more
> > difficult and warrant a specific 'virtio-camera' ... but I would think
> > the basics could be covered generically to start with.
> >
> > It's not clear who's driving this implementation and spec, so I guess
> > there's more reading to do.
> >
> > Anyway, I've added Cc libcamera-devel to raise awareness of this topic
> > to camera list.
> >
> > I bet Laurent has some stronger opinions on how he'd see camera's exist
> > in a virtio space.

You seem to think I have strong opinions about everything. This may not
be a complitely unfounded assumption ;-)

Overall I agree with you, I think cameras are too complex for a
low-level virtualization protocol. I'd rather see a high-level protocol
that exposes webcam-like devices, with the low-level complexity handled
on the host side (using libcamera of course ;-)). This would support use
cases that require sharing hardware blocks between multiple logical
cameras, including sharing the same camera streams between multiple
guests.

If a guest needs low-level access to the camera, including the ability
to control the raw camera sensor or ISP, then I'd recommend passing the
corresponding hardware blocks to the guest for exclusive access.

> Personally I would rather see a separate virtio-camera specification
> that properly encapsulates all the various use cases we have for
> cameras. In many ways just processing a stream of video is a much
> simpler use case.
> 
> During Linaro's Project Stratos we got a lot of feedback from members
> who professed interest in a virtio-camera initiative. However we were
> unable to get enough engineering resources from the various companies to
> collaborate in developing a specification that would meet everyone's
> needs. The problem space is wide from having numerous black and white
> sensor cameras on cars to the full on computational photography as
> exposed by modern camera systems on phones. If you want to read more
> words on the topic I wrote a blog post at the time:
> 
>   https://www.linaro.org/blog/the-challenges-of-abstracting-virtio/
> 
> Back to the topic of virtio-video as I understand it the principle
> features/configurations are:
> 
>   - All the various CODECs, resolutions and pixel formats
>   - Stateful vs Stateless streams
>   - If we want support grabbing single frames from a source
> 
> My main concern about the V4L approach is that it pegs updates to the
> interface to the continuing evolution of the V4L interface in Linux. Now
> maybe video is a solved problem and there won't be (m)any new features
> we need to add after the initial revision. However I'm not a domain
> expert here so I just don't know.

I've briefly discussed "virtio-v4l2" with Alex Courbot a few weeks ago
when we got a chance to meet face to face. I think the V4L2 kernel API
is a quite good fit in the sense that its level of abstraction, when
applied to video codecs and "simple" cameras (defined, more or less, as
something ressembling a USB webcam feature-wise). It doesn't mean that
the virtio-video or virtio-camera specifications should necessarily
reference V4L2 or use the exact same vocabulary, they could simply copy
the concepts, and stay loosely-coupled with V4L2 in the sense that both
specification should try to evolve in compatible directions.

-- 
Regards,

Laurent Pinchart