[libcamera-devel] [PATCH 2/3] pipeline: raspberrypi: Rework the internal buffer allocation scheme

Naushir Patuck naush at raspberrypi.com
Thu Nov 11 17:42:32 CET 2021


Hi Laurent,

On Thu, 11 Nov 2021 at 15:49, Laurent Pinchart <
laurent.pinchart at ideasonboard.com> wrote:

> Hi Naush,
>
> On Thu, Nov 11, 2021 at 02:58:21PM +0000, Naushir Patuck wrote:
> > On Thu, 11 Nov 2021 at 14:35, Laurent Pinchart wrote:
> > > On Wed, Nov 10, 2021 at 10:08:01AM +0000, Naushir Patuck wrote:
> > > > For simplicity, the pipeline handler would look at the maximum
> number of
> > > > buffers set in the StreamConfiguration and allocate the same number
> of internal
> > > > buffers for all device nodes. This would likely overallocate buffers
> for some
> > > > nodes. Rework this logic to try and minimise overallcations without
> compromising
> > > > performance.
> > > >
> > > > For ISP nodes, we only ever need 1 set of internal buffers, as the
> hardware runs
> > > > synchronous with the requests and IPA.
> > > >
> > > > For Unicam nodes, allocate a minimum for 4 buffers (exported +
> internal), but
> > > > also require at least 1 internal buffer.
> > > >
> > > > Signed-off-by: Naushir Patuck <naush at raspberrypi.com>
> > > > ---
> > > >  .../pipeline/raspberrypi/raspberrypi.cpp      | 44
> ++++++++++++++-----
> > > >  1 file changed, 33 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > > > index 11d3c2b120dd..5f0f00aacd59 100644
> > > > --- a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > > > +++ b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > > > @@ -1211,21 +1211,43 @@ int
> PipelineHandlerRPi::queueAllBuffers(Camera *camera)
> > > >  int PipelineHandlerRPi::prepareBuffers(Camera *camera)
> > > >  {
> > > >       RPiCameraData *data = cameraData(camera);
> > > > +     unsigned int numBuffers;
> > > I'd name the variable numRawBuffers.
> > >
> > > > +     bool rawStream = false;
> > > >       int ret;
> > > >
> > > > -     /*
> > > > -      * Decide how many internal buffers to allocate. For now,
> simply look
> > > > -      * at how many external buffers will be provided. We'll need
> to improve
> > > > -      * this logic. However, we really must have all streams
> allocate the same
> > > > -      * number of buffers to simplify error handling in
> queueRequestDevice().
> > >
> > > Does the error logic still work after this change ?
> >
> > Yes, the comment above has been bit-rotted, and this is handled correctly
> > now.
> >
> > > > -      */
> > > > -     unsigned int maxBuffers = 0;
> > > > -     for (const Stream *s : camera->streams())
> > > > -             if (static_cast<const RPi::Stream *>(s)->isExternal())
> > > > -                     maxBuffers = std::max(maxBuffers,
> s->configuration().bufferCount);
> > > > +     for (Stream *s : camera->streams()) {
> > > > +             if (isRaw(s->configuration().pixelFormat)) {
> > > > +                     numBuffers = s->configuration().bufferCount;
> > > > +                     rawStream = true;
> > > > +                     break;
> > > > +             }
> > > > +     }
> > > >
> > > > +     /* Decide how many internal buffers to allocate. */
> > > >       for (auto const stream : data->streams_) {
> > > > -             ret = stream->prepareBuffers(maxBuffers);
> > >
> > > And here add a numBuffers local variable.
> > >
> > > > +             if (stream == &data->unicam_[Unicam::Image] ||
> > > > +                 stream == &data->unicam_[Unicam::Embedded]) {
> > > > +                     /*
> > > > +                      * For Unicam, allocate a minimum of 4 buffers
> as we want
> > > > +                      * to avoid any frame drops. If an application
> has configured
> > > > +                      * a RAW stream, allocate additional buffers
> to make up the
> > > > +                      * minimum, but ensure we have at least 1 set
> of internal
> > > > +                      * buffers to use to minimise frame drops.
> > > > +                      */
> > > > +                     constexpr unsigned int minBuffers = 4;
> > > > +                     numBuffers = rawStream ? std::max<int>(1,
> minBuffers - numBuffers)
> > > > +                                            : minBuffers;
> > >
> > > You could initialize numRawBuffers to 0 and drop rawStream, as if
> > > !rawStream, then numRawBuffers will be equal to 0, so
> > >
> > >                         numBuffers = std::max<int>(1, minBuffers -
> numRawBuffers);
> > >
> > > would do the right thing.
> > >
> > > > +             } else {
> > > > +                     /*
> > > > +                      * Since the ISP runs synchronous with the IPA
> and requests,
> > > > +                      * we only ever need one set of internal
> buffers. Any buffers
> > > > +                      * the application wants to hold onto will
> already be exported
> > > > +                      * through
> PipelineHandlerRPi::exportFrameBuffers().
> > > > +                      */
> > > > +                     numBuffers = 1;
> > > > +             }
> > > > +
> > > > +             ret = stream->prepareBuffers(numBuffers);
> > >
> > > I have a hard time follow this. Before the patch, the pipeline handler
> > > calls prepareBuffers() with the number of buffers requested by the
> > > application. Now it calls it with a number of internal buffers only,
> > > without any change to the prepareBuffers() function itself. This would
> > > benefit from an explanation in the commit message, as it's not clear
> why
> > > it's correct.
> >
> > For clarification, prepareBuffers() would (and still will) allocate
> internal use
> > buffers, and call importBuffers() to allocate the buffer cache with the
> internal
> > + user requested buffer count. Buffer allocations for the user requested
> buffer
> > count goes through PipelineHandler::exportFrameBuffers().
>
> I'm looking at prepareBuffers():
>
> int Stream::prepareBuffers(unsigned int count)
> {
>         int ret;
>
>         if (!importOnly_) {
>                 if (count) {
>                         /* Export some frame buffers for internal use. */
>                         ret = dev_->exportBuffers(count,
> &internalBuffers_);
>                         if (ret < 0)
>                                 return ret;
>
>                         /* Add these exported buffers to the
> internal/external buffer list. */
>                         setExportedBuffers(&internalBuffers_);
>
>                         /* Add these buffers to the queue of internal
> usable buffers. */
>                         for (auto const &buffer : internalBuffers_)
>                                 availableBuffers_.push(buffer.get());
>                 }
>
>                 /* We must import all internal/external exported buffers.
> */
>                 count = bufferMap_.size();
>         }
>
>         return dev_->importBuffers(count);
> }
>
> In the !importOnly_ case, we add count buffers to bufferMap_ by a call
> to setExportedBuffers(). setExportedBuffers() is also called by
> PipelineHandlerRPi::exportFrameBuffers(), so we currently add
> max(s->configuration().bufferCount) buffers to any buffers already
> exported by the application. This certainly overallocates, as we
> essentially allocations bufferCount * 2 buffers.
>
> With this patch, overallocation is reduced. However, there's no
> requirement for applications to use the FrameBufferAllocator (which ends
> up calling PipelineHandlerRPi::exportFrameBuffers()) to allocate
> buffers. If an application allocates buffers through different means
> (for instance exporting them from a display device), then we will call
> dev_->importBuffers() with a very low count.
>

Yes, I see how this could go wrong.  Should I just assume the worst, and
pass VIDEO_MAX_FRAME into importBuffers.  Would that be advisable?


>
> > So this change (somewhat) decouples the user count from the internal
> count.
> > I will clarify this in the commit message.
> >
> > > Furthermore, what will happen if an application requests 4 buffers for
> > > the raw stream and any number of buffers for a processed streams, and
> > > repeatedly queues requests with no buffer for the raw stream ? It seems
> > > to me that you'll now allocate a single extra buffer for the raw
> stream,
> > > which won't be enough to keep the unicam queue running.
> >
> > This is a tricky one.  Without any hints from the application as to what
> it intends
> > to do, I have to balance memory usage with performance.  You are right
> that
> > with only a single internal buffer allocated, the performance may be
> degraded
> > if the application does not provide buffers through the Request.
> However,
> > If the application has allocated 4x 18Mbyte buffers (12mpix 12-bit), I
> really don't
> > want to be allocating 2x more of them when they possibly may never be
> used
> > (assuming the application is circulating all these buffers constantly).
> I would hope
> > application writers would be using those buffers sensibly :-)
> >
> > Some of our platforms have very small CMA heaps to work with, memory will
> > get very tight for these use-cases, so I do want to limit memory usage
> at the
> > expense of performance.  Some hints from the application on how it
> intends to
> > use buffers might help here with this balance.
>
> I agree with you on the problem statement, but I'm wondering if we
> shouldn't start by implementing the safe option to avoid frame drops,
> and then improve performance on top (possibly with a hint API).
>

I think maybe we can start with 2 internal buffer allocations, then see if
that may ever need reducing.

Regards,
Naush


>
> > David, do you think we can bump the internal allocation up to 2 buffers?
> >
> > > >               if (ret < 0)
> > > >                       return ret;
> > > >       }
>
> --
> Regards,
>
> Laurent Pinchart
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.libcamera.org/pipermail/libcamera-devel/attachments/20211111/a5a1672a/attachment-0001.htm>


More information about the libcamera-devel mailing list