[libcamera-devel] [PATCH 2/3] pipeline: raspberrypi: Rework the internal buffer allocation scheme

Laurent Pinchart laurent.pinchart at ideasonboard.com
Thu Nov 11 16:48:58 CET 2021


Hi Naush,

On Thu, Nov 11, 2021 at 02:58:21PM +0000, Naushir Patuck wrote:
> On Thu, 11 Nov 2021 at 14:35, Laurent Pinchart wrote:
> > On Wed, Nov 10, 2021 at 10:08:01AM +0000, Naushir Patuck wrote:
> > > For simplicity, the pipeline handler would look at the maximum number of
> > > buffers set in the StreamConfiguration and allocate the same number of internal
> > > buffers for all device nodes. This would likely overallocate buffers for some
> > > nodes. Rework this logic to try and minimise overallcations without compromising
> > > performance.
> > >
> > > For ISP nodes, we only ever need 1 set of internal buffers, as the hardware runs
> > > synchronous with the requests and IPA.
> > >
> > > For Unicam nodes, allocate a minimum for 4 buffers (exported + internal), but
> > > also require at least 1 internal buffer.
> > >
> > > Signed-off-by: Naushir Patuck <naush at raspberrypi.com>
> > > ---
> > >  .../pipeline/raspberrypi/raspberrypi.cpp      | 44 ++++++++++++++-----
> > >  1 file changed, 33 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > > index 11d3c2b120dd..5f0f00aacd59 100644
> > > --- a/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > > +++ b/src/libcamera/pipeline/raspberrypi/raspberrypi.cpp
> > > @@ -1211,21 +1211,43 @@ int PipelineHandlerRPi::queueAllBuffers(Camera *camera)
> > >  int PipelineHandlerRPi::prepareBuffers(Camera *camera)
> > >  {
> > >       RPiCameraData *data = cameraData(camera);
> > > +     unsigned int numBuffers;
> > I'd name the variable numRawBuffers.
> >
> > > +     bool rawStream = false;
> > >       int ret;
> > >
> > > -     /*
> > > -      * Decide how many internal buffers to allocate. For now, simply look
> > > -      * at how many external buffers will be provided. We'll need to improve
> > > -      * this logic. However, we really must have all streams allocate the same
> > > -      * number of buffers to simplify error handling in queueRequestDevice().
> >
> > Does the error logic still work after this change ?
> 
> Yes, the comment above has been bit-rotted, and this is handled correctly
> now.
> 
> > > -      */
> > > -     unsigned int maxBuffers = 0;
> > > -     for (const Stream *s : camera->streams())
> > > -             if (static_cast<const RPi::Stream *>(s)->isExternal())
> > > -                     maxBuffers = std::max(maxBuffers, s->configuration().bufferCount);
> > > +     for (Stream *s : camera->streams()) {
> > > +             if (isRaw(s->configuration().pixelFormat)) {
> > > +                     numBuffers = s->configuration().bufferCount;
> > > +                     rawStream = true;
> > > +                     break;
> > > +             }
> > > +     }
> > >
> > > +     /* Decide how many internal buffers to allocate. */
> > >       for (auto const stream : data->streams_) {
> > > -             ret = stream->prepareBuffers(maxBuffers);
> >
> > And here add a numBuffers local variable.
> >
> > > +             if (stream == &data->unicam_[Unicam::Image] ||
> > > +                 stream == &data->unicam_[Unicam::Embedded]) {
> > > +                     /*
> > > +                      * For Unicam, allocate a minimum of 4 buffers as we want
> > > +                      * to avoid any frame drops. If an application has configured
> > > +                      * a RAW stream, allocate additional buffers to make up the
> > > +                      * minimum, but ensure we have at least 1 set of internal
> > > +                      * buffers to use to minimise frame drops.
> > > +                      */
> > > +                     constexpr unsigned int minBuffers = 4;
> > > +                     numBuffers = rawStream ? std::max<int>(1, minBuffers - numBuffers)
> > > +                                            : minBuffers;
> >
> > You could initialize numRawBuffers to 0 and drop rawStream, as if
> > !rawStream, then numRawBuffers will be equal to 0, so
> >
> >                         numBuffers = std::max<int>(1, minBuffers - numRawBuffers);
> >
> > would do the right thing.
> >
> > > +             } else {
> > > +                     /*
> > > +                      * Since the ISP runs synchronous with the IPA and requests,
> > > +                      * we only ever need one set of internal buffers. Any buffers
> > > +                      * the application wants to hold onto will already be exported
> > > +                      * through PipelineHandlerRPi::exportFrameBuffers().
> > > +                      */
> > > +                     numBuffers = 1;
> > > +             }
> > > +
> > > +             ret = stream->prepareBuffers(numBuffers);
> >
> > I have a hard time follow this. Before the patch, the pipeline handler
> > calls prepareBuffers() with the number of buffers requested by the
> > application. Now it calls it with a number of internal buffers only,
> > without any change to the prepareBuffers() function itself. This would
> > benefit from an explanation in the commit message, as it's not clear why
> > it's correct.
> 
> For clarification, prepareBuffers() would (and still will) allocate internal use
> buffers, and call importBuffers() to allocate the buffer cache with the internal
> + user requested buffer count. Buffer allocations for the user requested buffer
> count goes through PipelineHandler::exportFrameBuffers().

I'm looking at prepareBuffers():

int Stream::prepareBuffers(unsigned int count)
{
	int ret;

	if (!importOnly_) {
		if (count) {
			/* Export some frame buffers for internal use. */
			ret = dev_->exportBuffers(count, &internalBuffers_);
			if (ret < 0)
				return ret;

			/* Add these exported buffers to the internal/external buffer list. */
			setExportedBuffers(&internalBuffers_);

			/* Add these buffers to the queue of internal usable buffers. */
			for (auto const &buffer : internalBuffers_)
				availableBuffers_.push(buffer.get());
		}

		/* We must import all internal/external exported buffers. */
		count = bufferMap_.size();
	}

	return dev_->importBuffers(count);
}

In the !importOnly_ case, we add count buffers to bufferMap_ by a call
to setExportedBuffers(). setExportedBuffers() is also called by
PipelineHandlerRPi::exportFrameBuffers(), so we currently add
max(s->configuration().bufferCount) buffers to any buffers already
exported by the application. This certainly overallocates, as we
essentially allocations bufferCount * 2 buffers.

With this patch, overallocation is reduced. However, there's no
requirement for applications to use the FrameBufferAllocator (which ends
up calling PipelineHandlerRPi::exportFrameBuffers()) to allocate
buffers. If an application allocates buffers through different means
(for instance exporting them from a display device), then we will call
dev_->importBuffers() with a very low count.

> So this change (somewhat) decouples the user count from the internal count.
> I will clarify this in the commit message.
> 
> > Furthermore, what will happen if an application requests 4 buffers for
> > the raw stream and any number of buffers for a processed streams, and
> > repeatedly queues requests with no buffer for the raw stream ? It seems
> > to me that you'll now allocate a single extra buffer for the raw stream,
> > which won't be enough to keep the unicam queue running.
> 
> This is a tricky one.  Without any hints from the application as to what it intends
> to do, I have to balance memory usage with performance.  You are right that
> with only a single internal buffer allocated, the performance may be degraded
> if the application does not provide buffers through the Request.  However,
> If the application has allocated 4x 18Mbyte buffers (12mpix 12-bit), I really don't
> want to be allocating 2x more of them when they possibly may never be used
> (assuming the application is circulating all these buffers constantly). I would hope
> application writers would be using those buffers sensibly :-)
> 
> Some of our platforms have very small CMA heaps to work with, memory will
> get very tight for these use-cases, so I do want to limit memory usage at the
> expense of performance.  Some hints from the application on how it intends to
> use buffers might help here with this balance.

I agree with you on the problem statement, but I'm wondering if we
shouldn't start by implementing the safe option to avoid frame drops,
and then improve performance on top (possibly with a hint API).

> David, do you think we can bump the internal allocation up to 2 buffers?
> 
> > >               if (ret < 0)
> > >                       return ret;
> > >       }

-- 
Regards,

Laurent Pinchart


More information about the libcamera-devel mailing list