[libcamera-devel] Issue allocating many frame buffers on RasPi
Jacopo Mondi
jacopo at jmondi.org
Tue Jul 5 09:56:27 CEST 2022
Hi Alan,
On Mon, Jul 04, 2022 at 03:16:22PM -0400, Alan W Szlosek Jr wrote:
> Thanks Naush and Jacopo,
>
> I'm trying to allocate lots of buffers to avoid unnecessary data
> copying. And I'm hoping to support high frame rates of 30 fps as well
> so I'll ultimately need to allocate around 60 frame buffers. Basically
> I plan to have a circular buffer of frame buffers, spanning at least a
> second or two. On slower machines (like Pi Zeros) I'll likely run a
> motion detection algorithm every second. On RasPis with more cores
> I'll probably run it 3 times a second. When motion is detected I'll
> transcode the frames to h264 using the hardware encoder and save to a
> file. Hopefully that explains why I want to allocate so many buffers
> ahead of time.
So basically you want to buffer 1/2 seconds of video streaming to
periodically run an algorithm on it.
I would start wondering if your algorithm and your use case requires
a picture every 33ms (or 16ms for the 60FPS use case), but I have no
idea what kind of motion you're tracking, so this might be perfectly legit.
Second, with the risk of saying something imprecise as I don't know
the platform so well, you should consider if allocating so much
memory in the video device is a good idea.
Buffers are generally allocated by V4L2 drivers using the videobuf2
contigous allocator, which can use the CMA allocator as backend. CMA
has a configurable number of zones which are reserved by the kernel
for the purpose of allocating chunks of contigous memory from there.
The CMA area size is configurable via a Kconfig option or a kernel
parameter but it's generally limited. I have no idea what the size is on
RPi to be honest, nor how much it can be increased.
I would explore instead the idea of allocating buffers via a different
allocator and import them in the video device, specifically by using
the dmabuf-heaps allocator which, as far as I know, allows to reserve
a chunk of physical memory for the purpose. One could argue that CMA
does the same, but by doing this you would have full control over the
memory area you're using for your buffer pool and will not be
contending it with other system components.
>
> Jacopo, to answer your question about dmesg output .... No, there's
> nothing in dmesg, see the following:
Enabling CONFIG_CMA_DEBUG in your kernel config might help maybe, even
if I would expect relevant messages like "you're out of memory" to be
there. Can you double check with 'dmesg | grep -i cma' ?
>
> [ 17.904193] NET: Registered PF_BLUETOOTH protocol family
> [ 17.904202] Bluetooth: HCI device and connection manager initialized
> [ 17.904227] Bluetooth: HCI socket layer initialized
> [ 17.904240] Bluetooth: L2CAP socket layer initialized
> [ 17.904263] Bluetooth: SCO socket layer initialized
> [ 17.923327] Bluetooth: HCI UART driver ver 2.3
> [ 17.923355] Bluetooth: HCI UART protocol H4 registered
> [ 17.923440] Bluetooth: HCI UART protocol Three-wire (H5) registered
> [ 17.923694] Bluetooth: HCI UART protocol Broadcom registered
> [ 18.541155] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
> [ 18.541179] Bluetooth: BNEP filters: protocol multicast
> [ 18.541200] Bluetooth: BNEP socket layer initialized
> [ 18.561456] NET: Registered PF_ALG protocol family
> [ 20.449306] vc4-drm soc:gpu: [drm] Cannot find any crtc or sizes
> [ 33.761159] cam-dummy-reg: disabling
>
> And Naush, when you say CMA do you mean the Contiguous Memory
> Allocator? Does this mean that when I ask for 20 buffers it's trying
> to allocate 1 contiguous block of memory behind the scenes, resulting
> in 1 dmabuf file descriptor? If so, it sounds like I should somehow
> ask for smaller, separate dmabuf blocks to cover what I need. What do
> you think? Is that easily doable?
I presume the C in CMA means that the area from where memory pages are
allocated from is contiguous, but if you ask for 20 buffers you should
get 20 dmabuf identifiers.
>
> Thanks to you both for your help.
>
> On Mon, Jul 4, 2022 at 4:40 AM Naushir Patuck <naush at raspberrypi.com> wrote:
> >
> > Hi Alan,
> >
> > On Mon, 4 Jul 2022 at 09:31, Jacopo Mondi via libcamera-devel <libcamera-devel at lists.libcamera.org> wrote:
> >>
> >> Hi Alan,
> >>
> >> On Sat, Jul 02, 2022 at 07:48:48AM -0400, Alan W Szlosek Jr via libcamera-devel wrote:
> >> > Hi libcamera, I'm creating a security camera app for RasPis and I'm
> >> > having trouble allocating 20 frame buffers (would like to alloc even
> >> > more). Do you know why? Do you have suggestions? I'm currently testing
> >> > on a Raspberry Pi 3B+.
> >> >
> >>
> >> Can I ask why you need to allocate tham many buffers in the video
> >> device ?
> >
> >
> > Snap. I was going to ask the same question. All frame buffers are allocated out
> > of CMA space. 20 x 2MP YUV420 buffers is approx 60 MBytes only for a single
> > set of buffers. Typically, you ought to get aways with < 10 buffers for most video
> > use cases.
> >
> > Naush
> >
> >
> >>
> >>
> >> > This is the output I'm getting. The return value from allocate() seems
> >> > to imply that everything is fine ("Allocated 20 buffers for stream")
> >> > when it's not fine behind the scenes.
> >> >
> >> > [1:23:50.594602178] [1217] INFO Camera camera_manager.cpp:293
> >> > libcamera v0.0.0+3544-22656360
> >> > [1:23:50.657034054] [1218] WARN RPI raspberrypi.cpp:1241 Mismatch
> >> > between Unicam and CamHelper for embedded data usage!
> >> > [1:23:50.659149325] [1218] INFO RPI raspberrypi.cpp:1356 Registered
> >> > camera /base/soc/i2c0mux/i2c at 1/imx219 at 10 to Unicam device /dev/media3
> >> > and ISP device /dev/media0
> >> > [1:23:50.660510009] [1217] INFO Camera camera.cpp:1029 configuring
> >> > streams: (0) 1640x922-YUV420
> >> > [1:23:50.661246471] [1218] INFO RPI raspberrypi.cpp:760 Sensor:
> >> > /base/soc/i2c0mux/i2c at 1/imx219 at 10 - Selected sensor format:
> >> > 1920x1080-SBGGR10_1X10 - Selected unicam format: 1920x1080-pBAA
> >> > Allocated 20 buffers for stream
> >> > [1:23:50.733980221] [1218] ERROR V4L2 v4l2_videodevice.cpp:1218
> >> > /dev/video14[14:cap]: Not enough buffers provided by V4L2VideoDevice
> >> > [1:23:50.734467203] [1218] ERROR RPI raspberrypi.cpp:1008 Failed to
> >> > allocate buffers
> >>
> >> This seems to happen when the pipeline starts and tries to allocate
> >> buffers for its internal usage. Might it be you simply run out of
> >> available memory ?
> >>
> >> Is there anything on your dmesg output that might suggest that, like a
> >> message from your CMA allocator ?
> >>
> >> Can you try with allocating an increasing number of buffers until you
> >> don't get to the failure limit ?
> >>
> >> > [1:23:50.739962387] [1217] ERROR Camera camera.cpp:528 Camera in
> >> > Configured state trying queueRequest() requiring state Running
> >> > [1:23:50.740078898] [1217] ERROR Camera camera.cpp:528 Camera in
> >> > Configured state trying queueRequest() requiring state Running
> >> >
> >> > Here's how I'm compiling:
> >> >
> >> > clang++ -g -std=c++17 -o scaffold \
> >> > -I /usr/include/libcamera \
> >> > -L /usr/lib/aarch64-linux-gnu \
> >> > -l camera -l camera-base \
> >> > scaffold.cpp
> >> >
> >> > And here's the code I'm using. Thank you!
> >> >
> >> > #include <iomanip>
> >> > #include <iostream>
> >> > #include <memory>
> >> > #include <thread>
> >> >
> >> > #include <libcamera/libcamera.h>
> >> >
> >> > using namespace libcamera;
> >> >
> >> > static std::shared_ptr<Camera> camera;
> >> >
> >> > time_t previousSeconds = 0;
> >> > int frames = 0;
> >> > static void requestComplete(Request *request)
> >> > {
> >> > std::unique_ptr<Request> request2;
> >> > if (request->status() == Request::RequestCancelled)
> >> > return;
> >> > const std::map<const Stream *, FrameBuffer *> &buffers = request->buffers();
> >> >
> >> > request->reuse(Request::ReuseBuffers);
> >> > camera->queueRequest(request);
> >> >
> >> > struct timespec delta;
> >> > clock_gettime(CLOCK_REALTIME, &delta);
> >> > if (previousSeconds == delta.tv_sec) {
> >> > frames++;
> >> > } else {
> >> > fprintf(stdout, "Frames: %d\n", frames);
> >> > frames = 1;
> >> > previousSeconds = delta.tv_sec;
> >> > }
> >> > }
> >> >
> >> > int main()
> >> > {
> >> > std::unique_ptr<CameraManager> cm = std::make_unique<CameraManager>();
> >> > cm->start();
> >> >
> >> > if (cm->cameras().empty()) {
> >> > std::cout << "No cameras were identified on the system."
> >> > << std::endl;
> >> > cm->stop();
> >> > return EXIT_FAILURE;
> >> > }
> >> >
> >> > std::string cameraId = cm->cameras()[0]->id();
> >> > camera = cm->get(cameraId);
> >> >
> >> > camera->acquire();
> >> >
> >> > // VideoRecording
> >> > std::unique_ptr<CameraConfiguration> config =
> >> > camera->generateConfiguration( { StreamRole::VideoRecording } );
> >> > StreamConfiguration &streamConfig = config->at(0);
> >> > streamConfig.size.width = 1640; //640;
> >> > streamConfig.size.height = 922; //480;
> >> > // This seems to default to 4, but we want to queue buffers for post
> >> > // processing, so we need to raise it.
> >> > // 10 works ... oddly, but 20 fails behind the scenes. doesn't apear
> >> > // to be an error we can catch
> >> > streamConfig.bufferCount = 20;
> >> >
> >> > // TODO: check return value of this
> >> > CameraConfiguration::Status status = config->validate();
> >> > if (status == CameraConfiguration::Invalid) {
> >> > fprintf(stderr, "Camera Configuration is invalid\n");
> >> > } else if (status == CameraConfiguration::Adjusted) {
> >> > fprintf(stderr, "Camera Configuration was invalid and has been
> >> > adjusted\n");
> >> > }
> >> >
> >> > camera->configure(config.get());
> >> >
> >> > FrameBufferAllocator *allocator = new FrameBufferAllocator(camera);
> >> >
> >> > for (StreamConfiguration &cfg : *config) {
> >> > // TODO: it's possible we'll need our own allocator for raspi,
> >> > // so we can enqueue many frames for processing
> >> > int ret = allocator->allocate(cfg.stream());
> >> > // This error handling doesn't catch a failure to allocate 20 buffers
> >> > if (ret < 0) {
> >> > std::cerr << "Can't allocate buffers" << std::endl;
> >> > return -ENOMEM;
> >> > }
> >> >
> >> > size_t allocated = allocator->buffers(cfg.stream()).size();
> >> > std::cout << "Allocated " << allocated << " buffers for
> >> > stream" << std::endl;
> >> > }
> >> >
> >> >
> >> > Stream *stream = streamConfig.stream();
> >> > const std::vector<std::unique_ptr<FrameBuffer>> &buffers =
> >> > allocator->buffers(stream);
> >> > std::vector<std::unique_ptr<Request>> requests;
> >> >
> >> > for (unsigned int i = 0; i < buffers.size(); ++i) {
> >> > std::unique_ptr<Request> request = camera->createRequest();
> >> > if (!request)
> >> > {
> >> > std::cerr << "Can't create request" << std::endl;
> >> > return -ENOMEM;
> >> > }
> >> >
> >> > const std::unique_ptr<FrameBuffer> &buffer = buffers[i];
> >> > int ret = request->addBuffer(stream, buffer.get());
> >> > if (ret < 0)
> >> > {
> >> > std::cerr << "Can't set buffer for request"
> >> > << std::endl;
> >> > return ret;
> >> > }
> >> >
> >> > requests.push_back(std::move(request));
> >> > }
> >> >
> >> > camera->requestCompleted.connect(requestComplete);
> >> >
> >> > // sets fps (via frame duration limts)
> >> > // TODO: create ControlList and move to global var
> >> > // TODO: is there a raspi-specific implementation of this?
> >> > libcamera::ControlList controls(libcamera::controls::controls);
> >> > int framerate = 30;
> >> > int64_t frame_time = 1000000 / framerate; // in microseconds
> >> > controls.set(libcamera::controls::FrameDurationLimits, {
> >> > frame_time, frame_time });
> >> >
> >> > camera->start(&controls);
> >> > for (auto &request : requests)
> >> > camera->queueRequest(request.get());
> >> >
> >> > //60 * 60 * 24 * 7; // days
> >> > int duration = 10;
> >> >
> >> > for (int i = 0; i < duration; i++) {
> >> > std::cout << "Sleeping" << std::endl;
> >> > std::this_thread::sleep_for(std::chrono::milliseconds(1000));
> >> > }
> >> >
> >> >
> >> > return 0;
> >> > }
> >> >
> >> > --
> >> > Alan Szlosek
>
>
>
> --
> Alan Szlosek
More information about the libcamera-devel
mailing list