[libcamera-devel] [RFC 1/2] android: camera_worker: Introduce CameraWorker

Sat Oct 10 00:51:34 CEST 2020

Hi Jacopo,

On Fri, Oct 09, 2020 at 11:31:59AM +0200, Jacopo Mondi wrote:
> On Thu, Oct 08, 2020 at 11:33:26PM +0300, Laurent Pinchart wrote:
> > On Thu, Oct 08, 2020 at 10:36:24AM +0200, Jacopo Mondi wrote:
> > > On Thu, Oct 08, 2020 at 07:05:22AM +0300, Laurent Pinchart wrote:
> > > > On Tue, Oct 06, 2020 at 06:06:36PM +0200, Jacopo Mondi wrote:
> > > > > The CameraWork class creates a Worker instance and runs on an internal
> > > >
> > > > s/CameraWork/CameraWorker/
> > > >
> > > > > thread. The worker waits on a set of fences before queueing a capture
> > > > > requests to the the libcamera::Camera.
> > > >
> > > > s/the the/the/
> > > >
> > > > You may want to copy some of the text from the cover letter to explain
> > > > why this is needed.
> > > >
> > > > > Signed-off-by: Jacopo Mondi <jacopo at jmondi.org>
> > > > > ---
> > > > >  src/android/camera_worker.cpp | 67 +++++++++++++++++++++++++++++++
> > > > >  src/android/camera_worker.h   | 75 +++++++++++++++++++++++++++++++++++
> > > > >  src/android/meson.build       |  1 +
> > > > >  3 files changed, 143 insertions(+)
> > > > >  create mode 100644 src/android/camera_worker.cpp
> > > > >  create mode 100644 src/android/camera_worker.h
> > > > >
> > > > > diff --git a/src/android/camera_worker.cpp b/src/android/camera_worker.cpp
> > > > > new file mode 100644
> > > > > index 000000000000..2bb94775a195
> > > > > --- /dev/null
> > > > > +++ b/src/android/camera_worker.cpp
> > > > > @@ -0,0 +1,67 @@
> > > > > +/* SPDX-License-Identifier: LGPL-2.1-or-later */
> > > > > +/*
> > > > > + * Copyright (C) 2020, Google Inc.
> > > > > + *
> > > > > + * camera_worker.cpp - Process capture request on behalf of the Camera HAL
> > > > > + */
> > > > > +
> > > > > +#include "camera_worker.h"
> > > > > +
> > > > > +#include <errno.h>
> > > > > +#include <sys/poll.h>
> > > > > +
> > > > > +#include "camera_device.h"
> > > > > +
> > > > > +using namespace libcamera;
> > > > > +
> > > > > +CameraWorker::CameraWorker(const std::shared_ptr<Camera> &camera)
> > > > > +	: camera_(camera), worker_(camera_.get())
> > > > > +{
> > > > > +	worker_.moveToThread(&thread_);
> > > > > +	thread_.start();
> > > > > +}
> > > >
> > > > You need to stop the thread before deleting it. I would actually expose
> > > > start and stop through the API, as there's no need to have a thread
> > > > running when the camera is closed.
> > >
> > > Good idea!
> > > I'll add start() and stop()
> > >
> > > > > +
> > > > > +void CameraWorker::queueRequest(std::unique_ptr<CaptureRequest> request)
> > > > > +{
> > > > > +	CaptureRequest *req = request.release();
> > > > > +	worker_.invokeMethod(&Worker::processRequest, ConnectionTypeQueued, req);
> > > >
> > > > Another important reason to stop the thread manually is to make sure to
> > > > synchronize its message queue, as we want to guarantee that all requests
> > > > will be freed.
> > >
> > > mmm, so Thread exposes a start() and an exit() functions.
> > >
> > > start() runs the underlying thread and make the connected event dispatcher
> > > process events. exit() interrupts the event dispatcher only, then the
> > > thread will be actually stopped at destruction time.
> > >
> > > As we don't run events in the notifier, but rather poll manually here,
> > > is there any value in calling exit() ?
> >
> > Unless the Thread::run() method gets overridden (which isn't he case
> > here), it calls Thread::exec(), which runs an event loop. Thread::exit()
> > instructs the event loop to exit, which is required, otherwise the
> > thread will never stop and Thread::wait() will block forever.
> > Thread::wait() is also mandatory to call before destroying the Thread
> > object.
> 
> Right, we poll manually, but we dispatch messages which indeed run on
> the event loop
> 
> > We can't override Thread::run() to not call exec(), as we rely on the
> > event loop. That's how the messages used by Object::invokeMethod() are
> > dispatched.
> >
> > The conclusion is that we need to call exit() and wait() before
> > destroying the thread (this should happen in CameraWorker::stop()). We
> > also need to make sure that all messages posted to the thread's message
> > queue by worker_.invokeMethod() are handled before the thread is
> > stopped. There are two sides to it:
> >
> > - We need to ensure that no messages get posted after calling
> >   Thread::exit(). This means not calling CameraWorker::queueRequest()
> >   after CameraWorker::stop(). Given that both functions are called from
> >   the camera service thread there should be no race condition, and given
> >   that stop() will be called when the camera is closed, queueRequest()
> >   won't be called anymore. I think we're safe here.
> 
> I agree we are.
> 
> >
> > - I believe there's a race condition in the thread class. Even if
> >   Thread::exit() and Thread::postMessage() (the latter called from
> >   Object::invokeMethod) are call from a single thread, the message could
> >   be added to the queue, and exit() could instruct Thread::exect() to
> >   return before the event dispatcher gets a change to dispatch messages.
> >   Thread::exec() contains
> >
> > 	while (!data_->exit_.load(std::memory_order_acquire))
> > 		dispatcher->processEvents();
> >
> >   and EventDispatcherPoll::processEvents() calls
> >   Thread::dispatchMessages() at the very beginning, before sleeping.
> >
> >   We don't implement flush() in the HAL, which I think would be related
> >   to the solution to this issue.
> 
> flush seems to be meant to be called before a new configuration
> 
>      * on the given device. The framework will use this to dump all state as
>      * quickly as possible in order to prepare for a configure_streams() call.
> 
> Not in the teardown path.

Indeed. I however wonder if in practice there will always be a flush()
before close().

> If I got you right we might hit a small (not that small actually)
> windown where a message is added to the thread's queue but never
> processed as a Thread::exit() just after the queueing happens
> interrupts the dispatcher for good.

Correct.

> As far as the HAL is concerned that pending request should be signaled
> as ERROR before the HAL::stop() completes ? In this case, would
> instrumenting wait() with an additional message queue swip  help ? We
> can emit a signal for every non-handled message..

How does that work on the camera service side though ? Does it expect
close() to block until all requests are returned ? Can it even support
request completion callbacks while close() is in progress ? The android
camera HAL API doesn't have an explicit stop(), it's pretty annoying :-S

> >   Am I overthinking this, or is it a real issue ?
> >
> > > > > +}
> > > > > +
> > > > > +int CameraWorker::Worker::waitFence(int fence)
> > > > > +{
> > > > > +	struct pollfd fds = { fence, POLLIN, 0 };
> > > > > +	constexpr unsigned int timeoutMs = 300;
> > > >
> > > > That's rather arbitrary, I wonder how we should pick the timeout, but
> > > > for now it should be good enough.
> > >
> > > You got me... Copyed from the Rockchip HAL implementation.
> > > Quite arbitrary indeed!
> >
> > Maybe a \todo would help remembering the task ?
> >
> > > > > +	int ret;
> > > > > +
> > > > > +	do {
> > > > > +		ret = poll(&fds, 1, timeoutMs);
> > > > > +		if (ret == 0)
> > > > > +			return -ETIME;
> > > > > +
> > > > > +		if (ret > 0) {
> > > > > +			if (fds.revents & (POLLERR | POLLNVAL))
> > > > > +				return -EINVAL;
> > > > > +
> > > > > +			return 0;
> > > > > +		}
> > > > > +	} while (ret == -1 && (errno == EINTR || errno == EAGAIN));
> > > > > +
> > > > > +	return ret;
> > > > > +}
> > > > > +
> > > > > +void CameraWorker::Worker::processRequest(CaptureRequest *request)
> > > > > +{
> > > > > +	for (int fence : request->acquireFences_) {
> > > > > +		if (fence == -1)
> > > > > +			continue;
> > > > > +
> > > > > +		int ret = waitFence(fence);
> > > >
> > > > It would be possible to optimize this by waiting on all the fences at
> > > > the same time to minimize syscalls when multiple fences are ready at the
> > > > same time, but that may not be worth it.
> > >
> > > I considered several options here, including using a custom event
> > > notifier that accepts a set of fds instead of a single one to exploit
> > > our notifier/dispatcher infrastructure. Unfortunately, that would
> > > require manualy tracking of the number of completed fences here, or
> > > the implementation of a custom dispatcher that works on a set of fds
> > > and only signal completion when all of them are done.
> > >
> > > At the same time I considered instrumenting poll to work with multiple
> > > fds instead of going through a syscall for each one of them. In this
> > > case too, manual tracking of the number of completed requests would be
> > > required, and once I considered that we deal with up to 3 buffers, and
> > > most of the time with a single one, I considered writing such an
> > > error-prone and quite-harsh to debug code not worth it.
> >
> > Ack.
> >
> > > > > +		if (ret < 0)
> > > > > +			return;
> > > > > +
> > > > > +		close(fence);
> > > > > +	}
> > > > > +
> > > > > +	camera_->queueRequest(request->request_);
> > > > > +	delete request;
> > > > > +}
> > > > > diff --git a/src/android/camera_worker.h b/src/android/camera_worker.h
> > > > > new file mode 100644
> > > > > index 000000000000..24b176d6269c
> > > > > --- /dev/null
> > > > > +++ b/src/android/camera_worker.h
> > > > > @@ -0,0 +1,75 @@
> > > > > +/* SPDX-License-Identifier: LGPL-2.1-or-later */
> > > > > +/*
> > > > > + * Copyright (C) 2020, Google Inc.
> > > > > + *
> > > > > + * camera_worker.h - Process capture request on behalf of the Camera HAL
> > > > > + */
> > > > > +#ifndef __ANDROID_CAMERA_WORKER_H__
> > > > > +#define __ANDROID_CAMERA_WORKER_H__
> > > > > +
> > > > > +#include <memory>
> > > > > +
> > > > > +#include <libcamera/buffer.h>
> > > > > +#include <libcamera/camera.h>
> > > > > +#include <libcamera/object.h>
> > > > > +#include <libcamera/request.h>
> > > > > +#include <libcamera/stream.h>
> > > > > +
> > > > > +#include "libcamera/internal/thread.h"
> > > > > +
> > > > > +class CameraDevice;
> > > > > +
> > > > > +struct CaptureRequest {
> > > > > +	CaptureRequest(libcamera::Request *request)
> > > > > +		: request_(request)
> > > > > +	{
> > > > > +	}
> > > > > +
> > > > > +	void addBuffer(libcamera::Stream *stream,
> > > > > +		       libcamera::FrameBuffer *buffer, int fence)
> > > > > +	{
> > > > > +		request_->addBuffer(stream, buffer);
> > > > > +		acquireFences_.push_back(fence);
> > > > > +	}
> > > >
> > > > The API isn't amazing, but that's because the best solution will be to
> > > > add fences support to the FrameBuffer class. Until we do so, this
> > > > workaround is fine.
> > >
> > > To be honest I don't mind the intruduction of CaptureRequest in the
> > > CameraDevice, the rest I agree, it's awful.
> >
> > It's not awful :-) It's just sub-optimal, and that's fine. I think
> > you've struct the right balance, the implementation does the job, it's
> > clean and lean, not over-designed, and probably as good as it can get
> > until this gets moved to the libcamera core.
> >
> > > A proper library-wide solution is needed, but this closes a gap and
> > > does not introduce any techincal debt
> > >
> > > So I do as well think that:
> > >
> > > > Overall I think this is totally fine for now.

-- 
Regards,

Laurent Pinchart