[libcamera-devel] [RFC 1/2] android: camera_worker: Introduce CameraWorker

Fri Oct 9 11:31:59 CEST 2020

Hi Laurent,

On Thu, Oct 08, 2020 at 11:33:26PM +0300, Laurent Pinchart wrote:
> Hi Jacopo,
>
> On Thu, Oct 08, 2020 at 10:36:24AM +0200, Jacopo Mondi wrote:
> > On Thu, Oct 08, 2020 at 07:05:22AM +0300, Laurent Pinchart wrote:
> > > On Tue, Oct 06, 2020 at 06:06:36PM +0200, Jacopo Mondi wrote:
> > > > The CameraWork class creates a Worker instance and runs on an internal
> > >
> > > s/CameraWork/CameraWorker/
> > >
> > > > thread. The worker waits on a set of fences before queueing a capture
> > > > requests to the the libcamera::Camera.
> > >
> > > s/the the/the/
> > >
> > > You may want to copy some of the text from the cover letter to explain
> > > why this is needed.
> > >
> > > > Signed-off-by: Jacopo Mondi <jacopo at jmondi.org>
> > > > ---
> > > >  src/android/camera_worker.cpp | 67 +++++++++++++++++++++++++++++++
> > > >  src/android/camera_worker.h   | 75 +++++++++++++++++++++++++++++++++++
> > > >  src/android/meson.build       |  1 +
> > > >  3 files changed, 143 insertions(+)
> > > >  create mode 100644 src/android/camera_worker.cpp
> > > >  create mode 100644 src/android/camera_worker.h
> > > >
> > > > diff --git a/src/android/camera_worker.cpp b/src/android/camera_worker.cpp
> > > > new file mode 100644
> > > > index 000000000000..2bb94775a195
> > > > --- /dev/null
> > > > +++ b/src/android/camera_worker.cpp
> > > > @@ -0,0 +1,67 @@
> > > > +/* SPDX-License-Identifier: LGPL-2.1-or-later */
> > > > +/*
> > > > + * Copyright (C) 2020, Google Inc.
> > > > + *
> > > > + * camera_worker.cpp - Process capture request on behalf of the Camera HAL
> > > > + */
> > > > +
> > > > +#include "camera_worker.h"
> > > > +
> > > > +#include <errno.h>
> > > > +#include <sys/poll.h>
> > > > +
> > > > +#include "camera_device.h"
> > > > +
> > > > +using namespace libcamera;
> > > > +
> > > > +CameraWorker::CameraWorker(const std::shared_ptr<Camera> &camera)
> > > > +	: camera_(camera), worker_(camera_.get())
> > > > +{
> > > > +	worker_.moveToThread(&thread_);
> > > > +	thread_.start();
> > > > +}
> > >
> > > You need to stop the thread before deleting it. I would actually expose
> > > start and stop through the API, as there's no need to have a thread
> > > running when the camera is closed.
> >
> > Good idea!
> > I'll add start() and stop()
> >
> > > > +
> > > > +void CameraWorker::queueRequest(std::unique_ptr<CaptureRequest> request)
> > > > +{
> > > > +	CaptureRequest *req = request.release();
> > > > +	worker_.invokeMethod(&Worker::processRequest, ConnectionTypeQueued, req);
> > >
> > > Another important reason to stop the thread manually is to make sure to
> > > synchronize its message queue, as we want to guarantee that all requests
> > > will be freed.
> >
> > mmm, so Thread exposes a start() and an exit() functions.
> >
> > start() runs the underlying thread and make the connected event dispatcher
> > process events. exit() interrupts the event dispatcher only, then the
> > thread will be actually stopped at destruction time.
> >
> > As we don't run events in the notifier, but rather poll manually here,
> > is there any value in calling exit() ?
>
> Unless the Thread::run() method gets overridden (which isn't he case
> here), it calls Thread::exec(), which runs an event loop. Thread::exit()
> instructs the event loop to exit, which is required, otherwise the
> thread will never stop and Thread::wait() will block forever.
> Thread::wait() is also mandatory to call before destroying the Thread
> object.
>

Right, we poll manually, but we dispatch messages which indeed run on
the event loop

> We can't override Thread::run() to not call exec(), as we rely on the
> event loop. That's how the messages used by Object::invokeMethod() are
> dispatched.
>
> The conclusion is that we need to call exit() and wait() before
> destroying the thread (this should happen in CameraWorker::stop()). We
> also need to make sure that all messages posted to the thread's message
> queue by worker_.invokeMethod() are handled before the thread is
> stopped. There are two sides to it:
>
> - We need to ensure that no messages get posted after calling
>   Thread::exit(). This means not calling CameraWorker::queueRequest()
>   after CameraWorker::stop(). Given that both functions are called from
>   the camera service thread there should be no race condition, and given
>   that stop() will be called when the camera is closed, queueRequest()
>   won't be called anymore. I think we're safe here.

I agree we are.

>
> - I believe there's a race condition in the thread class. Even if
>   Thread::exit() and Thread::postMessage() (the latter called from
>   Object::invokeMethod) are call from a single thread, the message could
>   be added to the queue, and exit() could instruct Thread::exect() to
>   return before the event dispatcher gets a change to dispatch messages.
>   Thread::exec() contains
>
> 	while (!data_->exit_.load(std::memory_order_acquire))
> 		dispatcher->processEvents();
>
>   and EventDispatcherPoll::processEvents() calls
>   Thread::dispatchMessages() at the very beginning, before sleeping.
>
>   We don't implement flush() in the HAL, which I think would be related
>   to the solution to this issue.

flush seems to be meant to be called before a new configuration

     * on the given device. The framework will use this to dump all state as
     * quickly as possible in order to prepare for a configure_streams() call.

Not in the teardown path.

If I got you right we might hit a small (not that small actually)
windown where a message is added to the thread's queue but never
processed as a Thread::exit() just after the queueing happens
interrupts the dispatcher for good.

As far as the HAL is concerned that pending request should be signaled
as ERROR before the HAL::stop() completes ? In this case, would
instrumenting wait() with an additional message queue swip  help ? We
can emit a signal for every non-handled message..

>
>   Am I overthinking this, or is it a real issue ?
>
> > > > +}
> > > > +
> > > > +int CameraWorker::Worker::waitFence(int fence)
> > > > +{
> > > > +	struct pollfd fds = { fence, POLLIN, 0 };
> > > > +	constexpr unsigned int timeoutMs = 300;
> > >
> > > That's rather arbitrary, I wonder how we should pick the timeout, but
> > > for now it should be good enough.
> >
> > You got me... Copyed from the Rockchip HAL implementation.
> > Quite arbitrary indeed!
>
> Maybe a \todo would help remembering the task ?
>
> > > > +	int ret;
> > > > +
> > > > +	do {
> > > > +		ret = poll(&fds, 1, timeoutMs);
> > > > +		if (ret == 0)
> > > > +			return -ETIME;
> > > > +
> > > > +		if (ret > 0) {
> > > > +			if (fds.revents & (POLLERR | POLLNVAL))
> > > > +				return -EINVAL;
> > > > +
> > > > +			return 0;
> > > > +		}
> > > > +	} while (ret == -1 && (errno == EINTR || errno == EAGAIN));
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +void CameraWorker::Worker::processRequest(CaptureRequest *request)
> > > > +{
> > > > +	for (int fence : request->acquireFences_) {
> > > > +		if (fence == -1)
> > > > +			continue;
> > > > +
> > > > +		int ret = waitFence(fence);
> > >
> > > It would be possible to optimize this by waiting on all the fences at
> > > the same time to minimize syscalls when multiple fences are ready at the
> > > same time, but that may not be worth it.
> >
> > I considered several options here, including using a custom event
> > notifier that accepts a set of fds instead of a single one to exploit
> > our notifier/dispatcher infrastructure. Unfortunately, that would
> > require manualy tracking of the number of completed fences here, or
> > the implementation of a custom dispatcher that works on a set of fds
> > and only signal completion when all of them are done.
> >
> > At the same time I considered instrumenting poll to work with multiple
> > fds instead of going through a syscall for each one of them. In this
> > case too, manual tracking of the number of completed requests would be
> > required, and once I considered that we deal with up to 3 buffers, and
> > most of the time with a single one, I considered writing such an
> > error-prone and quite-harsh to debug code not worth it.
>
> Ack.
>
> > > > +		if (ret < 0)
> > > > +			return;
> > > > +
> > > > +		close(fence);
> > > > +	}
> > > > +
> > > > +	camera_->queueRequest(request->request_);
> > > > +	delete request;
> > > > +}
> > > > diff --git a/src/android/camera_worker.h b/src/android/camera_worker.h
> > > > new file mode 100644
> > > > index 000000000000..24b176d6269c
> > > > --- /dev/null
> > > > +++ b/src/android/camera_worker.h
> > > > @@ -0,0 +1,75 @@
> > > > +/* SPDX-License-Identifier: LGPL-2.1-or-later */
> > > > +/*
> > > > + * Copyright (C) 2020, Google Inc.
> > > > + *
> > > > + * camera_worker.h - Process capture request on behalf of the Camera HAL
> > > > + */
> > > > +#ifndef __ANDROID_CAMERA_WORKER_H__
> > > > +#define __ANDROID_CAMERA_WORKER_H__
> > > > +
> > > > +#include <memory>
> > > > +
> > > > +#include <libcamera/buffer.h>
> > > > +#include <libcamera/camera.h>
> > > > +#include <libcamera/object.h>
> > > > +#include <libcamera/request.h>
> > > > +#include <libcamera/stream.h>
> > > > +
> > > > +#include "libcamera/internal/thread.h"
> > > > +
> > > > +class CameraDevice;
> > > > +
> > > > +struct CaptureRequest {
> > > > +	CaptureRequest(libcamera::Request *request)
> > > > +		: request_(request)
> > > > +	{
> > > > +	}
> > > > +
> > > > +	void addBuffer(libcamera::Stream *stream,
> > > > +		       libcamera::FrameBuffer *buffer, int fence)
> > > > +	{
> > > > +		request_->addBuffer(stream, buffer);
> > > > +		acquireFences_.push_back(fence);
> > > > +	}
> > >
> > > The API isn't amazing, but that's because the best solution will be to
> > > add fences support to the FrameBuffer class. Until we do so, this
> > > workaround is fine.
> >
> > To be honest I don't mind the intruduction of CaptureRequest in the
> > CameraDevice, the rest I agree, it's awful.
>
> It's not awful :-) It's just sub-optimal, and that's fine. I think
> you've struct the right balance, the implementation does the job, it's
> clean and lean, not over-designed, and probably as good as it can get
> until this gets moved to the libcamera core.
>
> > A proper library-wide solution is needed, but this closes a gap and
> > does not introduce any techincal debt
> >
> > So I do as well think that:
> >
> > > Overall I think this is totally fine for now.
>
> --
> Regards,
>
> Laurent Pinchart