[libcamera-devel] [PATCH 4/5] android: camera_worker: Notify fence wait failures

Jacopo Mondi jacopo at jmondi.org
Mon Sep 6 18:20:40 CEST 2021


Hi Hiro,

On Mon, Sep 06, 2021 at 11:55:15PM +0900, Hirokazu Honda wrote:
> Hi Jacopo, thank you for the patch.
>
> On Mon, Sep 6, 2021 at 11:01 PM Jacopo Mondi <jacopo at jmondi.org> wrote:
> >
> > The CameraDevice class stores one CaptureRequestDescriptor instance
> > for each capture request queued to the Camera. Such descriptors are
> > cleared when the Camera completes the Request in the
> > CameraDevice::requestComplete() slot.
> >
> > The CameraDevice class relies on an internal worker to off-load
> > waiting on synchronization fences, and does unconditionally store
> > the descriptor in the descriptors_ class member map to handle its
> > lifetime.
> >
> > If waiting on a fence fails, the created descriptor remains in the
> > descriptors_ map until the next call to stop() or close(), as
> > requestComplete() will never be called, and the camera
> > service is never notified about the queueing failure.
> >
> > Fix that by allowing the CameraWorker to notify to the CameraDevice if
> > waiting on a fence has failed, by installing a new requestQueueFailed
> > Signal.
> >
> > The Signal is emitted by the CameraWorker internal worker if waiting on
> > a fence has failed. The CameraDevice is augmented with a slot connected
> > to the Signal that removes the descriptor from the map and notifies the
> > camera framework about the failure.
> >
> > Signed-off-by: Jacopo Mondi <jacopo at jmondi.org>
> > ---
> >  src/android/camera_device.cpp | 22 +++++++++++++++++++++-
> >  src/android/camera_device.h   |  1 +
> >  src/android/camera_worker.cpp | 10 ++++++----
> >  src/android/camera_worker.h   |  9 +++++++--
> >  4 files changed, 35 insertions(+), 7 deletions(-)
> >
> > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > index a0ea138d9499..0ce9b699bfae 100644
> > --- a/src/android/camera_device.cpp
> > +++ b/src/android/camera_device.cpp
> > @@ -244,7 +244,7 @@ CameraDevice::Camera3RequestDescriptor::Camera3RequestDescriptor(
> >          * Create the CaptureRequest, stored as a unique_ptr<> to tie its
> >          * lifetime to the descriptor.
> >          */
> > -       request_ = std::make_unique<CaptureRequest>(camera);
> > +       request_ = std::make_unique<CaptureRequest>(camera, camera3Request);
> >  }
> >
> >  /*
> > @@ -264,6 +264,7 @@ CameraDevice::CameraDevice(unsigned int id, std::shared_ptr<Camera> camera)
> >         : id_(id), state_(State::Stopped), camera_(std::move(camera)),
> >           facing_(CAMERA_FACING_FRONT), orientation_(0)
> >  {
> > +       worker_.requestQueueFailed.connect(this, &CameraDevice::requestQueueFailed);
> >         camera_->requestCompleted.connect(this, &CameraDevice::requestComplete);
> >
> >         maker_ = "libcamera";
> > @@ -1060,6 +1061,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> >         return 0;
> >  }
> >
> > +void CameraDevice::requestQueueFailed(CaptureRequest *request)
> > +{
> > +       /*
> > +        * Remove the descriptor from the map if the worker has failed
> > +        * to process it and notify the camera service about the failure.
> > +        */
> > +       MutexLocker descriptorsLock(descriptorsMutex_);
> > +       auto it = descriptors_.find(request->cookie());
> > +       if (it == descriptors_.end()) {
> > +               LOG(HAL, Fatal)
>
> If I understand correctly, this must not happen because

Do you mean the it == end() with "this" ?

It should not happen as requestQueueFail is called by the worker if it
fails to wait on a fence and the descriptor is already part of
descriptors.

Actually, there's a small window for this to race now that I look at
it again

--------------------------------------------------------------------------
CameraDevice                            CameraWorker
--------------------------------------------------------------------------

worker_.queueRequest()          ->

                                        worker.processRequest
                                        fence.wait() == error
                                <-      requestQueueFailed.emit()
requestQueueFailed()
        it == descriptors_.end()
        abort();

descriptors_[req] = desc;
--------------------------------------------------------------------------

While the intended order of execution is

CameraDevice                            CameraWorker
--------------------------------------------------------------------------

worker_.queueRequest()
descriptors_[req] = desc;
                                ->
                                        worker.processRequest
                                        fence.wait() == error
                                <-      requestQueueFailed.emit()
requestQueueFailed()
        it == descriptors_.end()
        abort();

--------------------------------------------------------------------------

I'll make sure to add the descriptor to descriptors_ before calling
worker_.queuRequest().

> requestQueueFailed() is invoked by worker::stop() and it is executed

Why do you say it is invoked by stop() ? The signal is emitted if
waiting on a fence fails as shown...

> before descriptors_.clear() always.
> Correct?
>
> Reviewed-by: Hirokazu Honda <hiroh at chromium.org>

> >   */
> > -CaptureRequest::CaptureRequest(libcamera::Camera *camera)
> > -       : camera_(camera)
> > +CaptureRequest::CaptureRequest(libcamera::Camera *camera,
> > +                              const camera3_capture_request_t *camera3Request)
> > +       : camera_(camera), camera3Request_(camera3Request)
> >  {
> >         request_ = camera_->createRequest(reinterpret_cast<uint64_t>(this));
> >  }
> > @@ -77,7 +78,7 @@ void CameraWorker::queueRequest(CaptureRequest *request)
> >  {
> >         /* Async process the request on the worker which runs its own thread. */
> >         worker_.invokeMethod(&Worker::processRequest, ConnectionTypeQueued,
> > -                            request);
> > +                            this, request);
> >  }
> >
> >  /*
> > @@ -109,7 +110,7 @@ int CameraWorker::Worker::waitFence(int fence)
> >         return -errno;
> >  }
> >
> > -void CameraWorker::Worker::processRequest(CaptureRequest *request)
> > +void CameraWorker::Worker::processRequest(CameraWorker *context, CaptureRequest *request)
> >  {
> >         /* Wait on all fences before queuing the Request. */
> >         for (int fence : request->fences()) {
> > @@ -121,6 +122,7 @@ void CameraWorker::Worker::processRequest(CaptureRequest *request)
> >                 if (ret < 0) {
> >                         LOG(HAL, Error) << "Failed waiting for fence: "
> >                                         << fence << ": " << strerror(-ret);
> > +                       context->requestQueueFailed.emit(request);

.... Here

Is your question about stop() related to dispatching messages before
the thread has actually been stopped ? Isn't that been fixed some time
ago by overriding CameraWorker::exec() ?

Thanks
   j



More information about the libcamera-devel mailing list