[libcamera-devel] [PATCH 4/5] android: camera_worker: Notify fence wait failures
Hirokazu Honda
hiroh at chromium.org
Mon Sep 6 18:36:43 CEST 2021
Hi Jacopo,
On Tue, Sep 7, 2021 at 1:19 AM Jacopo Mondi <jacopo at jmondi.org> wrote:
>
> Hi Hiro,
>
> On Mon, Sep 06, 2021 at 11:55:15PM +0900, Hirokazu Honda wrote:
> > Hi Jacopo, thank you for the patch.
> >
> > On Mon, Sep 6, 2021 at 11:01 PM Jacopo Mondi <jacopo at jmondi.org> wrote:
> > >
> > > The CameraDevice class stores one CaptureRequestDescriptor instance
> > > for each capture request queued to the Camera. Such descriptors are
> > > cleared when the Camera completes the Request in the
> > > CameraDevice::requestComplete() slot.
> > >
> > > The CameraDevice class relies on an internal worker to off-load
> > > waiting on synchronization fences, and does unconditionally store
> > > the descriptor in the descriptors_ class member map to handle its
> > > lifetime.
> > >
> > > If waiting on a fence fails, the created descriptor remains in the
> > > descriptors_ map until the next call to stop() or close(), as
> > > requestComplete() will never be called, and the camera
> > > service is never notified about the queueing failure.
> > >
> > > Fix that by allowing the CameraWorker to notify to the CameraDevice if
> > > waiting on a fence has failed, by installing a new requestQueueFailed
> > > Signal.
> > >
> > > The Signal is emitted by the CameraWorker internal worker if waiting on
> > > a fence has failed. The CameraDevice is augmented with a slot connected
> > > to the Signal that removes the descriptor from the map and notifies the
> > > camera framework about the failure.
> > >
> > > Signed-off-by: Jacopo Mondi <jacopo at jmondi.org>
> > > ---
> > > src/android/camera_device.cpp | 22 +++++++++++++++++++++-
> > > src/android/camera_device.h | 1 +
> > > src/android/camera_worker.cpp | 10 ++++++----
> > > src/android/camera_worker.h | 9 +++++++--
> > > 4 files changed, 35 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > index a0ea138d9499..0ce9b699bfae 100644
> > > --- a/src/android/camera_device.cpp
> > > +++ b/src/android/camera_device.cpp
> > > @@ -244,7 +244,7 @@ CameraDevice::Camera3RequestDescriptor::Camera3RequestDescriptor(
> > > * Create the CaptureRequest, stored as a unique_ptr<> to tie its
> > > * lifetime to the descriptor.
> > > */
> > > - request_ = std::make_unique<CaptureRequest>(camera);
> > > + request_ = std::make_unique<CaptureRequest>(camera, camera3Request);
> > > }
> > >
> > > /*
> > > @@ -264,6 +264,7 @@ CameraDevice::CameraDevice(unsigned int id, std::shared_ptr<Camera> camera)
> > > : id_(id), state_(State::Stopped), camera_(std::move(camera)),
> > > facing_(CAMERA_FACING_FRONT), orientation_(0)
> > > {
> > > + worker_.requestQueueFailed.connect(this, &CameraDevice::requestQueueFailed);
> > > camera_->requestCompleted.connect(this, &CameraDevice::requestComplete);
> > >
> > > maker_ = "libcamera";
> > > @@ -1060,6 +1061,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > return 0;
> > > }
> > >
> > > +void CameraDevice::requestQueueFailed(CaptureRequest *request)
> > > +{
> > > + /*
> > > + * Remove the descriptor from the map if the worker has failed
> > > + * to process it and notify the camera service about the failure.
> > > + */
> > > + MutexLocker descriptorsLock(descriptorsMutex_);
> > > + auto it = descriptors_.find(request->cookie());
> > > + if (it == descriptors_.end()) {
> > > + LOG(HAL, Fatal)
> >
> > If I understand correctly, this must not happen because
>
> Do you mean the it == end() with "this" ?
>
> It should not happen as requestQueueFail is called by the worker if it
> fails to wait on a fence and the descriptor is already part of
> descriptors.
>
> Actually, there's a small window for this to race now that I look at
> it again
>
> --------------------------------------------------------------------------
> CameraDevice CameraWorker
> --------------------------------------------------------------------------
>
> worker_.queueRequest() ->
>
> worker.processRequest
> fence.wait() == error
> <- requestQueueFailed.emit()
> requestQueueFailed()
> it == descriptors_.end()
> abort();
>
> descriptors_[req] = desc;
> --------------------------------------------------------------------------
>
> While the intended order of execution is
>
> CameraDevice CameraWorker
> --------------------------------------------------------------------------
>
> worker_.queueRequest()
> descriptors_[req] = desc;
> ->
> worker.processRequest
> fence.wait() == error
> <- requestQueueFailed.emit()
> requestQueueFailed()
> it == descriptors_.end()
> abort();
>
> --------------------------------------------------------------------------
>
> I'll make sure to add the descriptor to descriptors_ before calling
> worker_.queuRequest().
>
> > requestQueueFailed() is invoked by worker::stop() and it is executed
>
> Why do you say it is invoked by stop() ? The signal is emitted if
> waiting on a fence fails as shown...
>
I don't probably understand when fence waiting error happens.
Could you explain when it happens?
It looks like, in the next patch, descriptors_ must be empty in some places.
How is it guaranteed that requestQueueFailed() is executed before
descriptors_ is empty?
-Hiro
> > before descriptors_.clear() always.
> > Correct?
> >
> > Reviewed-by: Hirokazu Honda <hiroh at chromium.org>
>
> > > */
> > > -CaptureRequest::CaptureRequest(libcamera::Camera *camera)
> > > - : camera_(camera)
> > > +CaptureRequest::CaptureRequest(libcamera::Camera *camera,
> > > + const camera3_capture_request_t *camera3Request)
> > > + : camera_(camera), camera3Request_(camera3Request)
> > > {
> > > request_ = camera_->createRequest(reinterpret_cast<uint64_t>(this));
> > > }
> > > @@ -77,7 +78,7 @@ void CameraWorker::queueRequest(CaptureRequest *request)
> > > {
> > > /* Async process the request on the worker which runs its own thread. */
> > > worker_.invokeMethod(&Worker::processRequest, ConnectionTypeQueued,
> > > - request);
> > > + this, request);
> > > }
> > >
> > > /*
> > > @@ -109,7 +110,7 @@ int CameraWorker::Worker::waitFence(int fence)
> > > return -errno;
> > > }
> > >
> > > -void CameraWorker::Worker::processRequest(CaptureRequest *request)
> > > +void CameraWorker::Worker::processRequest(CameraWorker *context, CaptureRequest *request)
> > > {
> > > /* Wait on all fences before queuing the Request. */
> > > for (int fence : request->fences()) {
> > > @@ -121,6 +122,7 @@ void CameraWorker::Worker::processRequest(CaptureRequest *request)
> > > if (ret < 0) {
> > > LOG(HAL, Error) << "Failed waiting for fence: "
> > > << fence << ": " << strerror(-ret);
> > > + context->requestQueueFailed.emit(request);
>
> .... Here
>
> Is your question about stop() related to dispatching messages before
> the thread has actually been stopped ? Isn't that been fixed some time
> ago by overriding CameraWorker::exec() ?
>
> Thanks
> j
>
More information about the libcamera-devel
mailing list