[libcamera-devel] [PATCH 4/5] android: camera_worker: Notify fence wait failures
Jacopo Mondi
jacopo at jmondi.org
Mon Sep 6 18:20:40 CEST 2021
Hi Hiro,
On Mon, Sep 06, 2021 at 11:55:15PM +0900, Hirokazu Honda wrote:
> Hi Jacopo, thank you for the patch.
>
> On Mon, Sep 6, 2021 at 11:01 PM Jacopo Mondi <jacopo at jmondi.org> wrote:
> >
> > The CameraDevice class stores one CaptureRequestDescriptor instance
> > for each capture request queued to the Camera. Such descriptors are
> > cleared when the Camera completes the Request in the
> > CameraDevice::requestComplete() slot.
> >
> > The CameraDevice class relies on an internal worker to off-load
> > waiting on synchronization fences, and does unconditionally store
> > the descriptor in the descriptors_ class member map to handle its
> > lifetime.
> >
> > If waiting on a fence fails, the created descriptor remains in the
> > descriptors_ map until the next call to stop() or close(), as
> > requestComplete() will never be called, and the camera
> > service is never notified about the queueing failure.
> >
> > Fix that by allowing the CameraWorker to notify to the CameraDevice if
> > waiting on a fence has failed, by installing a new requestQueueFailed
> > Signal.
> >
> > The Signal is emitted by the CameraWorker internal worker if waiting on
> > a fence has failed. The CameraDevice is augmented with a slot connected
> > to the Signal that removes the descriptor from the map and notifies the
> > camera framework about the failure.
> >
> > Signed-off-by: Jacopo Mondi <jacopo at jmondi.org>
> > ---
> > src/android/camera_device.cpp | 22 +++++++++++++++++++++-
> > src/android/camera_device.h | 1 +
> > src/android/camera_worker.cpp | 10 ++++++----
> > src/android/camera_worker.h | 9 +++++++--
> > 4 files changed, 35 insertions(+), 7 deletions(-)
> >
> > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > index a0ea138d9499..0ce9b699bfae 100644
> > --- a/src/android/camera_device.cpp
> > +++ b/src/android/camera_device.cpp
> > @@ -244,7 +244,7 @@ CameraDevice::Camera3RequestDescriptor::Camera3RequestDescriptor(
> > * Create the CaptureRequest, stored as a unique_ptr<> to tie its
> > * lifetime to the descriptor.
> > */
> > - request_ = std::make_unique<CaptureRequest>(camera);
> > + request_ = std::make_unique<CaptureRequest>(camera, camera3Request);
> > }
> >
> > /*
> > @@ -264,6 +264,7 @@ CameraDevice::CameraDevice(unsigned int id, std::shared_ptr<Camera> camera)
> > : id_(id), state_(State::Stopped), camera_(std::move(camera)),
> > facing_(CAMERA_FACING_FRONT), orientation_(0)
> > {
> > + worker_.requestQueueFailed.connect(this, &CameraDevice::requestQueueFailed);
> > camera_->requestCompleted.connect(this, &CameraDevice::requestComplete);
> >
> > maker_ = "libcamera";
> > @@ -1060,6 +1061,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > return 0;
> > }
> >
> > +void CameraDevice::requestQueueFailed(CaptureRequest *request)
> > +{
> > + /*
> > + * Remove the descriptor from the map if the worker has failed
> > + * to process it and notify the camera service about the failure.
> > + */
> > + MutexLocker descriptorsLock(descriptorsMutex_);
> > + auto it = descriptors_.find(request->cookie());
> > + if (it == descriptors_.end()) {
> > + LOG(HAL, Fatal)
>
> If I understand correctly, this must not happen because
Do you mean the it == end() with "this" ?
It should not happen as requestQueueFail is called by the worker if it
fails to wait on a fence and the descriptor is already part of
descriptors.
Actually, there's a small window for this to race now that I look at
it again
--------------------------------------------------------------------------
CameraDevice CameraWorker
--------------------------------------------------------------------------
worker_.queueRequest() ->
worker.processRequest
fence.wait() == error
<- requestQueueFailed.emit()
requestQueueFailed()
it == descriptors_.end()
abort();
descriptors_[req] = desc;
--------------------------------------------------------------------------
While the intended order of execution is
CameraDevice CameraWorker
--------------------------------------------------------------------------
worker_.queueRequest()
descriptors_[req] = desc;
->
worker.processRequest
fence.wait() == error
<- requestQueueFailed.emit()
requestQueueFailed()
it == descriptors_.end()
abort();
--------------------------------------------------------------------------
I'll make sure to add the descriptor to descriptors_ before calling
worker_.queuRequest().
> requestQueueFailed() is invoked by worker::stop() and it is executed
Why do you say it is invoked by stop() ? The signal is emitted if
waiting on a fence fails as shown...
> before descriptors_.clear() always.
> Correct?
>
> Reviewed-by: Hirokazu Honda <hiroh at chromium.org>
> > */
> > -CaptureRequest::CaptureRequest(libcamera::Camera *camera)
> > - : camera_(camera)
> > +CaptureRequest::CaptureRequest(libcamera::Camera *camera,
> > + const camera3_capture_request_t *camera3Request)
> > + : camera_(camera), camera3Request_(camera3Request)
> > {
> > request_ = camera_->createRequest(reinterpret_cast<uint64_t>(this));
> > }
> > @@ -77,7 +78,7 @@ void CameraWorker::queueRequest(CaptureRequest *request)
> > {
> > /* Async process the request on the worker which runs its own thread. */
> > worker_.invokeMethod(&Worker::processRequest, ConnectionTypeQueued,
> > - request);
> > + this, request);
> > }
> >
> > /*
> > @@ -109,7 +110,7 @@ int CameraWorker::Worker::waitFence(int fence)
> > return -errno;
> > }
> >
> > -void CameraWorker::Worker::processRequest(CaptureRequest *request)
> > +void CameraWorker::Worker::processRequest(CameraWorker *context, CaptureRequest *request)
> > {
> > /* Wait on all fences before queuing the Request. */
> > for (int fence : request->fences()) {
> > @@ -121,6 +122,7 @@ void CameraWorker::Worker::processRequest(CaptureRequest *request)
> > if (ret < 0) {
> > LOG(HAL, Error) << "Failed waiting for fence: "
> > << fence << ": " << strerror(-ret);
> > + context->requestQueueFailed.emit(request);
.... Here
Is your question about stop() related to dispatching messages before
the thread has actually been stopped ? Isn't that been fixed some time
ago by overriding CameraWorker::exec() ?
Thanks
j
More information about the libcamera-devel
mailing list