[libcamera-devel] [PATCH] android: camera_device: Fix race on queuing capture request

Jacopo Mondi jacopo at jmondi.org
Mon Sep 27 15:50:36 CEST 2021


Hi Umang, Hiro,

On Fri, Sep 24, 2021 at 01:49:19PM +0530, Umang Jain wrote:
> Hi Hiro,
>
> On 9/24/21 1:10 PM, Hirokazu Honda wrote:
> > Hi Umang, thank you for the patch.
> >
> > On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain at ideasonboard.com> wrote:
> > > The Camera3RequestDescriptor containing the capture request
> > > is adding to the descriptors_ map after a call to
> > > CameraWorker::queueRequest(). This is a race condition since
> > > CameraWorker::queueRequest() queues request to libcamera::Camera
> > > asynchronously and the addition of the descriptor to the map
> > > occurs with std::move(). Hence, it might happen that the async
> > > queueRequest() hasn't finished but the descriptor gets std::move()ed.
> > >
> > > Fix it by adding the descriptor to map first, before
> > > CameraWorker::queueRequest().
> >
> > I don't understand the problem well.
>
>
> It's a race between queueRequest() and adding to the descriptor_ std::map.
>
> queueRequest() is called asynchronously(separate thread)  so it might happen
> that queueRequest is still processing/queuing the request while we
> std::move()ed the descriptor (Which wraps the request queueRequest is
> accessing)
>
> If we fix the order with correct data-access it fixes the issue.
>
> > I think the pointer of descriptors.request_.get() is not invalidated
> > by std::move().

You know, I'm not really sure :)
Let's try to find it out...

This is the descriptor lifecycle inside this function

int processRequest()
{
        // Allocated on the stack, request_ is a unique_ptr<> embedded in
        // descriptor
        Camera3RequestDescriptor descriptor(camera_.get(), camera3Request);

        ....

        worker_.queueRequest(descriptor.request_.get());
        descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
}

Camera3RequestDescriptor::request_ is initialized as

	request_ = std::make_unique<CaptureRequest>(camera);

CameraDevice::descriptors_ is a map of type:
        std::map<uint64_t, Camera3RequestDescriptor> descriptors_;

And Camera3RequestDescriptor has a defaulted move assignment operator=()
       		Camera3RequestDescriptor &operator=(Camera3RequestDescriptor &&) = default;

As Camera3RequestDescriptor member are:

	struct Camera3RequestDescriptor {

		uint32_t frameNumber_ = 0;
		std::vector<camera3_stream_buffer_t> buffers_;
		std::vector<std::unique_ptr<libcamera::FrameBuffer>> frameBuffers_;
		CameraMetadata settings_;
		std::unique_ptr<CaptureRequest> request_;
	};

And most of them have a non-implicitly defined move assignment operator (vecors
and unique_ptr) I think we fall in the "Implicitly defined" section of [1]

-------------------------------------------------------------------------------
Implicitly-defined move assignment operator

If the implicitly-declared move assignment operator is neither deleted nor
trivial, it is defined (that is, a function body is generated and compiled) by
the compiler if odr-used or needed for constant evaluation (since C++14).

For union types, the implicitly-defined move assignment operator copies the
object representation (as by std::memmove).

For non-union class types (class and struct), the move assignment operator
performs full member-wise move assignment of the object's direct bases and
immediate non-static members, in their declaration order, using built-in
assignment for the scalars, memberwise move-assignment for arrays, and move
assignment operator for class types (called non-virtually).
-------------------------------------------------------------------------------

Which means we're effectively calling std::unique_ptr::operator[](&&) on
request_.

As std::unique_ptr::operator[](&&other) is equivalent to

        reset(other.release());

I guess what happens is that the location in memory of request_ doesn't change
but it's only its ownership that gets moved to the newly created entry in the
descriptors_ map.

Hence I think the code is safe the way it is, even if it might look suspicious
or fragile.

I think it could be made 'safer' if we want to by:

-       worker_.queueRequest(descriptor.request_.get());
-
+       CaptureRequest *req;
        {
                MutexLocker descriptorsLock(descriptorsMutex_);
+               unsigned long requestCookie = descriptor.request_->cookie();
-               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
+               descriptors_[requestCookie] = std::move(descriptor);
+               req = descriptors_[requestCookie].request_.get();
        }

+       worker_.queueRequest(req);
+
        return 0;
 }

But I didn't get if Umang hit any issue that lead him to send this patch.

[1] https://en.cppreference.com/w/cpp/language/move_assignment

>
>
> Why do you think so? :)
>
>     diff --git a/src/android/camera_device.cpp
> b/src/android/camera_device.cpp
>     index a693dcbe..0de7050d 100644
>     --- a/src/android/camera_device.cpp
>     +++ b/src/android/camera_device.cpp
>     @@ -1063,13 +1063,13 @@ int
> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>                     state_ = State::Running;
>             }
>
>     - worker_.queueRequest(descriptor.request_.get());
>     -
>             {
>                     MutexLocker descriptorsLock(descriptorsMutex_);
> descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
>             }
>
>     + worker_.queueRequest(descriptor.request_.get());
>     +
>
> is a segfaulted here (and expected no?).

For sure it does, you're now using 'descriptor' after it has been moved and it's
been left in unspecified state. As shown above std::unique_ptr::operator=(&&)
has been called on request_ which calls request_.release(), hence accessing it
with std::unique_ptr::get() is undefined behaviour.

>
> >
> > > Signed-off-by: Umang Jain <umang.jain at ideasonboard.com>
> > > ---
> > >   src/android/camera_device.cpp | 7 ++++---
> > >   1 file changed, 4 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > index 21844e51..c4c03d86 100644
> > > --- a/src/android/camera_device.cpp
> > > +++ b/src/android/camera_device.cpp
> > > @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > >                  state_ = State::Running;
> > >          }
> > >
> > > -       worker_.queueRequest(descriptor.request_.get());
> > > -
> > > +       unsigned long requestCookie = descriptor.request_->cookie();
> > >          {
> > >                  MutexLocker descriptorsLock(descriptorsMutex_);
> > > -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > > +               descriptors_[requestCookie] = std::move(descriptor);
> > >          }
> > >
> > > +       worker_.queueRequest(descriptors_[requestCookie].request_.get());
> > Accessing descriptors_ must be while holding descriptorsMutex_.
>
>
> I am in two minds here,
>
> We can lock it yes, but we are just reading from the map so, should be fine?

I think accessing descriptors_ should always happen while holding the
descriptorsMutex_ lock, as we extract nodes from the map in requestComplete()
which might invalidates iterators and re-points the internal pointers in the
map. I can't tell what happens if done concurrently with accessing the map, but
doesn't seem like a good idea.

Thanks
   j

>
> >
> > -Hiro
> > > +
> > >          return 0;
> > >   }
> > >
> > > --
> > > 2.31.1
> > >


More information about the libcamera-devel mailing list