[libcamera-devel] [PATCH] android: camera_device: Fix race on queuing capture request
Jacopo Mondi
jacopo at jmondi.org
Mon Sep 27 15:50:36 CEST 2021
Hi Umang, Hiro,
On Fri, Sep 24, 2021 at 01:49:19PM +0530, Umang Jain wrote:
> Hi Hiro,
>
> On 9/24/21 1:10 PM, Hirokazu Honda wrote:
> > Hi Umang, thank you for the patch.
> >
> > On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain at ideasonboard.com> wrote:
> > > The Camera3RequestDescriptor containing the capture request
> > > is adding to the descriptors_ map after a call to
> > > CameraWorker::queueRequest(). This is a race condition since
> > > CameraWorker::queueRequest() queues request to libcamera::Camera
> > > asynchronously and the addition of the descriptor to the map
> > > occurs with std::move(). Hence, it might happen that the async
> > > queueRequest() hasn't finished but the descriptor gets std::move()ed.
> > >
> > > Fix it by adding the descriptor to map first, before
> > > CameraWorker::queueRequest().
> >
> > I don't understand the problem well.
>
>
> It's a race between queueRequest() and adding to the descriptor_ std::map.
>
> queueRequest() is called asynchronously(separate thread) so it might happen
> that queueRequest is still processing/queuing the request while we
> std::move()ed the descriptor (Which wraps the request queueRequest is
> accessing)
>
> If we fix the order with correct data-access it fixes the issue.
>
> > I think the pointer of descriptors.request_.get() is not invalidated
> > by std::move().
You know, I'm not really sure :)
Let's try to find it out...
This is the descriptor lifecycle inside this function
int processRequest()
{
// Allocated on the stack, request_ is a unique_ptr<> embedded in
// descriptor
Camera3RequestDescriptor descriptor(camera_.get(), camera3Request);
....
worker_.queueRequest(descriptor.request_.get());
descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
}
Camera3RequestDescriptor::request_ is initialized as
request_ = std::make_unique<CaptureRequest>(camera);
CameraDevice::descriptors_ is a map of type:
std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
And Camera3RequestDescriptor has a defaulted move assignment operator=()
Camera3RequestDescriptor &operator=(Camera3RequestDescriptor &&) = default;
As Camera3RequestDescriptor member are:
struct Camera3RequestDescriptor {
uint32_t frameNumber_ = 0;
std::vector<camera3_stream_buffer_t> buffers_;
std::vector<std::unique_ptr<libcamera::FrameBuffer>> frameBuffers_;
CameraMetadata settings_;
std::unique_ptr<CaptureRequest> request_;
};
And most of them have a non-implicitly defined move assignment operator (vecors
and unique_ptr) I think we fall in the "Implicitly defined" section of [1]
-------------------------------------------------------------------------------
Implicitly-defined move assignment operator
If the implicitly-declared move assignment operator is neither deleted nor
trivial, it is defined (that is, a function body is generated and compiled) by
the compiler if odr-used or needed for constant evaluation (since C++14).
For union types, the implicitly-defined move assignment operator copies the
object representation (as by std::memmove).
For non-union class types (class and struct), the move assignment operator
performs full member-wise move assignment of the object's direct bases and
immediate non-static members, in their declaration order, using built-in
assignment for the scalars, memberwise move-assignment for arrays, and move
assignment operator for class types (called non-virtually).
-------------------------------------------------------------------------------
Which means we're effectively calling std::unique_ptr::operator[](&&) on
request_.
As std::unique_ptr::operator[](&&other) is equivalent to
reset(other.release());
I guess what happens is that the location in memory of request_ doesn't change
but it's only its ownership that gets moved to the newly created entry in the
descriptors_ map.
Hence I think the code is safe the way it is, even if it might look suspicious
or fragile.
I think it could be made 'safer' if we want to by:
- worker_.queueRequest(descriptor.request_.get());
-
+ CaptureRequest *req;
{
MutexLocker descriptorsLock(descriptorsMutex_);
+ unsigned long requestCookie = descriptor.request_->cookie();
- descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
+ descriptors_[requestCookie] = std::move(descriptor);
+ req = descriptors_[requestCookie].request_.get();
}
+ worker_.queueRequest(req);
+
return 0;
}
But I didn't get if Umang hit any issue that lead him to send this patch.
[1] https://en.cppreference.com/w/cpp/language/move_assignment
>
>
> Why do you think so? :)
>
> diff --git a/src/android/camera_device.cpp
> b/src/android/camera_device.cpp
> index a693dcbe..0de7050d 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -1063,13 +1063,13 @@ int
> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> state_ = State::Running;
> }
>
> - worker_.queueRequest(descriptor.request_.get());
> -
> {
> MutexLocker descriptorsLock(descriptorsMutex_);
> descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> }
>
> + worker_.queueRequest(descriptor.request_.get());
> +
>
> is a segfaulted here (and expected no?).
For sure it does, you're now using 'descriptor' after it has been moved and it's
been left in unspecified state. As shown above std::unique_ptr::operator=(&&)
has been called on request_ which calls request_.release(), hence accessing it
with std::unique_ptr::get() is undefined behaviour.
>
> >
> > > Signed-off-by: Umang Jain <umang.jain at ideasonboard.com>
> > > ---
> > > src/android/camera_device.cpp | 7 ++++---
> > > 1 file changed, 4 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > index 21844e51..c4c03d86 100644
> > > --- a/src/android/camera_device.cpp
> > > +++ b/src/android/camera_device.cpp
> > > @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > state_ = State::Running;
> > > }
> > >
> > > - worker_.queueRequest(descriptor.request_.get());
> > > -
> > > + unsigned long requestCookie = descriptor.request_->cookie();
> > > {
> > > MutexLocker descriptorsLock(descriptorsMutex_);
> > > - descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > > + descriptors_[requestCookie] = std::move(descriptor);
> > > }
> > >
> > > + worker_.queueRequest(descriptors_[requestCookie].request_.get());
> > Accessing descriptors_ must be while holding descriptorsMutex_.
>
>
> I am in two minds here,
>
> We can lock it yes, but we are just reading from the map so, should be fine?
I think accessing descriptors_ should always happen while holding the
descriptorsMutex_ lock, as we extract nodes from the map in requestComplete()
which might invalidates iterators and re-points the internal pointers in the
map. I can't tell what happens if done concurrently with accessing the map, but
doesn't seem like a good idea.
Thanks
j
>
> >
> > -Hiro
> > > +
> > > return 0;
> > > }
> > >
> > > --
> > > 2.31.1
> > >
More information about the libcamera-devel
mailing list