[libcamera-devel] [PATCH] android: camera_device: Fix race on queuing capture request

Umang Jain umang.jain at ideasonboard.com
Mon Sep 27 17:13:48 CEST 2021


Hi Jacopo, Hiro,

On 9/27/21 7:20 PM, Jacopo Mondi wrote:
> Hi Umang, Hiro,
>
> On Fri, Sep 24, 2021 at 01:49:19PM +0530, Umang Jain wrote:
>> Hi Hiro,
>>
>> On 9/24/21 1:10 PM, Hirokazu Honda wrote:
>>> Hi Umang, thank you for the patch.
>>>
>>> On Fri, Sep 24, 2021 at 3:44 PM Umang Jain <umang.jain at ideasonboard.com> wrote:
>>>> The Camera3RequestDescriptor containing the capture request
>>>> is adding to the descriptors_ map after a call to
>>>> CameraWorker::queueRequest(). This is a race condition since
>>>> CameraWorker::queueRequest() queues request to libcamera::Camera
>>>> asynchronously and the addition of the descriptor to the map
>>>> occurs with std::move(). Hence, it might happen that the async
>>>> queueRequest() hasn't finished but the descriptor gets std::move()ed.
>>>>
>>>> Fix it by adding the descriptor to map first, before
>>>> CameraWorker::queueRequest().
>>> I don't understand the problem well.
>>
>> It's a race between queueRequest() and adding to the descriptor_ std::map.
>>
>> queueRequest() is called asynchronously(separate thread)  so it might happen
>> that queueRequest is still processing/queuing the request while we
>> std::move()ed the descriptor (Which wraps the request queueRequest is
>> accessing)
>>
>> If we fix the order with correct data-access it fixes the issue.
>>
>>> I think the pointer of descriptors.request_.get() is not invalidated
>>> by std::move().
> You know, I'm not really sure :)
> Let's try to find it out...
>
> This is the descriptor lifecycle inside this function
>
> int processRequest()
> {
>          // Allocated on the stack, request_ is a unique_ptr<> embedded in
>          // descriptor
>          Camera3RequestDescriptor descriptor(camera_.get(), camera3Request);
>
>          ....
>
>          worker_.queueRequest(descriptor.request_.get());
>          descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> }
>
> Camera3RequestDescriptor::request_ is initialized as
>
> 	request_ = std::make_unique<CaptureRequest>(camera);
>
> CameraDevice::descriptors_ is a map of type:
>          std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
>
> And Camera3RequestDescriptor has a defaulted move assignment operator=()
>         		Camera3RequestDescriptor &operator=(Camera3RequestDescriptor &&) = default;
>
> As Camera3RequestDescriptor member are:
>
> 	struct Camera3RequestDescriptor {
>
> 		uint32_t frameNumber_ = 0;
> 		std::vector<camera3_stream_buffer_t> buffers_;
> 		std::vector<std::unique_ptr<libcamera::FrameBuffer>> frameBuffers_;
> 		CameraMetadata settings_;
> 		std::unique_ptr<CaptureRequest> request_;
> 	};
>
> And most of them have a non-implicitly defined move assignment operator (vecors
> and unique_ptr) I think we fall in the "Implicitly defined" section of [1]
>
> -------------------------------------------------------------------------------
> Implicitly-defined move assignment operator
>
> If the implicitly-declared move assignment operator is neither deleted nor
> trivial, it is defined (that is, a function body is generated and compiled) by
> the compiler if odr-used or needed for constant evaluation (since C++14).
>
> For union types, the implicitly-defined move assignment operator copies the
> object representation (as by std::memmove).
>
> For non-union class types (class and struct), the move assignment operator
> performs full member-wise move assignment of the object's direct bases and
> immediate non-static members, in their declaration order, using built-in
> assignment for the scalars, memberwise move-assignment for arrays, and move
> assignment operator for class types (called non-virtually).
> -------------------------------------------------------------------------------
>
> Which means we're effectively calling std::unique_ptr::operator[](&&) on
> request_.
>
> As std::unique_ptr::operator[](&&other) is equivalent to
>
>          reset(other.release());
>
> I guess what happens is that the location in memory of request_ doesn't change
> but it's only its ownership that gets moved to the newly created entry in the
> descriptors_ map.
>
> Hence I think the code is safe the way it is, even if it might look suspicious
> or fragile.


On a lighter note, looks don't matter ? :P

I think I'll slightly dis-agree that it is safe..

I understand that the ::move is not really copying to a different 
location of memory, rather transfer of ownership happens, so technically 
it might not cause any problems, since pointers are still pointing to 
memory location they are supposed to point to (possibly v2 version of 
this patch is based on this fact). But, ::move()ing  while an async 
operation is in play on a memory location does concern me if we look at it.

If we want to queueRequest() /before/ the descriptor is moved, I think 
that should also happen with a lock, since, descriptor.request_.get() is 
a descriptor access, no? It might happen that the queueRequest() might 
also set something on the CaptureRequest pointer, so essentially, we 
would still to lock the entire async operation before firing it off, 
until it returns.


>
> I think it could be made 'safer' if we want to by:
>
> -       worker_.queueRequest(descriptor.request_.get());
> -
> +       CaptureRequest *req;
>          {
>                  MutexLocker descriptorsLock(descriptorsMutex_);
> +               unsigned long requestCookie = descriptor.request_->cookie();
> -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> +               descriptors_[requestCookie] = std::move(descriptor);
> +               req = descriptors_[requestCookie].request_.get();
>          }
>
> +       worker_.queueRequest(req);
> +
>          return 0;
>   }


Precisely, that intended goal of the patch is to queueRequest() after 
the descriptor is moved to the map.

>
> But I didn't get if Umang hit any issue that lead him to send this patch.


Please mind that v2 of this patch is already on the list last week, 
which might be of interest and further discussion. Also, I didn't get 
any issues with it, it's something Laurent initially spotted while code 
review [1]

Excerpt below :

>/> We have a race condition here, worker_.queueRequest() should go after />/> adding the request to the queue. Could you fix it in a patch on top ? />//>/Do you mean the race condition is existing already, with the />/descriptors_ map (that has been removed from this patch)? /
Correct, it's already here.

>/Yes, I can introduce a patch before this one, that fixes the race first />/in the map itself. Is my understanding correct? /
Sounds good to me. It should be a small patch.

[1]: 
https://lists.libcamera.org/pipermail/libcamera-devel/2021-September/025004.html

>
> [1] https://en.cppreference.com/w/cpp/language/move_assignment
>
>>
>> Why do you think so? :)
>>
>>      diff --git a/src/android/camera_device.cpp
>> b/src/android/camera_device.cpp
>>      index a693dcbe..0de7050d 100644
>>      --- a/src/android/camera_device.cpp
>>      +++ b/src/android/camera_device.cpp
>>      @@ -1063,13 +1063,13 @@ int
>> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>>                      state_ = State::Running;
>>              }
>>
>>      - worker_.queueRequest(descriptor.request_.get());
>>      -
>>              {
>>                      MutexLocker descriptorsLock(descriptorsMutex_);
>> descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
>>              }
>>
>>      + worker_.queueRequest(descriptor.request_.get());
>>      +
>>
>> is a segfaulted here (and expected no?).
> For sure it does, you're now using 'descriptor' after it has been moved and it's
> been left in unspecified state. As shown above std::unique_ptr::operator=(&&)
> has been called on request_ which calls request_.release(), hence accessing it
> with std::unique_ptr::get() is undefined behaviour.
>
>>>> Signed-off-by: Umang Jain <umang.jain at ideasonboard.com>
>>>> ---
>>>>    src/android/camera_device.cpp | 7 ++++---
>>>>    1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
>>>> index 21844e51..c4c03d86 100644
>>>> --- a/src/android/camera_device.cpp
>>>> +++ b/src/android/camera_device.cpp
>>>> @@ -1065,13 +1065,14 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>>>>                   state_ = State::Running;
>>>>           }
>>>>
>>>> -       worker_.queueRequest(descriptor.request_.get());
>>>> -
>>>> +       unsigned long requestCookie = descriptor.request_->cookie();
>>>>           {
>>>>                   MutexLocker descriptorsLock(descriptorsMutex_);
>>>> -               descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
>>>> +               descriptors_[requestCookie] = std::move(descriptor);
>>>>           }
>>>>
>>>> +       worker_.queueRequest(descriptors_[requestCookie].request_.get());
>>> Accessing descriptors_ must be while holding descriptorsMutex_.
>>
>> I am in two minds here,
>>
>> We can lock it yes, but we are just reading from the map so, should be fine?
> I think accessing descriptors_ should always happen while holding the
> descriptorsMutex_ lock, as we extract nodes from the map in requestComplete()
> which might invalidates iterators and re-points the internal pointers in the
> map. I can't tell what happens if done concurrently with accessing the map, but
> doesn't seem like a good idea.
>
> Thanks
>     j
>
>>> -Hiro
>>>> +
>>>>           return 0;
>>>>    }
>>>>
>>>> --
>>>> 2.31.1
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.libcamera.org/pipermail/libcamera-devel/attachments/20210927/e268dddb/attachment-0001.htm>


More information about the libcamera-devel mailing list