[libcamera-devel] [PATCH v2 1/2] android: CameraDevice: Prevent race due to concurrent HAL calls

Laurent Pinchart laurent.pinchart at ideasonboard.com
Mon Apr 26 08:47:57 CEST 2021


Hi Hiro,

On Mon, Apr 26, 2021 at 11:48:02AM +0900, Hirokazu Honda wrote:
> On Mon, Apr 26, 2021 at 9:30 AM Laurent Pinchart wrote:
> > On Fri, Apr 23, 2021 at 01:07:37PM +0900, Hirokazu Honda wrote:
> > > HAL API functions might be called by different threads in Android
> > > Camera HAL3 API, while process_capture_request() must be called
> > > by a single thread.
> >
> > I'm not sure to follow you here. Do you mean that the HAL API requires
> > the camera service to call process_capture_request() from a single
> > thread ? Or do you mean that the libcamera HAL implementation assumes
> > that all calls come from the same thread ?
> 
> I mean process_capture_request() only. The document says.
>       * ... Only
>       * one call to process_capture_request() will be made at a time by the
>       * framework, and the calls will all be from the same thread.
> 
> 
> > > Furthermore, close() and flush() can be
> > > called any time. Therefore, races to variables can occur among
> > > the HAL API functions.
> >
> > I'd like to have a bit more details about calling contexts, and record
> > it in the commit message, or, actually better, in a comment at the top
> > of the file, as this is very important. There's unfortunately little
> > information about call contexts in the HAL documentation (camera3.h).
> > Here's what I've found:
> >
> >      * flush() may be called concurrently to process_capture_request(), with the expectation that
> >      * process_capture_request will return quickly and the request submitted in that
> >      * process_capture_request call is treated like all other in-flight requests.  Due to
> >      * concurrency issues, it is possible that from the HAL's point of view, a
> >      * process_capture_request() call may be started after flush has been invoked but has not
> >      * returned yet. If such a call happens before flush() returns, the HAL should treat the new
> >      * capture request like other in-flight pending requests (see #4 below).
> >
> > This means that flush() can be called from a different thread than
> > process_capture_request(). I think this makes sense, and we need to
> > handle that.
> >
> >      * process_capture_request:
> >      *
> >      * Send a new capture request to the HAL. The HAL should not return from
> >      * this call until it is ready to accept the next request to process. Only
> >      * one call to process_capture_request() will be made at a time by the
> >      * framework, and the calls will all be from the same thread. The next call
> >      * to process_capture_request() will be made as soon as a new request and
> >      * its associated buffers are available. In a normal preview scenario, this
> >      * means the function will be called again by the framework almost
> >      * instantly.
> >
> > There's only one call to process_capture_request() at a time, so we
> > don't need to serialize process_capture_request() calls.
> >
> > That's pretty much all I've found. Is there any other documentation
> > you're aware of, information coming from discussions with the Android
> > camera team, or from reading the Android camera service implementation ?
> 
> From Android camera team, flush() and close() can be called at any time.
> Yes, there is no race among process_capture_request() calls. However,
> the HAL functions must be serialized due to the any time of flush()
> and close() call.

flush() and close() can race with process_capture_request(). But can
they race each other ? Can they race with configure_streams() ?

> > > This prevents the races by enclosing HAL calls by mutex. It
> > > might not be the best in terms of the performance, but the
> > > simplicity is good as a first step and also further development.
> > >
> > > Signed-off-by: Hirokazu Honda <hiroh at chromium.org>
> > > ---
> > >  src/android/camera_device.cpp | 70 +++++++++++++++++++++--------------
> > >  src/android/camera_device.h   |  9 ++++-
> > >  2 files changed, 50 insertions(+), 29 deletions(-)
> > >
> > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > index a71aee2f..c74dc0e7 100644
> > > --- a/src/android/camera_device.cpp
> > > +++ b/src/android/camera_device.cpp
> > > @@ -748,27 +748,40 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > >
> > >  void CameraDevice::close()
> > >  {
> > > -     streams_.clear();
> > > -
> > > -     stop();
> > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > >
> > > -     camera_->release();
> > > +     stop(/*releaseCamera=*/true);
> >
> > The camera is only released here. Can't we keep the camera_->release()
> > call here, to avoid the bool parameter to stop() ?
> 
> Thanks, it should be better. I will move in the next patch.
> 
> > >  }
> > >
> > > -void CameraDevice::stop()
> > > +void CameraDevice::stop(bool releaseCamera)
> > >  {
> > > -     if (!running_)
> > > +     streams_.clear();
> > > +
> > > +     if (!running_) {
> > > +             if (releaseCamera) {
> > > +                     descriptors_.clear();
> > > +                     camera_->release();
> > > +             }
> > >               return;
> > > +     }
> > > +
> > > +     stopping_ = true;
> > >
> > >       worker_.stop();
> > >       camera_->stop();
> > >
> > >       descriptors_.clear();
> > > +
> > >       running_ = false;
> > > +     stopping_ = false;
> > > +
> > > +     if (releaseCamera)
> > > +             camera_->release();
> > >  }
> > >
> > >  void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
> > >  {
> > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > >       callbacks_ = callbacks;
> >
> > This function is called by the HAL .initialize() operation. I really
> > don't think it could race with anything else. While it's not explicitly
> > documented, I don't see how it would be reasonable for the camera
> > service to call .initialize() from one thread, and other HAL functions
> > from different threads before .initialize() returns.
> 
> Thanks. Yeah, I am sure this function will not be concurrently called
> with other functions.
> However, callbacks_ will be called in stop(), I wonder it is worth
> having a seliralization here also thinking of consistency.
> How do you think?

As commented below, even if we don't need to serialize these calls with
a lock, doing so shouldn't hurt performances, and should make locking
verification easier, so I'm not opposed to it.

> > >  }
> > >
> > > @@ -1581,6 +1594,8 @@ std::unique_ptr<CameraMetadata> CameraDevice::requestTemplateVideo()
> > >   */
> > >  const camera_metadata_t *CameraDevice::constructDefaultRequestSettings(int type)
> > >  {
> > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > > +
> >
> > This function shouldn't affect the state of the camera, does it need to
> > be serialized ?
> 
> Thanks. Even thinking of the consistency, having a serialization in
> the function is not reasonable.
> I will remove mutex_ in this function.
> 
> > >       auto it = requestTemplates_.find(type);
> > >       if (it != requestTemplates_.end())
> > >               return it->second->get();
> > > @@ -1648,8 +1663,7 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > >   */
> > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >  {
> > > -     /* Before any configuration attempt, stop the camera. */
> > > -     stop();
> > > +     std::scoped_lock<std::mutex> lock(mutex_);
> >
> > Same comment here, is this something that can race with other calls ?
> >
> > This being said, I'm not opposed to locking the mutex here and in
> > CameraDevice::setCallbacks(). Locks are best documented as what data
> > they protect, and using them consistently with their documentation makes
> > it easier to check correct locking (this includes making it easier for
> > static code checkers to catch issues). As taking the lock here should
> > not cause performance issues (especially given that there should be no
> > concurrent call to this function), we can keep it for consistency.
> 
> I think flush() and close() can be called during configureStreams() call.

That's interesting. I would have expected those to be serialized. Is
there any documentation that explains this ?

> I think it is mandatory to have a lock at least somewhere in
> process_capture_request(). If we don't enclose
> process_capture_request() by lock, handling flush() and concurrent
> process_capture_request() might become a bit complicated (though I can
> still do it, I actually implemented so initially but discarded because
> of the complexity).
> 
> > >
> > >       if (stream_list->num_streams == 0) {
> > >               LOG(HAL, Error) << "No streams in configuration";
> > > @@ -1661,6 +1675,9 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >               return -EINVAL;
> > >  #endif
> > >
> > > +     /* Before any configuration attempt, stop the camera. */
> > > +     stop();
> > > +
> > >       /*
> > >        * Generate an empty configuration, and construct a StreamConfiguration
> > >        * for each camera3_stream to add to it.
> > > @@ -1672,11 +1689,10 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >       }
> > >
> > >       /*
> > > -      * Clear and remove any existing configuration from previous calls, and
> > > -      * ensure the required entries are available without further
> > > +      * Ensure the required entries are available without further
> > >        * reallocation.
> > >        */
> > > -     streams_.clear();
> > > +     ASSERT(streams_.empty());
> > >       streams_.reserve(stream_list->num_streams);
> > >
> > >       std::vector<Camera3StreamConfig> streamConfigs;
> > > @@ -1912,6 +1928,8 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > >       if (!isValidRequest(camera3Request))
> > >               return -EINVAL;
> > >
> > > +     std::scoped_lock<std::mutex> lock(mutex_);
> > > +
> > >       /* Start the camera if that's the first request we handle. */
> > >       if (!running_) {
> > >               worker_.start();
> > > @@ -2015,33 +2033,31 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > >
> > >       worker_.queueRequest(descriptor.request_.get());
> > >
> > > -     {
> > > -             std::scoped_lock<std::mutex> lock(mutex_);
> > > -             descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > > -     }
> > > +     descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > >
> > >       return 0;
> > >  }
> > >
> > >  void CameraDevice::requestComplete(Request *request)
> > >  {
> > > +     if (stopping_)
> > > +             return;
> > > +
> > > +     std::scoped_lock<std::mutex> lock(mutex_);
> >
> > The requestComplete signal is emitted from the CameraManager thread, and
> > must not block for a large amount of time. We already lock the mutex_ in
> > this function, but we currently use the mutex_ only to protect
> > descriptors_, which should never block for long. Extending mutex_ to
> > serialize the Camera calls is dangerous. Could we implement better
> > locking ? I think the first step would be to clearly explain what races
> > we need to handle, as mentioned above in the review of the commit
> > message. Race conditions are tricky to handle right, proper
> > documentation is a key.
> >
> > I know we already have issues with JPEG compression that is performed
> > synchronously in this function. This needs to be fixed, but let's not
> > make it worse and introduce another issue that will then need to be
> > fixed too.
> 
> I agree.
> Since requestComplete() doesn't have to be serialized with HAL
> functions, we don't need to have a lock from begin to end.
> My first thought is coarse lock is a good first step as fine lock
> might make the code complicated.
> I will next try a fine lock in this function.

Thank you. If it starts causing issues, please feel free to reach out to
discuss them before spending too much time on the implementation.

> > > +
> > >       const Request::BufferMap &buffers = request->buffers();
> > >       camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
> > >       std::unique_ptr<CameraMetadata> resultMetadata;
> > >
> > > -     decltype(descriptors_)::node_type node;
> > > -     {
> > > -             std::scoped_lock<std::mutex> lock(mutex_);
> > > -             auto it = descriptors_.find(request->cookie());
> > > -             if (it == descriptors_.end()) {
> > > -                     LOG(HAL, Fatal)
> > > -                             << "Unknown request: " << request->cookie();
> > > -                     status = CAMERA3_BUFFER_STATUS_ERROR;
> > > -                     return;
> > > -             }
> > > -
> > > -             node = descriptors_.extract(it);
> > > +     auto it = descriptors_.find(request->cookie());
> > > +     if (it == descriptors_.end()) {
> > > +             LOG(HAL, Debug)
> > > +                     << "Unknown request: " << request->cookie();
> > > +             status = CAMERA3_BUFFER_STATUS_ERROR;
> > > +             return;
> > >       }
> > > +
> > > +     auto node = descriptors_.extract(it);
> > >       Camera3RequestDescriptor &descriptor = node.mapped();
> > >
> > >       if (request->status() != Request::RequestComplete) {
> > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > > index c63e8e21..08553584 100644
> > > --- a/src/android/camera_device.h
> > > +++ b/src/android/camera_device.h
> > > @@ -88,7 +88,7 @@ private:
> > >               int androidFormat;
> > >       };
> > >
> > > -     void stop();
> > > +     void stop(bool releaseCamera = false);
> > >
> > >       int initializeStreamConfigurations();
> > >       std::vector<libcamera::Size>
> > > @@ -127,7 +127,12 @@ private:
> > >       std::map<int, libcamera::PixelFormat> formatsMap_;
> > >       std::vector<CameraStream> streams_;
> > >
> > > -     std::mutex mutex_; /* Protect descriptors_ */
> > > +     /*
> > > +      * Protect descriptors_ and camera_, and also avoid races by concurrent
> > > +      * Camera HAL API calls.
> > > +      */
> > > +     std::mutex mutex_;
> > > +     std::atomic_bool stopping_;
> > >       std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > >
> > >       std::string maker_;

-- 
Regards,

Laurent Pinchart


More information about the libcamera-devel mailing list