[libcamera-devel] [PATCH v3 2/3] android: CameraDevice: Prevent race due to concurrent HAL calls
Laurent Pinchart
laurent.pinchart at ideasonboard.com
Mon May 3 16:55:38 CEST 2021
Hello,
On Mon, May 03, 2021 at 04:41:04PM +0200, Jacopo Mondi wrote:
> On Mon, Apr 26, 2021 at 05:38:29PM +0900, Hirokazu Honda wrote:
> > HAL API functions might be called by different threads in Android
> > Camera HAL3 API, while process_capture_request() must be called
> > by a single thread. Furthermore, close() and flush() can be
> > called any time. Therefore, races to variables can occur among
> > the HAL API functions.
> > This prevents the races by enclosing HAL calls by mutex. It
> > might not be the best in terms of the performance, but the
> > simplicity is good as a first step and also further development.
> >
> > Signed-off-by: Hirokazu Honda <hiroh at chromium.org>
> > ---
> > src/android/camera_device.cpp | 59 ++++++++++++++++++++---------------
> > src/android/camera_device.h | 7 +++--
> > 2 files changed, 39 insertions(+), 27 deletions(-)
> >
> > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > index b3def04b..96c8d4a9 100644
> > --- a/src/android/camera_device.cpp
> > +++ b/src/android/camera_device.cpp
> > @@ -743,7 +743,7 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> >
> > void CameraDevice::close()
> > {
> > - streams_.clear();
> > + std::scoped_lock<std::mutex> halLock(halMutex_);
>
> Have you considered locking the wrappers functions in camera_ops.cpp ?
> Wouldn't it be simpler ? However locking with such a coarse-grained
> granularity I feel contradicts a bit the Camera3 API flush()
> documentation, which implies that flush() and
> process_capture_request() can race
>
> * Due to
> * concurrency issues, it is possible that from the HAL's point of view, a
> * process_capture_request() call may be started after flush has been invoked but has not
> * returned yet. If such a call happens before flush() returns, the HAL should treat the new
> * capture request like other in-flight pending requests (see #4 below).
>
> Locking at function level rules out this use case: if we lock each
> function it's not possible for process_capture_request() to be started
> during flush() execution as the lock hold by flush will be released
> only at exit time.
>
> If serializing operations with this granularity is fine (and I really
> with it is, as it would really simplify life for use)
I think it is as an initial step, but likely not in the longer term. It
defeats the point of having the camera service issue
process_capture_request() and stop()/flush() calls concurrently. If
serializing operations was good enough in every case, it would be done
in the camera service itself, and HAL implementations wouldn't need to
deal with this.
> the only racing
> condition we have to protect from is between the request completion
> signal emitted by libcamera, and the camera framework calling into the
> Camera HAL using the camera3 API as flush(), close() and
> process_capture_request() will always be serialized.
>
> Assuming the above is ok with the Camera3 API, isn't it enough to
>
> - flush() to call stop() and stop the Camera:
> - All queued requests at the time Camera::stop() is called
> will be returned one by one with an error state. There's no
> reason to cancel the descriptors at stop() time, those
> requests will be completed
>
> - A process_capture_request() called after flush() has stopped
> the camera will return immeditely as the camera is stopped
> and the capture request is returned with an error
> (we might want a flag to keep track of this, otherwise we'll
> hit a Camera state machine error, much like we do with
> running_)
>
> - A close() called after flush() will be a no op
So far it seems fine to me.
> - Only open() will allow the CameraDevice to move back from
> stopped_ state
That's correct for close(). For flush(), we can to make sure that a
configureStreams() call will reopen the camera.
> - There's a tiny tiny window where a request can complete after flush
> has been called but the camera has not been stopped yet. In
It's not that tiny, it can certainly be hit in practice.
> requestComplete() we'll have to inspect the stopped_ flag state and
> force the request to return with error even if it did not.
Why can't we complete the request without any error in that case ?
> - Once we move JPEG encoding to a worker thread, stop() will have to
> forcefully stop it. Requests interrupted during encoding will be
> returned with error.
>
> A simple lock around stopped_ should protect it from concurrent
> accesses between stop() and requestComplete().
The issue is that the flush() function must wait for all requests to
complete. This could be implemented with a condition variable, allowing
to avoid serializing CameraDevice::requestComplete() with the API calls
from the camera service. I think that would be better indeed.
> Have you considered this option and ruled it out for reasons I'm
> failing to consider ? It feels simpler than decoupling requests and
> descriptors, maybe you tried and found out it's not :)
>
> > stop();
> >
> > @@ -758,12 +758,17 @@ void CameraDevice::stop()
> > worker_.stop();
> > camera_->stop();
> >
> > - descriptors_.clear();
> > + {
> > + std::scoped_lock<std::mutex> lock(mutex_);
> > + descriptors_.clear();
> > + }
> > +
> > running_ = false;
> > }
> >
> > void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
> > {
> > + std::scoped_lock<std::mutex> halLock(halMutex_);
> > callbacks_ = callbacks;
> > }
> >
> > @@ -1643,8 +1648,7 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > */
> > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > {
> > - /* Before any configuration attempt, stop the camera. */
> > - stop();
> > + std::scoped_lock<std::mutex> halLock(halMutex_);
> >
> > if (stream_list->num_streams == 0) {
> > LOG(HAL, Error) << "No streams in configuration";
> > @@ -1656,6 +1660,9 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > return -EINVAL;
> > #endif
> >
> > + /* Before any configuration attempt, stop the camera. */
> > + stop();
> > +
> > /*
> > * Generate an empty configuration, and construct a StreamConfiguration
> > * for each camera3_stream to add to it.
> > @@ -1671,6 +1678,8 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > * ensure the required entries are available without further
> > * reallocation.
> > */
> > +
> > + std::scoped_lock<std::mutex> lock(mutex_);
> > streams_.clear();
> > streams_.reserve(stream_list->num_streams);
> >
> > @@ -1907,6 +1916,8 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > if (!isValidRequest(camera3Request))
> > return -EINVAL;
> >
> > + std::scoped_lock<std::mutex> halLock(halMutex_);
> > +
> > /* Start the camera if that's the first request we handle. */
> > if (!running_) {
> > worker_.start();
> > @@ -2022,29 +2033,26 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> >
> > void CameraDevice::requestComplete(Request *request)
> > {
> > + std::scoped_lock<std::mutex> lock(mutex_);
> > +
> > const Request::BufferMap &buffers = request->buffers();
> > camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
> > std::unique_ptr<CameraMetadata> resultMetadata;
> >
> > - decltype(descriptors_)::node_type node;
> > - std::unique_ptr<CaptureRequest> captureRequest;
> > - {
> > - std::scoped_lock<std::mutex> lock(mutex_);
> > - auto requestIt = requests_.find(request->cookie());
> > - if (requestIt == requests_.end()) {
> > - LOG(HAL, Fatal)
> > - << "Unknown request: " << request->cookie();
> > - return;
> > - }
> > - captureRequest = std::move(requestIt->second);
> > - requests_.erase(requestIt);
> > + auto requestIt = requests_.find(request->cookie());
> > + if (requestIt == requests_.end()) {
> > + LOG(HAL, Fatal)
> > + << "Unknown request: " << request->cookie();
> > + return;
> > + }
> > + auto captureRequest = std::move(requestIt->second);
> > + requests_.erase(requestIt);
> >
> > - auto it = descriptors_.find(request->cookie());
> > - if (it == descriptors_.end())
> > - return;
> > + auto it = descriptors_.find(request->cookie());
> > + if (it == descriptors_.end())
> > + return;
> >
> > - node = descriptors_.extract(it);
> > - }
> > + auto node = descriptors_.extract(it);
> > Camera3RequestDescriptor &descriptor = node.mapped();
> >
> > if (request->status() != Request::RequestComplete) {
> > @@ -2149,11 +2157,12 @@ void CameraDevice::notifyError(uint32_t frameNumber, camera3_stream_t *stream)
> > camera3_notify_msg_t notify = {};
> >
> > /*
> > - * \todo Report and identify the stream number or configuration to
> > - * clarify the stream that failed.
> > + * \todo Report and identify the stream number or configuration
> > + * to clarify the stream that failed.
> > */
> > - LOG(HAL, Error) << "Error occurred on frame " << frameNumber << " ("
> > - << toPixelFormat(stream->format).toString() << ")";
> > + LOG(HAL, Error)
> > + << "Error occurred on frame " << descriptor.frameNumber_ << " ("
> > + << toPixelFormat(descriptor.buffers_[0].stream->format).toString() << ")";
> >
> > notify.type = CAMERA3_MSG_ERROR;
> > notify.message.error.error_stream = stream;
> > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > index 95d77890..325fb967 100644
> > --- a/src/android/camera_device.h
> > +++ b/src/android/camera_device.h
> > @@ -125,9 +125,12 @@ private:
> >
> > std::vector<Camera3StreamConfiguration> streamConfigurations_;
> > std::map<int, libcamera::PixelFormat> formatsMap_;
> > - std::vector<CameraStream> streams_;
> >
> > - std::mutex mutex_; /* Protect descriptors_ and requests_. */
> > + /* Avoid races by concurrent Camera HAL API calls. */
> > + std::mutex halMutex_;
> > +
> > + std::mutex mutex_; /* Protect streams_, descriptors_ and requests_. */
> > + std::vector<CameraStream> streams_;
> > std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > std::map<uint64_t, std::unique_ptr<CaptureRequest>> requests_;
> >
--
Regards,
Laurent Pinchart
More information about the libcamera-devel
mailing list