[libcamera-devel] [PATCH v3 2/3] android: CameraDevice: Prevent race due to concurrent HAL calls

Jacopo Mondi jacopo at jmondi.org
Mon May 3 16:41:04 CEST 2021


Hi Hiro,

On Mon, Apr 26, 2021 at 05:38:29PM +0900, Hirokazu Honda wrote:
> HAL API functions might be called by different threads in Android
> Camera HAL3 API, while process_capture_request() must be called
> by a single thread. Furthermore, close() and flush() can be
> called any time. Therefore, races to variables can occur among
> the HAL API functions.
> This prevents the races by enclosing HAL calls by mutex. It
> might not be the best in terms of the performance, but the
> simplicity is good as a first step and also further development.
>
> Signed-off-by: Hirokazu Honda <hiroh at chromium.org>
> ---
>  src/android/camera_device.cpp | 59 ++++++++++++++++++++---------------
>  src/android/camera_device.h   |  7 +++--
>  2 files changed, 39 insertions(+), 27 deletions(-)
>
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index b3def04b..96c8d4a9 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -743,7 +743,7 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
>
>  void CameraDevice::close()
>  {
> -	streams_.clear();
> +	std::scoped_lock<std::mutex> halLock(halMutex_);

Have you considered locking the wrappers functions in camera_ops.cpp ?
Wouldn't it be simpler ? However locking with such a coarse-grained
granularity I feel contradicts a bit the Camera3 API flush()
documentation, which implies that flush() and
process_capture_request() can race

     * Due to
     * concurrency issues, it is possible that from the HAL's point of view, a
     * process_capture_request() call may be started after flush has been invoked but has not
     * returned yet. If such a call happens before flush() returns, the HAL should treat the new
     * capture request like other in-flight pending requests (see #4 below).

Locking at function level rules out this use case: if we lock each
function it's not possible for process_capture_request() to be started
during flush() execution as the lock hold by flush will be released
only at exit time.

If serializing operations with this granularity is fine (and I really
with it is, as it would really simplify life for use) the only racing
condition we have to protect from is between the request completion
signal emitted by libcamera, and the camera framework calling into the
Camera HAL using the camera3 API as flush(), close() and
process_capture_request() will always be serialized.

Assuming the above is ok with the Camera3 API, isn't it enough to

- flush() to call stop() and stop the Camera:
        - All queued requests at the time Camera::stop() is called
          will be returned one by one with an error state. There's no
          reason to cancel the descriptors at stop() time, those
          requests will be completed

        - A process_capture_request() called after flush() has stopped
          the camera will return immeditely as the camera is stopped
          and the capture request is returned with an error
          (we might want a flag to keep track of this, otherwise we'll
          hit a Camera state machine error, much like we do with
          running_)

        - A close() called after flush() will be a no op

        - Only open() will allow the CameraDevice to move back from
          stopped_ state

- There's a tiny tiny window where a request can complete after flush
  has been called but the camera has not been stopped yet. In
  requestComplete() we'll have to inspect the stopped_ flag state and
  force the request to return with error even if it did not.

- Once we move JPEG encoding to a worker thread, stop() will have to
  forcefully stop it. Requests interrupted during encoding will be
  returned with error.

A simple lock around stopped_ should protect it from concurrent
accesses between stop() and requestComplete().

Have you considered this option and ruled it out for reasons I'm
failing to consider ? It feels simpler than decoupling requests and
descriptors, maybe you tried and found out it's not :)

Thanks
  j


>
>  	stop();
>
> @@ -758,12 +758,17 @@ void CameraDevice::stop()
>  	worker_.stop();
>  	camera_->stop();
>
> -	descriptors_.clear();
> +	{
> +		std::scoped_lock<std::mutex> lock(mutex_);
> +		descriptors_.clear();
> +	}
> +
>  	running_ = false;
>  }
>
>  void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
>  {
> +	std::scoped_lock<std::mutex> halLock(halMutex_);
>  	callbacks_ = callbacks;
>  }
>
> @@ -1643,8 +1648,7 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
>   */
>  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>  {
> -	/* Before any configuration attempt, stop the camera. */
> -	stop();
> +	std::scoped_lock<std::mutex> halLock(halMutex_);
>
>  	if (stream_list->num_streams == 0) {
>  		LOG(HAL, Error) << "No streams in configuration";
> @@ -1656,6 +1660,9 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>  		return -EINVAL;
>  #endif
>
> +	/* Before any configuration attempt, stop the camera. */
> +	stop();
> +
>  	/*
>  	 * Generate an empty configuration, and construct a StreamConfiguration
>  	 * for each camera3_stream to add to it.
> @@ -1671,6 +1678,8 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
>  	 * ensure the required entries are available without further
>  	 * reallocation.
>  	 */
> +
> +	std::scoped_lock<std::mutex> lock(mutex_);
>  	streams_.clear();
>  	streams_.reserve(stream_list->num_streams);
>
> @@ -1907,6 +1916,8 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>  	if (!isValidRequest(camera3Request))
>  		return -EINVAL;
>
> +	std::scoped_lock<std::mutex> halLock(halMutex_);
> +
>  	/* Start the camera if that's the first request we handle. */
>  	if (!running_) {
>  		worker_.start();
> @@ -2022,29 +2033,26 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>
>  void CameraDevice::requestComplete(Request *request)
>  {
> +	std::scoped_lock<std::mutex> lock(mutex_);
> +
>  	const Request::BufferMap &buffers = request->buffers();
>  	camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
>  	std::unique_ptr<CameraMetadata> resultMetadata;
>
> -	decltype(descriptors_)::node_type node;
> -	std::unique_ptr<CaptureRequest> captureRequest;
> -	{
> -		std::scoped_lock<std::mutex> lock(mutex_);
> -		auto requestIt = requests_.find(request->cookie());
> -		if (requestIt == requests_.end()) {
> -			LOG(HAL, Fatal)
> -				<< "Unknown request: " << request->cookie();
> -			return;
> -		}
> -		captureRequest = std::move(requestIt->second);
> -		requests_.erase(requestIt);
> +	auto requestIt = requests_.find(request->cookie());
> +	if (requestIt == requests_.end()) {
> +		LOG(HAL, Fatal)
> +			<< "Unknown request: " << request->cookie();
> +		return;
> +	}
> +	auto captureRequest = std::move(requestIt->second);
> +	requests_.erase(requestIt);
>
> -		auto it = descriptors_.find(request->cookie());
> -		if (it == descriptors_.end())
> -			return;
> +	auto it = descriptors_.find(request->cookie());
> +	if (it == descriptors_.end())
> +		return;
>
> -		node = descriptors_.extract(it);
> -	}
> +	auto node = descriptors_.extract(it);
>  	Camera3RequestDescriptor &descriptor = node.mapped();
>
>  	if (request->status() != Request::RequestComplete) {
> @@ -2149,11 +2157,12 @@ void CameraDevice::notifyError(uint32_t frameNumber, camera3_stream_t *stream)
>  	camera3_notify_msg_t notify = {};
>
>  	/*
> -	 * \todo Report and identify the stream number or configuration to
> -	 * clarify the stream that failed.
> +	 * \todo Report and identify the stream number or configuration
> +	 * to clarify the stream that failed.
>  	 */
> -	LOG(HAL, Error) << "Error occurred on frame " << frameNumber << " ("
> -			<< toPixelFormat(stream->format).toString() << ")";
> +	LOG(HAL, Error)
> +		<< "Error occurred on frame " << descriptor.frameNumber_ << " ("
> +		<< toPixelFormat(descriptor.buffers_[0].stream->format).toString() << ")";
>
>  	notify.type = CAMERA3_MSG_ERROR;
>  	notify.message.error.error_stream = stream;
> diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> index 95d77890..325fb967 100644
> --- a/src/android/camera_device.h
> +++ b/src/android/camera_device.h
> @@ -125,9 +125,12 @@ private:
>
>  	std::vector<Camera3StreamConfiguration> streamConfigurations_;
>  	std::map<int, libcamera::PixelFormat> formatsMap_;
> -	std::vector<CameraStream> streams_;
>
> -	std::mutex mutex_; /* Protect descriptors_ and requests_. */
> +	/* Avoid races by concurrent Camera HAL API calls. */
> +	std::mutex halMutex_;
> +
> +	std::mutex mutex_; /* Protect streams_, descriptors_ and requests_. */
> +	std::vector<CameraStream> streams_;
>  	std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
>  	std::map<uint64_t, std::unique_ptr<CaptureRequest>> requests_;
>
> --
> 2.31.1.498.g6c1eba8ee3d-goog
>
> _______________________________________________
> libcamera-devel mailing list
> libcamera-devel at lists.libcamera.org
> https://lists.libcamera.org/listinfo/libcamera-devel


More information about the libcamera-devel mailing list