[libcamera-devel] [PATCH v2 1/2] android: CameraDevice: Prevent race due to concurrent HAL calls
Hirokazu Honda
hiroh at chromium.org
Mon Apr 26 04:48:02 CEST 2021
Hi Laurent, thanks for the comments.
On Mon, Apr 26, 2021 at 9:30 AM Laurent Pinchart
<laurent.pinchart at ideasonboard.com> wrote:
>
> Hi Hiro,
>
> Thank you for the patch.
>
> On Fri, Apr 23, 2021 at 01:07:37PM +0900, Hirokazu Honda wrote:
> > HAL API functions might be called by different threads in Android
> > Camera HAL3 API, while process_capture_request() must be called
> > by a single thread.
>
> I'm not sure to follow you here. Do you mean that the HAL API requires
> the camera service to call process_capture_request() from a single
> thread ? Or do you mean that the libcamera HAL implementation assumes
> that all calls come from the same thread ?
>
I mean process_capture_request() only. The document says.
* ... Only
* one call to process_capture_request() will be made at a time by the
* framework, and the calls will all be from the same thread.
> > Furthermore, close() and flush() can be
> > called any time. Therefore, races to variables can occur among
> > the HAL API functions.
>
> I'd like to have a bit more details about calling contexts, and record
> it in the commit message, or, actually better, in a comment at the top
> of the file, as this is very important. There's unfortunately little
> information about call contexts in the HAL documentation (camera3.h).
> Here's what I've found:
>
> * flush() may be called concurrently to process_capture_request(), with the expectation that
> * process_capture_request will return quickly and the request submitted in that
> * process_capture_request call is treated like all other in-flight requests. Due to
> * concurrency issues, it is possible that from the HAL's point of view, a
> * process_capture_request() call may be started after flush has been invoked but has not
> * returned yet. If such a call happens before flush() returns, the HAL should treat the new
> * capture request like other in-flight pending requests (see #4 below).
>
> This means that flush() can be called from a different thread than
> process_capture_request(). I think this makes sense, and we need to
> handle that.
>
> * process_capture_request:
> *
> * Send a new capture request to the HAL. The HAL should not return from
> * this call until it is ready to accept the next request to process. Only
> * one call to process_capture_request() will be made at a time by the
> * framework, and the calls will all be from the same thread. The next call
> * to process_capture_request() will be made as soon as a new request and
> * its associated buffers are available. In a normal preview scenario, this
> * means the function will be called again by the framework almost
> * instantly.
>
> There's only one call to process_capture_request() at a time, so we
> don't need to serialize process_capture_request() calls.
>
> That's pretty much all I've found. Is there any other documentation
> you're aware of, information coming from discussions with the Android
> camera team, or from reading the Android camera service implementation ?
>
>From Android camera team, flush() and close() can be called at any time.
Yes, there is no race among process_capture_request() calls. However,
the HAL functions must be serialized due to the any time of flush()
and close() call.
> > This prevents the races by enclosing HAL calls by mutex. It
> > might not be the best in terms of the performance, but the
> > simplicity is good as a first step and also further development.
> >
> > Signed-off-by: Hirokazu Honda <hiroh at chromium.org>
> > ---
> > src/android/camera_device.cpp | 70 +++++++++++++++++++++--------------
> > src/android/camera_device.h | 9 ++++-
> > 2 files changed, 50 insertions(+), 29 deletions(-)
> >
> > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > index a71aee2f..c74dc0e7 100644
> > --- a/src/android/camera_device.cpp
> > +++ b/src/android/camera_device.cpp
> > @@ -748,27 +748,40 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> >
> > void CameraDevice::close()
> > {
> > - streams_.clear();
> > -
> > - stop();
> > + std::scoped_lock<std::mutex> lock(mutex_);
> >
> > - camera_->release();
> > + stop(/*releaseCamera=*/true);
>
> The camera is only released here. Can't we keep the camera_->release()
> call here, to avoid the bool parameter to stop() ?
>
Thanks, it should be better. I will move in the next patch.
> > }
> >
> > -void CameraDevice::stop()
> > +void CameraDevice::stop(bool releaseCamera)
> > {
> > - if (!running_)
> > + streams_.clear();
> > +
> > + if (!running_) {
> > + if (releaseCamera) {
> > + descriptors_.clear();
> > + camera_->release();
> > + }
> > return;
> > + }
> > +
> > + stopping_ = true;
> >
> > worker_.stop();
> > camera_->stop();
> >
> > descriptors_.clear();
> > +
> > running_ = false;
> > + stopping_ = false;
> > +
> > + if (releaseCamera)
> > + camera_->release();
> > }
> >
> > void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
> > {
> > + std::scoped_lock<std::mutex> lock(mutex_);
> > callbacks_ = callbacks;
>
> This function is called by the HAL .initialize() operation. I really
> don't think it could race with anything else. While it's not explicitly
> documented, I don't see how it would be reasonable for the camera
> service to call .initialize() from one thread, and other HAL functions
> from different threads before .initialize() returns.
>
Thanks. Yeah, I am sure this function will not be concurrently called
with other functions.
However, callbacks_ will be called in stop(), I wonder it is worth
having a seliralization here also thinking of consistency.
How do you think?
> > }
> >
> > @@ -1581,6 +1594,8 @@ std::unique_ptr<CameraMetadata> CameraDevice::requestTemplateVideo()
> > */
> > const camera_metadata_t *CameraDevice::constructDefaultRequestSettings(int type)
> > {
> > + std::scoped_lock<std::mutex> lock(mutex_);
> > +
>
> This function shouldn't affect the state of the camera, does it need to
> be serialized ?
>
Thanks. Even thinking of the consistency, having a serialization in
the function is not reasonable.
I will remove mutex_ in this function.
> > auto it = requestTemplates_.find(type);
> > if (it != requestTemplates_.end())
> > return it->second->get();
> > @@ -1648,8 +1663,7 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > */
> > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > {
> > - /* Before any configuration attempt, stop the camera. */
> > - stop();
> > + std::scoped_lock<std::mutex> lock(mutex_);
>
> Same comment here, is this something that can race with other calls ?
>
> This being said, I'm not opposed to locking the mutex here and in
> CameraDevice::setCallbacks(). Locks are best documented as what data
> they protect, and using them consistently with their documentation makes
> it easier to check correct locking (this includes making it easier for
> static code checkers to catch issues). As taking the lock here should
> not cause performance issues (especially given that there should be no
> concurrent call to this function), we can keep it for consistency.
>
I think flush() and close() can be called during configureStreams() call.
I think it is mandatory to have a lock at least somewhere in
process_capture_request(). If we don't enclose
process_capture_request() by lock, handling flush() and concurrent
process_capture_request() might become a bit complicated (though I can
still do it, I actually implemented so initially but discarded because
of the complexity).
> >
> > if (stream_list->num_streams == 0) {
> > LOG(HAL, Error) << "No streams in configuration";
> > @@ -1661,6 +1675,9 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > return -EINVAL;
> > #endif
> >
> > + /* Before any configuration attempt, stop the camera. */
> > + stop();
> > +
> > /*
> > * Generate an empty configuration, and construct a StreamConfiguration
> > * for each camera3_stream to add to it.
> > @@ -1672,11 +1689,10 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > }
> >
> > /*
> > - * Clear and remove any existing configuration from previous calls, and
> > - * ensure the required entries are available without further
> > + * Ensure the required entries are available without further
> > * reallocation.
> > */
> > - streams_.clear();
> > + ASSERT(streams_.empty());
> > streams_.reserve(stream_list->num_streams);
> >
> > std::vector<Camera3StreamConfig> streamConfigs;
> > @@ -1912,6 +1928,8 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > if (!isValidRequest(camera3Request))
> > return -EINVAL;
> >
> > + std::scoped_lock<std::mutex> lock(mutex_);
> > +
> > /* Start the camera if that's the first request we handle. */
> > if (!running_) {
> > worker_.start();
> > @@ -2015,33 +2033,31 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> >
> > worker_.queueRequest(descriptor.request_.get());
> >
> > - {
> > - std::scoped_lock<std::mutex> lock(mutex_);
> > - descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > - }
> > + descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> >
> > return 0;
> > }
> >
> > void CameraDevice::requestComplete(Request *request)
> > {
> > + if (stopping_)
> > + return;
> > +
> > + std::scoped_lock<std::mutex> lock(mutex_);
>
> The requestComplete signal is emitted from the CameraManager thread, and
> must not block for a large amount of time. We already lock the mutex_ in
> this function, but we currently use the mutex_ only to protect
> descriptors_, which should never block for long. Extending mutex_ to
> serialize the Camera calls is dangerous. Could we implement better
> locking ? I think the first step would be to clearly explain what races
> we need to handle, as mentioned above in the review of the commit
> message. Race conditions are tricky to handle right, proper
> documentation is a key.
>
> I know we already have issues with JPEG compression that is performed
> synchronously in this function. This needs to be fixed, but let's not
> make it worse and introduce another issue that will then need to be
> fixed too.
>
I agree.
Since requestComplete() doesn't have to be serialized with HAL
functions, we don't need to have a lock from begin to end.
My first thought is coarse lock is a good first step as fine lock
might make the code complicated.
I will next try a fine lock in this function.
-Hiro
> > +
> > const Request::BufferMap &buffers = request->buffers();
> > camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
> > std::unique_ptr<CameraMetadata> resultMetadata;
> >
> > - decltype(descriptors_)::node_type node;
> > - {
> > - std::scoped_lock<std::mutex> lock(mutex_);
> > - auto it = descriptors_.find(request->cookie());
> > - if (it == descriptors_.end()) {
> > - LOG(HAL, Fatal)
> > - << "Unknown request: " << request->cookie();
> > - status = CAMERA3_BUFFER_STATUS_ERROR;
> > - return;
> > - }
> > -
> > - node = descriptors_.extract(it);
> > + auto it = descriptors_.find(request->cookie());
> > + if (it == descriptors_.end()) {
> > + LOG(HAL, Debug)
> > + << "Unknown request: " << request->cookie();
> > + status = CAMERA3_BUFFER_STATUS_ERROR;
> > + return;
> > }
> > +
> > + auto node = descriptors_.extract(it);
> > Camera3RequestDescriptor &descriptor = node.mapped();
> >
> > if (request->status() != Request::RequestComplete) {
> > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > index c63e8e21..08553584 100644
> > --- a/src/android/camera_device.h
> > +++ b/src/android/camera_device.h
> > @@ -88,7 +88,7 @@ private:
> > int androidFormat;
> > };
> >
> > - void stop();
> > + void stop(bool releaseCamera = false);
> >
> > int initializeStreamConfigurations();
> > std::vector<libcamera::Size>
> > @@ -127,7 +127,12 @@ private:
> > std::map<int, libcamera::PixelFormat> formatsMap_;
> > std::vector<CameraStream> streams_;
> >
> > - std::mutex mutex_; /* Protect descriptors_ */
> > + /*
> > + * Protect descriptors_ and camera_, and also avoid races by concurrent
> > + * Camera HAL API calls.
> > + */
> > + std::mutex mutex_;
> > + std::atomic_bool stopping_;
> > std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> >
> > std::string maker_;
>
> --
> Regards,
>
> Laurent Pinchart
More information about the libcamera-devel
mailing list