[libcamera-devel] [PATCH v2 1/2] android: CameraDevice: Prevent race due to concurrent HAL calls
Laurent Pinchart
laurent.pinchart at ideasonboard.com
Mon Apr 26 08:47:57 CEST 2021
Hi Hiro,
On Mon, Apr 26, 2021 at 11:48:02AM +0900, Hirokazu Honda wrote:
> On Mon, Apr 26, 2021 at 9:30 AM Laurent Pinchart wrote:
> > On Fri, Apr 23, 2021 at 01:07:37PM +0900, Hirokazu Honda wrote:
> > > HAL API functions might be called by different threads in Android
> > > Camera HAL3 API, while process_capture_request() must be called
> > > by a single thread.
> >
> > I'm not sure to follow you here. Do you mean that the HAL API requires
> > the camera service to call process_capture_request() from a single
> > thread ? Or do you mean that the libcamera HAL implementation assumes
> > that all calls come from the same thread ?
>
> I mean process_capture_request() only. The document says.
> * ... Only
> * one call to process_capture_request() will be made at a time by the
> * framework, and the calls will all be from the same thread.
>
>
> > > Furthermore, close() and flush() can be
> > > called any time. Therefore, races to variables can occur among
> > > the HAL API functions.
> >
> > I'd like to have a bit more details about calling contexts, and record
> > it in the commit message, or, actually better, in a comment at the top
> > of the file, as this is very important. There's unfortunately little
> > information about call contexts in the HAL documentation (camera3.h).
> > Here's what I've found:
> >
> > * flush() may be called concurrently to process_capture_request(), with the expectation that
> > * process_capture_request will return quickly and the request submitted in that
> > * process_capture_request call is treated like all other in-flight requests. Due to
> > * concurrency issues, it is possible that from the HAL's point of view, a
> > * process_capture_request() call may be started after flush has been invoked but has not
> > * returned yet. If such a call happens before flush() returns, the HAL should treat the new
> > * capture request like other in-flight pending requests (see #4 below).
> >
> > This means that flush() can be called from a different thread than
> > process_capture_request(). I think this makes sense, and we need to
> > handle that.
> >
> > * process_capture_request:
> > *
> > * Send a new capture request to the HAL. The HAL should not return from
> > * this call until it is ready to accept the next request to process. Only
> > * one call to process_capture_request() will be made at a time by the
> > * framework, and the calls will all be from the same thread. The next call
> > * to process_capture_request() will be made as soon as a new request and
> > * its associated buffers are available. In a normal preview scenario, this
> > * means the function will be called again by the framework almost
> > * instantly.
> >
> > There's only one call to process_capture_request() at a time, so we
> > don't need to serialize process_capture_request() calls.
> >
> > That's pretty much all I've found. Is there any other documentation
> > you're aware of, information coming from discussions with the Android
> > camera team, or from reading the Android camera service implementation ?
>
> From Android camera team, flush() and close() can be called at any time.
> Yes, there is no race among process_capture_request() calls. However,
> the HAL functions must be serialized due to the any time of flush()
> and close() call.
flush() and close() can race with process_capture_request(). But can
they race each other ? Can they race with configure_streams() ?
> > > This prevents the races by enclosing HAL calls by mutex. It
> > > might not be the best in terms of the performance, but the
> > > simplicity is good as a first step and also further development.
> > >
> > > Signed-off-by: Hirokazu Honda <hiroh at chromium.org>
> > > ---
> > > src/android/camera_device.cpp | 70 +++++++++++++++++++++--------------
> > > src/android/camera_device.h | 9 ++++-
> > > 2 files changed, 50 insertions(+), 29 deletions(-)
> > >
> > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > index a71aee2f..c74dc0e7 100644
> > > --- a/src/android/camera_device.cpp
> > > +++ b/src/android/camera_device.cpp
> > > @@ -748,27 +748,40 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > >
> > > void CameraDevice::close()
> > > {
> > > - streams_.clear();
> > > -
> > > - stop();
> > > + std::scoped_lock<std::mutex> lock(mutex_);
> > >
> > > - camera_->release();
> > > + stop(/*releaseCamera=*/true);
> >
> > The camera is only released here. Can't we keep the camera_->release()
> > call here, to avoid the bool parameter to stop() ?
>
> Thanks, it should be better. I will move in the next patch.
>
> > > }
> > >
> > > -void CameraDevice::stop()
> > > +void CameraDevice::stop(bool releaseCamera)
> > > {
> > > - if (!running_)
> > > + streams_.clear();
> > > +
> > > + if (!running_) {
> > > + if (releaseCamera) {
> > > + descriptors_.clear();
> > > + camera_->release();
> > > + }
> > > return;
> > > + }
> > > +
> > > + stopping_ = true;
> > >
> > > worker_.stop();
> > > camera_->stop();
> > >
> > > descriptors_.clear();
> > > +
> > > running_ = false;
> > > + stopping_ = false;
> > > +
> > > + if (releaseCamera)
> > > + camera_->release();
> > > }
> > >
> > > void CameraDevice::setCallbacks(const camera3_callback_ops_t *callbacks)
> > > {
> > > + std::scoped_lock<std::mutex> lock(mutex_);
> > > callbacks_ = callbacks;
> >
> > This function is called by the HAL .initialize() operation. I really
> > don't think it could race with anything else. While it's not explicitly
> > documented, I don't see how it would be reasonable for the camera
> > service to call .initialize() from one thread, and other HAL functions
> > from different threads before .initialize() returns.
>
> Thanks. Yeah, I am sure this function will not be concurrently called
> with other functions.
> However, callbacks_ will be called in stop(), I wonder it is worth
> having a seliralization here also thinking of consistency.
> How do you think?
As commented below, even if we don't need to serialize these calls with
a lock, doing so shouldn't hurt performances, and should make locking
verification easier, so I'm not opposed to it.
> > > }
> > >
> > > @@ -1581,6 +1594,8 @@ std::unique_ptr<CameraMetadata> CameraDevice::requestTemplateVideo()
> > > */
> > > const camera_metadata_t *CameraDevice::constructDefaultRequestSettings(int type)
> > > {
> > > + std::scoped_lock<std::mutex> lock(mutex_);
> > > +
> >
> > This function shouldn't affect the state of the camera, does it need to
> > be serialized ?
>
> Thanks. Even thinking of the consistency, having a serialization in
> the function is not reasonable.
> I will remove mutex_ in this function.
>
> > > auto it = requestTemplates_.find(type);
> > > if (it != requestTemplates_.end())
> > > return it->second->get();
> > > @@ -1648,8 +1663,7 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > > */
> > > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > {
> > > - /* Before any configuration attempt, stop the camera. */
> > > - stop();
> > > + std::scoped_lock<std::mutex> lock(mutex_);
> >
> > Same comment here, is this something that can race with other calls ?
> >
> > This being said, I'm not opposed to locking the mutex here and in
> > CameraDevice::setCallbacks(). Locks are best documented as what data
> > they protect, and using them consistently with their documentation makes
> > it easier to check correct locking (this includes making it easier for
> > static code checkers to catch issues). As taking the lock here should
> > not cause performance issues (especially given that there should be no
> > concurrent call to this function), we can keep it for consistency.
>
> I think flush() and close() can be called during configureStreams() call.
That's interesting. I would have expected those to be serialized. Is
there any documentation that explains this ?
> I think it is mandatory to have a lock at least somewhere in
> process_capture_request(). If we don't enclose
> process_capture_request() by lock, handling flush() and concurrent
> process_capture_request() might become a bit complicated (though I can
> still do it, I actually implemented so initially but discarded because
> of the complexity).
>
> > >
> > > if (stream_list->num_streams == 0) {
> > > LOG(HAL, Error) << "No streams in configuration";
> > > @@ -1661,6 +1675,9 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > return -EINVAL;
> > > #endif
> > >
> > > + /* Before any configuration attempt, stop the camera. */
> > > + stop();
> > > +
> > > /*
> > > * Generate an empty configuration, and construct a StreamConfiguration
> > > * for each camera3_stream to add to it.
> > > @@ -1672,11 +1689,10 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > }
> > >
> > > /*
> > > - * Clear and remove any existing configuration from previous calls, and
> > > - * ensure the required entries are available without further
> > > + * Ensure the required entries are available without further
> > > * reallocation.
> > > */
> > > - streams_.clear();
> > > + ASSERT(streams_.empty());
> > > streams_.reserve(stream_list->num_streams);
> > >
> > > std::vector<Camera3StreamConfig> streamConfigs;
> > > @@ -1912,6 +1928,8 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > if (!isValidRequest(camera3Request))
> > > return -EINVAL;
> > >
> > > + std::scoped_lock<std::mutex> lock(mutex_);
> > > +
> > > /* Start the camera if that's the first request we handle. */
> > > if (!running_) {
> > > worker_.start();
> > > @@ -2015,33 +2033,31 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > >
> > > worker_.queueRequest(descriptor.request_.get());
> > >
> > > - {
> > > - std::scoped_lock<std::mutex> lock(mutex_);
> > > - descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > > - }
> > > + descriptors_[descriptor.request_->cookie()] = std::move(descriptor);
> > >
> > > return 0;
> > > }
> > >
> > > void CameraDevice::requestComplete(Request *request)
> > > {
> > > + if (stopping_)
> > > + return;
> > > +
> > > + std::scoped_lock<std::mutex> lock(mutex_);
> >
> > The requestComplete signal is emitted from the CameraManager thread, and
> > must not block for a large amount of time. We already lock the mutex_ in
> > this function, but we currently use the mutex_ only to protect
> > descriptors_, which should never block for long. Extending mutex_ to
> > serialize the Camera calls is dangerous. Could we implement better
> > locking ? I think the first step would be to clearly explain what races
> > we need to handle, as mentioned above in the review of the commit
> > message. Race conditions are tricky to handle right, proper
> > documentation is a key.
> >
> > I know we already have issues with JPEG compression that is performed
> > synchronously in this function. This needs to be fixed, but let's not
> > make it worse and introduce another issue that will then need to be
> > fixed too.
>
> I agree.
> Since requestComplete() doesn't have to be serialized with HAL
> functions, we don't need to have a lock from begin to end.
> My first thought is coarse lock is a good first step as fine lock
> might make the code complicated.
> I will next try a fine lock in this function.
Thank you. If it starts causing issues, please feel free to reach out to
discuss them before spending too much time on the implementation.
> > > +
> > > const Request::BufferMap &buffers = request->buffers();
> > > camera3_buffer_status status = CAMERA3_BUFFER_STATUS_OK;
> > > std::unique_ptr<CameraMetadata> resultMetadata;
> > >
> > > - decltype(descriptors_)::node_type node;
> > > - {
> > > - std::scoped_lock<std::mutex> lock(mutex_);
> > > - auto it = descriptors_.find(request->cookie());
> > > - if (it == descriptors_.end()) {
> > > - LOG(HAL, Fatal)
> > > - << "Unknown request: " << request->cookie();
> > > - status = CAMERA3_BUFFER_STATUS_ERROR;
> > > - return;
> > > - }
> > > -
> > > - node = descriptors_.extract(it);
> > > + auto it = descriptors_.find(request->cookie());
> > > + if (it == descriptors_.end()) {
> > > + LOG(HAL, Debug)
> > > + << "Unknown request: " << request->cookie();
> > > + status = CAMERA3_BUFFER_STATUS_ERROR;
> > > + return;
> > > }
> > > +
> > > + auto node = descriptors_.extract(it);
> > > Camera3RequestDescriptor &descriptor = node.mapped();
> > >
> > > if (request->status() != Request::RequestComplete) {
> > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > > index c63e8e21..08553584 100644
> > > --- a/src/android/camera_device.h
> > > +++ b/src/android/camera_device.h
> > > @@ -88,7 +88,7 @@ private:
> > > int androidFormat;
> > > };
> > >
> > > - void stop();
> > > + void stop(bool releaseCamera = false);
> > >
> > > int initializeStreamConfigurations();
> > > std::vector<libcamera::Size>
> > > @@ -127,7 +127,12 @@ private:
> > > std::map<int, libcamera::PixelFormat> formatsMap_;
> > > std::vector<CameraStream> streams_;
> > >
> > > - std::mutex mutex_; /* Protect descriptors_ */
> > > + /*
> > > + * Protect descriptors_ and camera_, and also avoid races by concurrent
> > > + * Camera HAL API calls.
> > > + */
> > > + std::mutex mutex_;
> > > + std::atomic_bool stopping_;
> > > std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > >
> > > std::string maker_;
--
Regards,
Laurent Pinchart
More information about the libcamera-devel
mailing list