[libcamera-devel] [PATCH v3 8/8] android: Implement flush() camera operation

Tomasz Figa tfiga at chromium.org
Thu May 27 04:49:35 CEST 2021


Hi Laurent,

On Thu, May 27, 2021 at 11:27 AM Laurent Pinchart
<laurent.pinchart at ideasonboard.com> wrote:
>
> Hi Jacopo,
>
> (expanding the CC list to finalize the race conditions discussion)
>
> On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote:
> > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:
> > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:
> > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:
> > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:
> > > > > > Implement the flush() camera operation in the CameraDevice class
> > > > > > and make it available to the camera framework by implementing the
> > > > > > operation wrapper in camera_ops.cpp.
> > > > > >
> > > > > > The flush() implementation stops the Camera and the worker thread and
> > > > > > waits for all in-flight requests to be returned. Stopping the Camera
> > > > > > forces all Requests already queued to be returned immediately in error
> > > > > > state. As flush() has to wait until all of them have been returned, make it
> > > > > > wait on a newly introduced condition variable which is notified by the
> > > > > > request completion handler when the queue of pending requests has been
> > > > > > exhausted.
> > > > > >
> > > > > > As flush() can race with processCaptureRequest() protect the requests
> > > > > > queueing by introducing a new CameraState::CameraFlushing state that
> > > > > > processCaptureRequest() inspects before queuing the Request to the
> > > > > > Camera. If flush() has been called while processCaptureRequest() was
> > > > > > executing, return the current Request immediately in error state.
> > > > > >
> > > > > > Protect potentially concurrent calls to close() and configureStreams()
> > >
> > > Can this happen ? Quoting camera3.h,
> > >
> > >  * 12. Alternatively, the framework may call camera3_device_t->common->close()
> > >  *    to end the camera session. This may be called at any time when no other
> > >  *    calls from the framework are active, although the call may block until all
> > >  *    in-flight captures have completed (all results returned, all buffers
> > >  *    filled). After the close call returns, no more calls to the
> > >  *    camera3_callback_ops_t functions are allowed from the HAL. Once the
> > >  *    close() call is underway, the framework may not call any other HAL device
> > >  *    functions.
> > >
> > > The important part is "when no other calss from the framework are
> > > active". I don't think we need to handle close() racing with anything
> > > else than process_capture_request().
> >
> > I've been discussing this with Hiro during v1, as initially I didn't
> > consider close() and configureStreams().
> >
> > https://patchwork.libcamera.org/patch/12248/#16884
> >
> > I initially only considered processCaptureRequest() as a potential
> > race, but got suggested differently by the cros camera team.
>
> Let's try to get to the bottom of this.
>
> Section S2 ("Startup and general expected operation sequence") states:
>
>  * 12. Alternatively, the framework may call camera3_device_t->common->close()
>  *    to end the camera session. This may be called at any time when no other
>  *    calls from the framework are active, although the call may block until all
>  *    in-flight captures have completed (all results returned, all buffers
>  *    filled). After the close call returns, no more calls to the
>  *    camera3_callback_ops_t functions are allowed from the HAL. Once the
>  *    close() call is underway, the framework may not call any other HAL device
>  *    functions.
>
> There can be in-flight requests when .close() is called, but it can't be
> called concurrently with any other call. There's thus no race condition
> to protect against.

Note that camera3.h is considered outdated, as the interface updates
have been reflected only in the HIDL version [1]. However, I don't see
a HIDL counterpart of the section you mentioned.

[1] https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/

>
> The .configure_streams() documentation states:
>
>      * Preconditions:
>      *
>      * The framework will only call this method when no captures are being
>      * processed. That is, all results have been returned to the framework, and
>      * all in-flight input and output buffers have been returned and their
>      * release sync fences have been signaled by the HAL. The framework will not
>      * submit new requests for capture while the configure_streams() call is
>      * underway.
>
> This clearly forbids calling .configure_streams() and
> .process_capture_request() concurrently.
>
> The .flush() documentation states:
>
>      * Flush all currently in-process captures and all buffers in the pipeline
>      * on the given device. The framework will use this to dump all state as
>      * quickly as possible in order to prepare for a configure_streams() call.
>
> I interpret this as at least a very strong hint that .flush() and
> .configure_streams() can't be called concurrently :-)
>
> If anyone disagrees, I'd like compelling evidence that those races can
> occur.

Indeed, the newest documentation [2] seems to be stating the same.

[2] https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/3.2/ICameraDeviceSession.hal;drc=48f3952ffc9bd6f4c610933d757a76020643aa52;l=107

My reading of the documentation is the same as Laurent's. Hiro, did
you hear something contradictory from the Android framework team?

Best regards,
Tomasz

>
> > > > > > by inspecting the CameraState, and force a wait for any flush() call
> > > > > > to complete before proceeding.
> > > > > >
> > > > > > Signed-off-by: Jacopo Mondi <jacopo at jmondi.org>
> > > > > > ---
> > > > > >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--
> > > > > >  src/android/camera_device.h   |  9 +++-
> > > > > >  src/android/camera_ops.cpp    |  8 +++-
> > > > > >  3 files changed, 100 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > > > index 3fce14035718..899afaa49439 100644
> > > > > > --- a/src/android/camera_device.cpp
> > > > > > +++ b/src/android/camera_device.cpp
> > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > > > >
> > > > > >  void CameraDevice::close()
> > > > > >  {
> > > > > > -     streams_.clear();
> > > > > > +     MutexLocker cameraLock(cameraMutex_);
> > >
> > > I'd add a blank line here.
> > >
> > > > > > +     if (state_ == CameraFlushing) {
> > >
> > > As mentioned above, I don't think you need to protect against close()
> > > and flush() racing each other.
> > >
> > > > > > +             flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > > > > > +             camera_->release();
> > > > > >
> > > > > > +             return;
> > > > > > +     }
> > > > > > +
> > > > > > +     streams_.clear();
> > > > > >       stop();
> > > > > >
> > > > > >       camera_->release();
> > > > > >  }
> > > > > >
> > > > > > -void CameraDevice::stop()
> > > > > > +/*
> > > > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait
> > >
> > > s/wait/waits/
> > >
> > > > > > + * until all the in-flight requests have been returned before setting the
> > > > > > + * camera state to stopped.
> > > > > > + *
> > > > > > + * Once flushing is done it unlocks concurrent calls to camera close() and
> > > > > > + * configureStreams().
> > > > > > + */
> > > > > > +void CameraDevice::flush()
> > > > > >  {
> > > > > > +     {
> > > > > > +             MutexLocker cameraLock(cameraMutex_);
> > > > > > +
> > > > > > +             if (state_ != CameraRunning)
> > > > > > +                     return;
> > > > > > +
> > > > > > +             worker_.stop();
> > > > > > +             camera_->stop();
> > > > > > +             state_ = CameraFlushing;
> > > > > > +     }
> > > > > > +
> > > > > > +     /*
> > > > > > +      * Now wait for all the in-flight requests to be completed before
> > > > > > +      * continuing. Stopping the Camera guarantees that all in-flight
> > > > > > +      * requests are completed in error state.
> > >
> > > Do we need to wait ? Camera::stop() guarantees that all requests
> > > complete synchronously with the stop() call.
> >
> > I didn't get the API that way... I thought after stop we would receive
> > a sequence of failed requests... Actually I don't see anything that
> > suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion
> > in Camera::stop()
>
> The camera::stop() documentation states
>
>  * This method stops capturing and processing requests immediately. All pending
>  * requests are cancelled and complete synchronously in an error state.
>
> Is this ambiguous ?
>
> > > Partly answering myself here, we'll have to wait for post-processing
> > > tasks to complete once we'll process them in a separate thread, but that
> > > will likely be handled by Thread::wait(). I don't think you need a
> > > condition variable here. I'm I'm not mistaken, this should simplify the
> > > implementation.
> >
> > If Camera::stop() is synchronous we don't need to wait indeed
> >
> > > > > > +      */
> > > > > > +     {
> > > > > > +             MutexLocker requestsLock(requestsMutex_);
> > > > > > +             flushing_.wait(requestsLock, [&] { return descriptors_.empty(); });
> > > > > > +     }
> > > > >
> > > > > I'm still uneasy about releasing the cameraMutex_ for this section. In
> > > > > patch 6/8 you add it to protect the state_ variable but here it's
> > > >
> > > > I'm not changing state_ without the mutex acquired, am I ?
> > > >
> > > > > ignored. I see the ASSERT() added to stop() but the patter of taking the
> > > > > lock checking state_, releasing the lock and do some work, retake the
> > > > > lock and update state_ feels like a bad idea. Maybe I'm missing
> > > >
> > > > How so, apart from the fact it feels a bit unusual, I concur ?
> > > >
> > > > If I keep the held the mutex for the whole duration of flush no other
> > > > concurrent method can proceed until all the queued requests have not
> > > > been completed. While flush waits for the flushing_ condition to be
> > > > signaled, processCaptureRequest() can proceed and immediately return
> > > > the newly queued requests in error state by detecting state_ ==
> > > > CameraFlushing which signals that flush in is progress.
> > > > Otherwise it would have had to wait for flush to end. But then we're back
> > > > to a situation where we could serialize all calls and that's it, we
> > > > would be done with a single mutex to be held for the whole duration of
> > > > all operations.
> > > >
> > > > If it only was for close() or configureStreams() we could have locked
> > > > for the whole duration of flush(), as they anyway wait for flush to
> > > > complete before proceeding (by waiting on the flushed_ condition here
> > > > below signaled).
> > > >
> > > > > something and this is not a real problem, if so maybe we can capture
> > > > > that in the comment here?
> > > > >
> > > > > > +
> > > > > > +     /*
> > > > > > +      * Set state to stopped and unlock close() or configureStreams() that
> > > > > > +      * might be waiting for flush to be completed.
> > > > > > +      */
> > > > > >       MutexLocker cameraLock(cameraMutex_);
> > > > > > +     state_ = CameraStopped;
> > > > > > +     flushed_.notify_one();
> > >
> > > You should drop the lock before calling notify_one(). Otherwise you'll
> > > wake up the task waiting on flushed_, which will try to lock
> > > cameraMutex_, which will block immediately. The scheduler will have to
> > > reschedule this task for the function to return and the lock to be
> > > released before the waiter can proceed. That works, but isn't very
> > > efficient.
> >
> > Weird, the cpp reference shows example about notify_one where the
> > caller always has the mutex held locked, but I see your point and
> > seems correct..
>
> I'm looking at
> https://en.cppreference.com/w/cpp/thread/condition_variable and
> https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one
> and both calls to notify_one() in the example are made without the lock
> held, aren't they ?
>
> > >
> > >     {
> > >             MutexLocker cameraLock(cameraMutex_);
> > >             state_ = CameraStopped;
> > >     }
> > >
> > >     flushed_.notify_one();
> > >
> >
> > So I could change to this one, if I don't have to drop this part
> > completely if we consider close() and configureStreams() not as
> > possible races...
> >
> > > > > > +}
> > > > > > +
> > > > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */
> > > > > > +void CameraDevice::stop()
> > > > > > +{
> > > > > > +     ASSERT(state_ != CameraFlushing);
> > > > > > +
> > > > > >       if (state_ == CameraStopped)
> > > > > >               return;
> > > > > >
> > > > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const
> > > > > >   */
> > > > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > > > >  {
> > > > > > -     /* Before any configuration attempt, stop the camera. */
> > > > > > -     stop();
> > > > > > +     {
> > > > > > +             /*
> > > > > > +              * If a flush is in progress, wait for it to complete and to
> > > > > > +              * stop the camera, otherwise before any new configuration
> > > > > > +              * attempt we have to stop the camera explictely.
> > > > > > +              */
> > >
> > > Same here, I don't think flush() and configure_streams() can race each
> > > other. I believe the only possible race to be between flush() and
> > > process_capture_request().
> >
> > Ditto.
> >
> > > > > > +             MutexLocker cameraLock(cameraMutex_);
> > > > > > +             if (state_ == CameraFlushing)
> > > > > > +                     flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });
> > > > > > +             else
> > > > > > +                     stop();
> > > > > > +     }
> > > > > >
> > > > > >       if (stream_list->num_streams == 0) {
> > > > > >               LOG(HAL, Error) << "No streams in configuration";
> > > > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > > > >       if (ret)
> > > > > >               return ret;
> > > > > >
> > > > > > +     /*
> > > > > > +      * Just before queuing the request, make sure flush() has not
> > > > > > +      * been called after this function has been executed. In that
> > > > > > +      * case, immediately return the request with errors.
> > > > > > +      */
> > > > > > +     MutexLocker cameraLock(cameraMutex_);
> > > > > > +     if (state_ == CameraFlushing || state_ == CameraStopped) {
> > > > > > +             for (camera3_stream_buffer_t &buffer : descriptor.buffers_) {
> > > > > > +                     buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
> > > > > > +                     buffer.release_fence = buffer.acquire_fence;
> > > > > > +             }
> > > > > > +
> > > > > > +             notifyError(descriptor.frameNumber_,
> > > > > > +                         descriptor.buffers_[0].stream,
> > >
> > > As commented on a previous patch, I think you should pass nullptr for
> > > the stream here.
> >
> > The "S6. Error management:" section of the camera3.h header does not
> > mention that, not the ?
>
> Indeed, that section doesn't mention the camera3_error_msg::error_stream
> field at all. The field is documented in the structure as
>
>     /**
>      * Pointer to the stream that had a failure. NULL if the stream isn't
>      * applicable to the error.
>      */
>
> The question is thus when the stream is applicable to the error. The
> documentation of enum camera3_error_msg_code mentions error_stream in
> the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to
> the device, the request or the result metadata, which are not specific
> to a stream.
>
> > where does you suggestion come from ? I don't find any reference in
> > the review of [1/8]
>
> ([PATCH v3 1/8] android: Rework request completion notification'
> (YKqV6Iik2sN3XUEf at pendragon.ideasonboard.com)
>
> > > > > > +                         CAMERA3_MSG_ERROR_REQUEST);
> > > > > > +
> > > > > > +             return 0;
> > > > > > +     }
> > > > > > +
> > > > > >       worker_.queueRequest(descriptor.request_.get());
> > > > > >
> > > > > >       {
> > > > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)
> > > > > >                       return;
> > > > > >               }
> > > > > >
> > > > > > +             /* Release flush if all the pending requests have been completed. */
> > > > > > +             if (descriptors_.empty())
> > > > > > +                     flushing_.notify_one();
> > >
> > > This will never happen, as you can only get here if descriptors_.find()
> > > has found the descriptor. Did you mean to do this after the extract()
> > > call below ?
> >
> > Ugh. This works only because Camera::stop() is synchronous then ?
>
> I believe so.
>
> > > > > > +
> > > > > >               node = descriptors_.extract(it);
> > > > > >       }
> > > > > >       Camera3RequestDescriptor &descriptor = node.mapped();
> > > > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644
> > > > > > --- a/src/android/camera_device.h
> > > > > > +++ b/src/android/camera_device.h
> > > > > > @@ -7,6 +7,7 @@
> > > > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__
> > > > > >  #define __ANDROID_CAMERA_DEVICE_H__
> > > > > >
> > > > > > +#include <condition_variable>
> > > > > >  #include <map>
> > > > > >  #include <memory>
> > > > > >  #include <mutex>
> > > > > > @@ -42,6 +43,7 @@ public:
> > > > > >
> > > > > >       int open(const hw_module_t *hardwareModule);
> > > > > >       void close();
> > > > > > +     void flush();
> > > > > >
> > > > > >       unsigned int id() const { return id_; }
> > > > > >       camera3_device_t *camera3Device() { return &camera3Device_; }
> > > > > > @@ -92,6 +94,7 @@ private:
> > > > > >       enum State {
> > > > > >               CameraStopped,
> > > > > >               CameraRunning,
> > > > > > +             CameraFlushing,
> > > > > >       };
> > > > > >
> > > > > >       void stop();
> > > > > > @@ -120,8 +123,9 @@ private:
> > > > > >
> > > > > >       CameraWorker worker_;
> > > > > >
> > > > > > -     libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */
> > > > > > +     libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */
> > > > > >       State state_;
> > > > > > +     std::condition_variable flushed_;
> > > > > >
> > > > > >       std::shared_ptr<libcamera::Camera> camera_;
> > > > > >       std::unique_ptr<libcamera::CameraConfiguration> config_;
> > > > > > @@ -134,8 +138,9 @@ private:
> > > > > >       std::map<int, libcamera::PixelFormat> formatsMap_;
> > > > > >       std::vector<CameraStream> streams_;
> > > > > >
> > > > > > -     libcamera::Mutex requestsMutex_; /* Protects descriptors_. */
> > > > > > +     libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */
> > > > > >       std::map<uint64_t, Camera3RequestDescriptor> descriptors_;
> > > > > > +     std::condition_variable flushing_;
> > > > > >
> > > > > >       std::string maker_;
> > > > > >       std::string model_;
> > > > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp
> > > > > > index 696e80436821..8a3cfa175ff5 100644
> > > > > > --- a/src/android/camera_ops.cpp
> > > > > > +++ b/src/android/camera_ops.cpp
> > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,
> > > > > >  {
> > > > > >  }
> > > > > >
> > > > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)
> > > > > > +static int hal_dev_flush(const struct camera3_device *dev)
> > > > > >  {
> > > > > > +     if (!dev)
> > > > > > +             return -EINVAL;
> > > > > > +
> > > > > > +     CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);
> > > > > > +     camera->flush();
> > > > > > +
> > > > > >       return 0;
> > > > > >  }
> > > > > >
>
> --
> Regards,
>
> Laurent Pinchart


More information about the libcamera-devel mailing list