[libcamera-devel] [PATCH v3 8/8] android: Implement flush() camera operation

Hirokazu Honda hiroh at chromium.org
Thu May 27 05:59:42 CEST 2021


Hi Laurent and Tomasz.

On Thu, May 27, 2021 at 11:49 AM Tomasz Figa <tfiga at chromium.org> wrote:

> Hi Laurent,
>
> On Thu, May 27, 2021 at 11:27 AM Laurent Pinchart
> <laurent.pinchart at ideasonboard.com> wrote:
> >
> > Hi Jacopo,
> >
> > (expanding the CC list to finalize the race conditions discussion)
> >
> > On Mon, May 24, 2021 at 09:47:55AM +0200, Jacopo Mondi wrote:
> > > On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:
> > > > On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:
> > > > > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:
> > > > > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:
> > > > > > > Implement the flush() camera operation in the CameraDevice
> class
> > > > > > > and make it available to the camera framework by implementing
> the
> > > > > > > operation wrapper in camera_ops.cpp.
> > > > > > >
> > > > > > > The flush() implementation stops the Camera and the worker
> thread and
> > > > > > > waits for all in-flight requests to be returned. Stopping the
> Camera
> > > > > > > forces all Requests already queued to be returned immediately
> in error
> > > > > > > state. As flush() has to wait until all of them have been
> returned, make it
> > > > > > > wait on a newly introduced condition variable which is
> notified by the
> > > > > > > request completion handler when the queue of pending requests
> has been
> > > > > > > exhausted.
> > > > > > >
> > > > > > > As flush() can race with processCaptureRequest() protect the
> requests
> > > > > > > queueing by introducing a new CameraState::CameraFlushing
> state that
> > > > > > > processCaptureRequest() inspects before queuing the Request to
> the
> > > > > > > Camera. If flush() has been called while
> processCaptureRequest() was
> > > > > > > executing, return the current Request immediately in error
> state.
> > > > > > >
> > > > > > > Protect potentially concurrent calls to close() and
> configureStreams()
> > > >
> > > > Can this happen ? Quoting camera3.h,
> > > >
> > > >  * 12. Alternatively, the framework may call
> camera3_device_t->common->close()
> > > >  *    to end the camera session. This may be called at any time when
> no other
> > > >  *    calls from the framework are active, although the call may
> block until all
> > > >  *    in-flight captures have completed (all results returned, all
> buffers
> > > >  *    filled). After the close call returns, no more calls to the
> > > >  *    camera3_callback_ops_t functions are allowed from the HAL.
> Once the
> > > >  *    close() call is underway, the framework may not call any other
> HAL device
> > > >  *    functions.
> > > >
> > > > The important part is "when no other calss from the framework are
> > > > active". I don't think we need to handle close() racing with anything
> > > > else than process_capture_request().
> > >
> > > I've been discussing this with Hiro during v1, as initially I didn't
> > > consider close() and configureStreams().
> > >
> > > https://patchwork.libcamera.org/patch/12248/#16884
> > >
> > > I initially only considered processCaptureRequest() as a potential
> > > race, but got suggested differently by the cros camera team.
> >
> > Let's try to get to the bottom of this.
> >
> > Section S2 ("Startup and general expected operation sequence") states:
> >
> >  * 12. Alternatively, the framework may call
> camera3_device_t->common->close()
> >  *    to end the camera session. This may be called at any time when no
> other
> >  *    calls from the framework are active, although the call may block
> until all
> >  *    in-flight captures have completed (all results returned, all
> buffers
> >  *    filled). After the close call returns, no more calls to the
> >  *    camera3_callback_ops_t functions are allowed from the HAL. Once the
> >  *    close() call is underway, the framework may not call any other HAL
> device
> >  *    functions.
> >
> > There can be in-flight requests when .close() is called, but it can't be
> > called concurrently with any other call. There's thus no race condition
> > to protect against.
>
> Note that camera3.h is considered outdated, as the interface updates
> have been reflected only in the HIDL version [1]. However, I don't see
> a HIDL counterpart of the section you mentioned.
>
> [1]
> https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/
>
> >
> > The .configure_streams() documentation states:
> >
> >      * Preconditions:
> >      *
> >      * The framework will only call this method when no captures are
> being
> >      * processed. That is, all results have been returned to the
> framework, and
> >      * all in-flight input and output buffers have been returned and
> their
> >      * release sync fences have been signaled by the HAL. The framework
> will not
> >      * submit new requests for capture while the configure_streams()
> call is
> >      * underway.
> >
> > This clearly forbids calling .configure_streams() and
> > .process_capture_request() concurrently.
> >
> > The .flush() documentation states:
> >
> >      * Flush all currently in-process captures and all buffers in the
> pipeline
> >      * on the given device. The framework will use this to dump all
> state as
> >      * quickly as possible in order to prepare for a configure_streams()
> call.
> >
> > I interpret this as at least a very strong hint that .flush() and
> > .configure_streams() can't be called concurrently :-)
> >
> > If anyone disagrees, I'd like compelling evidence that those races can
> > occur.
>
> Indeed, the newest documentation [2] seems to be stating the same.
>
> [2]
> https://cs.android.com/android/platform/superproject/+/master:hardware/interfaces/camera/device/3.2/ICameraDeviceSession.hal;drc=48f3952ffc9bd6f4c610933d757a76020643aa52;l=107
>
> My reading of the documentation is the same as Laurent's. Hiro, did
> you hear something contradictory from the Android framework team?
>
>
I got from Ricky that flush() and close() can be called anytime.
Ricky, have we requested the Android framework team to specify this in the
document?

Additionally, in the discussion mail with Android framework team, I found
the Android team told
> We also guarantee we won't call configureStreams while
processCaptureRequest is being called (or if there are any capture requests
that haven't yet been completed, for that matter).
So probably some protection in Jacopo's code can be saved. Sorry Jacopo for
reworking.

-Hiro


> Best regards,
> Tomasz
>
> >
> > > > > > > by inspecting the CameraState, and force a wait for any
> flush() call
> > > > > > > to complete before proceeding.
> > > > > > >
> > > > > > > Signed-off-by: Jacopo Mondi <jacopo at jmondi.org>
> > > > > > > ---
> > > > > > >  src/android/camera_device.cpp | 90
> +++++++++++++++++++++++++++++++++--
> > > > > > >  src/android/camera_device.h   |  9 +++-
> > > > > > >  src/android/camera_ops.cpp    |  8 +++-
> > > > > > >  3 files changed, 100 insertions(+), 7 deletions(-)
> > > > > > >
> > > > > > > diff --git a/src/android/camera_device.cpp
> b/src/android/camera_device.cpp
> > > > > > > index 3fce14035718..899afaa49439 100644
> > > > > > > --- a/src/android/camera_device.cpp
> > > > > > > +++ b/src/android/camera_device.cpp
> > > > > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t
> *hardwareModule)
> > > > > > >
> > > > > > >  void CameraDevice::close()
> > > > > > >  {
> > > > > > > -     streams_.clear();
> > > > > > > +     MutexLocker cameraLock(cameraMutex_);
> > > >
> > > > I'd add a blank line here.
> > > >
> > > > > > > +     if (state_ == CameraFlushing) {
> > > >
> > > > As mentioned above, I don't think you need to protect against close()
> > > > and flush() racing each other.
> > > >
> > > > > > > +             flushed_.wait(cameraLock, [&] { return state_ !=
> CameraStopped; });
> > > > > > > +             camera_->release();
> > > > > > >
> > > > > > > +             return;
> > > > > > > +     }
> > > > > > > +
> > > > > > > +     streams_.clear();
> > > > > > >       stop();
> > > > > > >
> > > > > > >       camera_->release();
> > > > > > >  }
> > > > > > >
> > > > > > > -void CameraDevice::stop()
> > > > > > > +/*
> > > > > > > + * Flush is similar to stop() but sets the camera state to
> 'flushing' and wait
> > > >
> > > > s/wait/waits/
> > > >
> > > > > > > + * until all the in-flight requests have been returned before
> setting the
> > > > > > > + * camera state to stopped.
> > > > > > > + *
> > > > > > > + * Once flushing is done it unlocks concurrent calls to
> camera close() and
> > > > > > > + * configureStreams().
> > > > > > > + */
> > > > > > > +void CameraDevice::flush()
> > > > > > >  {
> > > > > > > +     {
> > > > > > > +             MutexLocker cameraLock(cameraMutex_);
> > > > > > > +
> > > > > > > +             if (state_ != CameraRunning)
> > > > > > > +                     return;
> > > > > > > +
> > > > > > > +             worker_.stop();
> > > > > > > +             camera_->stop();
> > > > > > > +             state_ = CameraFlushing;
> > > > > > > +     }
> > > > > > > +
> > > > > > > +     /*
> > > > > > > +      * Now wait for all the in-flight requests to be
> completed before
> > > > > > > +      * continuing. Stopping the Camera guarantees that all
> in-flight
> > > > > > > +      * requests are completed in error state.
> > > >
> > > > Do we need to wait ? Camera::stop() guarantees that all requests
> > > > complete synchronously with the stop() call.
> > >
> > > I didn't get the API that way... I thought after stop we would receive
> > > a sequence of failed requests... Actually I don't see anything that
> > > suggests that in camera.cpp or pipeline_handler.cpp apart from an
> assertion
> > > in Camera::stop()
> >
> > The camera::stop() documentation states
> >
> >  * This method stops capturing and processing requests immediately. All
> pending
> >  * requests are cancelled and complete synchronously in an error state.
> >
> > Is this ambiguous ?
>

Perhaps, should "and complete synchronously" be rephrased "and
Camera::requestComplete() is executed for all of them before this returns"?

>
> > > > Partly answering myself here, we'll have to wait for post-processing
> > > > tasks to complete once we'll process them in a separate thread, but
> that
> > > > will likely be handled by Thread::wait(). I don't think you need a
> > > > condition variable here. I'm I'm not mistaken, this should simplify
> the
> > > > implementation.
> > >
> > > If Camera::stop() is synchronous we don't need to wait indeed
> > >
> > > > > > > +      */
> > > > > > > +     {
> > > > > > > +             MutexLocker requestsLock(requestsMutex_);
> > > > > > > +             flushing_.wait(requestsLock, [&] { return
> descriptors_.empty(); });
> > > > > > > +     }
> > > > > >
> > > > > > I'm still uneasy about releasing the cameraMutex_ for this
> section. In
> > > > > > patch 6/8 you add it to protect the state_ variable but here it's
> > > > >
> > > > > I'm not changing state_ without the mutex acquired, am I ?
> > > > >
> > > > > > ignored. I see the ASSERT() added to stop() but the patter of
> taking the
> > > > > > lock checking state_, releasing the lock and do some work,
> retake the
> > > > > > lock and update state_ feels like a bad idea. Maybe I'm missing
> > > > >
> > > > > How so, apart from the fact it feels a bit unusual, I concur ?
> > > > >
> > > > > If I keep the held the mutex for the whole duration of flush no
> other
> > > > > concurrent method can proceed until all the queued requests have
> not
> > > > > been completed. While flush waits for the flushing_ condition to be
> > > > > signaled, processCaptureRequest() can proceed and immediately
> return
> > > > > the newly queued requests in error state by detecting state_ ==
> > > > > CameraFlushing which signals that flush in is progress.
> > > > > Otherwise it would have had to wait for flush to end. But then
> we're back
> > > > > to a situation where we could serialize all calls and that's it, we
> > > > > would be done with a single mutex to be held for the whole
> duration of
> > > > > all operations.
> > > > >
> > > > > If it only was for close() or configureStreams() we could have
> locked
> > > > > for the whole duration of flush(), as they anyway wait for flush to
> > > > > complete before proceeding (by waiting on the flushed_ condition
> here
> > > > > below signaled).
> > > > >
> > > > > > something and this is not a real problem, if so maybe we can
> capture
> > > > > > that in the comment here?
> > > > > >
> > > > > > > +
> > > > > > > +     /*
> > > > > > > +      * Set state to stopped and unlock close() or
> configureStreams() that
> > > > > > > +      * might be waiting for flush to be completed.
> > > > > > > +      */
> > > > > > >       MutexLocker cameraLock(cameraMutex_);
> > > > > > > +     state_ = CameraStopped;
> > > > > > > +     flushed_.notify_one();
> > > >
> > > > You should drop the lock before calling notify_one(). Otherwise
> you'll
> > > > wake up the task waiting on flushed_, which will try to lock
> > > > cameraMutex_, which will block immediately. The scheduler will have
> to
> > > > reschedule this task for the function to return and the lock to be
> > > > released before the waiter can proceed. That works, but isn't very
> > > > efficient.
> > >
> > > Weird, the cpp reference shows example about notify_one where the
> > > caller always has the mutex held locked, but I see your point and
> > > seems correct..
> >
> > I'm looking at
> > https://en.cppreference.com/w/cpp/thread/condition_variable and
> > https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one
> > and both calls to notify_one() in the example are made without the lock
> > held, aren't they ?
> >
> > > >
> > > >     {
> > > >             MutexLocker cameraLock(cameraMutex_);
> > > >             state_ = CameraStopped;
> > > >     }
> > > >
> > > >     flushed_.notify_one();
> > > >
> > >
> > > So I could change to this one, if I don't have to drop this part
> > > completely if we consider close() and configureStreams() not as
> > > possible races...
> > >
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* Calls to stop() must be protected by cameraMutex_ being
> held by the caller. */
> > > > > > > +void CameraDevice::stop()
> > > > > > > +{
> > > > > > > +     ASSERT(state_ != CameraFlushing);
> > > > > > > +
> > > > > > >       if (state_ == CameraStopped)
> > > > > > >               return;
> > > > > > >
> > > > > > > @@ -1581,8 +1630,18 @@ PixelFormat
> CameraDevice::toPixelFormat(int format) const
> > > > > > >   */
> > > > > > >  int
> CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > > > > > >  {
> > > > > > > -     /* Before any configuration attempt, stop the camera. */
> > > > > > > -     stop();
> > > > > > > +     {
> > > > > > > +             /*
> > > > > > > +              * If a flush is in progress, wait for it to
> complete and to
> > > > > > > +              * stop the camera, otherwise before any new
> configuration
> > > > > > > +              * attempt we have to stop the camera explictely.
> > > > > > > +              */
> > > >
> > > > Same here, I don't think flush() and configure_streams() can race
> each
> > > > other. I believe the only possible race to be between flush() and
> > > > process_capture_request().
> > >
> > > Ditto.
> > >
> > > > > > > +             MutexLocker cameraLock(cameraMutex_);
> > > > > > > +             if (state_ == CameraFlushing)
> > > > > > > +                     flushed_.wait(cameraLock, [&] { return
> state_ != CameraStopped; });
> > > > > > > +             else
> > > > > > > +                     stop();
> > > > > > > +     }
> > > > > > >
> > > > > > >       if (stream_list->num_streams == 0) {
> > > > > > >               LOG(HAL, Error) << "No streams in configuration";
> > > > > > > @@ -1950,6 +2009,25 @@ int
> CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
> > > > > > >       if (ret)
> > > > > > >               return ret;
> > > > > > >
> > > > > > > +     /*
> > > > > > > +      * Just before queuing the request, make sure flush()
> has not
> > > > > > > +      * been called after this function has been executed. In
> that
> > > > > > > +      * case, immediately return the request with errors.
> > > > > > > +      */
> > > > > > > +     MutexLocker cameraLock(cameraMutex_);
> > > > > > > +     if (state_ == CameraFlushing || state_ == CameraStopped)
> {
> > > > > > > +             for (camera3_stream_buffer_t &buffer :
> descriptor.buffers_) {
> > > > > > > +                     buffer.status =
> CAMERA3_BUFFER_STATUS_ERROR;
> > > > > > > +                     buffer.release_fence =
> buffer.acquire_fence;
> > > > > > > +             }
> > > > > > > +
> > > > > > > +             notifyError(descriptor.frameNumber_,
> > > > > > > +                         descriptor.buffers_[0].stream,
> > > >
> > > > As commented on a previous patch, I think you should pass nullptr for
> > > > the stream here.
> > >
> > > The "S6. Error management:" section of the camera3.h header does not
> > > mention that, not the ?
> >
> > Indeed, that section doesn't mention the camera3_error_msg::error_stream
> > field at all. The field is documented in the structure as
> >
> >     /**
> >      * Pointer to the stream that had a failure. NULL if the stream isn't
> >      * applicable to the error.
> >      */
> >
> > The question is thus when the stream is applicable to the error. The
> > documentation of enum camera3_error_msg_code mentions error_stream in
> > the CAMERA3_MSG_ERROR_BUFFER case only. The other errors are related to
> > the device, the request or the result metadata, which are not specific
> > to a stream.
> >
> > > where does you suggestion come from ? I don't find any reference in
> > > the review of [1/8]
> >
> > ([PATCH v3 1/8] android: Rework request completion notification'
> > (YKqV6Iik2sN3XUEf at pendragon.ideasonboard.com)
> >
> > > > > > > +                         CAMERA3_MSG_ERROR_REQUEST);
> > > > > > > +
> > > > > > > +             return 0;
> > > > > > > +     }
> > > > > > > +
> > > > > > >       worker_.queueRequest(descriptor.request_.get());
> > > > > > >
> > > > > > >       {
> > > > > > > @@ -1979,6 +2057,10 @@ void
> CameraDevice::requestComplete(Request *request)
> > > > > > >                       return;
> > > > > > >               }
> > > > > > >
> > > > > > > +             /* Release flush if all the pending requests
> have been completed. */
> > > > > > > +             if (descriptors_.empty())
> > > > > > > +                     flushing_.notify_one();
> > > >
> > > > This will never happen, as you can only get here if
> descriptors_.find()
> > > > has found the descriptor. Did you mean to do this after the extract()
> > > > call below ?
> > >
> > > Ugh. This works only because Camera::stop() is synchronous then ?
> >
> > I believe so.
> >
> > > > > > > +
> > > > > > >               node = descriptors_.extract(it);
> > > > > > >       }
> > > > > > >       Camera3RequestDescriptor &descriptor = node.mapped();
> > > > > > > diff --git a/src/android/camera_device.h
> b/src/android/camera_device.h
> > > > > > > index 7cf8e8370387..e1b3bf7d30f2 100644
> > > > > > > --- a/src/android/camera_device.h
> > > > > > > +++ b/src/android/camera_device.h
> > > > > > > @@ -7,6 +7,7 @@
> > > > > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__
> > > > > > >  #define __ANDROID_CAMERA_DEVICE_H__
> > > > > > >
> > > > > > > +#include <condition_variable>
> > > > > > >  #include <map>
> > > > > > >  #include <memory>
> > > > > > >  #include <mutex>
> > > > > > > @@ -42,6 +43,7 @@ public:
> > > > > > >
> > > > > > >       int open(const hw_module_t *hardwareModule);
> > > > > > >       void close();
> > > > > > > +     void flush();
> > > > > > >
> > > > > > >       unsigned int id() const { return id_; }
> > > > > > >       camera3_device_t *camera3Device() { return
> &camera3Device_; }
> > > > > > > @@ -92,6 +94,7 @@ private:
> > > > > > >       enum State {
> > > > > > >               CameraStopped,
> > > > > > >               CameraRunning,
> > > > > > > +             CameraFlushing,
> > > > > > >       };
> > > > > > >
> > > > > > >       void stop();
> > > > > > > @@ -120,8 +123,9 @@ private:
> > > > > > >
> > > > > > >       CameraWorker worker_;
> > > > > > >
> > > > > > > -     libcamera::Mutex cameraMutex_; /* Protects access to the
> camera state. */
> > > > > > > +     libcamera::Mutex cameraMutex_; /* Protects the camera
> state and flushed_. */
> > > > > > >       State state_;
> > > > > > > +     std::condition_variable flushed_;
> > > > > > >
> > > > > > >       std::shared_ptr<libcamera::Camera> camera_;
> > > > > > >       std::unique_ptr<libcamera::CameraConfiguration> config_;
> > > > > > > @@ -134,8 +138,9 @@ private:
> > > > > > >       std::map<int, libcamera::PixelFormat> formatsMap_;
> > > > > > >       std::vector<CameraStream> streams_;
> > > > > > >
> > > > > > > -     libcamera::Mutex requestsMutex_; /* Protects
> descriptors_. */
> > > > > > > +     libcamera::Mutex requestsMutex_; /* Protects
> descriptors_ and flushing_. */
> > > > > > >       std::map<uint64_t, Camera3RequestDescriptor>
> descriptors_;
> > > > > > > +     std::condition_variable flushing_;
> > > > > > >
> > > > > > >       std::string maker_;
> > > > > > >       std::string model_;
> > > > > > > diff --git a/src/android/camera_ops.cpp
> b/src/android/camera_ops.cpp
> > > > > > > index 696e80436821..8a3cfa175ff5 100644
> > > > > > > --- a/src/android/camera_ops.cpp
> > > > > > > +++ b/src/android/camera_ops.cpp
> > > > > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]]
> const struct camera3_device *dev,
> > > > > > >  {
> > > > > > >  }
> > > > > > >
> > > > > > > -static int hal_dev_flush([[maybe_unused]] const struct
> camera3_device *dev)
> > > > > > > +static int hal_dev_flush(const struct camera3_device *dev)
> > > > > > >  {
> > > > > > > +     if (!dev)
> > > > > > > +             return -EINVAL;
> > > > > > > +
> > > > > > > +     CameraDevice *camera = reinterpret_cast<CameraDevice
> *>(dev->priv);
> > > > > > > +     camera->flush();
> > > > > > > +
> > > > > > >       return 0;
> > > > > > >  }
> > > > > > >
> >
> > --
> > Regards,
> >
> > Laurent Pinchart
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.libcamera.org/pipermail/libcamera-devel/attachments/20210527/8b8e0514/attachment-0001.htm>


More information about the libcamera-devel mailing list