<div dir="ltr"><div dir="ltr">Hi Jacopo, thank you for the patch.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, May 24, 2021 at 4:47 PM Jacopo Mondi <<a href="mailto:jacopo@jmondi.org">jacopo@jmondi.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Laurent,<br>

<br>

On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:<br>

> Hi Jacopo,<br>

><br>

> Thank you for the patch.<br>

><br>

> On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:<br>

> > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:<br>

> > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:<br>

> > > > Implement the flush() camera operation in the CameraDevice class<br>

> > > > and make it available to the camera framework by implementing the<br>

> > > > operation wrapper in camera_ops.cpp.<br>

> > > ><br>

> > > > The flush() implementation stops the Camera and the worker thread and<br>

> > > > waits for all in-flight requests to be returned. Stopping the Camera<br>

> > > > forces all Requests already queued to be returned immediately in error<br>

> > > > state. As flush() has to wait until all of them have been returned, make it<br>

> > > > wait on a newly introduced condition variable which is notified by the<br>

> > > > request completion handler when the queue of pending requests has been<br>

> > > > exhausted.<br>

> > > ><br>

> > > > As flush() can race with processCaptureRequest() protect the requests<br>

> > > > queueing by introducing a new CameraState::CameraFlushing state that<br>

> > > > processCaptureRequest() inspects before queuing the Request to the<br>

> > > > Camera. If flush() has been called while processCaptureRequest() was<br>

> > > > executing, return the current Request immediately in error state.<br>

> > > ><br>

> > > > Protect potentially concurrent calls to close() and configureStreams()<br>

><br>

> Can this happen ? Quoting camera3.h,<br>

><br>

>  * 12. Alternatively, the framework may call camera3_device_t->common->close()<br>

>  *    to end the camera session. This may be called at any time when no other<br>

>  *    calls from the framework are active, although the call may block until all<br>

>  *    in-flight captures have completed (all results returned, all buffers<br>

>  *    filled). After the close call returns, no more calls to the<br>

>  *    camera3_callback_ops_t functions are allowed from the HAL. Once the<br>

>  *    close() call is underway, the framework may not call any other HAL device<br>

>  *    functions.<br>

><br>

> The important part is "when no other calss from the framework are<br>

> active". I don't think we need to handle close() racing with anything<br>

> else than process_capture_request().<br>

<br>

I've been discussing this with Hiro during v1, as initially I didn't<br>

consider close() and configureStreams().<br>

<br>

<a href="https://patchwork.libcamera.org/patch/12248/#16884" rel="noreferrer" target="_blank">https://patchwork.libcamera.org/patch/12248/#16884</a><br>

<br>

I initially only considered processCaptureRequest() as a potential<br>

race, but got suggested differently by the cros camera team.<br>

<br>

<br>

><br>

> > > > by inspecting the CameraState, and force a wait for any flush() call<br>

> > > > to complete before proceeding.<br>

> > > ><br>

> > > > Signed-off-by: Jacopo Mondi <<a href="mailto:jacopo@jmondi.org" target="_blank">jacopo@jmondi.org</a>><br>

> > > > ---<br>

> > > >  src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--<br>

> > > >  src/android/camera_device.h   |  9 +++-<br>

> > > >  src/android/camera_ops.cpp    |  8 +++-<br>

> > > >  3 files changed, 100 insertions(+), 7 deletions(-)<br>

> > > ><br>

> > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp<br>

> > > > index 3fce14035718..899afaa49439 100644<br>

> > > > --- a/src/android/camera_device.cpp<br>

> > > > +++ b/src/android/camera_device.cpp<br>

> > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)<br>

> > > ><br>

> > > >  void CameraDevice::close()<br>

> > > >  {<br>

> > > > -       streams_.clear();<br>

> > > > +       MutexLocker cameraLock(cameraMutex_);<br>

><br>

> I'd add a blank line here.<br>

><br>

> > > > +       if (state_ == CameraFlushing) {<br>

><br>

> As mentioned above, I don't think you need to protect against close()<br>

> and flush() racing each other.<br>

><br>

> > > > +               flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });<br>

> > > > +               camera_->release();<br>

> > > ><br>

> > > > +               return;<br>

> > > > +       }<br>

> > > > +<br>

> > > > +       streams_.clear();<br>

> > > >         stop();<br>

> > > ><br>

> > > >         camera_->release();<br>

> > > >  }<br>

> > > ><br>

> > > > -void CameraDevice::stop()<br>

> > > > +/*<br>

> > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait<br>

><br>

> s/wait/waits/<br>

><br>

> > > > + * until all the in-flight requests have been returned before setting the<br>

> > > > + * camera state to stopped.<br>

> > > > + *<br>

> > > > + * Once flushing is done it unlocks concurrent calls to camera close() and<br>

> > > > + * configureStreams().<br>

> > > > + */<br>

> > > > +void CameraDevice::flush()<br>

> > > >  {<br>

> > > > +       {<br>

> > > > +               MutexLocker cameraLock(cameraMutex_);<br>

> > > > +<br>

> > > > +               if (state_ != CameraRunning)<br>

> > > > +                       return;<br>

> > > > +<br>

> > > > +               worker_.stop();<br>

> > > > +               camera_->stop();<br>

> > > > +               state_ = CameraFlushing;<br>

> > > > +       }<br>

> > > > +<br>

> > > > +       /*<br>

> > > > +        * Now wait for all the in-flight requests to be completed before<br>

> > > > +        * continuing. Stopping the Camera guarantees that all in-flight<br>

> > > > +        * requests are completed in error state.<br>

><br>

> Do we need to wait ? Camera::stop() guarantees that all requests<br>

> complete synchronously with the stop() call.<br>

<br>

I didn't get the API that way... I thought after stop we would receive<br>

a sequence of failed requests... Actually I don't see anything that<br>

suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion<br>

in Camera::stop()<br>

<br>

><br>

> Partly answering myself here, we'll have to wait for post-processing<br>

> tasks to complete once we'll process them in a separate thread, but that<br>

> will likely be handled by Thread::wait(). I don't think you need a<br>

> condition variable here. I'm I'm not mistaken, this should simplify the<br>

> implementation.<br>

<br>

If Camera::stop() is synchronous we don't need to wait indeed<br>

<br>

><br>

> > > > +        */<br>

> > > > +       {<br>

> > > > +               MutexLocker requestsLock(requestsMutex_);<br>

> > > > +               flushing_.wait(requestsLock, [&] { return descriptors_.empty(); });<br>

> > > > +       }<br>

> > ><br>

> > > I'm still uneasy about releasing the cameraMutex_ for this section. In<br>

> > > patch 6/8 you add it to protect the state_ variable but here it's<br>

> ><br>

> > I'm not changing state_ without the mutex acquired, am I ?<br>

> ><br>

> > > ignored. I see the ASSERT() added to stop() but the patter of taking the<br>

> > > lock checking state_, releasing the lock and do some work, retake the<br>

> > > lock and update state_ feels like a bad idea. Maybe I'm missing<br>

> ><br>

> > How so, apart from the fact it feels a bit unusual, I concur ?<br>

> ><br>

> > If I keep the held the mutex for the whole duration of flush no other<br>

> > concurrent method can proceed until all the queued requests have not<br>

> > been completed. While flush waits for the flushing_ condition to be<br>

> > signaled, processCaptureRequest() can proceed and immediately return<br>

> > the newly queued requests in error state by detecting state_ ==<br>

> > CameraFlushing which signals that flush in is progress.<br>

> > Otherwise it would have had to wait for flush to end. But then we're back<br>

> > to a situation where we could serialize all calls and that's it, we<br>

> > would be done with a single mutex to be held for the whole duration of<br>

> > all operations.<br>

> ><br>

> > If it only was for close() or configureStreams() we could have locked<br>

> > for the whole duration of flush(), as they anyway wait for flush to<br>

> > complete before proceeding (by waiting on the flushed_ condition here<br>

> > below signaled).<br>

> ><br>

> > > something and this is not a real problem, if so maybe we can capture<br>

> > > that in the comment here?<br>

> > ><br>

> > > > +<br>

> > > > +       /*<br>

> > > > +        * Set state to stopped and unlock close() or configureStreams() that<br>

> > > > +        * might be waiting for flush to be completed.<br>

> > > > +        */<br>

> > > >         MutexLocker cameraLock(cameraMutex_);<br>

> > > > +       state_ = CameraStopped;<br>

> > > > +       flushed_.notify_one();<br>

><br>

> You should drop the lock before calling notify_one(). Otherwise you'll<br>

> wake up the task waiting on flushed_, which will try to lock<br>

> cameraMutex_, which will block immediately. The scheduler will have to<br>

> reschedule this task for the function to return and the lock to be<br>

> released before the waiter can proceed. That works, but isn't very<br>

> efficient.<br>

<br>

Weird, the cpp reference shows example about notify_one where the<br>

caller always has the mutex held locked, but I see your point and<br>

seems correct..<br>

<br></blockquote><div><br></div><div>This is correct.</div><div><a href="https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one">https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one</a><br></div><div><span style="color:rgb(0,0,0);font-family:DejaVuSans,"DejaVu Sans",arial,sans-serif;font-size:12.8px;font-variant-ligatures:no-common-ligatures">> The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s);</span></div><div><br></div><div>I know that, but I haven't pointed it out because it is not false.</div><div>From the Laurent explanation, yeah, we should definitely avoid that. TIL.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

><br>

>       {<br>

>               MutexLocker cameraLock(cameraMutex_);<br>

>               state_ = CameraStopped;<br>

>       }<br>

><br>

>       flushed_.notify_one();<br>

><br>

<br>

So I could change to this one, if I don't have to drop this part<br>

completely if we consider close() and configureStreams() not as<br>

possible races...<br>

<br>

> > > > +}<br>

> > > > +<br>

> > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */<br>

> > > > +void CameraDevice::stop()<br>

> > > > +{<br>

> > > > +       ASSERT(state_ != CameraFlushing);<br>

> > > > +<br>

> > > >         if (state_ == CameraStopped)<br>

> > > >                 return;<br>

> > > ><br>

> > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const<br>

> > > >   */<br>

> > > >  int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)<br>

> > > >  {<br>

> > > > -       /* Before any configuration attempt, stop the camera. */<br>

> > > > -       stop();<br>

> > > > +       {<br>

> > > > +               /*<br>

> > > > +                * If a flush is in progress, wait for it to complete and to<br>

> > > > +                * stop the camera, otherwise before any new configuration<br>

> > > > +                * attempt we have to stop the camera explictely.<br>

> > > > +                */<br>

><br>

> Same here, I don't think flush() and configure_streams() can race each<br>

> other. I believe the only possible race to be between flush() and<br>

> process_capture_request().<br>

><br>

<br>

Ditto.<br>

<br>

> > > > +               MutexLocker cameraLock(cameraMutex_);<br>

> > > > +               if (state_ == CameraFlushing)<br>

> > > > +                       flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });<br>

> > > > +               else<br>

> > > > +                       stop();<br>

> > > > +       }<br>

> > > ><br>

> > > >         if (stream_list->num_streams == 0) {<br>

> > > >                 LOG(HAL, Error) << "No streams in configuration";<br>

> > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques<br>

> > > >         if (ret)<br>

> > > >                 return ret;<br>

> > > ><br>

> > > > +       /*<br>

> > > > +        * Just before queuing the request, make sure flush() has not<br>

> > > > +        * been called after this function has been executed. In that<br>

> > > > +        * case, immediately return the request with errors.<br>

> > > > +        */<br>

> > > > +       MutexLocker cameraLock(cameraMutex_);<br>

> > > > +       if (state_ == CameraFlushing || state_ == CameraStopped) {<br>

> > > > +               for (camera3_stream_buffer_t &buffer : descriptor.buffers_) {<br>

> > > > +                       buffer.status = CAMERA3_BUFFER_STATUS_ERROR;<br>

> > > > +                       buffer.release_fence = buffer.acquire_fence;<br>

> > > > +               }<br>

> > > > +<br>

> > > > +               notifyError(descriptor.frameNumber_,<br>

> > > > +                           descriptor.buffers_[0].stream,<br>

><br>

> As commented on a previous patch, I think you should pass nullptr for<br>

> the stream here.<br>

><br>

<br>

The "S6. Error management:" section of the camera3.h header does not<br>

mention that, not the ? where does you suggestion come from ? I don't find<br>

any reference in the review of [1/8]<br>

<br>

<br>

> > > > +                           CAMERA3_MSG_ERROR_REQUEST);<br>

> > > > +<br>

> > > > +               return 0;<br>

> > > > +       }<br>

> > > > +<br>

> > > >         worker_.queueRequest(descriptor.request_.get());<br>

> > > ><br>

> > > >         {<br>

> > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)<br>

> > > >                         return;<br>

> > > >                 }<br>

> > > ><br>

> > > > +               /* Release flush if all the pending requests have been completed. */<br>

> > > > +               if (descriptors_.empty())<br>

> > > > +                       flushing_.notify_one();<br>

><br>

> This will never happen, as you can only get here if descriptors_.find()<br>

> has found the descriptor. Did you mean to do this after the extract()<br>

> call below ?<br>

<br>

Ugh. This works only because Camera::stop() is synchronous then ?<br>

<br></blockquote><div><br></div><div>Ah, good catch!</div><div><br></div><div>With this fix, the code looks good to me.</div><div>I am happy to ongoingly join the discussion.</div><div><br></div><div>Reviewed-by: Hirokazu Honda <<a href="mailto:hiroh@chromium.org">hiroh@chromium.org</a>></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

><br>

> > > > +<br>

> > > >                 node = descriptors_.extract(it);<br>

> > > >         }<br>

> > > >         Camera3RequestDescriptor &descriptor = node.mapped();<br>

> > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h<br>

> > > > index 7cf8e8370387..e1b3bf7d30f2 100644<br>

> > > > --- a/src/android/camera_device.h<br>

> > > > +++ b/src/android/camera_device.h<br>

> > > > @@ -7,6 +7,7 @@<br>

> > > >  #ifndef __ANDROID_CAMERA_DEVICE_H__<br>

> > > >  #define __ANDROID_CAMERA_DEVICE_H__<br>

> > > ><br>

> > > > +#include <condition_variable><br>

> > > >  #include <map><br>

> > > >  #include <memory><br>

> > > >  #include <mutex><br>

> > > > @@ -42,6 +43,7 @@ public:<br>

> > > ><br>

> > > >         int open(const hw_module_t *hardwareModule);<br>

> > > >         void close();<br>

> > > > +       void flush();<br>

> > > ><br>

> > > >         unsigned int id() const { return id_; }<br>

> > > >         camera3_device_t *camera3Device() { return &camera3Device_; }<br>

> > > > @@ -92,6 +94,7 @@ private:<br>

> > > >         enum State {<br>

> > > >                 CameraStopped,<br>

> > > >                 CameraRunning,<br>

> > > > +               CameraFlushing,<br>

> > > >         };<br>

> > > ><br>

> > > >         void stop();<br>

> > > > @@ -120,8 +123,9 @@ private:<br>

> > > ><br>

> > > >         CameraWorker worker_;<br>

> > > ><br>

> > > > -       libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */<br>

> > > > +       libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */<br>

> > > >         State state_;<br>

> > > > +       std::condition_variable flushed_;<br>

> > > ><br>

> > > >         std::shared_ptr<libcamera::Camera> camera_;<br>

> > > >         std::unique_ptr<libcamera::CameraConfiguration> config_;<br>

> > > > @@ -134,8 +138,9 @@ private:<br>

> > > >         std::map<int, libcamera::PixelFormat> formatsMap_;<br>

> > > >         std::vector<CameraStream> streams_;<br>

> > > ><br>

> > > > -       libcamera::Mutex requestsMutex_; /* Protects descriptors_. */<br>

> > > > +       libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */<br>

> > > >         std::map<uint64_t, Camera3RequestDescriptor> descriptors_;<br>

> > > > +       std::condition_variable flushing_;<br>

> > > ><br>

> > > >         std::string maker_;<br>

> > > >         std::string model_;<br>

> > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp<br>

> > > > index 696e80436821..8a3cfa175ff5 100644<br>

> > > > --- a/src/android/camera_ops.cpp<br>

> > > > +++ b/src/android/camera_ops.cpp<br>

> > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,<br>

> > > >  {<br>

> > > >  }<br>

> > > ><br>

> > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)<br>

> > > > +static int hal_dev_flush(const struct camera3_device *dev)<br>

> > > >  {<br>

> > > > +       if (!dev)<br>

> > > > +               return -EINVAL;<br>

> > > > +<br>

> > > > +       CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);<br>

> > > > +       camera->flush();<br>

> > > > +<br>

> > > >         return 0;<br>

> > > >  }<br>

> > > ><br>

><br>

> --<br>

> Regards,<br>

><br>

> Laurent Pinchart<br>

</blockquote></div></div>