<div dir="ltr"><div dir="ltr">Hi Jacopo, thank you for the patch.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, May 24, 2021 at 4:47 PM Jacopo Mondi <<a href="mailto:jacopo@jmondi.org">jacopo@jmondi.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Laurent,<br>
<br>
On Sun, May 23, 2021 at 09:50:46PM +0300, Laurent Pinchart wrote:<br>
> Hi Jacopo,<br>
><br>
> Thank you for the patch.<br>
><br>
> On Sun, May 23, 2021 at 04:22:51PM +0200, Jacopo Mondi wrote:<br>
> > On Sat, May 22, 2021 at 11:55:36AM +0200, Niklas Söderlund wrote:<br>
> > > On 2021-05-21 17:42:27 +0200, Jacopo Mondi wrote:<br>
> > > > Implement the flush() camera operation in the CameraDevice class<br>
> > > > and make it available to the camera framework by implementing the<br>
> > > > operation wrapper in camera_ops.cpp.<br>
> > > ><br>
> > > > The flush() implementation stops the Camera and the worker thread and<br>
> > > > waits for all in-flight requests to be returned. Stopping the Camera<br>
> > > > forces all Requests already queued to be returned immediately in error<br>
> > > > state. As flush() has to wait until all of them have been returned, make it<br>
> > > > wait on a newly introduced condition variable which is notified by the<br>
> > > > request completion handler when the queue of pending requests has been<br>
> > > > exhausted.<br>
> > > ><br>
> > > > As flush() can race with processCaptureRequest() protect the requests<br>
> > > > queueing by introducing a new CameraState::CameraFlushing state that<br>
> > > > processCaptureRequest() inspects before queuing the Request to the<br>
> > > > Camera. If flush() has been called while processCaptureRequest() was<br>
> > > > executing, return the current Request immediately in error state.<br>
> > > ><br>
> > > > Protect potentially concurrent calls to close() and configureStreams()<br>
><br>
> Can this happen ? Quoting camera3.h,<br>
><br>
> * 12. Alternatively, the framework may call camera3_device_t->common->close()<br>
> * to end the camera session. This may be called at any time when no other<br>
> * calls from the framework are active, although the call may block until all<br>
> * in-flight captures have completed (all results returned, all buffers<br>
> * filled). After the close call returns, no more calls to the<br>
> * camera3_callback_ops_t functions are allowed from the HAL. Once the<br>
> * close() call is underway, the framework may not call any other HAL device<br>
> * functions.<br>
><br>
> The important part is "when no other calss from the framework are<br>
> active". I don't think we need to handle close() racing with anything<br>
> else than process_capture_request().<br>
<br>
I've been discussing this with Hiro during v1, as initially I didn't<br>
consider close() and configureStreams().<br>
<br>
<a href="https://patchwork.libcamera.org/patch/12248/#16884" rel="noreferrer" target="_blank">https://patchwork.libcamera.org/patch/12248/#16884</a><br>
<br>
I initially only considered processCaptureRequest() as a potential<br>
race, but got suggested differently by the cros camera team.<br>
<br>
<br>
><br>
> > > > by inspecting the CameraState, and force a wait for any flush() call<br>
> > > > to complete before proceeding.<br>
> > > ><br>
> > > > Signed-off-by: Jacopo Mondi <<a href="mailto:jacopo@jmondi.org" target="_blank">jacopo@jmondi.org</a>><br>
> > > > ---<br>
> > > > src/android/camera_device.cpp | 90 +++++++++++++++++++++++++++++++++--<br>
> > > > src/android/camera_device.h | 9 +++-<br>
> > > > src/android/camera_ops.cpp | 8 +++-<br>
> > > > 3 files changed, 100 insertions(+), 7 deletions(-)<br>
> > > ><br>
> > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp<br>
> > > > index 3fce14035718..899afaa49439 100644<br>
> > > > --- a/src/android/camera_device.cpp<br>
> > > > +++ b/src/android/camera_device.cpp<br>
> > > > @@ -750,16 +750,65 @@ int CameraDevice::open(const hw_module_t *hardwareModule)<br>
> > > ><br>
> > > > void CameraDevice::close()<br>
> > > > {<br>
> > > > - streams_.clear();<br>
> > > > + MutexLocker cameraLock(cameraMutex_);<br>
><br>
> I'd add a blank line here.<br>
><br>
> > > > + if (state_ == CameraFlushing) {<br>
><br>
> As mentioned above, I don't think you need to protect against close()<br>
> and flush() racing each other.<br>
><br>
> > > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });<br>
> > > > + camera_->release();<br>
> > > ><br>
> > > > + return;<br>
> > > > + }<br>
> > > > +<br>
> > > > + streams_.clear();<br>
> > > > stop();<br>
> > > ><br>
> > > > camera_->release();<br>
> > > > }<br>
> > > ><br>
> > > > -void CameraDevice::stop()<br>
> > > > +/*<br>
> > > > + * Flush is similar to stop() but sets the camera state to 'flushing' and wait<br>
><br>
> s/wait/waits/<br>
><br>
> > > > + * until all the in-flight requests have been returned before setting the<br>
> > > > + * camera state to stopped.<br>
> > > > + *<br>
> > > > + * Once flushing is done it unlocks concurrent calls to camera close() and<br>
> > > > + * configureStreams().<br>
> > > > + */<br>
> > > > +void CameraDevice::flush()<br>
> > > > {<br>
> > > > + {<br>
> > > > + MutexLocker cameraLock(cameraMutex_);<br>
> > > > +<br>
> > > > + if (state_ != CameraRunning)<br>
> > > > + return;<br>
> > > > +<br>
> > > > + worker_.stop();<br>
> > > > + camera_->stop();<br>
> > > > + state_ = CameraFlushing;<br>
> > > > + }<br>
> > > > +<br>
> > > > + /*<br>
> > > > + * Now wait for all the in-flight requests to be completed before<br>
> > > > + * continuing. Stopping the Camera guarantees that all in-flight<br>
> > > > + * requests are completed in error state.<br>
><br>
> Do we need to wait ? Camera::stop() guarantees that all requests<br>
> complete synchronously with the stop() call.<br>
<br>
I didn't get the API that way... I thought after stop we would receive<br>
a sequence of failed requests... Actually I don't see anything that<br>
suggests that in camera.cpp or pipeline_handler.cpp apart from an assertion<br>
in Camera::stop()<br>
<br>
><br>
> Partly answering myself here, we'll have to wait for post-processing<br>
> tasks to complete once we'll process them in a separate thread, but that<br>
> will likely be handled by Thread::wait(). I don't think you need a<br>
> condition variable here. I'm I'm not mistaken, this should simplify the<br>
> implementation.<br>
<br>
If Camera::stop() is synchronous we don't need to wait indeed<br>
<br>
><br>
> > > > + */<br>
> > > > + {<br>
> > > > + MutexLocker requestsLock(requestsMutex_);<br>
> > > > + flushing_.wait(requestsLock, [&] { return descriptors_.empty(); });<br>
> > > > + }<br>
> > ><br>
> > > I'm still uneasy about releasing the cameraMutex_ for this section. In<br>
> > > patch 6/8 you add it to protect the state_ variable but here it's<br>
> ><br>
> > I'm not changing state_ without the mutex acquired, am I ?<br>
> ><br>
> > > ignored. I see the ASSERT() added to stop() but the patter of taking the<br>
> > > lock checking state_, releasing the lock and do some work, retake the<br>
> > > lock and update state_ feels like a bad idea. Maybe I'm missing<br>
> ><br>
> > How so, apart from the fact it feels a bit unusual, I concur ?<br>
> ><br>
> > If I keep the held the mutex for the whole duration of flush no other<br>
> > concurrent method can proceed until all the queued requests have not<br>
> > been completed. While flush waits for the flushing_ condition to be<br>
> > signaled, processCaptureRequest() can proceed and immediately return<br>
> > the newly queued requests in error state by detecting state_ ==<br>
> > CameraFlushing which signals that flush in is progress.<br>
> > Otherwise it would have had to wait for flush to end. But then we're back<br>
> > to a situation where we could serialize all calls and that's it, we<br>
> > would be done with a single mutex to be held for the whole duration of<br>
> > all operations.<br>
> ><br>
> > If it only was for close() or configureStreams() we could have locked<br>
> > for the whole duration of flush(), as they anyway wait for flush to<br>
> > complete before proceeding (by waiting on the flushed_ condition here<br>
> > below signaled).<br>
> ><br>
> > > something and this is not a real problem, if so maybe we can capture<br>
> > > that in the comment here?<br>
> > ><br>
> > > > +<br>
> > > > + /*<br>
> > > > + * Set state to stopped and unlock close() or configureStreams() that<br>
> > > > + * might be waiting for flush to be completed.<br>
> > > > + */<br>
> > > > MutexLocker cameraLock(cameraMutex_);<br>
> > > > + state_ = CameraStopped;<br>
> > > > + flushed_.notify_one();<br>
><br>
> You should drop the lock before calling notify_one(). Otherwise you'll<br>
> wake up the task waiting on flushed_, which will try to lock<br>
> cameraMutex_, which will block immediately. The scheduler will have to<br>
> reschedule this task for the function to return and the lock to be<br>
> released before the waiter can proceed. That works, but isn't very<br>
> efficient.<br>
<br>
Weird, the cpp reference shows example about notify_one where the<br>
caller always has the mutex held locked, but I see your point and<br>
seems correct..<br>
<br></blockquote><div><br></div><div>This is correct.</div><div><a href="https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one">https://en.cppreference.com/w/cpp/thread/condition_variable/notify_one</a><br></div><div><span style="color:rgb(0,0,0);font-family:DejaVuSans,"DejaVu Sans",arial,sans-serif;font-size:12.8px;font-variant-ligatures:no-common-ligatures">> The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s);</span></div><div><br></div><div>I know that, but I haven't pointed it out because it is not false.</div><div>From the Laurent explanation, yeah, we should definitely avoid that. TIL.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
><br>
> {<br>
> MutexLocker cameraLock(cameraMutex_);<br>
> state_ = CameraStopped;<br>
> }<br>
><br>
> flushed_.notify_one();<br>
><br>
<br>
So I could change to this one, if I don't have to drop this part<br>
completely if we consider close() and configureStreams() not as<br>
possible races...<br>
<br>
> > > > +}<br>
> > > > +<br>
> > > > +/* Calls to stop() must be protected by cameraMutex_ being held by the caller. */<br>
> > > > +void CameraDevice::stop()<br>
> > > > +{<br>
> > > > + ASSERT(state_ != CameraFlushing);<br>
> > > > +<br>
> > > > if (state_ == CameraStopped)<br>
> > > > return;<br>
> > > ><br>
> > > > @@ -1581,8 +1630,18 @@ PixelFormat CameraDevice::toPixelFormat(int format) const<br>
> > > > */<br>
> > > > int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)<br>
> > > > {<br>
> > > > - /* Before any configuration attempt, stop the camera. */<br>
> > > > - stop();<br>
> > > > + {<br>
> > > > + /*<br>
> > > > + * If a flush is in progress, wait for it to complete and to<br>
> > > > + * stop the camera, otherwise before any new configuration<br>
> > > > + * attempt we have to stop the camera explictely.<br>
> > > > + */<br>
><br>
> Same here, I don't think flush() and configure_streams() can race each<br>
> other. I believe the only possible race to be between flush() and<br>
> process_capture_request().<br>
><br>
<br>
Ditto.<br>
<br>
> > > > + MutexLocker cameraLock(cameraMutex_);<br>
> > > > + if (state_ == CameraFlushing)<br>
> > > > + flushed_.wait(cameraLock, [&] { return state_ != CameraStopped; });<br>
> > > > + else<br>
> > > > + stop();<br>
> > > > + }<br>
> > > ><br>
> > > > if (stream_list->num_streams == 0) {<br>
> > > > LOG(HAL, Error) << "No streams in configuration";<br>
> > > > @@ -1950,6 +2009,25 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques<br>
> > > > if (ret)<br>
> > > > return ret;<br>
> > > ><br>
> > > > + /*<br>
> > > > + * Just before queuing the request, make sure flush() has not<br>
> > > > + * been called after this function has been executed. In that<br>
> > > > + * case, immediately return the request with errors.<br>
> > > > + */<br>
> > > > + MutexLocker cameraLock(cameraMutex_);<br>
> > > > + if (state_ == CameraFlushing || state_ == CameraStopped) {<br>
> > > > + for (camera3_stream_buffer_t &buffer : descriptor.buffers_) {<br>
> > > > + buffer.status = CAMERA3_BUFFER_STATUS_ERROR;<br>
> > > > + buffer.release_fence = buffer.acquire_fence;<br>
> > > > + }<br>
> > > > +<br>
> > > > + notifyError(descriptor.frameNumber_,<br>
> > > > + descriptor.buffers_[0].stream,<br>
><br>
> As commented on a previous patch, I think you should pass nullptr for<br>
> the stream here.<br>
><br>
<br>
The "S6. Error management:" section of the camera3.h header does not<br>
mention that, not the ? where does you suggestion come from ? I don't find<br>
any reference in the review of [1/8]<br>
<br>
<br>
> > > > + CAMERA3_MSG_ERROR_REQUEST);<br>
> > > > +<br>
> > > > + return 0;<br>
> > > > + }<br>
> > > > +<br>
> > > > worker_.queueRequest(descriptor.request_.get());<br>
> > > ><br>
> > > > {<br>
> > > > @@ -1979,6 +2057,10 @@ void CameraDevice::requestComplete(Request *request)<br>
> > > > return;<br>
> > > > }<br>
> > > ><br>
> > > > + /* Release flush if all the pending requests have been completed. */<br>
> > > > + if (descriptors_.empty())<br>
> > > > + flushing_.notify_one();<br>
><br>
> This will never happen, as you can only get here if descriptors_.find()<br>
> has found the descriptor. Did you mean to do this after the extract()<br>
> call below ?<br>
<br>
Ugh. This works only because Camera::stop() is synchronous then ?<br>
<br></blockquote><div><br></div><div>Ah, good catch!</div><div><br></div><div>With this fix, the code looks good to me.</div><div>I am happy to ongoingly join the discussion.</div><div><br></div><div>Reviewed-by: Hirokazu Honda <<a href="mailto:hiroh@chromium.org">hiroh@chromium.org</a>></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
><br>
> > > > +<br>
> > > > node = descriptors_.extract(it);<br>
> > > > }<br>
> > > > Camera3RequestDescriptor &descriptor = node.mapped();<br>
> > > > diff --git a/src/android/camera_device.h b/src/android/camera_device.h<br>
> > > > index 7cf8e8370387..e1b3bf7d30f2 100644<br>
> > > > --- a/src/android/camera_device.h<br>
> > > > +++ b/src/android/camera_device.h<br>
> > > > @@ -7,6 +7,7 @@<br>
> > > > #ifndef __ANDROID_CAMERA_DEVICE_H__<br>
> > > > #define __ANDROID_CAMERA_DEVICE_H__<br>
> > > ><br>
> > > > +#include <condition_variable><br>
> > > > #include <map><br>
> > > > #include <memory><br>
> > > > #include <mutex><br>
> > > > @@ -42,6 +43,7 @@ public:<br>
> > > ><br>
> > > > int open(const hw_module_t *hardwareModule);<br>
> > > > void close();<br>
> > > > + void flush();<br>
> > > ><br>
> > > > unsigned int id() const { return id_; }<br>
> > > > camera3_device_t *camera3Device() { return &camera3Device_; }<br>
> > > > @@ -92,6 +94,7 @@ private:<br>
> > > > enum State {<br>
> > > > CameraStopped,<br>
> > > > CameraRunning,<br>
> > > > + CameraFlushing,<br>
> > > > };<br>
> > > ><br>
> > > > void stop();<br>
> > > > @@ -120,8 +123,9 @@ private:<br>
> > > ><br>
> > > > CameraWorker worker_;<br>
> > > ><br>
> > > > - libcamera::Mutex cameraMutex_; /* Protects access to the camera state. */<br>
> > > > + libcamera::Mutex cameraMutex_; /* Protects the camera state and flushed_. */<br>
> > > > State state_;<br>
> > > > + std::condition_variable flushed_;<br>
> > > ><br>
> > > > std::shared_ptr<libcamera::Camera> camera_;<br>
> > > > std::unique_ptr<libcamera::CameraConfiguration> config_;<br>
> > > > @@ -134,8 +138,9 @@ private:<br>
> > > > std::map<int, libcamera::PixelFormat> formatsMap_;<br>
> > > > std::vector<CameraStream> streams_;<br>
> > > ><br>
> > > > - libcamera::Mutex requestsMutex_; /* Protects descriptors_. */<br>
> > > > + libcamera::Mutex requestsMutex_; /* Protects descriptors_ and flushing_. */<br>
> > > > std::map<uint64_t, Camera3RequestDescriptor> descriptors_;<br>
> > > > + std::condition_variable flushing_;<br>
> > > ><br>
> > > > std::string maker_;<br>
> > > > std::string model_;<br>
> > > > diff --git a/src/android/camera_ops.cpp b/src/android/camera_ops.cpp<br>
> > > > index 696e80436821..8a3cfa175ff5 100644<br>
> > > > --- a/src/android/camera_ops.cpp<br>
> > > > +++ b/src/android/camera_ops.cpp<br>
> > > > @@ -66,8 +66,14 @@ static void hal_dev_dump([[maybe_unused]] const struct camera3_device *dev,<br>
> > > > {<br>
> > > > }<br>
> > > ><br>
> > > > -static int hal_dev_flush([[maybe_unused]] const struct camera3_device *dev)<br>
> > > > +static int hal_dev_flush(const struct camera3_device *dev)<br>
> > > > {<br>
> > > > + if (!dev)<br>
> > > > + return -EINVAL;<br>
> > > > +<br>
> > > > + CameraDevice *camera = reinterpret_cast<CameraDevice *>(dev->priv);<br>
> > > > + camera->flush();<br>
> > > > +<br>
> > > > return 0;<br>
> > > > }<br>
> > > ><br>
><br>
> --<br>
> Regards,<br>
><br>
> Laurent Pinchart<br>
</blockquote></div></div>