[EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture

Anle Pan anle.pan at nxp.com
Mon Mar 25 10:20:43 CET 2024


Hi Umang, Jacopo.

Yes, when issue happened, stuck in sendCaptureResults. The reproduce chance on my side was also not high for the cts (1/10)

		void CameraDevice::sendCaptureResults()
		{
        	while (!descriptors_.empty() && !descriptors_.front()->isPending()) {

And Jacopo, your description is accurate, the descriptors_ has a chance to be not cleared due to the State was already 'Stopped' after flush() .
will ask hui to help update the commit message since the "git send" configuration has issue on my side.

Best Regards
Anle Pan

-----Original Message-----
From: Umang Jain <umang.jain at ideasonboard.com> 
Sent: 2024年3月25日 15:44
To: Jacopo Mondi <jacopo.mondi at ideasonboard.com>; Anle Pan <anle.pan at nxp.com>
Cc: libcamera-devel at lists.libcamera.org; Hui Fang <hui.fang at nxp.com>
Subject: Re: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked issue when stopping capture

Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button


Hi Jacopo,

Thanks for testing,

On 22/03/24 4:06 pm, Jacopo Mondi wrote:
> Hello
>
> On Tue, Mar 05, 2024 at 09:28:00AM +0000, Anle Pan wrote:
>> Hi Umang Jain,
>>
>> this issue was random(around 1/10), and can be reproduced by single run below cts test:
>>      ./cts-tradefed run commandAndExit cts --abi arm64-v8a -m 
>> CtsMediaPlayerTestCases -t 
>> android.media.player.cts.MediaPlayerTest#testRecordedVideoPlayback90
>>
>> •    3 times configuring streams in this Test:
>> 1)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
>> 2)  3 streams:   (0) 320x240-NV12 (1) 320x240-NV12 (2) 1280x720-YUYV: will capture on stream0 and stream2 at the same time
>> 3)  2 streams:   (0) 320x240-NV12 (1) 1280x720-YUYV:   capture several frames on stream1
>>
>> •    when single run the test several times and met fail. When digging the log, can find that the current failed is caused by the abnormal complete of the previous PASS test in step3, which is the situation described in my patch.
>>
> The only code path I see that could lead to descriptor_ not being 
> cleared is flush() being called and then stop() called just after 
> possibily as the result of a new configuration
>
> void CameraDevice::flush()
> {
>       {
>               MutexLocker stateLock(stateMutex_);
>               if (state_ != State::Running)
>                       return;
>
>               state_ = State::Flushing;
>       }
>
>       camera_->stop();
>
>       MutexLocker stateLock(stateMutex_);
>       state_ = State::Stopped;
> }
>
> void CameraDevice::stop()
> {
>       MutexLocker stateLock(stateMutex_);
>       if (state_ == State::Stopped)
>               return;
>
>       camera_->stop();
>
>       {
>               MutexLocker descriptorsLock(descriptorsMutex_);
>               descriptors_ = {};
>       }
>
>       streams_.clear();
>
>       state_ = State::Stopped;
> }
>
> int CameraDevice::configureStreams(camera3_stream_configuration_t 
> *stream_list) {
>       /* Before any configuration attempt, stop the camera. */
>       stop();
>
>          ...
>
> }
>
> Looking at the flush() implementation right now I wonder why we stop 
> the camera without the stateMutex_ held (the only reason I see is 
> because it can be concurrent with processCaptureRequest() which keeps 
> the mutex held), but I certainly don't want to touch that code right 
> now without a good reason.
>
> All in all, I think there's indeed a path that could lead to the 
> descriptor not being cleaned up, and we can't do it in flush() because 
> a concurrent call to processCaptureRequest() might be using the 
> descriptors to complete the in-flight requests with error state.
>
> Anle, does this match your understanding ? If so, this is what should 
> be recorded in the commit message (the cause of an issue, not the
> symptoms)
>
>
>> Best Regards
>> Anle Pan
>>
>> -----Original Message-----
>> From: Umang Jain <umang.jain at ideasonboard.com>
>> Sent: 2024年3月5日 16:27
>> To: Anle Pan <anle.pan at nxp.com>; libcamera-devel at lists.libcamera.org
>> Cc: Hui Fang <hui.fang at nxp.com>
>> Subject: [EXT] Re: [PATCH] android: camera_device: Fix camera blocked 
>> issue when stopping capture
>>
>> Caution: This is an external email. Please take care when clicking 
>> links or opening attachments. When in doubt, report the message using 
>> the 'Report this email' button
>>
>>
>> Hi Anle Pan, Hui Fang
>>
>> On 04/03/24 3:05 pm, Anle Pan wrote:
>>> The issue occurs when stopping capture soon after starting capture.
>> Can you describe the issue/use case in more detail? It's really hard to infer from one line.
>>> In this case, no frame get from the device, the related capture 
>>> request has been pushed to the queue descriptors_, but the 
>>> queuedRequests_ was still empty due to no requests will be queue to 
>>> the device since the stream will be stopped soon,
>> I don't get it how is this possible? Since processCaptureRequest() will push the request in the descriptors_ queue along with queuing the request to the camera.
>>
>> How will the call path be interleaved /before/ processCaptureRequest() returns? Possibly we need more details on what you are trying to do, before reviewing the code here. Is this related to CTS ?
>>> so there will be no camera->requestComplete called later, then the 
>>> descriptors_ can not pop normally, this will cause the pending if we 
>>> want to start capture next time.
>>>
>>> To fix the issue, ensure the descriptors_ is empty after the camera 
>>> device is stopped.
> Do you know precisely why a non-empty descriptors_ stalls the camera ?
> Does it get stuck on
>
> void CameraDevice::sendCaptureResults()
> {
>       while (!descriptors_.empty() && 
> !descriptors_.front()->isPending()) {
>
> ?

We should to formalise this as an proper issue, after getting more information.
>>> Signed-off-by: Anle Pan <anle.pan at nxp.com>
> Tested with CTS, no regressions detected
>
> Reviewed-by: Jacopo Mondi <jacopo.mondi at ideasonboard.com>
> Tested-by: Jacopo Mondi <jacopo.mondi at ideasonboard.com>
>
>>> ---
>>>    src/android/camera_device.cpp | 5 ++++-
>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/android/camera_device.cpp 
>>> b/src/android/camera_device.cpp index 25cedd44..d452992d 100644
>>> --- a/src/android/camera_device.cpp
>>> +++ b/src/android/camera_device.cpp
>>> @@ -433,8 +433,11 @@ void CameraDevice::flush()
>>>    void CameraDevice::stop()
>>>    {
>>>        MutexLocker stateLock(stateMutex_);
>>> -     if (state_ == State::Stopped)
>>> +     if (state_ == State::Stopped) {
>>> +             MutexLocker descriptorsLock(descriptorsMutex_);
>>> +             descriptors_ = {};
>>>                return;
>>> +     }

I am not comfortable with this patch. Consider my points below

- First we don't have a exact description of the issue it is fixing

- Second, if the camera state is ::Stopped already. Clears descriptors_ doesn't look a good idea to me. The function which has set the camera state::Stopped already, should be ideally be emptying the descriptors_...

- If we are accepting this, why not just let the Camera::stop() fall through and remove

             if (state_ == State::Stopped)
                 return;

as well. Then, CameraDevice::stop() will behave same as Camera::stop() - a no-op when called even when State::Stopped and responsible to clean up queues(and stuff) for the HAL layer.

I should ideally back up my thoughts with some testing, but I don't have any kind of CTS testing available to me right now :(
>>>        camera_->stop();
>>>



More information about the libcamera-devel mailing list