[libcamera-devel] [PATCH v3] android: camera_device: Fix crash in calling CameraDevice::close()

Hirokazu Honda hiroh at chromium.org
Fri Sep 3 10:26:41 CEST 2021


Hi Jacopo,

On Thu, Sep 2, 2021 at 6:55 PM Jacopo Mondi <jacopo at jmondi.org> wrote:
>
> Hello,
>
> On Thu, Sep 02, 2021 at 11:51:02AM +0300, Laurent Pinchart wrote:
> > Hi Jacopo,
> >
> > On Thu, Sep 02, 2021 at 10:25:59AM +0200, Jacopo Mondi wrote:
> > > On Thu, Sep 02, 2021 at 10:40:39AM +0300, Laurent Pinchart wrote:
> > > > On Thu, Sep 02, 2021 at 08:59:34AM +0200, Jacopo Mondi wrote:
> > > > > On Thu, Sep 02, 2021 at 06:11:40AM +0300, Laurent Pinchart wrote:
> > > > > > On Wed, Sep 01, 2021 at 05:47:08PM +0200, Jacopo Mondi wrote:
> > > > > > > On Wed, Sep 01, 2021 at 03:37:39AM +0900, Hirokazu Honda wrote:
> > > > > > > > The problem is happening because we seem to add a CameraStream
> > > > > > > > associated buffer(depending on the CameraStream::Type) to the Request,
> > > > > > > > in CameraDevice::processCaptureRequest().
> > > > > > > >
> > > > > > > > However, when the camera stops, all the current buffers are marked with
> > > > > > > > FrameMetadata::FrameCancelled and proceed to completion. But the buffer
> > > > > > > > associated with the CameraStream (that was previously added to the
> > > > > > > > request) has now been cleared out with a part of streams_.clear(), even
> > > > > > > > before the camera stop() has been invoked. Any access to those request
> > > > > > > > buffers after they have been cleared, shall result in a crash.
> > > > > > >
> > > > > > > I recall it was painful at the time when we had to deal with flush()
> > > > > > > to get the right ordering, and I had done the same. However, since
> > > > > > > stop() is only called by configureStreams() I removed the
> > > > > > > streams_.clear() there and had issues, so I gave up.
> > > > > >
> > > > > > Do you mean you have tried removing streams_.clear() from
> > > > > > configureStreams() while testing this patch and ran into issues, or that
> > > > > > you tried that when implementing support for flush() ?
> > > > >
> > > > > I meant when developing flush().
> > > > > I haven't tried to do the same on top of this patch, but it might be
> > > > > good to do so ?
> > > >
> > > > I think so, because if it works, it's a good cleanup (possibly as part
> > > > of this patch actually, not on top), and if it doesn't, there's
> > > > something fishy :-) Would you be able to test it ?
> > >
> > > With this applied
> > >
> > > --- a/src/android/camera_device.cpp
> > > +++ b/src/android/camera_device.cpp
> > > @@ -544,6 +544,7 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >                 LOG(HAL, Error) << "No streams in configuration";
> > >                 return -EINVAL;
> > >         }
> > > +       streams_.reserve(stream_list->num_streams);
> > >
> > >  #if defined(OS_CHROMEOS)
> > >         if (!validateCropRotate(*stream_list))
> > > @@ -560,14 +561,6 @@ int CameraDevice::configureStreams(camera3_stream_configuration_t *stream_list)
> > >                 return -EINVAL;
> > >         }
> > >
> > > -       /*
> > > -        * Clear and remove any existing configuration from previous calls, and
> > > -        * ensure the required entries are available without further
> > > -        * reallocation.
> > > -        */
> > > -       streams_.clear();
> > > -       streams_.reserve(stream_list->num_streams);
> > > -
> > >         std::vector<Camera3StreamConfig> streamConfigs;
> > >         streamConfigs.reserve(stream_list->num_streams);
> > >
> > >  I can easily get a segfault running CTS.
> > >  I analyzed the coredump and the stack trace doesn't help much
> > >
> > > (soraka-libcamera-gdb) bt
> > > #0  0x000000043983e000 in ?? ()
> > > #1  0x00007eee396ee5b5 in base::OnceCallback<void ()>::Run() && (this=0x7eee1c009310) at ../../../libchrome-0.0.1/platform2/libchrome/base/callback.h:102
> > > #2  base::TaskAnnotator::RunTask (this=<optimized out>, trace_event_name=<optimized out>, pending_task=0x7eee1c009310) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/common/task_annotator.cc:163
> > > #3  0x00007eee396ff465 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl (this=this at entry=0x7eee2c035ed0, continuation_lazy_now=continuation_lazy_now at entry=0x7eee34bd0910)
> > >     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:332
> > > #4  0x00007eee396ff27c in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork (this=0x7eee2c035ed0)
> > >     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:252
> > > #5  0x00007eee396a322d in base::MessagePumpDefault::Run (this=0x7eee1c00cc70, delegate=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/message_loop/message_pump_default.cc:39
> > > #6  0x00007eee396ff730 in base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run (this=<optimized out>, application_tasks_allowed=<optimized out>, timeout=...)
> > >     at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc:446
> > > #7  0x00007eee396d0325 in base::RunLoop::Run (this=0x7eee34bd0aa0) at ../../../libchrome-0.0.1/platform2/libchrome/base/run_loop.cc:124
> > > #8  0x00007eee3971d04e in base::Thread::ThreadMain (this=0x7eee2c034090) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:382
> > > #9  0x00007eee39718f3d in base::(anonymous namespace)::ThreadFunc (params=0x7eee2c033c50) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/platform_thread_posix.cc:87
> > > #10 0x00007eee395adfbf in start_thread (arg=0x7eee34bd1640) at pthread_create.c:463
> > > #11 0x00007eee393f735f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> > > (soraka-libcamera-gdb) quit
> > >
> > > The last ?? in #0 makes me think I might not have symbols for the
> > > camera service, which I cannot easily recompile due to the cros-debug
> > > USE flag issue. I need to rebuild all the packages and I've now kicked
> > > it.
> > >
> > > One day I'll learn to shut up and ack patches without asking for
> > > reasons to fall into rabbit holes like this one.
> >
> > Thanks for the quick test.
>
> I can confirm removing streams_.clear() from configureStreams() easily
> triggers a segfault when running CTS. That's indeed the same issue I
> was facing when developing flush().
>
> I really cannot figure out why it happens looking at the code. I got
> a few more stack traces with debuga symbols in the camera service from the
> coredumps but they do not seem that helpful at a first look.
>
> [Current thread is 1 (Thread 0x7c372ffff640 (LWP 8265))]
> (soraka-libcamera-gdb) bt
> #0  std::__1::__cxx_atomic_fetch_sub<int> (__a=0x400000000, __delta=1, __order=std::__1::memory_order_acq_rel) at /usr/bin/../include/c++/v1/atomic:1072
> #1  std::__1::__atomic_base<int, true>::fetch_sub (this=0x400000000, __op=1, __m=std::__1::memory_order_acq_rel) at /usr/bin/../include/c++/v1/atomic:1719
> #2  base::AtomicRefCount::Decrement (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/atomic_ref_count.h:39
> #3  base::subtle::RefCountedThreadSafeBase::ReleaseImpl (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/ref_counted.h:218
> #4  base::subtle::RefCountedThreadSafeBase::Release (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/ref_counted.h:170
> #5  base::RefCountedThreadSafe<base::internal::BindStateBase, base::internal::BindStateBaseRefCountTraits>::Release (this=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/ref_counted.h:398
> #6  scoped_refptr<base::internal::BindStateBase>::Release (ptr=0x400000000) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/scoped_refptr.h:322
> #7  scoped_refptr<base::internal::BindStateBase>::~scoped_refptr (this=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/memory/scoped_refptr.h:224
> #8  base::internal::CallbackBase::~CallbackBase (this=<optimized out>) at ../../../libchrome-0.0.1/platform2/libchrome/base/callback_internal.cc:85
> #9  0x00007c37561eb82b in base::sequence_manager::internal::AtomicFlagSet::Group::~Group (this=0x7c373000ef80) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/atomic_flag_set.cc:132
> #10 std::__1::default_delete<base::sequence_manager::internal::AtomicFlagSet::Group>::operator() (this=<optimized out>, __ptr=0x7c373000ef80) at /usr/bin/../include/c++/v1/memory:1335
> #11 0x00007c37561eaede in base::sequence_manager::internal::AtomicFlagSet::RemoveFromAllocList (group=0x7c373000ef80, this=<optimized out>) at /usr/bin/../include/c++/v1/memory:1595
> #12 base::sequence_manager::internal::AtomicFlagSet::AtomicFlag::ReleaseAtomicFlag (this=0x7c3730013b90) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/atomic_flag_set.cc:77
> #13 0x00007c37561f3cec in base::sequence_manager::internal::TaskQueueImpl::UnregisterTaskQueue (this=0x7c37300139e0) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/task_queue_impl.cc:201
> #14 0x00007c37561ec1df in base::sequence_manager::internal::SequenceManagerImpl::~SequenceManagerImpl (this=0x7c37300008d0) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/sequence_manager_impl.cc:234
> #15 0x00007c37561ec52e in base::sequence_manager::internal::SequenceManagerImpl::~SequenceManagerImpl (this=0x7c37300008d0) at ../../../libchrome-0.0.1/platform2/libchrome/base/task/sequence_manager/sequence_manager_impl.cc:212
> #16 0x00007c3756218317 in std::__1::default_delete<base::sequence_manager::internal::SequenceManagerImpl>::operator() (this=0x7c3730014848, __ptr=0x400000000) at /usr/bin/../include/c++/v1/memory:1335
> #17 std::__1::unique_ptr<base::sequence_manager::internal::SequenceManagerImpl, std::__1::default_delete<base::sequence_manager::internal::SequenceManagerImpl> >::reset (this=0x7c3730014848, __p=0x0)
>     at /usr/bin/../include/c++/v1/memory:1596
> #18 std::__1::unique_ptr<base::sequence_manager::internal::SequenceManagerImpl, std::__1::default_delete<base::sequence_manager::internal::SequenceManagerImpl> >::~unique_ptr (this=0x7c3730014848)
>     at /usr/bin/../include/c++/v1/memory:1550
> #19 base::(anonymous namespace)::SequenceManagerThreadDelegate::~SequenceManagerThreadDelegate (this=0x7c3730014840) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:67
> #20 base::(anonymous namespace)::SequenceManagerThreadDelegate::~SequenceManagerThreadDelegate (this=0x7c3730014840) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:67
> #21 0x00007c3756218098 in std::__1::default_delete<base::Thread::Delegate>::operator() (this=0x7c3744039bc0, __ptr=0x400000000) at /usr/bin/../include/c++/v1/memory:1335
> #22 std::__1::unique_ptr<base::Thread::Delegate, std::__1::default_delete<base::Thread::Delegate> >::reset (this=0x7c3744039bc0, __p=0x0) at /usr/bin/../include/c++/v1/memory:1596
> #23 base::Thread::ThreadMain (this=0x7c3744039b40) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/thread.cc:400
> #24 0x00007c3756213f3d in base::(anonymous namespace)::ThreadFunc (params=0x7c3730011840) at ../../../libchrome-0.0.1/platform2/libchrome/base/threading/platform_thread_posix.cc:87
> #25 0x00007c37560a8fbf in start_thread (arg=0x7c372ffff640) at pthread_create.c:463
> #26 0x00007c3755ef235f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> >
> > Hiro, is this something you can look at ?

I guess this is because streams_ is not cleared when state_ is Stopped.
Could you try again by modifying as follows?

--- a/src/android/camera_device.cpp
+++ b/src/android/camera_device.cpp
@@ -448,8 +448,10 @@ void CameraDevice::flush()
 void CameraDevice::stop()
 {
        MutexLocker stateLock(stateMutex_);
-       if (state_ == State::Stopped)
+       if (state_ == State::Stopped) {
+               streams_.clear();
                return;
+       }

Thanks,
-Hiro
> >
> > > > > > > But I have this patch gone through cros_camera_test and also CTS seems
> > > > > > > fine. Also looking at the code it seems an harmless change so I wonder
> > > > > > > what was wrong at the time.
> > > > > > >
> > > > > > > As far as I can tell:
> > > > > > >
> > > > > > > Tested-by: Jacopo Mondi <jacopo at jmondi.org>
> > > > > > > Acked-by: Jacopo Mondi <jacopo at jmondi.org>
> > > > > > >
> > > > > > > > Signed-off-by: Hirokazu Honda <hiroh at chromium.org>
> > > > > > > > ---
> > > > > > > >  src/android/camera_device.cpp | 4 ++--
> > > > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> > > > > > > > index 8ca76719..fda77db4 100644
> > > > > > > > --- a/src/android/camera_device.cpp
> > > > > > > > +++ b/src/android/camera_device.cpp
> > > > > > > > @@ -423,8 +423,6 @@ int CameraDevice::open(const hw_module_t *hardwareModule)
> > > > > > > >
> > > > > > > >  void CameraDevice::close()
> > > > > > > >  {
> > > > > > > > - streams_.clear();
> > > > > > > > -
> > > > > > > >   stop();
> > > > > > > >
> > > > > > > >   camera_->release();
> > > > > > > > @@ -457,6 +455,8 @@ void CameraDevice::stop()
> > > > > > > >   camera_->stop();
> > > > > > > >
> > > > > > > >   descriptors_.clear();
> > > > > > > > + streams_.clear();
> > > > > > > > +
> > > > > > > >   state_ = State::Stopped;
> > > > > > > >  }
> > > > > > > >
> >
> > --
> > Regards,
> >
> > Laurent Pinchart


More information about the libcamera-devel mailing list