[libcamera-devel] Potential issue with EventDispatcher

Laurent Pinchart laurent.pinchart at ideasonboard.com
Fri Jul 24 06:05:25 CEST 2020


Hi Umang,

On Thu, Jul 23, 2020 at 10:59:06AM +0000, Umang Jain wrote:
> Hi Laurent, Sorry for late reply on this.
> I took this up again only today.

No worries.

> On 7/17/20 7:33 AM, Laurent Pinchart wrote:
> > On Wed, Jul 15, 2020 at 08:29:54AM +0000, Umang Jain wrote:
> >> Hi,
> >>
> >> I noticed a potential issue where EventDispatcher::processEvents() is
> >> running in a `while` loop, which seems to starve the execution of any
> >> other call to EventDispatcher. Those calls seems to be queued for
> >> execution until processEvents() seems to be terminated in some way or
> >> the other.
> >
> > I assume you mean the while loop in Thread::exec():
> >
> > int Thread::exec()
> > {
> > 	MutexLocker locker(data_->mutex_);
> >
> > 	EventDispatcher *dispatcher = eventDispatcher();
> >
> > 	locker.unlock();
> >
> > 	while (!data_->exit_.load(std::memory_order_acquire))
> > 		dispatcher->processEvents();
> >
> > 	locker.lock();
> >
> > 	return data_->exitCode_;
> > }
> >
> >> I noticed this while studying the guts of Event Loop on Hot-_un_plug
> >> of a currently streaming camera. (Context: see "cam: Add --monitor
> >> option" patch/reviews). On Hot-unplug of a UVC camera, The
> >> V4L2VideoDevice tries to release the buffers and also delete event
> >> notifiers. Essentially, deleting event notifiers primarily means to
> >> unregister them from Event Dispatcher, so that the subsequent
> >> processEvents call do not process events from the unregistered event
> >> notifiers. However, as I mentioned above, the while loop in which
> >> processEvents is running, can starve the unregisterEventNotifier from
> >> executing (I tried with gdb to hit a breakpoint, but it didn't) .
> >> It's only when I SIGINT cam application, those queued calls on
> >> unregisterNotifier are executed.
> >>
> >> Is this expected? Or am I missing something.
> >
> > This is clearly not expected.
> >
> > libcamera is event-based. There's a main thread created by the camera
> > manager, one thread per IPA, and IPAs can create additional threads. All
> > the internal code in the camera manager thread and IPA threads runs from
> > an event loop, with an event dispatcher at its core (the custom threads
> > created by IPAs may or may not use this mechanism, but that's irrelevant
> > for this discussion).
> >
> > Being event-based means that the event loop can starve execution of code
> > from the same thread, as that code runs from the event loop in the first
> > place. However, it can cause multi-thread issues. libcamera has a
> > threading model documented in src/libcamera/thread.cpp. This governs
> > what classes and functions can be called concurrently from different
> > threads, with the default case when not explicitly stated otheriwse
> > being that functions from an object may not be called from different
> > threads.
> >
> > I assume your issue is related to the V4L2VideoDevice deletion. This
> > calls V4L2VideoDevice::close(), which deletes the EventNotifier
> > instances. Deleting an EventNotifier calls setEnabled(false), which
> > calls dispatcher->unregisterEventNotifier(this). Note that
> > EventNotifier::setEnabled() is documented as being thread-bound, which
> > means that it can only be called from the thread that the event notifier
> > belongs to (in this case it's the camera manager thread, as that's the
> > thread to which all pipeline handlers belong). This is why
> > unregisterEventNotifier() doesn't use any lock to guard against races
> > with processEvents(), it simply can't race because it's guaranteed to be
> > executed from the event loop (provided it's used correctly of course).
> > The same holds true through the whole code base, we have plenty of
> > operations that we don't lock because the thread model guarantees they
> > won't be executed concurrently.
> >
> > There's nothing special about the
> > EventDispatcher::unregisterEventNotifier() function. If you don't see it
> > getting called, it's because it doesn't get called :-) I would recommend
> > moving your breakpoint up the stack, to EventNotifier::~EventNotifier(),
> > then V4L2VideoDevice::~V4L2VideoDevice(), to confirm those don't get
> > called. I suspect a reference count issue.
>
> Indeed, I cannot see ~V4L2VideoDevice or  ~EventNotifier being executed,
> on camera hot-unplug *until* I terminate the app. After termination (Ctrl-C
> on cam and/or closing qcam), the destructors gets executed from the camera
> manager only (checked using std::this_thread::get_id())
> 
> You mentioned of refcount issue and I looked into that too. The signal
> handler of "cameraRemoved" receives a refcount of '8' in cam and
> '7' in qcam. This looks notoriously high! (I am under the impression,
> this is the last place from where camera destructors could have been
> executed - so it probably equates to '1'). This needs looking too.

Yes, the value is too high. Do you need help tracing this ?

> > Note that due to the threading model there can be issues if the last
> > reference to the shared pointer storing the camera object is released
> > from the application thread, as the destructor is then called from that
> > thread, which isn't allowed by the threading model. If the hot-unplug
> > issues eventually gets traced to that, we'll have to discuss how to
> > solve the problem. It may be as easy as creating the shared pointer with
> > a custom deleter that would push the task of deleting the object to the
> > camera manager's event loop. That could be implemented by adding
> > Object::deleteLater() (with similar semantics to the Qt's equivalent,
> > see https://doc.qt.io/qt-5/qobject.html#deleteLater), and making Camera
> > inherit from the Object class. It could also be implemented by creating
> > a custom shared pointer class for the camera, but I'm not sure that's
> > worth it.
> >
> > If you're stuck on this, can you share the procedure to reproduce the
> > problem ?
>
> I have no particular instructions other than what Niklas has:
> https://lists.libcamera.org/pipermail/libcamera-devel/2020-July/010824.html
> 
> I have been able to trigger the errors ~50% of the times. So I think 
> there indeed,
> is some kind of race happening which didn't get caught while I integrated
>   the hotplug feature earlier.

-- 
Regards,

Laurent Pinchart


More information about the libcamera-devel mailing list