[libcamera-devel] [PATCH 0/3] IPU3 Stability

Laurent Pinchart laurent.pinchart at ideasonboard.com
Mon Mar 8 16:00:06 CET 2021


Hi Kieran,

On Mon, Mar 08, 2021 at 12:06:20PM +0000, Kieran Bingham wrote:
> On 04/03/2021 14:12, Laurent Pinchart wrote:
> > On Thu, Mar 04, 2021 at 10:14:40AM +0000, Kieran Bingham wrote:
> >> On 03/03/2021 18:45, Laurent Pinchart wrote:
> >>> On Wed, Mar 03, 2021 at 05:04:23PM +0000, Kieran Bingham wrote:
> >>>> While working on the IPU3 IPA, some crashes have been occuring on
> >>>> shutdown races.
> >>>>
> >>>> Patches 2/3 and 3/3 at least presently work around those issues, and
> >>>> while they may be suitable for integration now - both suggest that more
> >>>> work is needed to ensure that the IPU3 pipeline handler is stable and
> >>>> implemented correctly.
> >>>>
> >>>> Patch 2/3 highlights that we may need to take more concern over the
> >>>> lifetime management of a Request and the associated FrameInfo that is
> >>>> supported with that.
> >>>
> >>> I think 2/3 is suitable for integration, even if API improvements are
> >>> possible. This isn't a completely unexpected situation, now that we have
> >>> an implementation we find the related issues, that's a way to improve
> >>> APIs.
> >>
> >> Yes, well we certainly need to fix it to unblock other developments.
> >>
> >>>> Patch 3/3 shows that the IPU3 Pipeline handler is queueing buffers after
> >>>> it has released the buffers on the video nodes, and needs further
> >>>> consideration as well (though I beleive that patch is still worthy of
> >>>> integration on it's own).
> >>>
> >>> I have more doubts about this one, as there's really something that
> >>> should be fixed on the pipeline handler side. I was considering
> >>> proposing the Fatal log level, but I suppose it will not make a big
> >>> difference compared to a segfault :-) Or maybe it would, the log message
> >>> can then clearly state that the problem is in the caller.
> >>
> >> You know, locally it's a fatal on my tree right now, so it's harder to
> >> disagree, but in this patch I set it to Error because I believe it's
> >> better not to crash right now.
> > 
> > :-)
> > 
> >> It's not a Fatally unrecoverable error. It's an issue in the the
> >> pipeline handler, doing something it shouldn't, and it is recoverable
> >> (Just don't queue).
> > 
> > The part on the V4L2VideoDevice side may be, but it's a symptom of
> > something really wrong happening, which will likely cause other
> > undesired behaviour in the pipeline handler. What those behaviours will
> > be depend on the pipeline handler, but the worst case would be some
> > silent memory corruption that will cause a crash later. A fatal error
> > here would help pin-pointing the root cause, making debugging of the
> > pipeline handler easier.
> > 
> > Granted, the error message will help there too (returning an error
> > silently would be the worst option). I suppose our ways of screaming
> > loudly differ, mine goes through an assertion failure :-)
> > 
> >> But it should still be fixed by the pipeline handler, hence loud error
> >> message.
> >>
> >> I would love to be able to make this an Error with a backtrace ... to be
> >> /really/ loud, but I'm not sure yet about having an extra log level
> >> between Error and Fatal.
> > 
> > It's an interesting thought, I'll sleep on it.
> 
> Did you get any sleep?

Yes :-)

We already have an error level with a backtrace, that's Fatal. Currently
it terminates program execution with std::abort(), and I'd be fine only
doing so for debug builds. It doesn't seem we need an additional level.

> I'd like to get these patches in at least so that we can ensure IPA
> development can continue.

1/3 and 2/3 can be merged right away.

For 3/3, I'll reply to the patch now.

> We can always make queuing a buffer to V4L2VideoDevice after it's been
> stopped a fatal error (Or at least a backtracable error) after the
> pipeline handler has been fixed...
> 
> >>>> The issue leading to what would be a crash without patch 3/3 is when
> >>>> shutting down, the IPA running in parallel can emit an event from:
> >>>>  void IPU3CameraData::queueFrameAction()
> >>>> as 
> >>>>
> >>>>   case ipa::ipu3::ActionParamFilled:
> >>>>
> >>>> which then goes on to queue a buffer to the three relevant V4L2 devices:
> >>>>
> >>>> 	imgu_->param_->queueBuffer(info->paramBuffer);
> >>>> 	imgu_->stat_->queueBuffer(info->statBuffer);
> >>>> 	imgu_->input_->queueBuffer(info->rawBuffer);
> >>>>
> >>>>
> >>>> This is occuring after those devices have released their buffers, and
> >>>> thus the v4l2 buffer cache (cache_) is deleted and set to nullptr,
> >>>> resulting in a segmentation fault within the V4L2VideoDevice otherwise.
> >>>
> >>> Looking at PipelineHandlerIPU3::stop(), we have
> >>>
> >>> 	data->ipa_->stop();
> >>>
> >>> 	freeBuffers(camera);
> >>>
> >>> The IPA stop() call should be synchronous, which means that any pending
> >>> asynchronous call from the IPA module to the pipeline handler should be
> >>> processed before stop() returns.
> >>>
> >>> On the IPA side, the stop() function should stop any internal thread,
> >>> which will make sure that the API won't emit any event on its own after
> >>> stop() returns. The stop() implementation is currently empty, but as
> >>> we're not creating any custom thread in the IPU3 IPA at this time, it
> >>> shouldn't be a problem.
> >>>
> >>> It could also be that the pipeline handler sends more frame events to
> >>> the IPA module after stop(), but given the cross-thread nature of the
> >>> calls, I wouldn't expect that to go through if the IPA thread created by
> >>> the proxy is stopped. Still, it may be worth double-checking this.
> >>>
> >>> Tracing the IPA calls could be useful here. Please let me know if I can
> >>> help.
> >>
> >> I enabled tracing with:
> >>  lttng enable-event -u libcamera:\*
> >>
> >> but got not IPA traces.
> >>
> >> Ok - so I've discussed this with Paul, and it's because IPA traces
> >> aren't enabled ;-)
> >>
> >> Adding a patch he has adds more context there, and it seems like we
> >> might be able to add more tracepoints automatically to see when messages
> >> are sent back from the IPA - that will make it clear if the IPA is
> >> sending a fillParams message back /after/ having already stopped.
> >>
> >> I'll defer further investigation on this until after I have that
> >> visibility for now.
> >>
> >>>> Kieran Bingham (3):
> >>>>   libcamera: pipeline: ipu3: Fix spelling error
> >>>>   libcamera: pipeline: ipu3: Ensure that IPU3Frames::info is not used
> >>>>     after delete
> >>>>   libcamera: v4l2_videodevice: Prevent queueing buffers without a cache
> >>>>
> >>>>  src/libcamera/pipeline/ipu3/frames.cpp |  2 +-
> >>>>  src/libcamera/pipeline/ipu3/ipu3.cpp   | 20 ++++++++++++++++++--
> >>>>  src/libcamera/v4l2_videodevice.cpp     | 10 ++++++++++
> >>>>  3 files changed, 29 insertions(+), 3 deletions(-)
> >>>>

-- 
Regards,

Laurent Pinchart


More information about the libcamera-devel mailing list