[libcamera-devel] [PATCH 0/3] IPU3 Stability
Laurent Pinchart
laurent.pinchart at ideasonboard.com
Thu Mar 4 15:12:18 CET 2021
Hi Kieran,
On Thu, Mar 04, 2021 at 10:14:40AM +0000, Kieran Bingham wrote:
> On 03/03/2021 18:45, Laurent Pinchart wrote:
> > On Wed, Mar 03, 2021 at 05:04:23PM +0000, Kieran Bingham wrote:
> >> While working on the IPU3 IPA, some crashes have been occuring on
> >> shutdown races.
> >>
> >> Patches 2/3 and 3/3 at least presently work around those issues, and
> >> while they may be suitable for integration now - both suggest that more
> >> work is needed to ensure that the IPU3 pipeline handler is stable and
> >> implemented correctly.
> >>
> >> Patch 2/3 highlights that we may need to take more concern over the
> >> lifetime management of a Request and the associated FrameInfo that is
> >> supported with that.
> >
> > I think 2/3 is suitable for integration, even if API improvements are
> > possible. This isn't a completely unexpected situation, now that we have
> > an implementation we find the related issues, that's a way to improve
> > APIs.
>
> Yes, well we certainly need to fix it to unblock other developments.
>
> >> Patch 3/3 shows that the IPU3 Pipeline handler is queueing buffers after
> >> it has released the buffers on the video nodes, and needs further
> >> consideration as well (though I beleive that patch is still worthy of
> >> integration on it's own).
> >
> > I have more doubts about this one, as there's really something that
> > should be fixed on the pipeline handler side. I was considering
> > proposing the Fatal log level, but I suppose it will not make a big
> > difference compared to a segfault :-) Or maybe it would, the log message
> > can then clearly state that the problem is in the caller.
>
> You know, locally it's a fatal on my tree right now, so it's harder to
> disagree, but in this patch I set it to Error because I believe it's
> better not to crash right now.
:-)
> It's not a Fatally unrecoverable error. It's an issue in the the
> pipeline handler, doing something it shouldn't, and it is recoverable
> (Just don't queue).
The part on the V4L2VideoDevice side may be, but it's a symptom of
something really wrong happening, which will likely cause other
undesired behaviour in the pipeline handler. What those behaviours will
be depend on the pipeline handler, but the worst case would be some
silent memory corruption that will cause a crash later. A fatal error
here would help pin-pointing the root cause, making debugging of the
pipeline handler easier.
Granted, the error message will help there too (returning an error
silently would be the worst option). I suppose our ways of screaming
loudly differ, mine goes through an assertion failure :-)
> But it should still be fixed by the pipeline handler, hence loud error
> message.
>
> I would love to be able to make this an Error with a backtrace ... to be
> /really/ loud, but I'm not sure yet about having an extra log level
> between Error and Fatal.
It's an interesting thought, I'll sleep on it.
> >> The issue leading to what would be a crash without patch 3/3 is when
> >> shutting down, the IPA running in parallel can emit an event from:
> >> void IPU3CameraData::queueFrameAction()
> >> as
> >>
> >> case ipa::ipu3::ActionParamFilled:
> >>
> >> which then goes on to queue a buffer to the three relevant V4L2 devices:
> >>
> >> imgu_->param_->queueBuffer(info->paramBuffer);
> >> imgu_->stat_->queueBuffer(info->statBuffer);
> >> imgu_->input_->queueBuffer(info->rawBuffer);
> >>
> >>
> >> This is occuring after those devices have released their buffers, and
> >> thus the v4l2 buffer cache (cache_) is deleted and set to nullptr,
> >> resulting in a segmentation fault within the V4L2VideoDevice otherwise.
> >
> > Looking at PipelineHandlerIPU3::stop(), we have
> >
> > data->ipa_->stop();
> >
> > freeBuffers(camera);
> >
> > The IPA stop() call should be synchronous, which means that any pending
> > asynchronous call from the IPA module to the pipeline handler should be
> > processed before stop() returns.
> >
> > On the IPA side, the stop() function should stop any internal thread,
> > which will make sure that the API won't emit any event on its own after
> > stop() returns. The stop() implementation is currently empty, but as
> > we're not creating any custom thread in the IPU3 IPA at this time, it
> > shouldn't be a problem.
> >
> > It could also be that the pipeline handler sends more frame events to
> > the IPA module after stop(), but given the cross-thread nature of the
> > calls, I wouldn't expect that to go through if the IPA thread created by
> > the proxy is stopped. Still, it may be worth double-checking this.
> >
> > Tracing the IPA calls could be useful here. Please let me know if I can
> > help.
>
> I enabled tracing with:
> lttng enable-event -u libcamera:\*
>
> but got not IPA traces.
>
> Ok - so I've discussed this with Paul, and it's because IPA traces
> aren't enabled ;-)
>
> Adding a patch he has adds more context there, and it seems like we
> might be able to add more tracepoints automatically to see when messages
> are sent back from the IPA - that will make it clear if the IPA is
> sending a fillParams message back /after/ having already stopped.
>
> I'll defer further investigation on this until after I have that
> visibility for now.
>
> >> Kieran Bingham (3):
> >> libcamera: pipeline: ipu3: Fix spelling error
> >> libcamera: pipeline: ipu3: Ensure that IPU3Frames::info is not used
> >> after delete
> >> libcamera: v4l2_videodevice: Prevent queueing buffers without a cache
> >>
> >> src/libcamera/pipeline/ipu3/frames.cpp | 2 +-
> >> src/libcamera/pipeline/ipu3/ipu3.cpp | 20 ++++++++++++++++++--
> >> src/libcamera/v4l2_videodevice.cpp | 10 ++++++++++
> >> 3 files changed, 29 insertions(+), 3 deletions(-)
> >>
--
Regards,
Laurent Pinchart
More information about the libcamera-devel
mailing list