[libcamera-devel] [RFC PATCH 1/1] test: cam tool testing with valgrind

Wed Oct 7 14:26:40 CEST 2020

Hi Niklas,

On 07/10/2020 13:03, Niklas Söderlund wrote:
> Hi Kieran,
> 
> On 2020-10-07 12:25:44 +0100, Kieran Bingham wrote:
>> Implement a test which runs cam to capture 10 frames,
>> and validates there are no memory leaks with valgrind.
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham at ideasonboard.com>
>> ---
>>
>> A few points to consider here:
>>
>> Duration
>> --------
>> Unfortuantely this test adds an extra 13 seconds to the test run on my
>> system. We could reduce the duration of course, but it will still be a
>> reasonably substantial addition.
> 
> I think this is indeed a good tool to have in our arsenal. I'm starting 
> to wonder if we need to split our test directory in a 'release' and 
> 'normal' mode. In release we would have tests that takes longer and are 
> more focused to catch deeper errors such as memory leaks or other issues 
> that requires a longer runtime. While in 'normal' we would have tests 
> similar that we have today that focus on incorrect behavior of the API.  
> What do you think?
> 
> I think I would be unhappy for this test to run (without any way to 
> disable it) for every ninja test run as I already think our tests takes 
> too much time ;-)

I'm fine with that, a 'smoke-test' suite, and a full suite. I'll leave
the definition of what goes it the smoke-test to later, after we
determine if it's possible. But I certainly don't object ;-)

>> TAP
>> ---
>> This runs a test for mulitiple cameras, and meson now supports 'tap'
>> format tests (multiple tests in one execution). In this patch - the
>> 'protocol : "tap" is disabled, because meson doesn't like the output of
>> the rest of the test being mixed with the tap output. I'm not sure of a
>> good way to handle that. The output is useful, so we want it somewhere,
>> and ideally in the test log - but we also want the tap parsing to work
>> in that case too.
>> Anyway, with the tap protocol disabled, it just falls back to the return
>> status which is also handled here. Still, this is a good opportunity to
>> see how 'tap' could help our tests, and might revive my old patches to
>> use common test helpers across the rest of test/
> 
> I'm a huge fan of TAP, do you think it would require a lot of work to 
> change our current tests output to be valid TAP? If I recall correctly 
> prepending a # too all non-TAP lines is nought to make any text stream 
> into a valid (yet not so useful) TAP stream.

Ok - prepending sounds like a good route then. Maybe we can throw
something together that automatically redirects stdout/stderr to two
different prefixes.

1...3
#out: a standard out message
#err: A standard error mesage/warning
not ok - 1 There was an error in this test.

> One thing about TAP I learnt the hard way is that the number of planed 
> tests (keyword "1..5") does not need to go first in the TAP stream, it's 
> perfectly valid to provide this last. The results should only be 

Yes, I discovered this recently too. That helps a lot when you don't
know how many tests you are going to run until you run them ;-)

> considered incomplete if there is no plan provided anywhere. This have 
> helped me in the past turning things into being TAP compatible.
> 
> Lastly a word of caution of the TAP13 specification, please don't :-)

Ack. I haven't looked at TAP13. I'll avoid it ;-)

>> Valgrind
>> --------
>> Parsing valgrind is a real pain. This test *only* checks to see if there
>> are any memory leaks, and does not report on any of the other topics
>> valgirind might highlight. (They could be added later of course).
>> Ideally we would want a machine parsable output from valgrind. It can
>> output an xml report ... but ... that's a lot more parsing too ;-)
>>
>> Thoughts/Suggestions?
> 
> Is it really useful information to parse this out? As long as the 
> valgrind output is keept in a log file I think a PASS/FAIL of the run is 
> enough as someone who debugs it would like find more informaiton in the 
> log then from parsed values presented in the test run.

The issue is - It's hard to (easily/simply) identify if there is a
reason to consider valgrind a pass/fail.

grepping the logs seems fragile and awkward --- It would be nice if
valgrind had a better summary that said:
  - leaks: 5
  - uninitialsed data: 2
  - Total errors: 7

Or something really easy to consider a pass/fail.

For example, without Tomi's current fixup patches, this test 'passes'
even though valgrind has (correctly) identified and reported the use of
uninitialised variables, and an invalid close(-1) syscall.

It would be nice if (easily) we could report all valgrind errors here.
It feels too obvious - so I must have missed something staring me in the
face :-(

> 
>>
>> https://github.com/adobe/chromium/blob/master/tools/valgrind/memcheck_analyze.py
>> seems to show an existing tool that parses the valgrind output as one
>> possible path - but I'm happy to hear of any better ideas too.
>>
>>
> 
> As a last point I wonder if we should create a new binary in tests that 
> iterates over all cameras while using the same CameraManager instance 
> and looper over them 2-3 times. I think that could exercise more code 
> paths then creating a new CamereManger for each run (as we do with cam).  
> But this can of course be done on top or maybe even in addition.

Yes, that sounds like another internal test to add.

Meson does have the facilility to wrap everthing with valgrind too ...
so perhaps we should also consider that - but I thought this route might
be interesting too.

$ meson test --wrap='valgrind --tool=helgrind -v --error-exitcode=1'

Ok - so I grabbed that from a search - and look at that
--error-exitcode=1 ! That looks like what I might have been missing ;-)

Well well well ... so I've run meson test --wrap=valgrind and I get:

$ meson test --wrap=valgrind
14/57 libcamera:ipc / unixsocket                       FAIL
16/57 libcamera:log / log_process                      FAIL
22/57 libcamera:process / process_test                 FAIL
55/57 libcamera / timer                                FAIL

(trimmed of course)

and worse:

meson test --wrap='valgrind --tool=helgrind -v --error-exitcode=1'
<trim all the failures>
 1/57 libcamera:integration / cam-capture-test         OK
38/57 libcamera / span                                 OK

Ok:                 2
Expected Fail:      0
Fail:               55
Unexpected Pass:    0
Skipped:            0
Timeout:            0

It seems that might be potentially due to a common failure in the
CameraTest::CameraTest though.

Time for some digging I guess...

--
Kieran

> 
>>
>>  test/cam/capture-test.sh | 71 ++++++++++++++++++++++++++++++++++++++++
>>  test/cam/meson.build     | 10 ++++++
>>  test/meson.build         |  1 +
>>  3 files changed, 82 insertions(+)
>>  create mode 100755 test/cam/capture-test.sh
>>  create mode 100644 test/cam/meson.build
>>
>> diff --git a/test/cam/capture-test.sh b/test/cam/capture-test.sh
>> new file mode 100755
>> index 000000000000..ab808976be72
>> --- /dev/null
>> +++ b/test/cam/capture-test.sh
>> @@ -0,0 +1,71 @@
>> +#!/bin/sh
>> +
>> +# SPDX-License-Identifier: GPL-2.0-or-later
>> +
>> +TestPass=0
>> +TestFail=255
>> +TestSkip=77
>> +
>> +# Initialise success, set for failure.
>> +ret=$TestPass
>> +
>> +ok() {
>> +	echo "ok $*"
>> +}
>> +
>> +nok() {
>> +	echo "not ok $*"
>> +	ret=$TestFail
>> +}
>> +
>> +valgrind=$(command -v valgrind)
>> +
>> +if [ x"" = x"$valgrind" ] ; then
>> +	echo "skip 1 - Valgrind unavailable ..."
>> +	exit $TestSkip
>> +fi
>> +
>> +# Tests expect to be run from the meson.project_build_root()
>> +cam=${1:-src/cam/cam}
>> +
>> +if [ ! -e "$cam" ] ; then
>> +	nok "1 - failed to find cam utility."
>> +	exit $TestFail
>> +fi
>> +
>> +# Unfortunately, we don't have a 'machine interface', so we rely on parsing the
>> +# output of cam...
>> +num_cameras=$("$cam" -l | grep -v "Available" | wc -l)
>> +
>> +# Enter TAP plan
>> +echo "1..$num_cameras"
>> +
>> +for i in $(seq 1 1 "$num_cameras");
>> +do
>> +	"$cam" -c "$i" -C10
>> +	ret=$?
>> +	if [ $ret != 0 ] ; then
>> +		nok "$i - $cam returned $ret"
>> +		continue
>> +	fi
>> +
>> +	log_file="valgrind-cam-$i.log"
>> +	"$valgrind" "$cam" -c "$i" -C10 > "$log_file" 2>&1
>> +	ret=$?
>> +	if [ $ret != 0 ] ; then
>> +		nok "$i - $valgrind returned $ret"
>> +		continue
>> +	fi
>> +
>> +	# I'd prefer a better way of checking there are no leaks, as well as reporting
>> +	# the different categories from valgrind as distinct tests.
>> +	if ! grep "no leaks are possible" "$log_file" > /dev/null; then
>> +		nok "$i - Valgrind Errors detected"
>> +		cat $log_file > /dev/stderr
>> +		continue
>> +	fi
>> +
>> +	ok "$i - Camera $i reports no leaks"
>> +done;
>> +
>> +exit $ret
>> diff --git a/test/cam/meson.build b/test/cam/meson.build
>> new file mode 100644
>> index 000000000000..834c9bcf6b86
>> --- /dev/null
>> +++ b/test/cam/meson.build
>> @@ -0,0 +1,10 @@
>> +# SPDX-License-Identifier: CC0-1.0
>> +
>> +cam_capture = files('capture-test.sh')
>> +
>> +test('cam-capture-test', cam_capture,
>> +    args : cam,
>> +    suite : 'integration',
>> +    is_parallel : false,
>> +    #protocol : 'tap',
>> +    timeout : 60)
>> diff --git a/test/meson.build b/test/meson.build
>> index 0a1d434e3996..d1b24220dc7c 100644
>> --- a/test/meson.build
>> +++ b/test/meson.build
>> @@ -2,6 +2,7 @@
>>  
>>  subdir('libtest')
>>  
>> +subdir('cam')
>>  subdir('camera')
>>  subdir('controls')
>>  subdir('ipa')
>> -- 
>> 2.25.1
>>
>> _______________________________________________
>> libcamera-devel mailing list
>> libcamera-devel at lists.libcamera.org
>> https://lists.libcamera.org/listinfo/libcamera-devel
> 

-- 
Regards
--
Kieran