[libcamera-devel] [PATCH 2/2] checkstyle: Add support for pre-commit hook

Fri Jan 17 02:23:36 CET 2020

Le vendredi 17 janvier 2020 à 00:31 +0200, Laurent Pinchart a écrit :
> Hi Kieran,
> 
> On Thu, Jan 16, 2020 at 10:25:31PM +0000, Kieran Bingham wrote:
> > On 16/01/2020 18:26, nicolas at ndufresne.ca wrote:
> > > From: Nicolas Dufresne <nicolas.dufresne at collabora.com>
> > > 
> > > This adds support for pre-commit hook workflow. In pre-commit hook we
> > > check the style on the changes currently staged. Note that this patch
> > > also includes a bit of sugar to support --amend.
> > > 
> > > Signed-off-by: Nicolas Dufresne <nicolas.dufresne at collabora.com>
> > > ---
> > >  utils/checkstyle.py    | 55 +++++++++++++++++++++++++++++-------------
> > >  utils/hooks/pre-commit | 16 ++++++++++++
> > >  2 files changed, 54 insertions(+), 17 deletions(-)
> > >  mode change 100644 => 100755 utils/checkstyle.py
> > >  create mode 100755 utils/hooks/pre-commit
> > > 
> > > diff --git a/utils/checkstyle.py b/utils/checkstyle.py
> > > old mode 100644
> > > new mode 100755
> > > index 7edea25..23eb3f6
> > > --- a/utils/checkstyle.py
> > > +++ b/utils/checkstyle.py
> > > @@ -458,11 +458,16 @@ class StripTrailingSpaceFormatter(Formatter):
> > >  # Style checking
> > >  #
> > >  
> > > -def check_file(top_level, commit, filename):
> > > +def check_file(top_level, commit, filename, staged):
> > >      # Extract the line numbers touched by the commit.
> > > -    diff = subprocess.run(['git', 'diff', '%s~..%s' % (commit, commit), '--',
> > > -                           '%s/%s' % (top_level, filename)],
> > > -                          stdout=subprocess.PIPE).stdout
> > 
> > I wonder if we could/should automate this by detecting staged changes,
> > and then setting staged based on that?
> > 
> > 	"git status --porcelain | grep ^M"
> > 
> > Could detect if there are staged changes.
> > 
> > But equally it might be better to allow the caller to be explicit.
> 
> I've been wondering about this too, I have a slight preference for
> auto-detection, but that's not a hard requirement if too difficult to
> implement or impractical to use.

I don't think it hard, but I wasn't sure it was a good idea. One thing
that I've been a little slowpy is with the meaning of commit here (of
the rev list in fact). Before I added amend support, I was simply
ignoring that (no meaning). But then I recycled it to support the
commit --amend case, where you want to included the staged changes and
the previous commit.

What I found worked to get that information was to only pass one hash,
which is the commit before the last commit (HEAD~). Anyway, I think I
should fix this, and  so '%s~' % commit instead if I want to maintained
the behaviour. But by doing that, there is no longer anything to
describe that when I commit --amend, I want the combined report, not
just what was staged. And that I also don't want the previous commit to
be checked independently from the staged, because that would produce
that warning I'm trying to fix.

So my conclusion, is that to disambiguate all this, I should in fact be
even more explicit, and also introduce --amend. A better implementation
would be to find a way to integrated these special revisions into the
revlist, this way (not sure if there is a use case) but you could
request a report for specific commits, the staging changes and the
amended changes in one call.

> 
> > > +    if staged:
> > > +        diff = subprocess.run(['git', 'diff', '--staged', commit, '--',
> > > +                               '%s/%s' % (top_level, filename)],
> > > +                              stdout=subprocess.PIPE).stdout

> > > +    else:
> > > +        diff = subprocess.run(['git', 'diff', '%s~..%s' % (commit, commit), '--',
> > > +                               '%s/%s' % (top_level, filename)],
> > > +                              stdout=subprocess.PIPE).stdout
> > >      diff = diff.decode('utf-8').splitlines(True)
> > >      commit_diff = parse_diff(diff)
> > >  
> > > @@ -476,8 +481,12 @@ def check_file(top_level, commit, filename):
> > >  
> > >      # Format the file after the commit with all formatters and compute the diff
> > >      # between the unformatted and formatted contents.
> > > -    after = subprocess.run(['git', 'show', '%s:%s' % (commit, filename)],
> > > -                           stdout=subprocess.PIPE).stdout
> > > +    if staged:
> > > +        after = subprocess.run(['git', 'show', ':%s' % (filename)],
> > > +                               stdout=subprocess.PIPE).stdout
> > > +    else:
> > > +        after = subprocess.run(['git', 'show', '%s:%s' % (commit, filename)],
> > > +                               stdout=subprocess.PIPE).stdout
> > >      after = after.decode('utf-8')
> > >  
> > >      formatted = after
> > > @@ -521,13 +530,20 @@ def check_file(top_level, commit, filename):
> > >      return len(formatted_diff) + len(issues)
> > >  
> > >  
> > > -def check_style(top_level, commit):
> > > -    # Get the commit title and list of files.
> > > -    ret = subprocess.run(['git', 'show', '--pretty=oneline','--name-only', commit],
> > > -                         stdout=subprocess.PIPE)
> > > -    files = ret.stdout.decode('utf-8').splitlines()
> > > -    title = files[0]
> > > -    files = files[1:]
> > > +def check_style(top_level, commit, staged):
> > > +    if staged:
> > > +        # Get list of staged changed files
> > > +        ret = subprocess.run(['git', 'diff', '--staged', '--name-only', commit],
> > > +                             stdout=subprocess.PIPE)
> > > +        files = ret.stdout.decode('utf-8').splitlines()
> > > +        title = "Pre-commit"
> > 
> > This is not necessarily a "Pre-commit" (hook). It's just staged.
> > A user could run the tool directly to validate some staged changes
> > before committing...
> > 
> > So I'd perhaps make this
> > 	title = "Staged"
> 
> Or maybe "Staged changes" to be slightly more explicit ?

It started "Staged", then I splitted it between amend and staged, and
for some reason ended up like this. I think I'll split it again, and
use "Staged changes" or "Amended changes".

> 
> > > +    else:
> > > +        # Get the commit title and list of files.
> > > +        ret = subprocess.run(['git', 'show', '--pretty=oneline', '--name-only', commit],
> > > +                             stdout=subprocess.PIPE)
> > > +        files = ret.stdout.decode('utf-8').splitlines()
> > > +        title = files[0]
> > > +        files = files[1:]
> > >  
> > >      separator = '-' * len(title)
> > >      print(separator)
> > > @@ -541,11 +557,11 @@ def check_style(top_level, commit):
> > >      files = [f for f in files if len([p for p in patterns if fnmatch.fnmatch(os.path.basename(f), p)])]
> > >      if len(files) == 0:
> > >          print("Commit doesn't touch source files, skipping")
> > > -        return
> > > +        return 0
> > >  
> > >      issues = 0
> > >      for f in files:
> > > -        issues += check_file(top_level, commit, f)
> > > +        issues += check_file(top_level, commit, f, staged)
> > >  
> > >      if issues == 0:
> > >          print("No style issue detected")
> > > @@ -554,6 +570,8 @@ def check_style(top_level, commit):
> > >          print("%u potential style %s detected, please review" % \
> > >                  (issues, 'issue' if issues == 1 else 'issues'))
> > >  
> > > +    return issues
> > > +
> > >  
> > >  def extract_revlist(revs):
> > >      """Extract a list of commits on which to operate from a revision or revision
> > > @@ -595,6 +613,8 @@ def main(argv):
> > >      parser = argparse.ArgumentParser()
> > >      parser.add_argument('--formatter', '-f', type=str, choices=['astyle', 'clang-format'],
> > >                          help='Code formatter. Default to clang-format if not specified.')
> > > +    parser.add_argument('--staged', '-s', action='store_true',
> > > +                        help='Looks at the staged changes, used for pre-commit, defaults to False')
> > >      parser.add_argument('revision_range', type=str, default='HEAD', nargs='?',
> > >                          help='Revision range (as defined by git rev-parse). Defaults to HEAD if not specified.')
> > >      args = parser.parse_args(argv[1:])
> > > @@ -632,11 +652,12 @@ def main(argv):
> > >  
> > >      revlist = extract_revlist(args.revision_range)
> > >  
> > > +    issues = 0
> > >      for commit in revlist:
> > > -        check_style(top_level, commit)
> > > +        issues += check_style(top_level, commit, args.staged)
> > >          print('')
> > >  
> > > -    return 0
> > > +    return issues
> > 
> > I'd be tempted to split the
> >   "checkstyle: Report issue count as a failure"
> > and
> >   "checkstyle: Support validating staged git state"
> > 
> > to two patches, but perhaps that's not necessary unless you see benefit
> > in splitting.
> 
> I don't think there's a need to resubmit just for this, but if a v2 is
> needed, it may be nice to split the changes indeed, to explain why
> changing the return value is needed.

I think a v2 is needed, so I'll do.

> 
> According to the exit man page,
> 
> RATIONALE
>        As explained in other sections, certain exit status values have been reserved for special uses and should be used by applications only for those purposes:
> 
>         126    A file to be executed was found, but it was not an executable utility.
> 
>         127    A utility to be executed was not found.
> 
>        >128    A command was interrupted by a signal.
> 
> And according to exit(),
> 
> The value status & 0377 is returned to the parent process as the process's exit status, and can be collected using one of the wait(2) family of calls.
> 
> In the unfortunate event that a multiple of 256 issues would be found,
> the caller would think everything went fine. Should we thus return 0 on
> success and 1 if any issue was found ?

Agreed, will turn the number of issues into 0 and 1.

> 
> > > 
> > >  if __name__ == '__main__':
> > > diff --git a/utils/hooks/pre-commit b/utils/hooks/pre-commit
> > > new file mode 100755
> > > index 0000000..57d23ef
> > > --- /dev/null
> > > +++ b/utils/hooks/pre-commit
> > 
> > Though I really think this deserves it's own commit.
> > 
> >  "utils/hooks: Provide a pre-commit checkstyle hook"
> > 
> > > @@ -0,0 +1,16 @@
> > > +#!/bin/sh
> > > +
> > > +# Execute the checkstyle script before committing any code, failing the commit
> > > +# if needed. With this workflow, false positive can be ignored using:
> > > +#   git commit -n
> > > +#
> > > +# To utilise this hook, install this file with:
> > > +#   cp utils/hooks/pre-commit .git/hooks/pre-commit
> > 
> > One of the things I had toyed with is installing the hook by setting the
> > git hook directory to utils/hooks/ :
> > 
> >  git config core.hooksPath utils/hooks
> > 
> > But doing so now would invoke both the pre-commit and the post-commit hook.
> > 
> > I think it's good to provide the option of how someone might install
> > their hook
> > 
> > > +
> > > +commit=
> > > +if ps -ocommand= -p $PPID | grep -- "--amend"
> > > +then
> > > +   commit="HEAD~"
> > 
> > Is this really needed?
> > 
> > Edit: Yes, I've just played around with it - and I see what's happening.
> > Good catch to find that something extra was required here.
> > 
> > > +fi
> > > +
> > > +./utils/checkstyle.py --staged $commit
> > 
> > I wonder if the checkstyle.py could detect if we had staged changes if
> > we could fall back to needing only a single variant of the hook which
> > the user could choose to install as either a pre-commit or a post-commit
> > hook.
> > 
> > But perhaps it's more straight-forward to have two hooks to keep it
> > easier to support any future differences too.
> 
> I'm fine either way.

I think having two files in hook/ directory will be less confusing 
even if they are identical in the end. But I don't think the auto
detection is a good idea, it's makes the CLI of the tool ambiguous at
least.