[libcamera-devel] [PATCH 2/2] checkstyle: Add support for pre-commit hook

Fri Jan 17 14:50:56 CET 2020

Le vendredi 17 janvier 2020 à 09:46 +0200, Laurent Pinchart a écrit :
> Hi Nicolas,
> 
> On Thu, Jan 16, 2020 at 08:23:36PM -0500, Nicolas Dufresne wrote:
> > Le vendredi 17 janvier 2020 à 00:31 +0200, Laurent Pinchart a écrit :
> > > On Thu, Jan 16, 2020 at 10:25:31PM +0000, Kieran Bingham wrote:
> > > > On 16/01/2020 18:26, nicolas at ndufresne.ca wrote:
> > > > > From: Nicolas Dufresne <nicolas.dufresne at collabora.com>
> > > > > 
> > > > > This adds support for pre-commit hook workflow. In pre-commit hook we
> > > > > check the style on the changes currently staged. Note that this patch
> > > > > also includes a bit of sugar to support --amend.
> > > > > 
> > > > > Signed-off-by: Nicolas Dufresne <nicolas.dufresne at collabora.com>
> > > > > ---
> > > > >  utils/checkstyle.py    | 55 +++++++++++++++++++++++++++++-------------
> > > > >  utils/hooks/pre-commit | 16 ++++++++++++
> > > > >  2 files changed, 54 insertions(+), 17 deletions(-)
> > > > >  mode change 100644 => 100755 utils/checkstyle.py
> > > > >  create mode 100755 utils/hooks/pre-commit
> > > > > 
> > > > > diff --git a/utils/checkstyle.py b/utils/checkstyle.py
> > > > > old mode 100644
> > > > > new mode 100755
> > > > > index 7edea25..23eb3f6
> > > > > --- a/utils/checkstyle.py
> > > > > +++ b/utils/checkstyle.py
> > > > > @@ -458,11 +458,16 @@ class StripTrailingSpaceFormatter(Formatter):
> > > > >  # Style checking
> > > > >  #
> > > > >  
> > > > > -def check_file(top_level, commit, filename):
> > > > > +def check_file(top_level, commit, filename, staged):
> > > > >      # Extract the line numbers touched by the commit.
> > > > > -    diff = subprocess.run(['git', 'diff', '%s~..%s' % (commit, commit), '--',
> > > > > -                           '%s/%s' % (top_level, filename)],
> > > > > -                          stdout=subprocess.PIPE).stdout
> > > > 
> > > > I wonder if we could/should automate this by detecting staged changes,
> > > > and then setting staged based on that?
> > > > 
> > > > 	"git status --porcelain | grep ^M"
> > > > 
> > > > Could detect if there are staged changes.
> > > > 
> > > > But equally it might be better to allow the caller to be explicit.
> > > 
> > > I've been wondering about this too, I have a slight preference for
> > > auto-detection, but that's not a hard requirement if too difficult to
> > > implement or impractical to use.
> > 
> > I don't think it hard, but I wasn't sure it was a good idea. One thing
> > that I've been a little slowpy is with the meaning of commit here (of
> > the rev list in fact). Before I added amend support, I was simply
> > ignoring that (no meaning). But then I recycled it to support the
> > commit --amend case, where you want to included the staged changes and
> > the previous commit.
> > 
> > What I found worked to get that information was to only pass one hash,
> > which is the commit before the last commit (HEAD~). Anyway, I think I
> > should fix this, and  so '%s~' % commit instead if I want to maintained
> > the behaviour. But by doing that, there is no longer anything to
> > describe that when I commit --amend, I want the combined report, not
> > just what was staged. And that I also don't want the previous commit to
> > be checked independently from the staged, because that would produce
> > that warning I'm trying to fix.
> > 
> > So my conclusion, is that to disambiguate all this, I should in fact be
> > even more explicit, and also introduce --amend. A better implementation
> > would be to find a way to integrated these special revisions into the
> > revlist, this way (not sure if there is a use case) but you could
> > request a report for specific commits, the staging changes and the
> > amended changes in one call.
> 
> That would be neat indeed. From a quick look at git-rev-parse there's no
> revision syntax to express this that I could find. We could extend the
> revision syntax, but that may be dangerous.

I was thinking to keep the --staged/amend options, but to use mixed
type in the python list, and do type checks when iterating. That would
avoid the binary parameter, which is just bad API really.

> 
> If we instead go for explicit arguments, --staged or --cached would be
> fine with me, but I'd rather use --worktree (or something similar) than
> --amend to reflect that the revision corresponds to the working tree.

I believe worktree is a much worst choice as it matches a completely
unrelated concept in git (see git worktree --help). The concept of an
amendment isn't defined, but can logically match the combination of the
previous commit and the staged changes, since this is what git commit 
--amend will combine to replace the last commit.

> 
> > > > > +    if staged:
> > > > > +        diff = subprocess.run(['git', 'diff', '--staged', commit, '--',
> > > > > +                               '%s/%s' % (top_level, filename)],
> > > > > +                              stdout=subprocess.PIPE).stdout
> > > > > +    else:
> > > > > +        diff = subprocess.run(['git', 'diff', '%s~..%s' % (commit, commit), '--',
> > > > > +                               '%s/%s' % (top_level, filename)],
> > > > > +                              stdout=subprocess.PIPE).stdout
> > > > >      diff = diff.decode('utf-8').splitlines(True)
> > > > >      commit_diff = parse_diff(diff)
> > > > >  
> > > > > @@ -476,8 +481,12 @@ def check_file(top_level, commit, filename):
> > > > >  
> > > > >      # Format the file after the commit with all formatters and compute the diff
> > > > >      # between the unformatted and formatted contents.
> > > > > -    after = subprocess.run(['git', 'show', '%s:%s' % (commit, filename)],
> > > > > -                           stdout=subprocess.PIPE).stdout
> > > > > +    if staged:
> > > > > +        after = subprocess.run(['git', 'show', ':%s' % (filename)],
> > > > > +                               stdout=subprocess.PIPE).stdout
> > > > > +    else:
> > > > > +        after = subprocess.run(['git', 'show', '%s:%s' % (commit, filename)],
> > > > > +                               stdout=subprocess.PIPE).stdout
> > > > >      after = after.decode('utf-8')
> > > > >  
> > > > >      formatted = after
> > > > > @@ -521,13 +530,20 @@ def check_file(top_level, commit, filename):
> > > > >      return len(formatted_diff) + len(issues)
> > > > >  
> > > > >  
> > > > > -def check_style(top_level, commit):
> > > > > -    # Get the commit title and list of files.
> > > > > -    ret = subprocess.run(['git', 'show', '--pretty=oneline','--name-only', commit],
> > > > > -                         stdout=subprocess.PIPE)
> > > > > -    files = ret.stdout.decode('utf-8').splitlines()
> > > > > -    title = files[0]
> > > > > -    files = files[1:]
> > > > > +def check_style(top_level, commit, staged):
> > > > > +    if staged:
> > > > > +        # Get list of staged changed files
> > > > > +        ret = subprocess.run(['git', 'diff', '--staged', '--name-only', commit],
> > > > > +                             stdout=subprocess.PIPE)
> > > > > +        files = ret.stdout.decode('utf-8').splitlines()
> > > > > +        title = "Pre-commit"
> > > > 
> > > > This is not necessarily a "Pre-commit" (hook). It's just staged.
> > > > A user could run the tool directly to validate some staged changes
> > > > before committing...
> > > > 
> > > > So I'd perhaps make this
> > > > 	title = "Staged"
> > > 
> > > Or maybe "Staged changes" to be slightly more explicit ?
> > 
> > It started "Staged", then I splitted it between amend and staged, and
> > for some reason ended up like this. I think I'll split it again, and
> > use "Staged changes" or "Amended changes".
> 
> Maybe "Work tree changes" instead of "Amended changes" ?

See my comment above on why Worktree isn't a great choice. What I maybe
suggest, is to title this one after the previous commit. That would
look like "Amend: <subject line>". The truth though is that nobody
using pre-commit will ever even look at the subject, so we are putting
a little too much effort in naming here.

> 
> > > > > +    else:
> > > > > +        # Get the commit title and list of files.
> > > > > +        ret = subprocess.run(['git', 'show', '--pretty=oneline', '--name-only', commit],
> > > > > +                             stdout=subprocess.PIPE)
> > > > > +        files = ret.stdout.decode('utf-8').splitlines()
> > > > > +        title = files[0]
> > > > > +        files = files[1:]
> > > > >  
> > > > >      separator = '-' * len(title)
> > > > >      print(separator)
> > > > > @@ -541,11 +557,11 @@ def check_style(top_level, commit):
> > > > >      files = [f for f in files if len([p for p in patterns if fnmatch.fnmatch(os.path.basename(f), p)])]
> > > > >      if len(files) == 0:
> > > > >          print("Commit doesn't touch source files, skipping")
> > > > > -        return
> > > > > +        return 0
> > > > >  
> > > > >      issues = 0
> > > > >      for f in files:
> > > > > -        issues += check_file(top_level, commit, f)
> > > > > +        issues += check_file(top_level, commit, f, staged)
> > > > >  
> > > > >      if issues == 0:
> > > > >          print("No style issue detected")
> > > > > @@ -554,6 +570,8 @@ def check_style(top_level, commit):
> > > > >          print("%u potential style %s detected, please review" % \
> > > > >                  (issues, 'issue' if issues == 1 else 'issues'))
> > > > >  
> > > > > +    return issues
> > > > > +
> > > > >  
> > > > >  def extract_revlist(revs):
> > > > >      """Extract a list of commits on which to operate from a revision or revision
> > > > > @@ -595,6 +613,8 @@ def main(argv):
> > > > >      parser = argparse.ArgumentParser()
> > > > >      parser.add_argument('--formatter', '-f', type=str, choices=['astyle', 'clang-format'],
> > > > >                          help='Code formatter. Default to clang-format if not specified.')
> > > > > +    parser.add_argument('--staged', '-s', action='store_true',
> > > > > +                        help='Looks at the staged changes, used for pre-commit, defaults to False')
> > > > >      parser.add_argument('revision_range', type=str, default='HEAD', nargs='?',
> > > > >                          help='Revision range (as defined by git rev-parse). Defaults to HEAD if not specified.')
> > > > >      args = parser.parse_args(argv[1:])
> > > > > @@ -632,11 +652,12 @@ def main(argv):
> > > > >  
> > > > >      revlist = extract_revlist(args.revision_range)
> > > > >  
> > > > > +    issues = 0
> > > > >      for commit in revlist:
> > > > > -        check_style(top_level, commit)
> > > > > +        issues += check_style(top_level, commit, args.staged)
> > > > >          print('')
> > > > >  
> > > > > -    return 0
> > > > > +    return issues
> > > > 
> > > > I'd be tempted to split the
> > > >   "checkstyle: Report issue count as a failure"
> > > > and
> > > >   "checkstyle: Support validating staged git state"
> > > > 
> > > > to two patches, but perhaps that's not necessary unless you see benefit
> > > > in splitting.
> > > 
> > > I don't think there's a need to resubmit just for this, but if a v2 is
> > > needed, it may be nice to split the changes indeed, to explain why
> > > changing the return value is needed.
> > 
> > I think a v2 is needed, so I'll do.
> > 
> > > According to the exit man page,
> > > 
> > > RATIONALE
> > >        As explained in other sections, certain exit status values have been reserved for special uses and should be used by applications only for those purposes:
> > > 
> > >         126    A file to be executed was found, but it was not an executable utility.
> > > 
> > >         127    A utility to be executed was not found.
> > > 
> > >        >128    A command was interrupted by a signal.
> > > 
> > > And according to exit(),
> > > 
> > > The value status & 0377 is returned to the parent process as the process's exit status, and can be collected using one of the wait(2) family of calls.
> > > 
> > > In the unfortunate event that a multiple of 256 issues would be found,
> > > the caller would think everything went fine. Should we thus return 0 on
> > > success and 1 if any issue was found ?
> > 
> > Agreed, will turn the number of issues into 0 and 1.
> > 
> > > > >  if __name__ == '__main__':
> > > > > diff --git a/utils/hooks/pre-commit b/utils/hooks/pre-commit
> > > > > new file mode 100755
> > > > > index 0000000..57d23ef
> > > > > --- /dev/null
> > > > > +++ b/utils/hooks/pre-commit
> > > > 
> > > > Though I really think this deserves it's own commit.
> > > > 
> > > >  "utils/hooks: Provide a pre-commit checkstyle hook"
> > > > 
> > > > > @@ -0,0 +1,16 @@
> > > > > +#!/bin/sh
> > > > > +
> > > > > +# Execute the checkstyle script before committing any code, failing the commit
> > > > > +# if needed. With this workflow, false positive can be ignored using:
> > > > > +#   git commit -n
> > > > > +#
> > > > > +# To utilise this hook, install this file with:
> > > > > +#   cp utils/hooks/pre-commit .git/hooks/pre-commit
> > > > 
> > > > One of the things I had toyed with is installing the hook by setting the
> > > > git hook directory to utils/hooks/ :
> > > > 
> > > >  git config core.hooksPath utils/hooks
> > > > 
> > > > But doing so now would invoke both the pre-commit and the post-commit hook.
> > > > 
> > > > I think it's good to provide the option of how someone might install
> > > > their hook
> > > > 
> > > > > +
> > > > > +commit=
> > > > > +if ps -ocommand= -p $PPID | grep -- "--amend"
> > > > > +then
> > > > > +   commit="HEAD~"
> > > > 
> > > > Is this really needed?
> > > > 
> > > > Edit: Yes, I've just played around with it - and I see what's happening.
> > > > Good catch to find that something extra was required here.
> > > > 
> > > > > +fi
> > > > > +
> > > > > +./utils/checkstyle.py --staged $commit
> > > > 
> > > > I wonder if the checkstyle.py could detect if we had staged changes if
> > > > we could fall back to needing only a single variant of the hook which
> > > > the user could choose to install as either a pre-commit or a post-commit
> > > > hook.
> > > > 
> > > > But perhaps it's more straight-forward to have two hooks to keep it
> > > > easier to support any future differences too.
> > > 
> > > I'm fine either way.
> > 
> > I think having two files in hook/ directory will be less confusing 
> > even if they are identical in the end. But I don't think the auto
> > detection is a good idea, it's makes the CLI of the tool ambiguous at
> > least.