[libcamera-devel] [PATCH 2/2] checkstyle: Add support for pre-commit hook

Fri Jan 17 08:46:52 CET 2020

Hi Nicolas,

On Thu, Jan 16, 2020 at 08:23:36PM -0500, Nicolas Dufresne wrote:
> Le vendredi 17 janvier 2020 à 00:31 +0200, Laurent Pinchart a écrit :
> > On Thu, Jan 16, 2020 at 10:25:31PM +0000, Kieran Bingham wrote:
> >> On 16/01/2020 18:26, nicolas at ndufresne.ca wrote:
> >>> From: Nicolas Dufresne <nicolas.dufresne at collabora.com>
> >>> 
> >>> This adds support for pre-commit hook workflow. In pre-commit hook we
> >>> check the style on the changes currently staged. Note that this patch
> >>> also includes a bit of sugar to support --amend.
> >>> 
> >>> Signed-off-by: Nicolas Dufresne <nicolas.dufresne at collabora.com>
> >>> ---
> >>>  utils/checkstyle.py    | 55 +++++++++++++++++++++++++++++-------------
> >>>  utils/hooks/pre-commit | 16 ++++++++++++
> >>>  2 files changed, 54 insertions(+), 17 deletions(-)
> >>>  mode change 100644 => 100755 utils/checkstyle.py
> >>>  create mode 100755 utils/hooks/pre-commit
> >>> 
> >>> diff --git a/utils/checkstyle.py b/utils/checkstyle.py
> >>> old mode 100644
> >>> new mode 100755
> >>> index 7edea25..23eb3f6
> >>> --- a/utils/checkstyle.py
> >>> +++ b/utils/checkstyle.py
> >>> @@ -458,11 +458,16 @@ class StripTrailingSpaceFormatter(Formatter):
> >>>  # Style checking
> >>>  #
> >>>  
> >>> -def check_file(top_level, commit, filename):
> >>> +def check_file(top_level, commit, filename, staged):
> >>>      # Extract the line numbers touched by the commit.
> >>> -    diff = subprocess.run(['git', 'diff', '%s~..%s' % (commit, commit), '--',
> >>> -                           '%s/%s' % (top_level, filename)],
> >>> -                          stdout=subprocess.PIPE).stdout
> >> 
> >> I wonder if we could/should automate this by detecting staged changes,
> >> and then setting staged based on that?
> >> 
> >> 	"git status --porcelain | grep ^M"
> >> 
> >> Could detect if there are staged changes.
> >> 
> >> But equally it might be better to allow the caller to be explicit.
> > 
> > I've been wondering about this too, I have a slight preference for
> > auto-detection, but that's not a hard requirement if too difficult to
> > implement or impractical to use.
> 
> I don't think it hard, but I wasn't sure it was a good idea. One thing
> that I've been a little slowpy is with the meaning of commit here (of
> the rev list in fact). Before I added amend support, I was simply
> ignoring that (no meaning). But then I recycled it to support the
> commit --amend case, where you want to included the staged changes and
> the previous commit.
> 
> What I found worked to get that information was to only pass one hash,
> which is the commit before the last commit (HEAD~). Anyway, I think I
> should fix this, and  so '%s~' % commit instead if I want to maintained
> the behaviour. But by doing that, there is no longer anything to
> describe that when I commit --amend, I want the combined report, not
> just what was staged. And that I also don't want the previous commit to
> be checked independently from the staged, because that would produce
> that warning I'm trying to fix.
> 
> So my conclusion, is that to disambiguate all this, I should in fact be
> even more explicit, and also introduce --amend. A better implementation
> would be to find a way to integrated these special revisions into the
> revlist, this way (not sure if there is a use case) but you could
> request a report for specific commits, the staging changes and the
> amended changes in one call.

That would be neat indeed. From a quick look at git-rev-parse there's no
revision syntax to express this that I could find. We could extend the
revision syntax, but that may be dangerous.

If we instead go for explicit arguments, --staged or --cached would be
fine with me, but I'd rather use --worktree (or something similar) than
--amend to reflect that the revision corresponds to the working tree.

> >>> +    if staged:
> >>> +        diff = subprocess.run(['git', 'diff', '--staged', commit, '--',
> >>> +                               '%s/%s' % (top_level, filename)],
> >>> +                              stdout=subprocess.PIPE).stdout
> >>> +    else:
> >>> +        diff = subprocess.run(['git', 'diff', '%s~..%s' % (commit, commit), '--',
> >>> +                               '%s/%s' % (top_level, filename)],
> >>> +                              stdout=subprocess.PIPE).stdout
> >>>      diff = diff.decode('utf-8').splitlines(True)
> >>>      commit_diff = parse_diff(diff)
> >>>  
> >>> @@ -476,8 +481,12 @@ def check_file(top_level, commit, filename):
> >>>  
> >>>      # Format the file after the commit with all formatters and compute the diff
> >>>      # between the unformatted and formatted contents.
> >>> -    after = subprocess.run(['git', 'show', '%s:%s' % (commit, filename)],
> >>> -                           stdout=subprocess.PIPE).stdout
> >>> +    if staged:
> >>> +        after = subprocess.run(['git', 'show', ':%s' % (filename)],
> >>> +                               stdout=subprocess.PIPE).stdout
> >>> +    else:
> >>> +        after = subprocess.run(['git', 'show', '%s:%s' % (commit, filename)],
> >>> +                               stdout=subprocess.PIPE).stdout
> >>>      after = after.decode('utf-8')
> >>>  
> >>>      formatted = after
> >>> @@ -521,13 +530,20 @@ def check_file(top_level, commit, filename):
> >>>      return len(formatted_diff) + len(issues)
> >>>  
> >>>  
> >>> -def check_style(top_level, commit):
> >>> -    # Get the commit title and list of files.
> >>> -    ret = subprocess.run(['git', 'show', '--pretty=oneline','--name-only', commit],
> >>> -                         stdout=subprocess.PIPE)
> >>> -    files = ret.stdout.decode('utf-8').splitlines()
> >>> -    title = files[0]
> >>> -    files = files[1:]
> >>> +def check_style(top_level, commit, staged):
> >>> +    if staged:
> >>> +        # Get list of staged changed files
> >>> +        ret = subprocess.run(['git', 'diff', '--staged', '--name-only', commit],
> >>> +                             stdout=subprocess.PIPE)
> >>> +        files = ret.stdout.decode('utf-8').splitlines()
> >>> +        title = "Pre-commit"
> >> 
> >> This is not necessarily a "Pre-commit" (hook). It's just staged.
> >> A user could run the tool directly to validate some staged changes
> >> before committing...
> >> 
> >> So I'd perhaps make this
> >> 	title = "Staged"
> > 
> > Or maybe "Staged changes" to be slightly more explicit ?
> 
> It started "Staged", then I splitted it between amend and staged, and
> for some reason ended up like this. I think I'll split it again, and
> use "Staged changes" or "Amended changes".

Maybe "Work tree changes" instead of "Amended changes" ?

> >>> +    else:
> >>> +        # Get the commit title and list of files.
> >>> +        ret = subprocess.run(['git', 'show', '--pretty=oneline', '--name-only', commit],
> >>> +                             stdout=subprocess.PIPE)
> >>> +        files = ret.stdout.decode('utf-8').splitlines()
> >>> +        title = files[0]
> >>> +        files = files[1:]
> >>>  
> >>>      separator = '-' * len(title)
> >>>      print(separator)
> >>> @@ -541,11 +557,11 @@ def check_style(top_level, commit):
> >>>      files = [f for f in files if len([p for p in patterns if fnmatch.fnmatch(os.path.basename(f), p)])]
> >>>      if len(files) == 0:
> >>>          print("Commit doesn't touch source files, skipping")
> >>> -        return
> >>> +        return 0
> >>>  
> >>>      issues = 0
> >>>      for f in files:
> >>> -        issues += check_file(top_level, commit, f)
> >>> +        issues += check_file(top_level, commit, f, staged)
> >>>  
> >>>      if issues == 0:
> >>>          print("No style issue detected")
> >>> @@ -554,6 +570,8 @@ def check_style(top_level, commit):
> >>>          print("%u potential style %s detected, please review" % \
> >>>                  (issues, 'issue' if issues == 1 else 'issues'))
> >>>  
> >>> +    return issues
> >>> +
> >>>  
> >>>  def extract_revlist(revs):
> >>>      """Extract a list of commits on which to operate from a revision or revision
> >>> @@ -595,6 +613,8 @@ def main(argv):
> >>>      parser = argparse.ArgumentParser()
> >>>      parser.add_argument('--formatter', '-f', type=str, choices=['astyle', 'clang-format'],
> >>>                          help='Code formatter. Default to clang-format if not specified.')
> >>> +    parser.add_argument('--staged', '-s', action='store_true',
> >>> +                        help='Looks at the staged changes, used for pre-commit, defaults to False')
> >>>      parser.add_argument('revision_range', type=str, default='HEAD', nargs='?',
> >>>                          help='Revision range (as defined by git rev-parse). Defaults to HEAD if not specified.')
> >>>      args = parser.parse_args(argv[1:])
> >>> @@ -632,11 +652,12 @@ def main(argv):
> >>>  
> >>>      revlist = extract_revlist(args.revision_range)
> >>>  
> >>> +    issues = 0
> >>>      for commit in revlist:
> >>> -        check_style(top_level, commit)
> >>> +        issues += check_style(top_level, commit, args.staged)
> >>>          print('')
> >>>  
> >>> -    return 0
> >>> +    return issues
> >> 
> >> I'd be tempted to split the
> >>   "checkstyle: Report issue count as a failure"
> >> and
> >>   "checkstyle: Support validating staged git state"
> >> 
> >> to two patches, but perhaps that's not necessary unless you see benefit
> >> in splitting.
> > 
> > I don't think there's a need to resubmit just for this, but if a v2 is
> > needed, it may be nice to split the changes indeed, to explain why
> > changing the return value is needed.
> 
> I think a v2 is needed, so I'll do.
> 
> > According to the exit man page,
> > 
> > RATIONALE
> >        As explained in other sections, certain exit status values have been reserved for special uses and should be used by applications only for those purposes:
> > 
> >         126    A file to be executed was found, but it was not an executable utility.
> > 
> >         127    A utility to be executed was not found.
> > 
> >        >128    A command was interrupted by a signal.
> > 
> > And according to exit(),
> > 
> > The value status & 0377 is returned to the parent process as the process's exit status, and can be collected using one of the wait(2) family of calls.
> > 
> > In the unfortunate event that a multiple of 256 issues would be found,
> > the caller would think everything went fine. Should we thus return 0 on
> > success and 1 if any issue was found ?
> 
> Agreed, will turn the number of issues into 0 and 1.
> 
> >>> 
> >>>  if __name__ == '__main__':
> >>> diff --git a/utils/hooks/pre-commit b/utils/hooks/pre-commit
> >>> new file mode 100755
> >>> index 0000000..57d23ef
> >>> --- /dev/null
> >>> +++ b/utils/hooks/pre-commit
> >> 
> >> Though I really think this deserves it's own commit.
> >> 
> >>  "utils/hooks: Provide a pre-commit checkstyle hook"
> >> 
> >>> @@ -0,0 +1,16 @@
> >>> +#!/bin/sh
> >>> +
> >>> +# Execute the checkstyle script before committing any code, failing the commit
> >>> +# if needed. With this workflow, false positive can be ignored using:
> >>> +#   git commit -n
> >>> +#
> >>> +# To utilise this hook, install this file with:
> >>> +#   cp utils/hooks/pre-commit .git/hooks/pre-commit
> >> 
> >> One of the things I had toyed with is installing the hook by setting the
> >> git hook directory to utils/hooks/ :
> >> 
> >>  git config core.hooksPath utils/hooks
> >> 
> >> But doing so now would invoke both the pre-commit and the post-commit hook.
> >> 
> >> I think it's good to provide the option of how someone might install
> >> their hook
> >> 
> >>> +
> >>> +commit=
> >>> +if ps -ocommand= -p $PPID | grep -- "--amend"
> >>> +then
> >>> +   commit="HEAD~"
> >> 
> >> Is this really needed?
> >> 
> >> Edit: Yes, I've just played around with it - and I see what's happening.
> >> Good catch to find that something extra was required here.
> >> 
> >>> +fi
> >>> +
> >>> +./utils/checkstyle.py --staged $commit
> >> 
> >> I wonder if the checkstyle.py could detect if we had staged changes if
> >> we could fall back to needing only a single variant of the hook which
> >> the user could choose to install as either a pre-commit or a post-commit
> >> hook.
> >> 
> >> But perhaps it's more straight-forward to have two hooks to keep it
> >> easier to support any future differences too.
> > 
> > I'm fine either way.
> 
> I think having two files in hook/ directory will be less confusing 
> even if they are identical in the end. But I don't think the auto
> detection is a good idea, it's makes the CLI of the tool ambiguous at
> least.

-- 
Regards,

Laurent Pinchart