[libcamera-devel] [RFC PATCH v2 05/14] test: yaml-parser: Test dictionary items ordering

Thu Jun 16 11:51:31 CEST 2022

On Thu, Jun 16, 2022 at 09:59:44AM +0100, David Plowman wrote:
> Hi Laurent
> 
> Not sure I quite answered all the right questions!

Let's see :-)

> On Thu, 16 Jun 2022 at 09:48, Laurent Pinchart wrote:
> > On Thu, Jun 16, 2022 at 09:32:21AM +0100, David Plowman wrote:
> > > On Thu, 16 Jun 2022 at 09:10, Laurent Pinchart wrote:
> > > > On Mon, Jun 13, 2022 at 12:48:55PM +0300, Laurent Pinchart via libcamera-devel wrote:
> > > > > On Mon, Jun 13, 2022 at 10:25:15AM +0100, David Plowman wrote:
> > > > > > Hi Laurent, everyone
> > > > > >
> > > > > > Thanks for the suggestions. I'm fine to change the syntax to make
> > > > > > things clearer, but I wonder if we could avoid breaking existing JSON
> > > > > > files? There are probably low numbers of them out there beyond the
> > > > > > ones that we've supplied, but you never quite know and backwards
> > > > > > compatibility is still a nice thing. Do you think that's something we
> > > > > > can arrange?
> > > > >
> > > > > This patch series guarantees ordering of entries in mapping (at the
> > > > > expense of duplicated key storage in memory, but that could probably be
> > > > > fixed), so there should be no breakage. The main drawback is that
> > > > > YamlParser and YamlObject will then expose an order that is
> > > > > implementation-specific and shouldn't be relied on by applications
> > > > > according to the JSON and YAML specifications. There's thus a risk of
> > > > > introducing new code that relies on mappings being ordered, which
> > > > > wouldn't be the case if the implementation didn't guarantee ordering
> > > > > (although one may argue that in the case the users of YamlObject may
> > > > > still rely on a different implementation-specific order without
> > > > > realizing it, we would need to randomize the order to avoid that, which
> > > > > I don't think is a good idea).
> > > > >
> > > > > With versioned tuning files the IPA could fairly easily support both the
> > > > > current format and the new format, so that shouldn't be a problem. I
> > > > > think we should then provide a Python script to convert the old format
> > > > > to the new one, and print a warning to the log. Would you expect the
> > > > > Raspberry Pi IPA module to support the current format forever, or only
> > > > > for a fixed duration to help users transition ?
> > > >
> > > > I'll post a v3 series of the YamlObject changes, and I'd like to decide
> > > > on which direction to take (if possible :-)). Could you share your
> > > > thoughts on this ?
> > >
> > > I wasn't totally sure what the precise question is, but here are a few
> > > random answers:
> > >
> > > - I'd like existing turning files to continue working, could they
> > > implicitly be regarded as "version 0" or something if they don't say
> > > otherwise? It also means not having to touch the tuning tool just yet.
> >
> > We can, the v2 I've posted can handle this. The only drawback is that it
> > risks introducing another dependency on YAML mapping ordering, but I
> > suppose that's worth it to avoid breaking all existing tuning files.
> >
> > Would you consider dropping support for v0 at some point ? If so, any
> > idea when ?

This one went unanswered. I'd like to know if you foresee v0 being
supported forever, for a long time, or for a short transition period.
I'm not pushing for one particular option, I'd just like to know in
order to structure the YamlObject rework correctly.

> > > - I don't like having different priorities stored with different
> > > algorithms. You'll forever be hunting through the file reverse
> > > engineering the actual order. I'd rather have an "order" field (or
> > > something) that simply lists the correct order.
> > >
> > > - The "order" field would default to the "standard order" if not
> > > present. The order can list algorithms that aren't present and that
> > > would just be ignored. It would be common simply not to have all
> > > algorithms, or to comment some out for debugging.
> > >
> > > I think there is some complexity in the order matching. For example,
> > > I'd want "rpi.awb" to match just "awb". But if "rpi.awb" is listed in
> > > the order, that would take precedence.
> > >
> > > The idea is that we can list the standard order like "black_level",
> > > "dpc", "lux", "noise", "geq", "sdn", "awb", and so on. But if someone
> > > writes an algorithm "foo.awb" they can force it to go somewhere else
> > > if they have too, but by default it would take up the standard "awb"
> > > position.
> > >
> > > Hmm, that's starting to get a bit annoying, isn't it? Storing
> > > priorities in the algorithm would be less of a problem in this
> > > respect, but I really don't like it...
> >
> > It's getting complex indeed. I have no issue keeping the existing
> > mechanism working for as long as we support v0. In v1 it would work with
> > the modification to the JSON file explained below. The only cost will be
> > one more level in the hierachy. I've also shown how it looks like in
> > YAML as it's less verbose and thus easier to discuss by e-mail, but that
> > doesn't mean I want to convert existing files to YAML.
> 
> Trying to be less complicated...
> 
> Maybe we leave the existing files as "version 0", and can we keep the
> order-in-the-file for "version 0". That would save any complications.

Yes, that's the idea, as you're concerned about breakages, the current
format will be considered as v0 and will continue operating exactly as
it does today.

For v1, we can change the grammar of the JSON files to use lists instead
of dictionnaries, thus giving us an order without infringing on the JSON
specification. We will thus still have the ordering controlled from the
tuning file, same feature as today, just a different grammar.

> I'd be happy for "version 0" to be json only if that helps.

As YamlParser supports JSON (with the caveat I've already mentioned),
both JSON and YAML are fine. It may be nice to standardize on one format
throughout libcamera if it doesn't cause any particular problem, but I'm
fine with the Raspberry Pi tuning files continuing to use JSON.

> > > - Don't want to lose json, but happy to have both with the appropriate
> > > file extension.
> >
> > The YamlParser supports JSON just fine :-) The only caveat is that
> > spaces have to be used for indentation, not tabs.
> >
> > > Did I answer everything or was there anything I overlooked?
> >
> > There's the question of how the v1 format should look like. I'd propose
> > the following.
> >
> > {
> >     "version": 1
> >     "algorithms": [
> >         {
> >             "rpi.black_level":
> >             {
> >                 "black_level": 4096
> >             },
> >         },
> >         {
> >             "rpi.noise":
> >             {
> >                 "reference_constant": 0,
> >                 "reference_slope": 3.67
> >             },
> >         }
> >     ]
> > }
> >
> > As the contents of the "algorithms" key is a list, it is ordered, so we
> > keep the current feature of algorithm ordering without abusing a mapping
> > order.
> >
> > Another option would be
> >
> > {
> >     "version": 1
> >     "algorithms": [
> >         {
> >             "name": "rpi.black_level",
> >             "data": {
> >                 "black_level": 4096
> >             },
> >         },
> >         {
> >             "name": "rpi.noise",
> >             "data": {
> >                 "reference_constant": 0,
> >                 "reference_slope": 3.67
> >             },
> >         }
> >     ]
> > }
> >
> > I don't know what's better, if it's a good practice to make keys well
> > known (as in the second option, where an algorithm entry has two keys
> > that are known at compile time, "name" and "data") or if it's fine (or
> > even better) to make keys more dynamic (which then requires iterating
> > over the parsed object, as we don't know the key name in advance).
> 
> Hmm, both seem to have (a little bit of) extra "levels" of syntax in
> them.

That's right. Two new levels are introduced, one for versioning, and one
to store algorithms in a list instead of a dictionnary (this is what
will give us ordering).

> Would this work:
> 
> ...
>      "algorithms": [
>          {
>              "name": "rpi.black_level",
>              "black_level": 4096
>         },
> ...
> 
> If the json-reader just gives us a big blob maps/array then I'm
> guessing we can read the name first, make the correct C++ Algorithm
> object and then pass this same blob to the read function?

It would work, but I don't like it much, it mixes two different levels
into one, and could result in name clashes (for instance an algorithm
couldn't use "name" as a key in its data, it may not be a big deal
as-is, but it may be annoying if/when we introduce more common keys).

I've been trying to find literature about best practices for JSON or
YAML grammar design, but without luck :-(

> > > > > > On Mon, 13 Jun 2022 at 09:46, Laurent Pinchart wrote:
> > > > > > > On Mon, Jun 13, 2022 at 09:05:21AM +0100, Naushir Patuck wrote:
> > > > > > > > On Mon, 13 Jun 2022 at 07:18, Laurent Pinchart wrote:
> > > > > > > > > On Mon, Jun 13, 2022 at 01:22:19PM +0900, paul.elder at ideasonboard.com wrote:
> > > > > > > > > > On Sat, Jun 04, 2022 at 09:59:30PM +0300, Laurent Pinchart wrote:
> > > > > > > > > > > The order of items in a YAML dictionary may matter. Update the test to
> > > > > > > > > > > ensure that it is preserved. The test currently fails at the YamlParser
> > > > > > > > > >
> > > > > > > > > > My understanding is that YAML mappings are unordered [1] [2], and if
> > > > > > > > > > order in the mapping is significant, then either a sequence of mappings
> > > > > > > > > > [3] or flow mapping adjacent values [4] should be used.
> > > > > > > > >
> > > > > > > > > That's a very good point. [5] even mentions
> > > > > > > > >
> > > > > > > > > "while imposing a order on mapping keys is necessary for flattening YAML
> > > > > > > > > representations to a sequential access medium, this serialization detail
> > > > > > > > > must not be used to convey application level information."
> > > > > > > > >
> > > > > > > > > The same applies to JSON ([6]).
> > > > > > > > >
> > > > > > > > > Fixing this would require changing the syntax of the tuning files. It's
> > > > > > > > > inconvenient, but not doing so opens the door to more issues in the
> > > > > > > > > future :-S
> > > > > > > >
> > > > > > > > Given that the IPA does require ordering and we have been lucky with
> > > > > > > > Boost preserving order in the JSON parser, I think we probably ought
> > > > > > > > to specify ordering in the config file with a specific key.
> > > > > > >
> > > > > > > Sounds good to me.
> > > > > > >
> > > > > > > It could be done through a priority key in each algorithm, or by
> > > > > > > converting the mapping to a list. In YAML format, that would be moving
> > > > > > > from
> > > > > > >
> > > > > > > {
> > > > > > >     "rpi.black_level":
> > > > > > >     {
> > > > > > >         "black_level": 4096
> > > > > > >     },
> > > > > > >     "rpi.noise":
> > > > > > >     {
> > > > > > >         "reference_constant": 0,
> > > > > > >         "reference_slope": 3.67
> > > > > > >     },
> > > > > > > }
> > > > > > >
> > > > > > > to
> > > > > > >
> > > > > > > [
> > > > > > >     {
> > > > > > >         "rpi.black_level":
> > > > > > >         {
> > > > > > >             "black_level": 4096
> > > > > > >         },
> > > > > > >     },
> > > > > > >     {
> > > > > > >         "rpi.noise":
> > > > > > >         {
> > > > > > >             "reference_constant": 0,
> > > > > > >             "reference_slope": 3.67
> > > > > > >         },
> > > > > > >     }
> > > > > > > ]
> > > > > > >
> > > > > > > In YAML format, it would translate as a move from
> > > > > > >
> > > > > > > rpi.black_level:
> > > > > > >   black_level: 4096
> > > > > > > rpi.noise:
> > > > > > >   reference_constant: 0
> > > > > > >   reference_slope: 3.67
> > > > > > >
> > > > > > > to
> > > > > > >
> > > > > > > - rpi.black_level:
> > > > > > >     black_level: 4096
> > > > > > > - rpi.noise:
> > > > > > >     reference_constant: 0
> > > > > > >     reference_slope: 3.67
> > > > > > >
> > > > > > > As we have to change the tuning files, I'd like to take this as an
> > > > > > > opportunity to add a format version. Something along those lines maybe ?
> > > > > > >
> > > > > > > version: 1.0
> > > > > > > algorithms:
> > > > > > >   - rpi.black_level:
> > > > > > >       black_level: 4096
> > > > > > >   - rpi.noise:
> > > > > > >       reference_constant: 0
> > > > > > >       reference_slope: 3.67
> > > > > > >
> > > > > > > And while we're discussing this, does someone know about best practice
> > > > > > > rules to design JSON/YAML grammars ? I've been wondering for a long time
> > > > > > > if the following grammar would have any advantage:
> > > > > > >
> > > > > > > version: 1.0
> > > > > > > algorithms:
> > > > > > >   - name: rpi.black_level
> > > > > > >     data:
> > > > > > >       black_level: 4096
> > > > > > >   - name: rpi.noise:
> > > > > > >     data:
> > > > > > >       reference_constant: 0
> > > > > > >       reference_slope: 3.67
> > > > > > >
> > > > > > > > When I get a
> > > > > > > > change, I'll look to add a patch to allow this on the existing codebase,
> > > > > > > > and this series ought to "just work" after that.
> > > > > > > >
> > > > > > > > > > [1] https://yaml.org/spec/1.2.2/#mapping
> > > > > > > > > > [2] https://yaml.org/spec/1.2.2/#3221-mapping-key-order
> > > > > > > > > > [3] https://yaml.org/spec/1.2.2/#example-ordered-mappings
> > > > > > > > > > [4] https://yaml.org/spec/1.2.2/#example-flow-mapping-adjacent-values
> > > > > > > > >
> > > > > > > > > [5] https://yaml.org/spec/1.2.2/#32-information-models
> > > > > > > > > [6] https://www.json.org/json-en.html
> > > > > > > > >
> > > > > > > > > > > doesn't correctly preserve the order, this will be fixed by the next
> > > > > > > > > > > commit.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Laurent Pinchart <laurent.pinchart at ideasonboard.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  test/yaml-parser.cpp | 7 +++----
> > > > > > > > > > >  1 file changed, 3 insertions(+), 4 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/test/yaml-parser.cpp b/test/yaml-parser.cpp
> > > > > > > > > > > index 5ff4c3236dbf..582c9caed836 100644
> > > > > > > > > > > --- a/test/yaml-parser.cpp
> > > > > > > > > > > +++ b/test/yaml-parser.cpp
> > > > > > > > > > > @@ -29,8 +29,8 @@ static const string testYaml =
> > > > > > > > > > >     "  - Mary\n"
> > > > > > > > > > >     "dictionary:\n"
> > > > > > > > > > >     "  a: 1\n"
> > > > > > > > > > > -   "  b: 2\n"
> > > > > > > > > > >     "  c: 3\n"
> > > > > > > > > > > +   "  b: 2\n"
> > > > > > > > > > >     "level1:\n"
> > > > > > > > > > >     "  level2:\n"
> > > > > > > > > > >     "    - [1, 2]\n"
> > > > > > > > > > > @@ -428,7 +428,6 @@ protected:
> > > > > > > > > > >             }
> > > > > > > > > > >
> > > > > > > > > > >             auto memeberNames = dictObj.memberNames();
> > > > > > > > > > > -           sort(memeberNames.begin(), memeberNames.end());
> > > > > > > > > > >
> > > > > > > > > > >             if (memeberNames.size() != 3) {
> > > > > > > > > > >                     cerr << "Dictionary object fail to extra member names" << std::endl;
> > > > > > > > > > > @@ -436,8 +435,8 @@ protected:
> > > > > > > > > > >             }
> > > > > > > > > > >
> > > > > > > > > > >             if (memeberNames[0] != "a" ||
> > > > > > > > > > > -               memeberNames[1] != "b" ||
> > > > > > > > > > > -               memeberNames[2] != "c") {
> > > > > > > > > > > +               memeberNames[1] != "c" ||
> > > > > > > > > > > +               memeberNames[2] != "b") {
> > > > > > > > > > >                     cerr << "Dictionary object fail to parse member names" << std::endl;
> > > > > > > > > > >                     return TestFail;
> > > > > > > > > > >             }

-- 
Regards,

Laurent Pinchart