[libcamera-devel] [PATCH v4 2/7] libcamera: utils: Add method to strip Unicode characters
Niklas Söderlund
niklas.soderlund at ragnatech.se
Sun Aug 16 14:29:50 CEST 2020
Hi Laurent,
Thanks for your feedback.
On 2020-08-16 14:58:13 +0300, Laurent Pinchart wrote:
> On Fri, Aug 14, 2020 at 12:37:17AM +0200, Niklas Söderlund wrote:
> > Add method that strips non-ASCII characters from a string.
> >
> > Signed-off-by: Niklas Söderlund <niklas.soderlund at ragnatech.se>
> > ---
> > * Changes since v3
> > - Fix spelling in comment.
> > - Rename to toAscii()
> > ---
> > include/libcamera/internal/utils.h | 2 ++
> > src/libcamera/utils.cpp | 21 +++++++++++++++++++++
> > 2 files changed, 23 insertions(+)
> >
> > diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h
> > index 45cd6f120c51586b..b27f5a2323552058 100644
> > --- a/include/libcamera/internal/utils.h
> > +++ b/include/libcamera/internal/utils.h
> > @@ -197,6 +197,8 @@ private:
> >
> > details::StringSplitter split(const std::string &str, const std::string &delim);
> >
> > +std::string toAscii(const std::string &str);
> > +
> > std::string libcameraBuildPath();
> > std::string libcameraSourcePath();
> >
> > diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp
> > index 615df46ac142a2a9..726b84bfbae53ff2 100644
> > --- a/src/libcamera/utils.cpp
> > +++ b/src/libcamera/utils.cpp
> > @@ -342,6 +342,27 @@ details::StringSplitter split(const std::string &str, const std::string &delim)
> > return details::StringSplitter(str, delim);
> > }
> >
> > +/**
> > + * \brief Strip all Unicode characters from a string
> > + * \param[in] str The string to strip
> > + *
> > + * Strip all non-ASCII characters form a string. A Unicode character that spans
> > + * multiple bytes (and therefore is not also an ASCII character) may be
>
> "Unicode character" refers to the code points, while "spans multiply
> bytes" refers to encodings. As std::string has no notion of encoding,
> this documentation should tell what the expected input encoding is.
>
> > + * identified by the fact that its most significant bit is always set.
>
> Open question, do we want to remove non-ASCII characters, or replaced
> them (with a ".", "?" or something else) ?
I have not strong opinion, dropping them feels more natural to me but
I'm open to substitution. What do other think?
>
> > + *
> > + * \todo When switching to C++ 20 use std::remove_if.
>
> That would be less efficient :-)
>
> > + *
> > + * \return An ASCII string
> > + */
> > +std::string toAscii(const std::string &str)
>
> toAscii() makes it sound the function converts the string, while it
> really removes characters.
I originally called this stripUnicode() would that name work better for
you?
>
> > +{
> > + std::string ret;
>
> ret.reserve(str.size());
>
> > + for (const char &c : str)
> > + if (!(c & 0x80))
> > + ret += c;
> > + return ret;
> > +}
> > +
> > /**
> > * \brief Check if libcamera is installed or not
> > *
>
> --
> Regards,
>
> Laurent Pinchart
--
Regards,
Niklas Söderlund
More information about the libcamera-devel
mailing list