[libcamera-devel] [PATCH v4 2/7] libcamera: utils: Add method to strip Unicode characters

Niklas Söderlund niklas.soderlund at ragnatech.se
Sun Aug 16 14:29:50 CEST 2020


Hi Laurent,

Thanks for your feedback.

On 2020-08-16 14:58:13 +0300, Laurent Pinchart wrote:
> On Fri, Aug 14, 2020 at 12:37:17AM +0200, Niklas Söderlund wrote:
> > Add method that strips non-ASCII characters from a string.
> > 
> > Signed-off-by: Niklas Söderlund <niklas.soderlund at ragnatech.se>
> > ---
> > * Changes since v3
> > - Fix spelling in comment.
> > - Rename to toAscii()
> > ---
> >  include/libcamera/internal/utils.h |  2 ++
> >  src/libcamera/utils.cpp            | 21 +++++++++++++++++++++
> >  2 files changed, 23 insertions(+)
> > 
> > diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h
> > index 45cd6f120c51586b..b27f5a2323552058 100644
> > --- a/include/libcamera/internal/utils.h
> > +++ b/include/libcamera/internal/utils.h
> > @@ -197,6 +197,8 @@ private:
> >  
> >  details::StringSplitter split(const std::string &str, const std::string &delim);
> >  
> > +std::string toAscii(const std::string &str);
> > +
> >  std::string libcameraBuildPath();
> >  std::string libcameraSourcePath();
> >  
> > diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp
> > index 615df46ac142a2a9..726b84bfbae53ff2 100644
> > --- a/src/libcamera/utils.cpp
> > +++ b/src/libcamera/utils.cpp
> > @@ -342,6 +342,27 @@ details::StringSplitter split(const std::string &str, const std::string &delim)
> >  	return details::StringSplitter(str, delim);
> >  }
> >  
> > +/**
> > + * \brief Strip all Unicode characters from a string
> > + * \param[in] str The string to strip
> > + *
> > + * Strip all non-ASCII characters form a string. A Unicode character that spans
> > + * multiple bytes (and therefore is not also an ASCII character) may be
> 
> "Unicode character" refers to the code points, while "spans multiply
> bytes" refers to encodings. As std::string has no notion of encoding,
> this documentation should tell what the expected input encoding is.
> 
> > + * identified by the fact that its most significant bit is always set.
> 
> Open question, do we want to remove non-ASCII characters, or replaced
> them (with a ".", "?" or something else) ?

I have not strong opinion, dropping them feels more natural to me but 
I'm open to substitution. What do other think?

> 
> > + *
> > + * \todo When switching to C++ 20 use std::remove_if.
> 
> That would be less efficient :-)
> 
> > + *
> > + * \return An ASCII string
> > + */
> > +std::string toAscii(const std::string &str)
> 
> toAscii() makes it sound the function converts the string, while it
> really removes characters.

I originally called this stripUnicode() would that name work better for 
you?

> 
> > +{
> > +	std::string ret;
> 
> 	ret.reserve(str.size());
> 
> > +	for (const char &c : str)
> > +		if (!(c & 0x80))
> > +			ret += c;
> > +	return ret;
> > +}
> > +
> >  /**
> >   * \brief Check if libcamera is installed or not
> >   *
> 
> -- 
> Regards,
> 
> Laurent Pinchart

-- 
Regards,
Niklas Söderlund


More information about the libcamera-devel mailing list