[libcamera-devel] [PATCH 1/6] utils: Add function to convert string to UCS-2
Jacopo Mondi
jacopo at jmondi.org
Fri Jan 15 15:19:53 CET 2021
Hi Paul,
I read a few things around, but character encoding seems a very
complex subject, so I mostly have minor comments here
On Thu, Jan 14, 2021 at 07:40:30PM +0900, Paul Elder wrote:
> GPSProcessingMethod and UserComment in EXIF tags can be in Unicode, but
>From what I've read, even referring to Unicode might be mis-leading as
it includes a number of different encodings. Do the EXIF specification
mention Unicode or any other more specific standard ?
> are recommended to be in UCS-2. Add a function in utils to help with
> this.
>
> Signed-off-by: Paul Elder <paul.elder at ideasonboard.com>
> ---
> include/libcamera/internal/utils.h | 2 ++
> src/libcamera/utils.cpp | 30 ++++++++++++++++++++++++++++++
> 2 files changed, 32 insertions(+)
>
> diff --git a/include/libcamera/internal/utils.h b/include/libcamera/internal/utils.h
> index f08134af..aa9cc236 100644
> --- a/include/libcamera/internal/utils.h
> +++ b/include/libcamera/internal/utils.h
> @@ -35,6 +35,8 @@ const char *basename(const char *path);
> char *secure_getenv(const char *name);
> std::string dirname(const std::string &path);
>
> +std::vector<uint8_t> string_to_c16(const std::string &str, bool le);
> +
> template<typename T>
> std::vector<typename T::key_type> map_keys(const T &map)
> {
> diff --git a/src/libcamera/utils.cpp b/src/libcamera/utils.cpp
> index e90375ae..89cb0f73 100644
> --- a/src/libcamera/utils.cpp
> +++ b/src/libcamera/utils.cpp
> @@ -17,6 +17,7 @@
> #include <string.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> +#include <uchar.h>
> #include <unistd.h>
>
> /**
> @@ -122,6 +123,35 @@ std::string dirname(const std::string &path)
> return path.substr(0, pos + 1);
> }
>
> +/**
> + * \brief Convert string to byte array of UCS-2
a string to a byte array of UCS-2 encoded code point
But I wonder, the encoding used to represent the characters in the
string I assume depends on some locale, do they ?
> + * \param[in] str String to convert
The string to convert
> + * \param[in] le Little-endian (false for Big-endian)
The desired byte-endianess of the converted byte array.
An enum would not hurt, but it's not strictly required.
> + *
> + * \return Byte array of UCS-2 representation of \a str, without null-terminator
While it is still not clear to me the distinction between UTF-16 and
UCS-2 and I get the two are actually converging over time, the
documentaion of std::mbrtoc16 explicitely mentions UTF-16.
I guess it again depends on the encoding of \a str (which again
depends on the selected locale ?)
> + */
> +std::vector<uint8_t> string_to_c16(const std::string &str, bool le)
I wonder why we use snake_case in utils ? maybe to mimic STL ?
> +{
> + std::mbstate_t state{};
> + char16_t c16;
> + const char *ptr = &str[0], *end = &str[0] + str.size();
One variable per line and maybe
const char *end = &str.back()
> +
> + std::vector<uint8_t> ret;
I would reserve str.size() * 2
Even if I get it's not necessarly that every char in str gets expanded
to two bytes
> + while (size_t rc = mbrtoc16(&c16, ptr, end - ptr + 1, &state)) {
std::mbrtoc16 ?
How come the compiler does not complain ?
> + if (rc == static_cast<size_t>(-2) ||
> + rc == static_cast<size_t>(-1))
> + break;
> +
> + ret.push_back(le ? (c16 & 0xff) : ((c16 >> 8) & 0xff));
> + ret.push_back(le ? ((c16 >> 8) & 0xff) : (c16 & 0xff));
I think you can avoid & 0xff as being ret an array of uint8_t c16 gets
automatically converted, does it ?
> +
> + if (rc > 0)
> + ptr += rc;
> + }
> +
> + return ret;
> +}
> +
> /**
> * \fn std::vector<typename T::key_type> map_keys(const T &map)
> * \brief Retrieve the keys of a std::map<>
> --
> 2.27.0
>
> _______________________________________________
> libcamera-devel mailing list
> libcamera-devel at lists.libcamera.org
> https://lists.libcamera.org/listinfo/libcamera-devel
More information about the libcamera-devel
mailing list