[RFC PATCH 1/8] Documentation: coding-style: Document usage of classes for strings

Laurent Pinchart laurent.pinchart at ideasonboard.com
Mon Dec 16 00:01:59 CET 2024


C++ has three main ways to handle strings: std::string, std::string_view
and C-style strings. Document their pros and cons and the rules
governing their usage in libcamera.

Signed-off-by: Laurent Pinchart <laurent.pinchart at ideasonboard.com>
---
 Documentation/coding-style.rst | 98 ++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/Documentation/coding-style.rst b/Documentation/coding-style.rst
index 6ac3a4a0d5175118..dc4b093b58a99006 100644
--- a/Documentation/coding-style.rst
+++ b/Documentation/coding-style.rst
@@ -269,6 +269,104 @@ the compiler select the right function. This avoids potential errors such as
 calling abs(int) with a float argument, performing an unwanted implicit integer
 conversion. For this reason, cmath is preferred over math.h.
 
+Strings
+~~~~~~~
+
+This section focusses on strings as a sequence of characters represented by the
+`char` type, as those are the only strings used in libcamera. The principles
+are however equally applicable to other types of strings.
+
+C++ includes multiple standard ways to represent and handle strings. Each of
+them have pros and cons and different usage patterns.
+
+1. C-style strings
+
+   C-style strings are null-terminated arrays of characters of type `char`.
+   They are represented by a `char *` pointing to the first character. C string
+   literals (`"Hello world!"`) produce read-only global null-terminated arrays
+   of type `const char`.
+
+   Handling strings through `char *` relies on assumptions that are not
+   enforced at compile time: the memory must not be freed as long as the
+   pointer remains valid, and the string must be null-terminated. This causes a
+   risk of use-after-free or buffer out-of-bounds reads. Furthermore, as the
+   size of the underlying memory is not bundled with the pointer, handling
+   writable strings as a `char *` risks buffer out-of-bounds writes.
+
+2. `std::string` class
+
+   The `std::string` class is the main C++ data type to represent strings. The
+   class holds the string data as a null-terminated array of characters, as
+   well as the length of the array. `std::string` literals (`"Hello world!"s`)
+   converts the character array literal into a temporary `std::string` instance
+   whose lifetime ends with the statement.
+
+   Usage of `std::string` makes string handling safer and convenient thanks to
+   the integration with the rest of the C++ standard library. This comes at a
+   cost, as making copies of the class copies the underlying data, requiring
+   dynamic allocation of memory. This is partially offset by usage of move
+   constructors or assignment operators.
+
+3. `std::string_view` class
+
+   This newer addition to the C++ standard library is a non-owning reference to
+   a characters array. Like the `std::string` class, the `std::string_view`
+   stores a pointer to the array and a length, but unlike C-style strings and
+   the `std::string` class, the array is not guaranteed to be null-terminated.
+   `std::string_view` literals (`"Hello world!"sv`) create a temporary
+   `std::string_view` instance whose lifetime ends with the statement, but the
+   underlying character array has global static storage and lifetime.
+
+   String views are useful to represent in a single way strings with
+   heterogenous underlying storage, as they can be constructed from a `char *`
+   or a `std::string`. They reduce the risk of buffer out-of-bounds accesses as
+   they carry the array length, and they speed up handling of substrings as
+   they don't cause copies, unlike `std::string`. As the string views store a
+   borrowed reference to the underlying storage, they are prone to
+   use-after-free issues. The lack of a null terminator furthermore makes them
+   unsafe to handle strings that need to be passed to C functions that assume
+   null-terminated strings (such as most of the C standard library functions).
+
+   C++17 lacks many functions that make `std::string_view` usage convenient.
+   This includes `operator+()` to concatenate a `std::string` and a
+   `std::string_view`, as well as heterogenous lookup functions for containers
+   (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2363r5.html).
+   libcamera works around the former by defining the missing operators, and the
+   latter by using the `find()` function instead of `at()` or `operator[]()`.
+   Those issues are properly addressed in C++26.
+
+With those pros, cons are caveats in mind, libcamera follows a set of rules
+that governs selection of appropriate string types for function arguments:
+
+* If the function is internal to a compilation unit, and all callers use the
+  same string type for the function argument, use that type (by reference or
+  value, as appropriate).
+* If the function needs to modify the string, make a copy, or convert it to a
+  `std::string` for any other reason, pass a `std::string` by value. Use
+  `std::move()` in the caller if the string is not needed after the function
+  call, as well as inside the function to move the string to a local copy if
+  needed. This minimizes the number of copies in all use cases.
+* If the function needs to pass the string to a C function that takes a
+  `const char *`, and the caller may reasonably use a C string literal, pass a
+  `const char *`. This avoids copies when the caller uses a `char *` pointer or
+  a C string literal, and avoids buffer out-of-bound reads that could occur with
+  a non null-terminated `std::string_view`.
+* If the function only needs to pass the string to other functions that take a
+  `const std::string &` reference, pass a `const std::string &`. This minimizes
+  the construction of `std::string` instances when the caller already holds an
+  instance.
+* If the function only reads from the string, doesn't rely on it being
+  null-terminated, and can reasonably be called with either C strings, C string
+  literals, or `std::string` instances, pass a `std::string_view` instance. This
+  improves safety of the code without impacting performance.
+* Do not use `std::string_view` in public API function parameters. Previous
+  rules showed that selection of the optimal string type for function
+  parameters depends on the internal implementation of the function, which
+  could change over time. This conflicts with the stability requirements of the
+  libcamera public API. Furthermore, as C++17 makes `std::string_view` usage
+  inconvenient in many cases, we don't want to force that inconvenience upon
+  libcamera users.
+
 
 Documentation
 -------------
-- 
Regards,

Laurent Pinchart



More information about the libcamera-devel mailing list