[PATCH v5 1/1] libcamera: add method to set thread affinity
Cheng-Hao Yang
chenghaoyang at chromium.org
Thu Oct 24 10:16:03 CEST 2024
Hi Laurent,
On Thu, Oct 24, 2024 at 7:43 AM Laurent Pinchart
<laurent.pinchart at ideasonboard.com> wrote:
>
> On Wed, Oct 23, 2024 at 02:46:38PM +0800, Cheng-Hao Yang wrote:
> > On Wed, Oct 23, 2024 at 6:49 AM Laurent Pinchart wrote:
> > > On Thu, Oct 17, 2024 at 01:59:58PM +0800, Cheng-Hao Yang wrote:
> > > > Hi Laurent,
> > > >
> > > > I'll update a new version when all changes are applied.
> > >
> > > Yes, let's finalize the discussion first :-)
> > >
> > > > On Thu, Oct 17, 2024 at 6:08 AM Laurent Pinchart wrote:
> > > > > On Wed, Oct 16, 2024 at 08:43:44AM +0000, Harvey Yang wrote:
> > > > > > From: Han-Lin Chen <hanlinchen at chromium.org>
> > > > > >
> > > > > > Add method to set thread affinity to Thread class.
> > > > > >
> > > > > > Signed-off-by: Han-Lin Chen <hanlinchen at chromium.org>
> > > > > > Co-developed-by: Harvey Yang <chenghaoyang at chromium.org>
> > > > > > Signed-off-by: Harvey Yang <chenghaoyang at chromium.org>
> > > > > > ---
> > > > > > include/libcamera/base/thread.h | 5 ++++
> > > > > > src/libcamera/base/thread.cpp | 46 +++++++++++++++++++++++++++++++++
> > > > > > 2 files changed, 51 insertions(+)
> > > > > >
> > > > > > diff --git a/include/libcamera/base/thread.h b/include/libcamera/base/thread.h
> > > > > > index 4f33de63d..3209d4f7c 100644
> > > > > > --- a/include/libcamera/base/thread.h
> > > > > > +++ b/include/libcamera/base/thread.h
> > > > > > @@ -15,6 +15,7 @@
> > > > > >
> > > > > > #include <libcamera/base/message.h>
> > > > > > #include <libcamera/base/signal.h>
> > > > > > +#include <libcamera/base/span.h>
> > > > > > #include <libcamera/base/utils.h>
> > > > > >
> > > > > > namespace libcamera {
> > > > > > @@ -35,6 +36,8 @@ public:
> > > > > > void exit(int code = 0);
> > > > > > bool wait(utils::duration duration = utils::duration::max());
> > > > > >
> > > > > > + int setThreadAffinity(const Span<const unsigned int> &cpus);
> > > > > > +
> > > > > > bool isRunning();
> > > > > >
> > > > > > Signal<> finished;
> > > > > > @@ -54,6 +57,8 @@ private:
> > > > > > void startThread();
> > > > > > void finishThread();
> > > > > >
> > > > > > + void setThreadAffinityInternal();
> > > > > > +
> > > > > > void postMessage(std::unique_ptr<Message> msg, Object *receiver);
> > > > > > void removeMessages(Object *receiver);
> > > > > >
> > > > > > diff --git a/src/libcamera/base/thread.cpp b/src/libcamera/base/thread.cpp
> > > > > > index 8735670b8..5dc787dba 100644
> > > > > > --- a/src/libcamera/base/thread.cpp
> > > > > > +++ b/src/libcamera/base/thread.cpp
> > > > > > @@ -9,6 +9,7 @@
> > > > > >
> > > > > > #include <atomic>
> > > > > > #include <list>
> > > > > > +#include <optional>
> > > > > > #include <sys/syscall.h>
> > > > > > #include <sys/types.h>
> > > > > > #include <unistd.h>
> > > > > > @@ -128,6 +129,8 @@ private:
> > > > > > int exitCode_;
> > > > > >
> > > > > > MessageQueue messages_;
> > > > > > +
> > > > > > + std::optional<cpu_set_t> cpuset_ = std::nullopt;
> > > > >
> > > > > Is there a need to initialize this to std::nullopt, as that's what the
> > > > > default constructor does ?
> > > >
> > > > Removed.
> > > >
> > > > > > };
> > > > > >
> > > > > > /**
> > > > > > @@ -254,6 +257,8 @@ void Thread::start()
> > > > > > data_->exit_.store(false, std::memory_order_relaxed);
> > > > > >
> > > > > > thread_ = std::thread(&Thread::startThread, this);
> > > > > > +
> > > > > > + setThreadAffinityInternal();
> > > > > > }
> > > > > >
> > > > > > void Thread::startThread()
> > > > > > @@ -410,6 +415,47 @@ bool Thread::wait(utils::duration duration)
> > > > > > return hasFinished;
> > > > > > }
> > > > > >
> > > > > > +/**
> > > > > > + * \brief Set the CPU affinity mask of the thread
> > > > > > + * \param[in] cpus The list of CPU indices that the thread is set affinity to
> > > > > > + *
> > > > > > + * The CPU indices should be within [0, std::thread::hardware_concurrency()).
> > > > > > + * If any index is invalid, no cpu indices will be set.
> > > > >
> > > > > * If any index is invalid, this function won't modify the thread affinity and
> > > > > * will return an error.
> > > >
> > > > Done.
> > > >
> > > > > > + *
> > > > > > + * \return 0 if all indices are valid, -EINVAL otherwise
> > > > > > + */
> > > > > > +int Thread::setThreadAffinity(const Span<const unsigned int> &cpus)
> > > > > > +{
> > > > > > + const unsigned int numCpus = std::thread::hardware_concurrency();
> > > > >
> > > > > https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency
> > > > > says:
> > > > >
> > > > > Returns the number of concurrent threads supported by the
> > > > > implementation. The value should be considered only a hint.
> > > > >
> > > > > [snip]
> > > > >
> > > > > Return value
> > > > >
> > > > > Number of concurrent threads supported. If the value is not well
> > > > > defined or not computable, returns 0.
> > > > >
> > > > > libcxx seems to have the following implementation:
> > > > >
> > > > > unsigned thread::hardware_concurrency() noexcept {
> > > > > #if defined(_SC_NPROCESSORS_ONLN)
> > > > > long result = sysconf(_SC_NPROCESSORS_ONLN);
> > > > > // sysconf returns -1 if the name is invalid, the option does not exist or
> > > > > // does not have a definite limit.
> > > > > // if sysconf returns some other negative number, we have no idea
> > > > > // what is going on. Default to something safe.
> > > > > if (result < 0)
> > > > > return 0;
> > > > > return static_cast<unsigned>(result);
> > > > > #elif defined(_LIBCPP_WIN32API)
> > > > > SYSTEM_INFO info;
> > > > > GetSystemInfo(&info);
> > > > > return info.dwNumberOfProcessors;
> > > > > #else // defined(CTL_HW) && defined(HW_NCPU)
> > > > > // TODO: grovel through /proc or check cpuid on x86 and similar
> > > > > // instructions on other architectures.
> > > > > # if defined(_LIBCPP_WARNING)
> > > > > _LIBCPP_WARNING("hardware_concurrency not yet implemented")
> > > > > # else
> > > > > # warning hardware_concurrency not yet implemented
> > > > > # endif
> > > > > return 0; // Means not computable [thread.thread.static]
> > > > > #endif // defined(CTL_HW) && defined(HW_NCPU)
> > > > > }
> > > > >
> > > > > while libstdc++ uses
> > > > >
> > > > > #if defined(_GLIBCXX_USE_GET_NPROCS)
> > > > > # include <sys/sysinfo.h>
> > > > > # define _GLIBCXX_NPROCS get_nprocs()
> > > > > #elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP)
> > > > > # define _GLIBCXX_NPROCS pthread_num_processors_np()
> > > > > #elif defined(_GLIBCXX_USE_SYSCTL_HW_NCPU)
> > > > > # include <stddef.h>
> > > > > # include <sys/sysctl.h>
> > > > > static inline int get_nprocs()
> > > > > {
> > > > > int count;
> > > > > size_t size = sizeof(count);
> > > > > int mib[] = { CTL_HW, HW_NCPU };
> > > > > if (!sysctl(mib, 2, &count, &size, NULL, 0))
> > > > > return count;
> > > > > return 0;
> > > > > }
> > > > > # define _GLIBCXX_NPROCS get_nprocs()
> > > > > #elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)
> > > > > # include <unistd.h>
> > > > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN)
> > > > > #elif defined(_GLIBCXX_USE_SC_NPROC_ONLN)
> > > > > # include <unistd.h>
> > > > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN)
> > > > > #else
> > > > > # define _GLIBCXX_NPROCS 0
> > > > > #endif
> > > > >
> > > > > unsigned int
> > > > > thread::hardware_concurrency() noexcept
> > > > > {
> > > > > int __n = _GLIBCXX_NPROCS;
> > > > > if (__n < 0)
> > > > > __n = 0;
> > > > > return __n;
> > > > > }
> > > > >
> > > > > get_nprocs() and sysconf(_SC_NPROCESSORS_ONLN) are identical (the latter
> > > > > is a wrapper around the former). So it seems that, while the C++ API
> > > > > doesn't guarantee that thread::hardware_concurrency() will provide an
> > > > > accurate value, the implementations we have will do the right thing.
> > > > > They're however a bit expensive, as they open a file in /sys, but that's
> > > > > probably the best we can do.
> > > > >
> > > > > > +
> > > > > > + MutexLocker locker(data_->mutex_);
> > > > > > + data_->cpuset_ = cpu_set_t();
> > > > > > + CPU_ZERO(&data_->cpuset_.value());
> > > > > > +
> > > > > > + for (const unsigned int &cpu : cpus) {
> > > > > > + if (cpu >= numCpus) {
> > > > > > + LOG(Thread, Error) << "Ignore an invalid cpu index: " << cpu;
> > > > >
> > > > > It's not ignored anymore
> > > > >
> > > > > LOG(Thread, Error)
> > > > > << "Invalid CPU " << cpu << " for thread affinity";
> > > > >
> > > >
> > > > Right, updated.
> > > >
> > > > > > + return -EINVAL;
> > > > > > + }
> > > > > > +
> > > > > > + CPU_SET(cpu, &data_->cpuset_.value());
> > > > > > + }
> > > > > > +
> > > > > > + if (data_->running_)
> > > > > > + setThreadAffinityInternal();
> > > > > > +
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > +void Thread::setThreadAffinityInternal()
> > > > > > +{
> > > > > > + if (!data_->cpuset_)
> > > > > > + return;
> > > > > > +
> > > > > > + const cpu_set_t &cpuset = data_->cpuset_.value();
> > > > > > + pthread_setaffinity_np(thread_.native_handle(), sizeof(cpuset), &cpuset);
> > > > >
> > > > > With this implementation, it's not possible to set the thread's CPU
> > > > > affinity before starting the thread. That's a limitation of the C++
> > > > > std::thread API, it could be done if we used the native pthread API
> > > > > instead. Is that a problem ? Could you actually point me to code that
> > > > > uses Thread::setThreadAffinity() ?
> > > >
> > > > There are two main usages in mtkisp7:
> > > > 1. Setting pipeline handler's thread [1].
> > >
> > > Heads up on that one, we probably want to handle this in the libcamera
> > > core. Multiple pipeline handlers can run on the same platform, currently
> > > sharing a single camera manager thread. If they all tried to set thread
> > > affinity, that would be messy. I'd like to explore how we could
> > > coordinate that in a cleaner way.
> >
> > Yes, your concern makes a lot of sense. If we want to process requests
> > on a different CPU core in pipeline handlers, would it be better to create
> > another thread in each pipeline handler? That involves an extra context
> > switch per frame though...
>
> One thread per pipeline handler is an option, yes. I don't think it
> requires a context switch though, we don't have to go from the
> application thread to the camera manager thread to the pipeline handler
> thread, we could bypass the camera manager thread.
Ah right, libcamera::Camera APIs are mostly thread-safe, I forgot.
>
> Another option that was floated around was to use a thread pool in case
> one thread per pipeline handler would create too many threads. It's very
> theoretical at this point whether or not this would help improve
> performance.
Yeah true. We have much work to do in this approach though.
>
> I'm fine keeping a single internal thread in libcamera, but I'd like its
> affinity to be managed in a way that doesn't let different pipeline
> handlers compete to set different affinities.
Definitely :)
Let me know if I should prevent setting thread affinity in a thread
that's already started.
>
> > > > 2. Setting IPA's two new threads [2].
> > > >
> > > > The first one sets thread affinity to a thread that has already been
> > > > started, while the second one set new threads.
> > >
> > > Out of curiosity, have you measured the difference in performance ? Is
> > > it a matter of pinning the threads to big cores to ensure they have
> > > enough CPU time ? Or a matter of avoiding latency caused by migration ?
> > > Or something else ?
> >
> > We run CTS, and the current config passes the performance tests with
> > sufficient fps and low enough latencies.
> >
> > Without this functionality though, I think we can still somehow pass CTS
> > with more retries.
>
> It would be interesting to see what kind of time targets are missed
> without this change.
For example:
```
android.camera.cts
android.hardware.camera2.cts.RecordingTest#testRecordingFromPersistentSurface[1]
FAILURE: junit.framewo
rk.AssertionFailedError: Camera 1: Video frame drop rate too high:
6.976744%, tolerance 5.000000%. Video size: 1280x720, expectedDuration
[2866.666504,2866.666504], expectedFrameDur
ation 33.333332, frameDropCnt 6, frameCount 86
```
>
> > If it's controversial, we can skip this for now.
>
> No, it's not controversial, I'd just like to design it right :-)
>
> > We can also allow setting thread affinity when the thread hasn't been
> > started yet, which at least fits the second use case in mtkisp7.
> >
> > > > By "the native pthread API", you mean `pthread_attr_setaffinity_np`?
> > > > This means that we need to implement libcamera::Thread with pthread,
> > > > instead of std::thread?
> > >
> > > Yes. It doesn't mean we have to do that, I'm only asking if you think
> > > that would be a good idea, or overkill.
> >
> > I find it a bit an overkill :)
> > The approach below should be much easier.
>
> Fine with me.
>
> > > > [1]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/libcamera/pipeline/mtkisp7/mtkisp7.cpp;l=511
> > > > [2]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/ipa/mtkisp7/mtkisp7.cpp;l=119
> > > >
> > > > > Even without the ability to set the affinity before starting the thread,
> > > > > we could set it from within the thread before calling the run()
> > > > > function. This would avoid the main workload being run on incorrect
> > > > > CPUs. Is that something that would be useful ?
> > > >
> > > > You mean calling `setThreadAffinityInterna()l` in `startThread()`
> > > > (before `run()`), instead of at the end of `start()`?
> > >
> > > Yes. Again, I'm not saing it has to be done, just exploring options.
> >
> > That makes a lot of sense. It works on ciri as well in my trial.
>
> Will you submit a new patch with this ?
Sure. Uploaded.
BR,
Harvey
>
> > > > > Regardless of which option we implement, Thread::setThreadAffinity()
> > > > > should document what happens if it's called before calling
> > > > > Thread::start().
> > > > >
> > > > > > +}
> > > > > > +
> > > > > > /**
> > > > > > * \brief Check if the thread is running
> > > > > > *
>
> --
> Regards,
>
> Laurent Pinchart
More information about the libcamera-devel
mailing list