[PATCH v5 1/1] libcamera: add method to set thread affinity
Cheng-Hao Yang
chenghaoyang at chromium.org
Wed Oct 23 08:46:38 CEST 2024
Hi Laurent,
On Wed, Oct 23, 2024 at 6:49 AM Laurent Pinchart
<laurent.pinchart at ideasonboard.com> wrote:
>
> On Thu, Oct 17, 2024 at 01:59:58PM +0800, Cheng-Hao Yang wrote:
> > Hi Laurent,
> >
> > I'll update a new version when all changes are applied.
>
> Yes, let's finalize the discussion first :-)
>
> > On Thu, Oct 17, 2024 at 6:08 AM Laurent Pinchart wrote:
> > > On Wed, Oct 16, 2024 at 08:43:44AM +0000, Harvey Yang wrote:
> > > > From: Han-Lin Chen <hanlinchen at chromium.org>
> > > >
> > > > Add method to set thread affinity to Thread class.
> > > >
> > > > Signed-off-by: Han-Lin Chen <hanlinchen at chromium.org>
> > > > Co-developed-by: Harvey Yang <chenghaoyang at chromium.org>
> > > > Signed-off-by: Harvey Yang <chenghaoyang at chromium.org>
> > > > ---
> > > > include/libcamera/base/thread.h | 5 ++++
> > > > src/libcamera/base/thread.cpp | 46 +++++++++++++++++++++++++++++++++
> > > > 2 files changed, 51 insertions(+)
> > > >
> > > > diff --git a/include/libcamera/base/thread.h b/include/libcamera/base/thread.h
> > > > index 4f33de63d..3209d4f7c 100644
> > > > --- a/include/libcamera/base/thread.h
> > > > +++ b/include/libcamera/base/thread.h
> > > > @@ -15,6 +15,7 @@
> > > >
> > > > #include <libcamera/base/message.h>
> > > > #include <libcamera/base/signal.h>
> > > > +#include <libcamera/base/span.h>
> > > > #include <libcamera/base/utils.h>
> > > >
> > > > namespace libcamera {
> > > > @@ -35,6 +36,8 @@ public:
> > > > void exit(int code = 0);
> > > > bool wait(utils::duration duration = utils::duration::max());
> > > >
> > > > + int setThreadAffinity(const Span<const unsigned int> &cpus);
> > > > +
> > > > bool isRunning();
> > > >
> > > > Signal<> finished;
> > > > @@ -54,6 +57,8 @@ private:
> > > > void startThread();
> > > > void finishThread();
> > > >
> > > > + void setThreadAffinityInternal();
> > > > +
> > > > void postMessage(std::unique_ptr<Message> msg, Object *receiver);
> > > > void removeMessages(Object *receiver);
> > > >
> > > > diff --git a/src/libcamera/base/thread.cpp b/src/libcamera/base/thread.cpp
> > > > index 8735670b8..5dc787dba 100644
> > > > --- a/src/libcamera/base/thread.cpp
> > > > +++ b/src/libcamera/base/thread.cpp
> > > > @@ -9,6 +9,7 @@
> > > >
> > > > #include <atomic>
> > > > #include <list>
> > > > +#include <optional>
> > > > #include <sys/syscall.h>
> > > > #include <sys/types.h>
> > > > #include <unistd.h>
> > > > @@ -128,6 +129,8 @@ private:
> > > > int exitCode_;
> > > >
> > > > MessageQueue messages_;
> > > > +
> > > > + std::optional<cpu_set_t> cpuset_ = std::nullopt;
> > >
> > > Is there a need to initialize this to std::nullopt, as that's what the
> > > default constructor does ?
> >
> > Removed.
> >
> > > > };
> > > >
> > > > /**
> > > > @@ -254,6 +257,8 @@ void Thread::start()
> > > > data_->exit_.store(false, std::memory_order_relaxed);
> > > >
> > > > thread_ = std::thread(&Thread::startThread, this);
> > > > +
> > > > + setThreadAffinityInternal();
> > > > }
> > > >
> > > > void Thread::startThread()
> > > > @@ -410,6 +415,47 @@ bool Thread::wait(utils::duration duration)
> > > > return hasFinished;
> > > > }
> > > >
> > > > +/**
> > > > + * \brief Set the CPU affinity mask of the thread
> > > > + * \param[in] cpus The list of CPU indices that the thread is set affinity to
> > > > + *
> > > > + * The CPU indices should be within [0, std::thread::hardware_concurrency()).
> > > > + * If any index is invalid, no cpu indices will be set.
> > >
> > > * If any index is invalid, this function won't modify the thread affinity and
> > > * will return an error.
> >
> > Done.
> >
> > > > + *
> > > > + * \return 0 if all indices are valid, -EINVAL otherwise
> > > > + */
> > > > +int Thread::setThreadAffinity(const Span<const unsigned int> &cpus)
> > > > +{
> > > > + const unsigned int numCpus = std::thread::hardware_concurrency();
> > >
> > > https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency
> > > says:
> > >
> > > Returns the number of concurrent threads supported by the
> > > implementation. The value should be considered only a hint.
> > >
> > > [snip]
> > >
> > > Return value
> > >
> > > Number of concurrent threads supported. If the value is not well
> > > defined or not computable, returns 0.
> > >
> > > libcxx seems to have the following implementation:
> > >
> > > unsigned thread::hardware_concurrency() noexcept {
> > > #if defined(_SC_NPROCESSORS_ONLN)
> > > long result = sysconf(_SC_NPROCESSORS_ONLN);
> > > // sysconf returns -1 if the name is invalid, the option does not exist or
> > > // does not have a definite limit.
> > > // if sysconf returns some other negative number, we have no idea
> > > // what is going on. Default to something safe.
> > > if (result < 0)
> > > return 0;
> > > return static_cast<unsigned>(result);
> > > #elif defined(_LIBCPP_WIN32API)
> > > SYSTEM_INFO info;
> > > GetSystemInfo(&info);
> > > return info.dwNumberOfProcessors;
> > > #else // defined(CTL_HW) && defined(HW_NCPU)
> > > // TODO: grovel through /proc or check cpuid on x86 and similar
> > > // instructions on other architectures.
> > > # if defined(_LIBCPP_WARNING)
> > > _LIBCPP_WARNING("hardware_concurrency not yet implemented")
> > > # else
> > > # warning hardware_concurrency not yet implemented
> > > # endif
> > > return 0; // Means not computable [thread.thread.static]
> > > #endif // defined(CTL_HW) && defined(HW_NCPU)
> > > }
> > >
> > > while libstdc++ uses
> > >
> > > #if defined(_GLIBCXX_USE_GET_NPROCS)
> > > # include <sys/sysinfo.h>
> > > # define _GLIBCXX_NPROCS get_nprocs()
> > > #elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP)
> > > # define _GLIBCXX_NPROCS pthread_num_processors_np()
> > > #elif defined(_GLIBCXX_USE_SYSCTL_HW_NCPU)
> > > # include <stddef.h>
> > > # include <sys/sysctl.h>
> > > static inline int get_nprocs()
> > > {
> > > int count;
> > > size_t size = sizeof(count);
> > > int mib[] = { CTL_HW, HW_NCPU };
> > > if (!sysctl(mib, 2, &count, &size, NULL, 0))
> > > return count;
> > > return 0;
> > > }
> > > # define _GLIBCXX_NPROCS get_nprocs()
> > > #elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)
> > > # include <unistd.h>
> > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN)
> > > #elif defined(_GLIBCXX_USE_SC_NPROC_ONLN)
> > > # include <unistd.h>
> > > # define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN)
> > > #else
> > > # define _GLIBCXX_NPROCS 0
> > > #endif
> > >
> > > unsigned int
> > > thread::hardware_concurrency() noexcept
> > > {
> > > int __n = _GLIBCXX_NPROCS;
> > > if (__n < 0)
> > > __n = 0;
> > > return __n;
> > > }
> > >
> > > get_nprocs() and sysconf(_SC_NPROCESSORS_ONLN) are identical (the latter
> > > is a wrapper around the former). So it seems that, while the C++ API
> > > doesn't guarantee that thread::hardware_concurrency() will provide an
> > > accurate value, the implementations we have will do the right thing.
> > > They're however a bit expensive, as they open a file in /sys, but that's
> > > probably the best we can do.
> > >
> > > > +
> > > > + MutexLocker locker(data_->mutex_);
> > > > + data_->cpuset_ = cpu_set_t();
> > > > + CPU_ZERO(&data_->cpuset_.value());
> > > > +
> > > > + for (const unsigned int &cpu : cpus) {
> > > > + if (cpu >= numCpus) {
> > > > + LOG(Thread, Error) << "Ignore an invalid cpu index: " << cpu;
> > >
> > > It's not ignored anymore
> > >
> > > LOG(Thread, Error)
> > > << "Invalid CPU " << cpu << " for thread affinity";
> > >
> >
> > Right, updated.
> >
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + CPU_SET(cpu, &data_->cpuset_.value());
> > > > + }
> > > > +
> > > > + if (data_->running_)
> > > > + setThreadAffinityInternal();
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +void Thread::setThreadAffinityInternal()
> > > > +{
> > > > + if (!data_->cpuset_)
> > > > + return;
> > > > +
> > > > + const cpu_set_t &cpuset = data_->cpuset_.value();
> > > > + pthread_setaffinity_np(thread_.native_handle(), sizeof(cpuset), &cpuset);
> > >
> > > With this implementation, it's not possible to set the thread's CPU
> > > affinity before starting the thread. That's a limitation of the C++
> > > std::thread API, it could be done if we used the native pthread API
> > > instead. Is that a problem ? Could you actually point me to code that
> > > uses Thread::setThreadAffinity() ?
> >
> > There are two main usages in mtkisp7:
> > 1. Setting pipeline handler's thread [1].
>
> Heads up on that one, we probably want to handle this in the libcamera
> core. Multiple pipeline handlers can run on the same platform, currently
> sharing a single camera manager thread. If they all tried to set thread
> affinity, that would be messy. I'd like to explore how we could
> coordinate that in a cleaner way.
Yes, your concern makes a lot of sense. If we want to process requests
on a different CPU core in pipeline handlers, would it be better to create
another thread in each pipeline handler? That involves an extra context
switch per frame though...
>
> > 2. Setting IPA's two new threads [2].
> >
> > The first one sets thread affinity to a thread that has already been
> > started, while the second one set new threads.
>
> Out of curiosity, have you measured the difference in performance ? Is
> it a matter of pinning the threads to big cores to ensure they have
> enough CPU time ? Or a matter of avoiding latency caused by migration ?
> Or something else ?
We run CTS, and the current config passes the performance tests with
sufficient fps and low enough latencies.
Without this functionality though, I think we can still somehow pass CTS
with more retries. If it's controversial, we can skip this for now.
We can also allow setting thread affinity when the thread hasn't been
started yet, which at least fits the second use case in mtkisp7.
>
> > By "the native pthread API", you mean `pthread_attr_setaffinity_np`?
> > This means that we need to implement libcamera::Thread with pthread,
> > instead of std::thread?
>
> Yes. It doesn't mean we have to do that, I'm only asking if you think
> that would be a good idea, or overkill.
I find it a bit an overkill :)
The approach below should be much easier.
>
> > [1]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/libcamera/pipeline/mtkisp7/mtkisp7.cpp;l=511
> > [2]: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/libcamera/mtkisp7/src/ipa/mtkisp7/mtkisp7.cpp;l=119
> >
> > > Even without the ability to set the affinity before starting the thread,
> > > we could set it from within the thread before calling the run()
> > > function. This would avoid the main workload being run on incorrect
> > > CPUs. Is that something that would be useful ?
> >
> > You mean calling `setThreadAffinityInterna()l` in `startThread()`
> > (before `run()`), instead of at the end of `start()`?
>
> Yes. Again, I'm not saing it has to be done, just exploring options.
That makes a lot of sense. It works on ciri as well in my trial.
BR,
Harvey
>
> > > Regardless of which option we implement, Thread::setThreadAffinity()
> > > should document what happens if it's called before calling
> > > Thread::start().
> > >
> > > > +}
> > > > +
> > > > /**
> > > > * \brief Check if the thread is running
> > > > *
>
> --
> Regards,
>
> Laurent Pinchart
More information about the libcamera-devel
mailing list