[PATCH v5 1/1] libcamera: add method to set thread affinity

Laurent Pinchart laurent.pinchart at ideasonboard.com
Thu Oct 17 00:08:27 CEST 2024


Hi Harvey, Han-Lin,

Thank you for the patch. This is taking shape.

On Wed, Oct 16, 2024 at 08:43:44AM +0000, Harvey Yang wrote:
> From: Han-Lin Chen <hanlinchen at chromium.org>
> 
> Add method to set thread affinity to Thread class.
> 
> Signed-off-by: Han-Lin Chen <hanlinchen at chromium.org>
> Co-developed-by: Harvey Yang <chenghaoyang at chromium.org>
> Signed-off-by: Harvey Yang <chenghaoyang at chromium.org>
> ---
>  include/libcamera/base/thread.h |  5 ++++
>  src/libcamera/base/thread.cpp   | 46 +++++++++++++++++++++++++++++++++
>  2 files changed, 51 insertions(+)
> 
> diff --git a/include/libcamera/base/thread.h b/include/libcamera/base/thread.h
> index 4f33de63d..3209d4f7c 100644
> --- a/include/libcamera/base/thread.h
> +++ b/include/libcamera/base/thread.h
> @@ -15,6 +15,7 @@
>  
>  #include <libcamera/base/message.h>
>  #include <libcamera/base/signal.h>
> +#include <libcamera/base/span.h>
>  #include <libcamera/base/utils.h>
>  
>  namespace libcamera {
> @@ -35,6 +36,8 @@ public:
>  	void exit(int code = 0);
>  	bool wait(utils::duration duration = utils::duration::max());
>  
> +	int setThreadAffinity(const Span<const unsigned int> &cpus);
> +
>  	bool isRunning();
>  
>  	Signal<> finished;
> @@ -54,6 +57,8 @@ private:
>  	void startThread();
>  	void finishThread();
>  
> +	void setThreadAffinityInternal();
> +
>  	void postMessage(std::unique_ptr<Message> msg, Object *receiver);
>  	void removeMessages(Object *receiver);
>  
> diff --git a/src/libcamera/base/thread.cpp b/src/libcamera/base/thread.cpp
> index 8735670b8..5dc787dba 100644
> --- a/src/libcamera/base/thread.cpp
> +++ b/src/libcamera/base/thread.cpp
> @@ -9,6 +9,7 @@
>  
>  #include <atomic>
>  #include <list>
> +#include <optional>
>  #include <sys/syscall.h>
>  #include <sys/types.h>
>  #include <unistd.h>
> @@ -128,6 +129,8 @@ private:
>  	int exitCode_;
>  
>  	MessageQueue messages_;
> +
> +	std::optional<cpu_set_t> cpuset_ = std::nullopt;

Is there a need to initialize this to std::nullopt, as that's what the
default constructor does ?

>  };
>  
>  /**
> @@ -254,6 +257,8 @@ void Thread::start()
>  	data_->exit_.store(false, std::memory_order_relaxed);
>  
>  	thread_ = std::thread(&Thread::startThread, this);
> +
> +	setThreadAffinityInternal();
>  }
>  
>  void Thread::startThread()
> @@ -410,6 +415,47 @@ bool Thread::wait(utils::duration duration)
>  	return hasFinished;
>  }
>  
> +/**
> + * \brief Set the CPU affinity mask of the thread
> + * \param[in] cpus The list of CPU indices that the thread is set affinity to
> + *
> + * The CPU indices should be within [0, std::thread::hardware_concurrency()).
> + * If any index is invalid, no cpu indices will be set.

 * If any index is invalid, this function won't modify the thread affinity and
 * will return an error.

> + *
> + * \return 0 if all indices are valid, -EINVAL otherwise
> + */
> +int Thread::setThreadAffinity(const Span<const unsigned int> &cpus)
> +{
> +	const unsigned int numCpus = std::thread::hardware_concurrency();

https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency
says:

  Returns the number of concurrent threads supported by the
  implementation. The value should be considered only a hint.

  [snip]

  Return value

  Number of concurrent threads supported. If the value is not well
  defined or not computable, returns 0.

libcxx seems to have the following implementation:

unsigned thread::hardware_concurrency() noexcept {
#if defined(_SC_NPROCESSORS_ONLN)
  long result = sysconf(_SC_NPROCESSORS_ONLN);
  // sysconf returns -1 if the name is invalid, the option does not exist or
  // does not have a definite limit.
  // if sysconf returns some other negative number, we have no idea
  // what is going on. Default to something safe.
  if (result < 0)
    return 0;
  return static_cast<unsigned>(result);
#elif defined(_LIBCPP_WIN32API)
  SYSTEM_INFO info;
  GetSystemInfo(&info);
  return info.dwNumberOfProcessors;
#else // defined(CTL_HW) && defined(HW_NCPU)
  // TODO: grovel through /proc or check cpuid on x86 and similar
  // instructions on other architectures.
#  if defined(_LIBCPP_WARNING)
  _LIBCPP_WARNING("hardware_concurrency not yet implemented")
#  else
#    warning hardware_concurrency not yet implemented
#  endif
  return 0; // Means not computable [thread.thread.static]
#endif // defined(CTL_HW) && defined(HW_NCPU)
}

while libstdc++ uses

#if defined(_GLIBCXX_USE_GET_NPROCS)
# include <sys/sysinfo.h>
# define _GLIBCXX_NPROCS get_nprocs()
#elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP)
# define _GLIBCXX_NPROCS pthread_num_processors_np()
#elif defined(_GLIBCXX_USE_SYSCTL_HW_NCPU)
# include <stddef.h>
# include <sys/sysctl.h>
static inline int get_nprocs()
{
 int count;
 size_t size = sizeof(count);
 int mib[] = { CTL_HW, HW_NCPU };
 if (!sysctl(mib, 2, &count, &size, NULL, 0))
   return count;
 return 0;
}
# define _GLIBCXX_NPROCS get_nprocs()
#elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)
# include <unistd.h>
# define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN)
#elif defined(_GLIBCXX_USE_SC_NPROC_ONLN)
# include <unistd.h>
# define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN)
#else
# define _GLIBCXX_NPROCS 0
#endif

  unsigned int
  thread::hardware_concurrency() noexcept
  {
    int __n = _GLIBCXX_NPROCS;
    if (__n < 0)
      __n = 0;
    return __n;
  }

get_nprocs() and sysconf(_SC_NPROCESSORS_ONLN) are identical (the latter
is a wrapper around the former). So it seems that, while the C++ API
doesn't guarantee that thread::hardware_concurrency() will provide an
accurate value, the implementations we have will do the right thing.
They're however a bit expensive, as they open a file in /sys, but that's
probably the best we can do.

> +
> +	MutexLocker locker(data_->mutex_);
> +	data_->cpuset_ = cpu_set_t();
> +	CPU_ZERO(&data_->cpuset_.value());
> +
> +	for (const unsigned int &cpu : cpus) {
> +		if (cpu >= numCpus) {
> +			LOG(Thread, Error) << "Ignore an invalid cpu index: " << cpu;

It's not ignored anymore

			LOG(Thread, Error)
				<< "Invalid CPU " << cpu << " for thread affinity";

> +			return -EINVAL;
> +		}
> +
> +		CPU_SET(cpu, &data_->cpuset_.value());
> +	}
> +
> +	if (data_->running_)
> +		setThreadAffinityInternal();
> +
> +	return 0;
> +}
> +
> +void Thread::setThreadAffinityInternal()
> +{
> +	if (!data_->cpuset_)
> +		return;
> +
> +	const cpu_set_t &cpuset = data_->cpuset_.value();
> +	pthread_setaffinity_np(thread_.native_handle(), sizeof(cpuset), &cpuset);

With this implementation, it's not possible to set the thread's CPU
affinity before starting the thread. That's a limitation of the C++
std::thread API, it could be done if we used the native pthread API
instead. Is that a problem ? Could you actually point me to code that
uses Thread::setThreadAffinity() ?

Even without the ability to set the affinity before starting the thread,
we could set it from within the thread before calling the run()
function. This would avoid the main workload being run on incorrect
CPUs. Is that something that would be useful ?

Regardless of which option we implement, Thread::setThreadAffinity()
should document what happens if it's called before calling
Thread::start().

> +}
> +
>  /**
>   * \brief Check if the thread is running
>   *

-- 
Regards,

Laurent Pinchart


More information about the libcamera-devel mailing list