[libcamera-devel] [RFC PATCH 5/5] android: Run post processing in a separate thread

Fri Aug 27 05:04:55 CEST 2021

Hi Umang,

Thank you for the patch.

I'll start with a few high-level comments.

On Fri, Aug 27, 2021 at 03:00:16AM +0530, Umang Jain wrote:
> In CameraStream, introduce a new private PostProcessorWorker
> class derived from Thread. The post-processor shall run in
> this derived thread instance asynchronously.
> 
> Running PostProcessor asynchronously should entail that all the
> data needed by the PostProcessor should remain valid for the entire
> duration of its run. In this instance, we currently need to ensure:
> 
> - 'source' FrameBuffer for the post processor
> - Camera result metadata
> 
> Should be valid and saved with preserving the entire context. The
> 'source' framebuffer is copied internally for streams other than
> internal (since internal ones are allocated by CameraStream class
> itself) and the copy is passed along to post-processor.
> 
> Coming to preserving the context, a quick reference on how captured
> results are sent to android framework. Completed captured results
> should be sent using process_capture_result() callback. The order
> of sending them should match the order the capture request
> (camera3_capture_request_t), that was submitted to the HAL in the first
> place.
> 
> In order to enforce the ordering, we need to maintain a queue. When a
> stream requires post-processing, we save the associated capture results
> (via Camera3RequestDescriptor) and add it to the queue. All transient
> completed captured results are queued as well. When the post-processing
> results are available, we can simply start de-queuing all the queued
> completed captured results and invoke process_capture_result() callback
> on each of them.
> 
> The context saving part is done by Camera3RequestDescriptor. It is
> further extended to include data-structs to fulfill the demands of
> async post-processing.

To ease review, do you think it would be possible to split this patch in
two, with a first part that starts making use of the signals but doesn't
move the post-processor to a separate threads

> Signed-off-by: Umang Jain <umang.jain at ideasonboard.com>
> ---
>  src/android/camera_device.cpp | 92 +++++++++++++++++++++++++++++++----
>  src/android/camera_device.h   | 21 +++++++-
>  src/android/camera_stream.cpp | 35 ++++++++++---
>  src/android/camera_stream.h   | 40 +++++++++++++++
>  4 files changed, 170 insertions(+), 18 deletions(-)
> 
> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
> index fea59ce6..08237187 100644
> --- a/src/android/camera_device.cpp
> +++ b/src/android/camera_device.cpp
> @@ -960,6 +960,7 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>  		 * and add them to the Request if required.
>  		 */
>  		FrameBuffer *buffer = nullptr;
> +		descriptor.srcInternalBuffer_ = nullptr;
>  		switch (cameraStream->type()) {
>  		case CameraStream::Type::Mapped:
>  			/*
> @@ -990,6 +991,7 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>  			 * once it has been processed.
>  			 */
>  			buffer = cameraStream->getBuffer();
> +			descriptor.srcInternalBuffer_ = buffer;
>  			LOG(HAL, Debug) << ss.str() << " (internal)";
>  			break;
>  		}
> @@ -1149,25 +1151,95 @@ void CameraDevice::requestComplete(Request *request)
>  			continue;
>  		}
>  
> +		/*
> +		 * Save the current context of capture result and queue the
> +		 * descriptor. cameraStream will process asynchronously, so
> +		 * we can only send the results back after the processing has
> +		 * been completed.
> +		 */
> +		descriptor.status_ = Camera3RequestDescriptor::NOT_READY;
> +		descriptor.resultMetadata_ = std::move(resultMetadata);
> +		descriptor.captureResult_ = captureResult;
> +
> +		cameraStream->processComplete.connect(this, &CameraDevice::streamProcessingComplete);

While signals can be connected and disconnected on demand, it often
indicates a design issue. Is there a reason why you can't connect the
signal when the cameraStream instance is created, and keep it connected
for the whole lifetime of the stream ?

>  		int ret = cameraStream->process(src, *buffer.buffer,
>  						descriptor.settings_,
> -						resultMetadata.get());
> +						descriptor.resultMetadata_.get());

You have a race condition here, if cameraStream->process() runs in a
separate thread, it could complete and emit the processComplete signal
before the code below gets executed.

> +		std::unique_ptr<Camera3RequestDescriptor> reqDescriptor =
> +			std::make_unique<Camera3RequestDescriptor>();
> +		*reqDescriptor = std::move(descriptor);
> +		queuedDescriptor_.push_back(std::move(reqDescriptor));
> +
> +		return;
> +	}
> +
> +	if (queuedDescriptor_.empty()) {
> +		captureResult.result = resultMetadata->get();
> +		callbacks_->process_capture_result(callbacks_, &captureResult);
> +	} else {
> +		/*
> +		 * Previous capture results waiting to be sent to framework
> +		 * hence, queue the current capture results as well. After that,
> +		 * check if any results are ready to be sent back to the
> +		 * framework.
> +		 */
> +		descriptor.status_ = Camera3RequestDescriptor::READY_SUCCESS;
> +		descriptor.resultMetadata_ = std::move(resultMetadata);
> +		descriptor.captureResult_ = captureResult;
> +		std::unique_ptr<Camera3RequestDescriptor> reqDescriptor =
> +			std::make_unique<Camera3RequestDescriptor>();
> +		*reqDescriptor = std::move(descriptor);
> +		queuedDescriptor_.push_back(std::move(reqDescriptor));
> +
> +		while (!queuedDescriptor_.empty()) {
> +			std::unique_ptr<Camera3RequestDescriptor> &d = queuedDescriptor_.front();
> +			if (d->status_ != Camera3RequestDescriptor::NOT_READY) {
> +				d->captureResult_.result = d->resultMetadata_->get();
> +				callbacks_->process_capture_result(callbacks_, &(d->captureResult_));
> +				queuedDescriptor_.pop_front();
> +			} else {
> +				break;
> +			}
> +		}
> +	}
> +}
> +
> +void CameraDevice::streamProcessingComplete(CameraStream *cameraStream,
> +					    CameraStream::ProcessStatus status,
> +					    CameraMetadata *resultMetadata)
> +{
> +	cameraStream->processComplete.disconnect(this, &CameraDevice::streamProcessingComplete);
> +
> +	for (auto &d : queuedDescriptor_) {
> +		if (d->resultMetadata_.get() != resultMetadata)
> +			continue;
> +
>  		/*
>  		 * Return the FrameBuffer to the CameraStream now that we're
>  		 * done processing it.
>  		 */
>  		if (cameraStream->type() == CameraStream::Type::Internal)
> -			cameraStream->putBuffer(src);
> -
> -		if (ret) {
> -			buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
> -			notifyError(descriptor.frameNumber_, buffer.stream,
> -				    CAMERA3_MSG_ERROR_BUFFER);
> +			cameraStream->putBuffer(d->srcInternalBuffer_);
> +
> +		if (status == CameraStream::ProcessStatus::Success) {
> +			d->status_ = Camera3RequestDescriptor::READY_SUCCESS;
> +		} else {
> +			/* \todo this is clumsy error handling, be better. */
> +			d->status_ = Camera3RequestDescriptor::READY_FAILED;
> +			for (camera3_stream_buffer_t &buffer : d->buffers_) {
> +				CameraStream *cs = static_cast<CameraStream *>(buffer.stream->priv);
> +
> +				if (cs->camera3Stream().format != HAL_PIXEL_FORMAT_BLOB)
> +					continue;
> +
> +				buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
> +				notifyError(d->frameNumber_, buffer.stream,
> +					    CAMERA3_MSG_ERROR_BUFFER);
> +			}
>  		}
> -	}
>  
> -	captureResult.result = resultMetadata->get();
> -	callbacks_->process_capture_result(callbacks_, &captureResult);
> +		return;
> +	}
>  }
>  
>  std::string CameraDevice::logPrefix() const
> diff --git a/src/android/camera_device.h b/src/android/camera_device.h
> index dd9aebba..4e4bb970 100644
> --- a/src/android/camera_device.h
> +++ b/src/android/camera_device.h
> @@ -7,6 +7,7 @@
>  #ifndef __ANDROID_CAMERA_DEVICE_H__
>  #define __ANDROID_CAMERA_DEVICE_H__
>  
> +#include <deque>
>  #include <map>
>  #include <memory>
>  #include <mutex>
> @@ -81,6 +82,20 @@ private:
>  		std::vector<std::unique_ptr<libcamera::FrameBuffer>> frameBuffers_;
>  		CameraMetadata settings_;
>  		std::unique_ptr<CaptureRequest> request_;
> +
> +		/*
> +		 * Only set when a capture result needs to be queued before
> +		 * completion via process_capture_result() callback.
> +		 */
> +		enum processingStatus {
> +			NOT_READY,
> +			READY_SUCCESS,
> +			READY_FAILED,
> +		};
> +		std::unique_ptr<CameraMetadata> resultMetadata_;
> +		camera3_capture_result_t captureResult_;
> +		libcamera::FrameBuffer *srcInternalBuffer_;
> +		processingStatus status_;
>  	};
>  
>  	enum class State {
> @@ -100,7 +115,9 @@ private:
>  	int processControls(Camera3RequestDescriptor *descriptor);
>  	std::unique_ptr<CameraMetadata> getResultMetadata(
>  		const Camera3RequestDescriptor &descriptor) const;
> -
> +	void streamProcessingComplete(CameraStream *cameraStream,
> +				      CameraStream::ProcessStatus status,
> +				      CameraMetadata *resultMetadata);
>  	unsigned int id_;
>  	camera3_device_t camera3Device_;
>  
> @@ -128,6 +145,8 @@ private:
>  	int orientation_;
>  
>  	CameraMetadata lastSettings_;
> +
> +	std::deque<std::unique_ptr<Camera3RequestDescriptor>> queuedDescriptor_;
>  };
>  
>  #endif /* __ANDROID_CAMERA_DEVICE_H__ */
> diff --git a/src/android/camera_stream.cpp b/src/android/camera_stream.cpp
> index bdcc7cf9..d5981c0e 100644
> --- a/src/android/camera_stream.cpp
> +++ b/src/android/camera_stream.cpp
> @@ -55,6 +55,7 @@ CameraStream::CameraStream(CameraDevice *const cameraDevice,
>  		 * is what we instantiate here.
>  		 */
>  		postProcessor_ = std::make_unique<PostProcessorJpeg>(cameraDevice_);
> +		ppWorker_ = std::make_unique<PostProcessorWorker>(postProcessor_.get());
>  	}
>  
>  	if (type == Type::Internal) {
> @@ -108,22 +109,41 @@ int CameraStream::process(const libcamera::FrameBuffer *source,
>  	if (!postProcessor_)
>  		return 0;
>  
> -	/*
> -	 * \todo Buffer mapping and processing should be moved to a
> -	 * separate thread.
> -	 */
> -	CameraBuffer dest(camera3Dest, PROT_READ | PROT_WRITE);
> -	if (!dest.isValid()) {
> +	/* \todo Should buffer mapping be moved to post-processor's thread? */

Good point :-)

> +	dest_ = std::make_unique<CameraBuffer>(camera3Dest, PROT_READ | PROT_WRITE);
> +	if (!dest_->isValid()) {
>  		LOG(HAL, Error) << "Failed to map android blob buffer";
>  		return -EINVAL;
>  	}
>  
> -	return postProcessor_->process(source, &dest, requestMetadata, resultMetadata);
> +	if (type() != CameraStream::Type::Internal) {
> +		/*
> +		 * The source buffer is owned by Request object which can be
> +		 * reused for future capture request. Since post-processor will
> +		 * run asynchrnously, we need to copy the source buffer and use
> +		 * it as source data for post-processor.
> +		 */
> +		srcPlanes_.clear();
> +		for (const auto &plane : source->planes())
> +			srcPlanes_.push_back(plane);
> +
> +		srcFramebuffer_ = std::make_unique<libcamera::FrameBuffer>(srcPlanes_);
> +		source = srcFramebuffer_.get();
> +	}

This will break if the next frame needs to be post-processed before the
current frame completes. You need a queue of frames to post-process, not
just one. One option would be to store the data in
Camera3RequestDescriptor instead, as that the full context for a
request, including all its frames. Some refactoring would be required,
as the structure is current private to CameraDevice.

> +
> +	ppWorker_->start();
> +
> +	return postProcessor_->invokeMethod(&PostProcessor::process,
> +					    ConnectionTypeQueued,
> +					    source, dest_.get(),
> +					    requestMetadata, resultMetadata);

Is the CameraStream actually the right place to spawn a thread and
dispatch processing jobs ? Brainstorming a bit, I wonder if this
shouldn't be moved to the PostProcessor class instead, as the need for a
thread depends on the type of post-processor. The current software-based
post-processors definitely need a thread, but a post-processor that
would use a hardware-accelerated JPEG encoder would be asynchronous by
nature (posting jobs to the device and being notified of their
completion through callbacks) and may not need to be wrapped in a
thread. It would actually also depend on how the hardware JPEG encoder
would be interfaced. If the post-processor were to deal with the kernel
device directly, it would likely need an event loop, which we're missing
in the HAL implementation. A thread would provide that event loop, but
it could be best to use a single thread for all event-driven
post-processors instead of one per post-processor. This is largely
theoretical though, we don't have hardware-backed post-processors today.

>  }
>  
>  void CameraStream::handlePostProcessing(PostProcessor::Status status,
>  					CameraMetadata *resultMetadata)
>  {
> +	ppWorker_->exit();

Threads won't consume much resources when they're idle, but starting and
stopping a thread is a costly operation. I would be better to start the
thread when starting the camera capture session.

> +
>  	switch (status) {
>  	case PostProcessor::Status::Success:
>  		processComplete.emit(this, ProcessStatus::Success, resultMetadata);
> @@ -134,6 +154,7 @@ void CameraStream::handlePostProcessing(PostProcessor::Status status,
>  	default:
>  		LOG(HAL, Error) << "PostProcessor status invalid";
>  	}
> +	srcFramebuffer_.reset();
>  }
>  
>  FrameBuffer *CameraStream::getBuffer()
> diff --git a/src/android/camera_stream.h b/src/android/camera_stream.h
> index d54d3f58..567d896f 100644
> --- a/src/android/camera_stream.h
> +++ b/src/android/camera_stream.h
> @@ -13,7 +13,9 @@
>  
>  #include <hardware/camera3.h>
>  
> +#include <libcamera/base/object.h>
>  #include <libcamera/base/signal.h>
> +#include <libcamera/base/thread.h>
>  
>  #include <libcamera/camera.h>
>  #include <libcamera/framebuffer.h>
> @@ -23,6 +25,7 @@
>  
>  #include "post_processor.h"
>  
> +class CameraBuffer;
>  class CameraDevice;
>  class CameraMetadata;
>  
> @@ -135,6 +138,38 @@ public:
>  	libcamera::Signal<CameraStream *, ProcessStatus, CameraMetadata *> processComplete;
>  
>  private:
> +	class PostProcessorWorker : public libcamera::Thread
> +	{
> +	public:
> +		PostProcessorWorker(PostProcessor *postProcessor)
> +		{
> +			postProcessor->moveToThread(this);
> +		}

Assuming we want to keep the thread inside the CameraStream, I think
this needs to be reworked a little bit. The post-processor, in this
case, isn't thread-aware, it gets run in a thread, but doesn't really
know much about that fact. However, in patch 1/5, you make PostProcessor
inherit from Object to enabled the invokeMethod() call. There's a design
conflict between these two facts.

I would propose instead keeping the PostProcessor unaware of threads,
without inheriting from Object, and making PostProcessorWorker the class
that inherits from Object and is moved to the thread. The thread could
stay in PostProcessorWorker by inheriting from both Object and Thread,
but that's a bit confusing, so I'd recommend adding a libcamera::Thread
thread_ member variable to CameraStream for the thread. This is the
design used by the IPA proxies, you can look at the generated code to
see how it's implemented (it's more readable than the templates). The
worker would have a process() function, called from
CameraStream::process() using invokeMethod(), and that worker process()
functions would simply call PostProcessor::process() directly.

> +
> +		void start()
> +		{
> +			/*
> +			 * \todo [BLOCKER] !!!
> +			 * On second shutter capture, this fails to start the
> +			 * thread(again). No errors for first shutter capture.
> +			 */
> +			Thread::start();
> +		}

Why do you reimplement start(), if you only call the same function in
the parent class ?

> +
> +		void stop()
> +		{
> +			exit();
> +			wait();
> +		}

This is never called.

> +
> +	protected:
> +		void run() override
> +		{
> +			exec();
> +			dispatchMessages();
> +		}
> +	};
> +
>  	void handlePostProcessing(PostProcessor::Status status,
>  				  CameraMetadata *resultMetadata);
>  
> @@ -152,6 +187,11 @@ private:
>  	 */
>  	std::unique_ptr<std::mutex> mutex_;
>  	std::unique_ptr<PostProcessor> postProcessor_;
> +	std::unique_ptr<PostProcessorWorker> ppWorker_;
> +
> +	std::unique_ptr<CameraBuffer> dest_;
> +	std::unique_ptr<libcamera::FrameBuffer> srcFramebuffer_;
> +	std::vector<libcamera::FrameBuffer::Plane> srcPlanes_;
>  };
>  
>  #endif /* __ANDROID_CAMERA_STREAM__ */

-- 
Regards,

Laurent Pinchart