[libcamera-devel] [RFC PATCH 5/5] android: Run post processing in a separate thread

Fri Aug 27 16:00:56 CEST 2021

Hi Laurent

On 8/27/21 8:34 AM, Laurent Pinchart wrote:
> Hi Umang,
>
> Thank you for the patch.
>
> I'll start with a few high-level comments.
Great, I was expecting exactly this.
>
> On Fri, Aug 27, 2021 at 03:00:16AM +0530, Umang Jain wrote:
>> In CameraStream, introduce a new private PostProcessorWorker
>> class derived from Thread. The post-processor shall run in
>> this derived thread instance asynchronously.
>>
>> Running PostProcessor asynchronously should entail that all the
>> data needed by the PostProcessor should remain valid for the entire
>> duration of its run. In this instance, we currently need to ensure:
>>
>> - 'source' FrameBuffer for the post processor
>> - Camera result metadata
>>
>> Should be valid and saved with preserving the entire context. The
>> 'source' framebuffer is copied internally for streams other than
>> internal (since internal ones are allocated by CameraStream class
>> itself) and the copy is passed along to post-processor.
>>
>> Coming to preserving the context, a quick reference on how captured
>> results are sent to android framework. Completed captured results
>> should be sent using process_capture_result() callback. The order
>> of sending them should match the order the capture request
>> (camera3_capture_request_t), that was submitted to the HAL in the first
>> place.
>>
>> In order to enforce the ordering, we need to maintain a queue. When a
>> stream requires post-processing, we save the associated capture results
>> (via Camera3RequestDescriptor) and add it to the queue. All transient
>> completed captured results are queued as well. When the post-processing
>> results are available, we can simply start de-queuing all the queued
>> completed captured results and invoke process_capture_result() callback
>> on each of them.
>>
>> The context saving part is done by Camera3RequestDescriptor. It is
>> further extended to include data-structs to fulfill the demands of
>> async post-processing.
> To ease review, do you think it would be possible to split this patch in
> two, with a first part that starts making use of the signals but doesn't
> move the post-processor to a separate threads

Ah, indeed, we can break it with use of signal emitting in the signals 
in same thread and then introducing the threading mechanism in a 
separate patch (Why didn't I think of this)

>
>> Signed-off-by: Umang Jain <umang.jain at ideasonboard.com>
>> ---
>>   src/android/camera_device.cpp | 92 +++++++++++++++++++++++++++++++----
>>   src/android/camera_device.h   | 21 +++++++-
>>   src/android/camera_stream.cpp | 35 ++++++++++---
>>   src/android/camera_stream.h   | 40 +++++++++++++++
>>   4 files changed, 170 insertions(+), 18 deletions(-)
>>
>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
>> index fea59ce6..08237187 100644
>> --- a/src/android/camera_device.cpp
>> +++ b/src/android/camera_device.cpp
>> @@ -960,6 +960,7 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>>   		 * and add them to the Request if required.
>>   		 */
>>   		FrameBuffer *buffer = nullptr;
>> +		descriptor.srcInternalBuffer_ = nullptr;
>>   		switch (cameraStream->type()) {
>>   		case CameraStream::Type::Mapped:
>>   			/*
>> @@ -990,6 +991,7 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>>   			 * once it has been processed.
>>   			 */
>>   			buffer = cameraStream->getBuffer();
>> +			descriptor.srcInternalBuffer_ = buffer;
>>   			LOG(HAL, Debug) << ss.str() << " (internal)";
>>   			break;
>>   		}
>> @@ -1149,25 +1151,95 @@ void CameraDevice::requestComplete(Request *request)
>>   			continue;
>>   		}
>>   
>> +		/*
>> +		 * Save the current context of capture result and queue the
>> +		 * descriptor. cameraStream will process asynchronously, so
>> +		 * we can only send the results back after the processing has
>> +		 * been completed.
>> +		 */
>> +		descriptor.status_ = Camera3RequestDescriptor::NOT_READY;
>> +		descriptor.resultMetadata_ = std::move(resultMetadata);
>> +		descriptor.captureResult_ = captureResult;
>> +
>> +		cameraStream->processComplete.connect(this, &CameraDevice::streamProcessingComplete);
> While signals can be connected and disconnected on demand, it often
> indicates a design issue. Is there a reason why you can't connect the
> signal when the cameraStream instance is created, and keep it connected
> for the whole lifetime of the stream ?

yes, I can keep it connected for the while lifetime of the stream too...

>
>>   		int ret = cameraStream->process(src, *buffer.buffer,
>>   						descriptor.settings_,
>> -						resultMetadata.get());
>> +						descriptor.resultMetadata_.get());
> You have a race condition here, if cameraStream->process() runs in a
> separate thread, it could complete and emit the processComplete signal
> before the code below gets executed.

ouch, that's not right order.  You are correct. I should queue the 
descriptor first before process().. Good spot!

>
>> +		std::unique_ptr<Camera3RequestDescriptor> reqDescriptor =
>> +			std::make_unique<Camera3RequestDescriptor>();
>> +		*reqDescriptor = std::move(descriptor);
>> +		queuedDescriptor_.push_back(std::move(reqDescriptor));
>> +
>> +		return;
>> +	}
>> +
>> +	if (queuedDescriptor_.empty()) {
>> +		captureResult.result = resultMetadata->get();
>> +		callbacks_->process_capture_result(callbacks_, &captureResult);
>> +	} else {
>> +		/*
>> +		 * Previous capture results waiting to be sent to framework
>> +		 * hence, queue the current capture results as well. After that,
>> +		 * check if any results are ready to be sent back to the
>> +		 * framework.
>> +		 */
>> +		descriptor.status_ = Camera3RequestDescriptor::READY_SUCCESS;
>> +		descriptor.resultMetadata_ = std::move(resultMetadata);
>> +		descriptor.captureResult_ = captureResult;
>> +		std::unique_ptr<Camera3RequestDescriptor> reqDescriptor =
>> +			std::make_unique<Camera3RequestDescriptor>();
>> +		*reqDescriptor = std::move(descriptor);
>> +		queuedDescriptor_.push_back(std::move(reqDescriptor));
>> +
>> +		while (!queuedDescriptor_.empty()) {
>> +			std::unique_ptr<Camera3RequestDescriptor> &d = queuedDescriptor_.front();
>> +			if (d->status_ != Camera3RequestDescriptor::NOT_READY) {
>> +				d->captureResult_.result = d->resultMetadata_->get();
>> +				callbacks_->process_capture_result(callbacks_, &(d->captureResult_));
>> +				queuedDescriptor_.pop_front();
>> +			} else {
>> +				break;
>> +			}
>> +		}
>> +	}
>> +}
>> +
>> +void CameraDevice::streamProcessingComplete(CameraStream *cameraStream,
>> +					    CameraStream::ProcessStatus status,
>> +					    CameraMetadata *resultMetadata)
>> +{
>> +	cameraStream->processComplete.disconnect(this, &CameraDevice::streamProcessingComplete);
>> +
>> +	for (auto &d : queuedDescriptor_) {
>> +		if (d->resultMetadata_.get() != resultMetadata)
>> +			continue;
>> +
>>   		/*
>>   		 * Return the FrameBuffer to the CameraStream now that we're
>>   		 * done processing it.
>>   		 */
>>   		if (cameraStream->type() == CameraStream::Type::Internal)
>> -			cameraStream->putBuffer(src);
>> -
>> -		if (ret) {
>> -			buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
>> -			notifyError(descriptor.frameNumber_, buffer.stream,
>> -				    CAMERA3_MSG_ERROR_BUFFER);
>> +			cameraStream->putBuffer(d->srcInternalBuffer_);
>> +
>> +		if (status == CameraStream::ProcessStatus::Success) {
>> +			d->status_ = Camera3RequestDescriptor::READY_SUCCESS;
>> +		} else {
>> +			/* \todo this is clumsy error handling, be better. */
>> +			d->status_ = Camera3RequestDescriptor::READY_FAILED;
>> +			for (camera3_stream_buffer_t &buffer : d->buffers_) {
>> +				CameraStream *cs = static_cast<CameraStream *>(buffer.stream->priv);
>> +
>> +				if (cs->camera3Stream().format != HAL_PIXEL_FORMAT_BLOB)
>> +					continue;
>> +
>> +				buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
>> +				notifyError(d->frameNumber_, buffer.stream,
>> +					    CAMERA3_MSG_ERROR_BUFFER);
>> +			}
>>   		}
>> -	}
>>   
>> -	captureResult.result = resultMetadata->get();
>> -	callbacks_->process_capture_result(callbacks_, &captureResult);
>> +		return;
>> +	}
>>   }
>>   
>>   std::string CameraDevice::logPrefix() const
>> diff --git a/src/android/camera_device.h b/src/android/camera_device.h
>> index dd9aebba..4e4bb970 100644
>> --- a/src/android/camera_device.h
>> +++ b/src/android/camera_device.h
>> @@ -7,6 +7,7 @@
>>   #ifndef __ANDROID_CAMERA_DEVICE_H__
>>   #define __ANDROID_CAMERA_DEVICE_H__
>>   
>> +#include <deque>
>>   #include <map>
>>   #include <memory>
>>   #include <mutex>
>> @@ -81,6 +82,20 @@ private:
>>   		std::vector<std::unique_ptr<libcamera::FrameBuffer>> frameBuffers_;
>>   		CameraMetadata settings_;
>>   		std::unique_ptr<CaptureRequest> request_;
>> +
>> +		/*
>> +		 * Only set when a capture result needs to be queued before
>> +		 * completion via process_capture_result() callback.
>> +		 */
>> +		enum processingStatus {
>> +			NOT_READY,
>> +			READY_SUCCESS,
>> +			READY_FAILED,
>> +		};
>> +		std::unique_ptr<CameraMetadata> resultMetadata_;
>> +		camera3_capture_result_t captureResult_;
>> +		libcamera::FrameBuffer *srcInternalBuffer_;
>> +		processingStatus status_;
>>   	};
>>   
>>   	enum class State {
>> @@ -100,7 +115,9 @@ private:
>>   	int processControls(Camera3RequestDescriptor *descriptor);
>>   	std::unique_ptr<CameraMetadata> getResultMetadata(
>>   		const Camera3RequestDescriptor &descriptor) const;
>> -
>> +	void streamProcessingComplete(CameraStream *cameraStream,
>> +				      CameraStream::ProcessStatus status,
>> +				      CameraMetadata *resultMetadata);
>>   	unsigned int id_;
>>   	camera3_device_t camera3Device_;
>>   
>> @@ -128,6 +145,8 @@ private:
>>   	int orientation_;
>>   
>>   	CameraMetadata lastSettings_;
>> +
>> +	std::deque<std::unique_ptr<Camera3RequestDescriptor>> queuedDescriptor_;
>>   };
>>   
>>   #endif /* __ANDROID_CAMERA_DEVICE_H__ */
>> diff --git a/src/android/camera_stream.cpp b/src/android/camera_stream.cpp
>> index bdcc7cf9..d5981c0e 100644
>> --- a/src/android/camera_stream.cpp
>> +++ b/src/android/camera_stream.cpp
>> @@ -55,6 +55,7 @@ CameraStream::CameraStream(CameraDevice *const cameraDevice,
>>   		 * is what we instantiate here.
>>   		 */
>>   		postProcessor_ = std::make_unique<PostProcessorJpeg>(cameraDevice_);
>> +		ppWorker_ = std::make_unique<PostProcessorWorker>(postProcessor_.get());
>>   	}
>>   
>>   	if (type == Type::Internal) {
>> @@ -108,22 +109,41 @@ int CameraStream::process(const libcamera::FrameBuffer *source,
>>   	if (!postProcessor_)
>>   		return 0;
>>   
>> -	/*
>> -	 * \todo Buffer mapping and processing should be moved to a
>> -	 * separate thread.
>> -	 */
>> -	CameraBuffer dest(camera3Dest, PROT_READ | PROT_WRITE);
>> -	if (!dest.isValid()) {
>> +	/* \todo Should buffer mapping be moved to post-processor's thread? */
> Good point :-)

umm, what does this indicate?

I prefer to leave the mapping as it is, for now, with \todo '?' 
converted to a statement. We can decide on it later and can be done on top?

>
>> +	dest_ = std::make_unique<CameraBuffer>(camera3Dest, PROT_READ | PROT_WRITE);
>> +	if (!dest_->isValid()) {
>>   		LOG(HAL, Error) << "Failed to map android blob buffer";
>>   		return -EINVAL;
>>   	}
>>   
>> -	return postProcessor_->process(source, &dest, requestMetadata, resultMetadata);
>> +	if (type() != CameraStream::Type::Internal) {
>> +		/*
>> +		 * The source buffer is owned by Request object which can be
>> +		 * reused for future capture request. Since post-processor will
>> +		 * run asynchrnously, we need to copy the source buffer and use
>> +		 * it as source data for post-processor.
>> +		 */
>> +		srcPlanes_.clear();
>> +		for (const auto &plane : source->planes())
>> +			srcPlanes_.push_back(plane);
>> +
>> +		srcFramebuffer_ = std::make_unique<libcamera::FrameBuffer>(srcPlanes_);
>> +		source = srcFramebuffer_.get();
>> +	}
> This will break if the next frame needs to be post-processed before the
> current frame completes. You need a queue of frames to post-process, not
> just one. One option would be to store the data in
> Camera3RequestDescriptor instead, as that the full context for a

Ah, yes, this is thin ice.

But exposing Camera3RequestDescriptor data, I am not sure... maybe we 
can just expose the frame data /only/. Something I'll meditate upon... 
but I agree with the data storage location in Camera3RequestDescriptor

Did you notice, I am /copying/ the frame buffer 'source' ? :-) Does the 
associated comment makes sense to you? Asking because I and Kieran had a 
quite a bit of discussion on this right now.

... In my head right now, I see that 'source' comes from 
libcamera::Request in CameraDevice::requestComplete. So after signal 
handler, the libcamera::Request is disposed or reused (I don't know, I 
haven't seen the 'higher' context of what's happening with request). In 
both cases, the 'source' framebuffer (used for post-processing) might 
get invalidated/segfault, hence the reason of copy the framebuffer for 
post-processing purposes.

This is the context in my head right now, I might be wrong. Does it make 
sense? I'll try to look around for the 'higher' context of the request, 
whether or not it's getting reused or something else

> request, including all its frames. Some refactoring would be required,
> as the structure is current private to CameraDevice.
>
>> +
>> +	ppWorker_->start();
>> +
>> +	return postProcessor_->invokeMethod(&PostProcessor::process,
>> +					    ConnectionTypeQueued,
>> +					    source, dest_.get(),
>> +					    requestMetadata, resultMetadata);
> Is the CameraStream actually the right place to spawn a thread and
> dispatch processing jobs ? Brainstorming a bit, I wonder if this
> shouldn't be moved to the PostProcessor class instead, as the need for a
> thread depends on the type of post-processor. The current software-based
> post-processors definitely need a thread, but a post-processor that
> would use a hardware-accelerated JPEG encoder would be asynchronous by
> nature (posting jobs to the device and being notified of their
> completion through callbacks) and may not need to be wrapped in a
> thread. It would actually also depend on how the hardware JPEG encoder

Yes, this bit needs discussion but certainly you have more look-forward 
ideas than me. The way I envisioned is that, we have 'n' post-processors 
in future. And CameraStream is the one that creates the post-processor 
instances.

For eg. , currently we have in CameraStream constructor:

             postProcessor_ = 
std::make_unique<PostProcessorJpeg>(cameraDevice_);

In a future of multiple post-processors, I see that CameraStream decides 
which postProcessor_ it needs, depending on the context. Speaking  from 
high-level, it feels CameraStream is the one that creates and manages 
the post-processor objects. Hence, it should be the one to decide 
whether or not, to post-processor asynchronously or synchronously; And 
the post-processor implementation should just care about the 
post-processing specific bits.

So the hardware-accelerator JPEG encoder you mentioned, kind of fits 
into this design i.e. letting CameraStream decide if a post-processor 
needs to run in separate threa or not. As the h/w accerlerated encoder 
is async by nature, the CameraStream shouldn't use a thread for this 
post-processor.

There can also be multiple post-processors right, for a single 
CameraStream instance? One chaining into the another... would need to 
think about it too, I guess.

Anyway, this is my vision while writing the series. As you can already 
see, it's limited and I am open to suggestions / experimenting new 
design ideas.

> would be interfaced. If the post-processor were to deal with the kernel
> device directly, it would likely need an event loop, which we're missing
> in the HAL implementation. A thread would provide that event loop, but
> it could be best to use a single thread for all event-driven
> post-processors instead of one per post-processor. This is largely
> theoretical though, we don't have hardware-backed post-processors today.
>
>>   }
>>   
>>   void CameraStream::handlePostProcessing(PostProcessor::Status status,
>>   					CameraMetadata *resultMetadata)
>>   {
>> +	ppWorker_->exit();
> Threads won't consume much resources when they're idle, but starting and
> stopping a thread is a costly operation. I would be better to start the
> thread when starting the camera capture session.

Ok, noted.

Also I am sure I had stop() here, but seems a bad rebase/cherry-pick 
fiasco that the exit() creeped in (from earlier implementations).

>
>> +
>>   	switch (status) {
>>   	case PostProcessor::Status::Success:
>>   		processComplete.emit(this, ProcessStatus::Success, resultMetadata);
>> @@ -134,6 +154,7 @@ void CameraStream::handlePostProcessing(PostProcessor::Status status,
>>   	default:
>>   		LOG(HAL, Error) << "PostProcessor status invalid";
>>   	}
>> +	srcFramebuffer_.reset();
>>   }
>>   
>>   FrameBuffer *CameraStream::getBuffer()
>> diff --git a/src/android/camera_stream.h b/src/android/camera_stream.h
>> index d54d3f58..567d896f 100644
>> --- a/src/android/camera_stream.h
>> +++ b/src/android/camera_stream.h
>> @@ -13,7 +13,9 @@
>>   
>>   #include <hardware/camera3.h>
>>   
>> +#include <libcamera/base/object.h>
>>   #include <libcamera/base/signal.h>
>> +#include <libcamera/base/thread.h>
>>   
>>   #include <libcamera/camera.h>
>>   #include <libcamera/framebuffer.h>
>> @@ -23,6 +25,7 @@
>>   
>>   #include "post_processor.h"
>>   
>> +class CameraBuffer;
>>   class CameraDevice;
>>   class CameraMetadata;
>>   
>> @@ -135,6 +138,38 @@ public:
>>   	libcamera::Signal<CameraStream *, ProcessStatus, CameraMetadata *> processComplete;
>>   
>>   private:
>> +	class PostProcessorWorker : public libcamera::Thread
>> +	{
>> +	public:
>> +		PostProcessorWorker(PostProcessor *postProcessor)
>> +		{
>> +			postProcessor->moveToThread(this);
>> +		}
> Assuming we want to keep the thread inside the CameraStream, I think
> this needs to be reworked a little bit. The post-processor, in this
> case, isn't thread-aware, it gets run in a thread, but doesn't really
> know much about that fact. However, in patch 1/5, you make PostProcessor
> inherit from Object to enabled the invokeMethod() call. There's a design
> conflict between these two facts.
>
> I would propose instead keeping the PostProcessor unaware of threads,
> without inheriting from Object, and making PostProcessorWorker the class
> that inherits from Object and is moved to the thread. The thread could
> stay in PostProcessorWorker by inheriting from both Object and Thread,
> but that's a bit confusing, so I'd recommend adding a libcamera::Thread
> thread_ member variable to CameraStream for the thread. This is the
> design used by the IPA proxies, you can look at the generated code to

Ok, I have noted this down but not commenting right now. I think I need 
to read the code bits again and visualize the flow first

> see how it's implemented (it's more readable than the templates). The
> worker would have a process() function, called from
> CameraStream::process() using invokeMethod(), and that worker process()
> functions would simply call PostProcessor::process() directly.
>
>> +
>> +		void start()
>> +		{
>> +			/*
>> +			 * \todo [BLOCKER] !!!
>> +			 * On second shutter capture, this fails to start the
>> +			 * thread(again). No errors for first shutter capture.
>> +			 */
>> +			Thread::start();
>> +		}
> Why do you reimplement start(), if you only call the same function in
> the parent class ?

Oh, hmm :-S

I think I had moveToThread() here for post-processor but it changed over 
iterations. And I failed to cleanup the ruins..

>
>> +
>> +		void stop()
>> +		{
>> +			exit();
>> +			wait();
>> +		}
> This is never called.
>
>> +
>> +	protected:
>> +		void run() override
>> +		{
>> +			exec();
>> +			dispatchMessages();
>> +		}
>> +	};
>> +
>>   	void handlePostProcessing(PostProcessor::Status status,
>>   				  CameraMetadata *resultMetadata);
>>   
>> @@ -152,6 +187,11 @@ private:
>>   	 */
>>   	std::unique_ptr<std::mutex> mutex_;
>>   	std::unique_ptr<PostProcessor> postProcessor_;
>> +	std::unique_ptr<PostProcessorWorker> ppWorker_;
>> +
>> +	std::unique_ptr<CameraBuffer> dest_;
>> +	std::unique_ptr<libcamera::FrameBuffer> srcFramebuffer_;
>> +	std::vector<libcamera::FrameBuffer::Plane> srcPlanes_;
>>   };
>>   
>>   #endif /* __ANDROID_CAMERA_STREAM__ */