[libcamera-devel] [RFC PATCH 5/5] android: Run post processing in a separate thread
Umang Jain
umang.jain at ideasonboard.com
Fri Aug 27 16:00:56 CEST 2021
Hi Laurent
On 8/27/21 8:34 AM, Laurent Pinchart wrote:
> Hi Umang,
>
> Thank you for the patch.
>
> I'll start with a few high-level comments.
Great, I was expecting exactly this.
>
> On Fri, Aug 27, 2021 at 03:00:16AM +0530, Umang Jain wrote:
>> In CameraStream, introduce a new private PostProcessorWorker
>> class derived from Thread. The post-processor shall run in
>> this derived thread instance asynchronously.
>>
>> Running PostProcessor asynchronously should entail that all the
>> data needed by the PostProcessor should remain valid for the entire
>> duration of its run. In this instance, we currently need to ensure:
>>
>> - 'source' FrameBuffer for the post processor
>> - Camera result metadata
>>
>> Should be valid and saved with preserving the entire context. The
>> 'source' framebuffer is copied internally for streams other than
>> internal (since internal ones are allocated by CameraStream class
>> itself) and the copy is passed along to post-processor.
>>
>> Coming to preserving the context, a quick reference on how captured
>> results are sent to android framework. Completed captured results
>> should be sent using process_capture_result() callback. The order
>> of sending them should match the order the capture request
>> (camera3_capture_request_t), that was submitted to the HAL in the first
>> place.
>>
>> In order to enforce the ordering, we need to maintain a queue. When a
>> stream requires post-processing, we save the associated capture results
>> (via Camera3RequestDescriptor) and add it to the queue. All transient
>> completed captured results are queued as well. When the post-processing
>> results are available, we can simply start de-queuing all the queued
>> completed captured results and invoke process_capture_result() callback
>> on each of them.
>>
>> The context saving part is done by Camera3RequestDescriptor. It is
>> further extended to include data-structs to fulfill the demands of
>> async post-processing.
> To ease review, do you think it would be possible to split this patch in
> two, with a first part that starts making use of the signals but doesn't
> move the post-processor to a separate threads
Ah, indeed, we can break it with use of signal emitting in the signals
in same thread and then introducing the threading mechanism in a
separate patch (Why didn't I think of this)
>
>> Signed-off-by: Umang Jain <umang.jain at ideasonboard.com>
>> ---
>> src/android/camera_device.cpp | 92 +++++++++++++++++++++++++++++++----
>> src/android/camera_device.h | 21 +++++++-
>> src/android/camera_stream.cpp | 35 ++++++++++---
>> src/android/camera_stream.h | 40 +++++++++++++++
>> 4 files changed, 170 insertions(+), 18 deletions(-)
>>
>> diff --git a/src/android/camera_device.cpp b/src/android/camera_device.cpp
>> index fea59ce6..08237187 100644
>> --- a/src/android/camera_device.cpp
>> +++ b/src/android/camera_device.cpp
>> @@ -960,6 +960,7 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>> * and add them to the Request if required.
>> */
>> FrameBuffer *buffer = nullptr;
>> + descriptor.srcInternalBuffer_ = nullptr;
>> switch (cameraStream->type()) {
>> case CameraStream::Type::Mapped:
>> /*
>> @@ -990,6 +991,7 @@ int CameraDevice::processCaptureRequest(camera3_capture_request_t *camera3Reques
>> * once it has been processed.
>> */
>> buffer = cameraStream->getBuffer();
>> + descriptor.srcInternalBuffer_ = buffer;
>> LOG(HAL, Debug) << ss.str() << " (internal)";
>> break;
>> }
>> @@ -1149,25 +1151,95 @@ void CameraDevice::requestComplete(Request *request)
>> continue;
>> }
>>
>> + /*
>> + * Save the current context of capture result and queue the
>> + * descriptor. cameraStream will process asynchronously, so
>> + * we can only send the results back after the processing has
>> + * been completed.
>> + */
>> + descriptor.status_ = Camera3RequestDescriptor::NOT_READY;
>> + descriptor.resultMetadata_ = std::move(resultMetadata);
>> + descriptor.captureResult_ = captureResult;
>> +
>> + cameraStream->processComplete.connect(this, &CameraDevice::streamProcessingComplete);
> While signals can be connected and disconnected on demand, it often
> indicates a design issue. Is there a reason why you can't connect the
> signal when the cameraStream instance is created, and keep it connected
> for the whole lifetime of the stream ?
yes, I can keep it connected for the while lifetime of the stream too...
>
>> int ret = cameraStream->process(src, *buffer.buffer,
>> descriptor.settings_,
>> - resultMetadata.get());
>> + descriptor.resultMetadata_.get());
> You have a race condition here, if cameraStream->process() runs in a
> separate thread, it could complete and emit the processComplete signal
> before the code below gets executed.
ouch, that's not right order. You are correct. I should queue the
descriptor first before process().. Good spot!
>
>> + std::unique_ptr<Camera3RequestDescriptor> reqDescriptor =
>> + std::make_unique<Camera3RequestDescriptor>();
>> + *reqDescriptor = std::move(descriptor);
>> + queuedDescriptor_.push_back(std::move(reqDescriptor));
>> +
>> + return;
>> + }
>> +
>> + if (queuedDescriptor_.empty()) {
>> + captureResult.result = resultMetadata->get();
>> + callbacks_->process_capture_result(callbacks_, &captureResult);
>> + } else {
>> + /*
>> + * Previous capture results waiting to be sent to framework
>> + * hence, queue the current capture results as well. After that,
>> + * check if any results are ready to be sent back to the
>> + * framework.
>> + */
>> + descriptor.status_ = Camera3RequestDescriptor::READY_SUCCESS;
>> + descriptor.resultMetadata_ = std::move(resultMetadata);
>> + descriptor.captureResult_ = captureResult;
>> + std::unique_ptr<Camera3RequestDescriptor> reqDescriptor =
>> + std::make_unique<Camera3RequestDescriptor>();
>> + *reqDescriptor = std::move(descriptor);
>> + queuedDescriptor_.push_back(std::move(reqDescriptor));
>> +
>> + while (!queuedDescriptor_.empty()) {
>> + std::unique_ptr<Camera3RequestDescriptor> &d = queuedDescriptor_.front();
>> + if (d->status_ != Camera3RequestDescriptor::NOT_READY) {
>> + d->captureResult_.result = d->resultMetadata_->get();
>> + callbacks_->process_capture_result(callbacks_, &(d->captureResult_));
>> + queuedDescriptor_.pop_front();
>> + } else {
>> + break;
>> + }
>> + }
>> + }
>> +}
>> +
>> +void CameraDevice::streamProcessingComplete(CameraStream *cameraStream,
>> + CameraStream::ProcessStatus status,
>> + CameraMetadata *resultMetadata)
>> +{
>> + cameraStream->processComplete.disconnect(this, &CameraDevice::streamProcessingComplete);
>> +
>> + for (auto &d : queuedDescriptor_) {
>> + if (d->resultMetadata_.get() != resultMetadata)
>> + continue;
>> +
>> /*
>> * Return the FrameBuffer to the CameraStream now that we're
>> * done processing it.
>> */
>> if (cameraStream->type() == CameraStream::Type::Internal)
>> - cameraStream->putBuffer(src);
>> -
>> - if (ret) {
>> - buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
>> - notifyError(descriptor.frameNumber_, buffer.stream,
>> - CAMERA3_MSG_ERROR_BUFFER);
>> + cameraStream->putBuffer(d->srcInternalBuffer_);
>> +
>> + if (status == CameraStream::ProcessStatus::Success) {
>> + d->status_ = Camera3RequestDescriptor::READY_SUCCESS;
>> + } else {
>> + /* \todo this is clumsy error handling, be better. */
>> + d->status_ = Camera3RequestDescriptor::READY_FAILED;
>> + for (camera3_stream_buffer_t &buffer : d->buffers_) {
>> + CameraStream *cs = static_cast<CameraStream *>(buffer.stream->priv);
>> +
>> + if (cs->camera3Stream().format != HAL_PIXEL_FORMAT_BLOB)
>> + continue;
>> +
>> + buffer.status = CAMERA3_BUFFER_STATUS_ERROR;
>> + notifyError(d->frameNumber_, buffer.stream,
>> + CAMERA3_MSG_ERROR_BUFFER);
>> + }
>> }
>> - }
>>
>> - captureResult.result = resultMetadata->get();
>> - callbacks_->process_capture_result(callbacks_, &captureResult);
>> + return;
>> + }
>> }
>>
>> std::string CameraDevice::logPrefix() const
>> diff --git a/src/android/camera_device.h b/src/android/camera_device.h
>> index dd9aebba..4e4bb970 100644
>> --- a/src/android/camera_device.h
>> +++ b/src/android/camera_device.h
>> @@ -7,6 +7,7 @@
>> #ifndef __ANDROID_CAMERA_DEVICE_H__
>> #define __ANDROID_CAMERA_DEVICE_H__
>>
>> +#include <deque>
>> #include <map>
>> #include <memory>
>> #include <mutex>
>> @@ -81,6 +82,20 @@ private:
>> std::vector<std::unique_ptr<libcamera::FrameBuffer>> frameBuffers_;
>> CameraMetadata settings_;
>> std::unique_ptr<CaptureRequest> request_;
>> +
>> + /*
>> + * Only set when a capture result needs to be queued before
>> + * completion via process_capture_result() callback.
>> + */
>> + enum processingStatus {
>> + NOT_READY,
>> + READY_SUCCESS,
>> + READY_FAILED,
>> + };
>> + std::unique_ptr<CameraMetadata> resultMetadata_;
>> + camera3_capture_result_t captureResult_;
>> + libcamera::FrameBuffer *srcInternalBuffer_;
>> + processingStatus status_;
>> };
>>
>> enum class State {
>> @@ -100,7 +115,9 @@ private:
>> int processControls(Camera3RequestDescriptor *descriptor);
>> std::unique_ptr<CameraMetadata> getResultMetadata(
>> const Camera3RequestDescriptor &descriptor) const;
>> -
>> + void streamProcessingComplete(CameraStream *cameraStream,
>> + CameraStream::ProcessStatus status,
>> + CameraMetadata *resultMetadata);
>> unsigned int id_;
>> camera3_device_t camera3Device_;
>>
>> @@ -128,6 +145,8 @@ private:
>> int orientation_;
>>
>> CameraMetadata lastSettings_;
>> +
>> + std::deque<std::unique_ptr<Camera3RequestDescriptor>> queuedDescriptor_;
>> };
>>
>> #endif /* __ANDROID_CAMERA_DEVICE_H__ */
>> diff --git a/src/android/camera_stream.cpp b/src/android/camera_stream.cpp
>> index bdcc7cf9..d5981c0e 100644
>> --- a/src/android/camera_stream.cpp
>> +++ b/src/android/camera_stream.cpp
>> @@ -55,6 +55,7 @@ CameraStream::CameraStream(CameraDevice *const cameraDevice,
>> * is what we instantiate here.
>> */
>> postProcessor_ = std::make_unique<PostProcessorJpeg>(cameraDevice_);
>> + ppWorker_ = std::make_unique<PostProcessorWorker>(postProcessor_.get());
>> }
>>
>> if (type == Type::Internal) {
>> @@ -108,22 +109,41 @@ int CameraStream::process(const libcamera::FrameBuffer *source,
>> if (!postProcessor_)
>> return 0;
>>
>> - /*
>> - * \todo Buffer mapping and processing should be moved to a
>> - * separate thread.
>> - */
>> - CameraBuffer dest(camera3Dest, PROT_READ | PROT_WRITE);
>> - if (!dest.isValid()) {
>> + /* \todo Should buffer mapping be moved to post-processor's thread? */
> Good point :-)
umm, what does this indicate?
I prefer to leave the mapping as it is, for now, with \todo '?'
converted to a statement. We can decide on it later and can be done on top?
>
>> + dest_ = std::make_unique<CameraBuffer>(camera3Dest, PROT_READ | PROT_WRITE);
>> + if (!dest_->isValid()) {
>> LOG(HAL, Error) << "Failed to map android blob buffer";
>> return -EINVAL;
>> }
>>
>> - return postProcessor_->process(source, &dest, requestMetadata, resultMetadata);
>> + if (type() != CameraStream::Type::Internal) {
>> + /*
>> + * The source buffer is owned by Request object which can be
>> + * reused for future capture request. Since post-processor will
>> + * run asynchrnously, we need to copy the source buffer and use
>> + * it as source data for post-processor.
>> + */
>> + srcPlanes_.clear();
>> + for (const auto &plane : source->planes())
>> + srcPlanes_.push_back(plane);
>> +
>> + srcFramebuffer_ = std::make_unique<libcamera::FrameBuffer>(srcPlanes_);
>> + source = srcFramebuffer_.get();
>> + }
> This will break if the next frame needs to be post-processed before the
> current frame completes. You need a queue of frames to post-process, not
> just one. One option would be to store the data in
> Camera3RequestDescriptor instead, as that the full context for a
Ah, yes, this is thin ice.
But exposing Camera3RequestDescriptor data, I am not sure... maybe we
can just expose the frame data /only/. Something I'll meditate upon...
but I agree with the data storage location in Camera3RequestDescriptor
Did you notice, I am /copying/ the frame buffer 'source' ? :-) Does the
associated comment makes sense to you? Asking because I and Kieran had a
quite a bit of discussion on this right now.
... In my head right now, I see that 'source' comes from
libcamera::Request in CameraDevice::requestComplete. So after signal
handler, the libcamera::Request is disposed or reused (I don't know, I
haven't seen the 'higher' context of what's happening with request). In
both cases, the 'source' framebuffer (used for post-processing) might
get invalidated/segfault, hence the reason of copy the framebuffer for
post-processing purposes.
This is the context in my head right now, I might be wrong. Does it make
sense? I'll try to look around for the 'higher' context of the request,
whether or not it's getting reused or something else
> request, including all its frames. Some refactoring would be required,
> as the structure is current private to CameraDevice.
>
>> +
>> + ppWorker_->start();
>> +
>> + return postProcessor_->invokeMethod(&PostProcessor::process,
>> + ConnectionTypeQueued,
>> + source, dest_.get(),
>> + requestMetadata, resultMetadata);
> Is the CameraStream actually the right place to spawn a thread and
> dispatch processing jobs ? Brainstorming a bit, I wonder if this
> shouldn't be moved to the PostProcessor class instead, as the need for a
> thread depends on the type of post-processor. The current software-based
> post-processors definitely need a thread, but a post-processor that
> would use a hardware-accelerated JPEG encoder would be asynchronous by
> nature (posting jobs to the device and being notified of their
> completion through callbacks) and may not need to be wrapped in a
> thread. It would actually also depend on how the hardware JPEG encoder
Yes, this bit needs discussion but certainly you have more look-forward
ideas than me. The way I envisioned is that, we have 'n' post-processors
in future. And CameraStream is the one that creates the post-processor
instances.
For eg. , currently we have in CameraStream constructor:
postProcessor_ =
std::make_unique<PostProcessorJpeg>(cameraDevice_);
In a future of multiple post-processors, I see that CameraStream decides
which postProcessor_ it needs, depending on the context. Speaking from
high-level, it feels CameraStream is the one that creates and manages
the post-processor objects. Hence, it should be the one to decide
whether or not, to post-processor asynchronously or synchronously; And
the post-processor implementation should just care about the
post-processing specific bits.
So the hardware-accelerator JPEG encoder you mentioned, kind of fits
into this design i.e. letting CameraStream decide if a post-processor
needs to run in separate threa or not. As the h/w accerlerated encoder
is async by nature, the CameraStream shouldn't use a thread for this
post-processor.
There can also be multiple post-processors right, for a single
CameraStream instance? One chaining into the another... would need to
think about it too, I guess.
Anyway, this is my vision while writing the series. As you can already
see, it's limited and I am open to suggestions / experimenting new
design ideas.
> would be interfaced. If the post-processor were to deal with the kernel
> device directly, it would likely need an event loop, which we're missing
> in the HAL implementation. A thread would provide that event loop, but
> it could be best to use a single thread for all event-driven
> post-processors instead of one per post-processor. This is largely
> theoretical though, we don't have hardware-backed post-processors today.
>
>> }
>>
>> void CameraStream::handlePostProcessing(PostProcessor::Status status,
>> CameraMetadata *resultMetadata)
>> {
>> + ppWorker_->exit();
> Threads won't consume much resources when they're idle, but starting and
> stopping a thread is a costly operation. I would be better to start the
> thread when starting the camera capture session.
Ok, noted.
Also I am sure I had stop() here, but seems a bad rebase/cherry-pick
fiasco that the exit() creeped in (from earlier implementations).
>
>> +
>> switch (status) {
>> case PostProcessor::Status::Success:
>> processComplete.emit(this, ProcessStatus::Success, resultMetadata);
>> @@ -134,6 +154,7 @@ void CameraStream::handlePostProcessing(PostProcessor::Status status,
>> default:
>> LOG(HAL, Error) << "PostProcessor status invalid";
>> }
>> + srcFramebuffer_.reset();
>> }
>>
>> FrameBuffer *CameraStream::getBuffer()
>> diff --git a/src/android/camera_stream.h b/src/android/camera_stream.h
>> index d54d3f58..567d896f 100644
>> --- a/src/android/camera_stream.h
>> +++ b/src/android/camera_stream.h
>> @@ -13,7 +13,9 @@
>>
>> #include <hardware/camera3.h>
>>
>> +#include <libcamera/base/object.h>
>> #include <libcamera/base/signal.h>
>> +#include <libcamera/base/thread.h>
>>
>> #include <libcamera/camera.h>
>> #include <libcamera/framebuffer.h>
>> @@ -23,6 +25,7 @@
>>
>> #include "post_processor.h"
>>
>> +class CameraBuffer;
>> class CameraDevice;
>> class CameraMetadata;
>>
>> @@ -135,6 +138,38 @@ public:
>> libcamera::Signal<CameraStream *, ProcessStatus, CameraMetadata *> processComplete;
>>
>> private:
>> + class PostProcessorWorker : public libcamera::Thread
>> + {
>> + public:
>> + PostProcessorWorker(PostProcessor *postProcessor)
>> + {
>> + postProcessor->moveToThread(this);
>> + }
> Assuming we want to keep the thread inside the CameraStream, I think
> this needs to be reworked a little bit. The post-processor, in this
> case, isn't thread-aware, it gets run in a thread, but doesn't really
> know much about that fact. However, in patch 1/5, you make PostProcessor
> inherit from Object to enabled the invokeMethod() call. There's a design
> conflict between these two facts.
>
> I would propose instead keeping the PostProcessor unaware of threads,
> without inheriting from Object, and making PostProcessorWorker the class
> that inherits from Object and is moved to the thread. The thread could
> stay in PostProcessorWorker by inheriting from both Object and Thread,
> but that's a bit confusing, so I'd recommend adding a libcamera::Thread
> thread_ member variable to CameraStream for the thread. This is the
> design used by the IPA proxies, you can look at the generated code to
Ok, I have noted this down but not commenting right now. I think I need
to read the code bits again and visualize the flow first
> see how it's implemented (it's more readable than the templates). The
> worker would have a process() function, called from
> CameraStream::process() using invokeMethod(), and that worker process()
> functions would simply call PostProcessor::process() directly.
>
>> +
>> + void start()
>> + {
>> + /*
>> + * \todo [BLOCKER] !!!
>> + * On second shutter capture, this fails to start the
>> + * thread(again). No errors for first shutter capture.
>> + */
>> + Thread::start();
>> + }
> Why do you reimplement start(), if you only call the same function in
> the parent class ?
Oh, hmm :-S
I think I had moveToThread() here for post-processor but it changed over
iterations. And I failed to cleanup the ruins..
>
>> +
>> + void stop()
>> + {
>> + exit();
>> + wait();
>> + }
> This is never called.
>
>> +
>> + protected:
>> + void run() override
>> + {
>> + exec();
>> + dispatchMessages();
>> + }
>> + };
>> +
>> void handlePostProcessing(PostProcessor::Status status,
>> CameraMetadata *resultMetadata);
>>
>> @@ -152,6 +187,11 @@ private:
>> */
>> std::unique_ptr<std::mutex> mutex_;
>> std::unique_ptr<PostProcessor> postProcessor_;
>> + std::unique_ptr<PostProcessorWorker> ppWorker_;
>> +
>> + std::unique_ptr<CameraBuffer> dest_;
>> + std::unique_ptr<libcamera::FrameBuffer> srcFramebuffer_;
>> + std::vector<libcamera::FrameBuffer::Plane> srcPlanes_;
>> };
>>
>> #endif /* __ANDROID_CAMERA_STREAM__ */
More information about the libcamera-devel
mailing list