[PATCH v2 00/37] Add GLES 2.0 GPUISP to libcamera
Bryan O'Donoghue
bryan.odonoghue at linaro.org
Sun Aug 24 02:48:12 CEST 2025
v2:
This version 2 is an incomplete update with-respect-to previous comment
feedback, which ordinarily I would not publish however, given OSSEU is
starting on Monday and we have talk about this topic, in addition to some
pretty good progress in the interregnum I thought a v2 would be
appropriate.
- V2 drops use of GBM surface in favour of generating a framebuffer from
the dma-buf handle, called render-to-texture.
The conversion from GBM surface + memcpy() including the associated cache
invalidate has a dramatic effect on GPUISP performance.
Some rough stats for a Qualcomm sm8250 "kona" device with an imx517
sensor @ 4048 x 3040 ABRG8888 - debug builds
CPUISP + CCM:
2 FPS CPU usage > 100% single core pulls about 9 watts
GPUISP v1 + CCM:
14 FPS - power not measured
GPUISP v2 + CCM:
30 FPS - sensor linerate - CPU usage ~ 70 % pulling 8 Watts.
Milan Zamal has reported a TI AM69 + imx219 - unknown resolution
CPUISP 4 FPS
GPUISP v2 - 2 or 3 FPS
GPUISP v2 - 15 FPS == sensor linerate
In other words for these boards we can hit linerate with GPUISP + 3A +
CCM.
- Drop GBM surface rendering
- Drop swapbuffers
- Use eglCreateImageKHR to directly render into the output dma-buf buffer
eglCreateImageKHR lets you specify the FOURCC of the texture which means
we can create the texture in the uncompressed target output pixel format
we want.
- Fix stride calculation to 256 bytes
Laurent and Maxime explained to me about GPU stride alignments being
tribal wisdom and that 256 bytes is a good cross-platform value.
This helped to get the render-to-texture command right.
- A synchronous blocking wait is used to ensure GPU operations have
completed. Laurent wants this to be made async.
At the moment its not clear to me the eglWaitSyncKHR is really required
and in any case doesn't seem to have any performance impact.
But this part is still TBD - I've included the sync wait for simplicity
and safety.
- A Debayer::stop() method has been introduced to ensure we call
eglDestroySyncKHR when the eGL context is valid, as opposed to in the
callchain of destructors triggering eGL::~eGL();
- stats move constructor call chain dropped - Branabas
- Incorporates Milan's area-of-interest constraint for Bayer stats
i.e. squashes his v3 update into debayer_egl.cpp directly
- Moves ALIGN_TO into a common area to facilitate its reuse in
egl.cpp
- Rebases on 0.5.2
- There are a number of known checks failing on the CI loop right now
Link to v1: https://lists.libcamera.org/pipermail/libcamera-devel/2025-June/050692.html
v1:
This series introduces a GLES 2.0 GPU ISP to libcamera.
We have had extensive discussions, meetings and collaborative discussions
about this topic over the last year or so.
As an overview we want to start to move as much processing of software_isp
into the GPU as possible. This is especially advantageous when we are
talking about processing a framebuffer's worth of pixels as quickly as
possible.
The decision to use GLES 2.0 instead of say Vulcan stems from a desire to
support as much in the way of older hardware as possible and the fact we
already have upstream GLES 2.0 fragment shaders to do debayer.
Generally the approach is
- Move the fragment shaders out of qcam and into a common location
- Update the existing SoftwareISP Debayer/DebayerCPU pair to facilitate
addition of a new class DebayerEGL.
- Introduce that class
- Then do progressive change of the shaders and DebayerEGL class to make
the modifications as transparent as possible in the git log.
- Reuse as much of the SoftIPA data-structures and logic as possible.
- Consume the data from SoftIPA in the Debayer Shaders so that CPUISP and
GPUISP give similar - hopefully the same results but with GPUISP going
faster.
In order to get untiled and uncompressed pixel data out of the GPU
framebuffer we need to tell the GPU how to store the data it is writing to
that framebuffer. GPUs can store their framebuffer data in tiled or even
compressed formats which is why the naive approach of running your fragment
shader and then using glReadPixels(GL_RGBA); will be horrendously slow as
glReadPixels must convert from the internal GPU format to the requested
output format - an operation that for me takes ~ 10 milliseconds per frame.
Instead we get the GPU to store its data as ARGB8888 swap buffers and
memcpy() from the swapped buffer to our output frame. Right now this series
supports 32 bit output formats only.
The memcpy() also entails flushing the cache of the target buffer as per
the terms of the dma-buf software contract.
This leads us onto the main outstanding TODOs
- 24 bit GBM buffer support leading
- 24 bit output framebuffer support
- Surfaceless GBM and eGL context with no swapbuffer
- Render to texture
If we render directly to a buffer provided to the GPU the output
buffer we will not need to memcpy() to the output buffer
nor will we need to invalidate the output buffer cache.
- eglCreateImageKHR for the texture upload.
This list is of the colour "make it go faster" not "make it work" which is
why we are moving to start to submit a v1 for discussion in the full
realisation it will have to go through several cycles of review giving us
the opportunity to fix:
- Doxygen is missing for new classes and methods
- Some of the pipelines don't complete in gitlab
- 24 bit output seems doable before merge
- Render to texture perhaps even too
For me on my Qualcomm hardware GPUISP works very well I get 30fps in qcam
with about 75% CPU usage versus > 100% - cam goes faster which to me
implies a good bit of time is being consumed in qcam itself.
The series starts out with fixes and updates from Hans and finishes it out
with shader modifications from Milan both of whom along with Kieran,
Laurent and Maxime I'd like to thank for being some helpful and patient.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue at linaro.org>
---
Bryan O'Donoghue (28):
libcamera: MappedFrameBuffer: Add MappedFrameBuffer::getPlaneFD()
libcamera: software_isp: Move useful items from DebayerCpu to Debayer base class
libcamera: software_isp: Move Bayer params init from DebayerCpu to Debayer
libcamera: software_isp: Move param select code to Debayer base class
libcamera: software_isp: Move DMA Sync code to Debayer base class
libcamera: software_isp: Move isStandardBayerOrder to base class
libcamera: software_isp: Start the ISP thread in configure
libcamera: software_isp: Move configure to worker thread
libcamera: software_isp: debayer: Make the debayer_ object of type class Debayer not DebayerCpu
libcamera: software_isp: debayer: Extend DebayerParams struct to hold a copy of per-frame CCM values
libcamera: software_isp: debayer: Introduce a stop() callback to the debayer object
libcamera: shaders: Move GL shader programs to src/libcamera/assets/shader
utils: gen-shader-headers: Add a utility to generate headers from shaders
meson: Automatically generate glsl_shaders.h from specified shader programs
libcamera: software_isp: ccm: Populate CCM table to Debayer params structure
libcamera: software_isp: lut: Make gain corrected CCM in lut.cpp available in debayer params
libcamera: software_isp: gbm: Add in a GBM helper class for GPU surface access
libcamera: utils: Move ALIGN_TO from camera_metadata.c to utils.h
libcamera: software_isp: egl: Introduce an eGL base helper class
libcamera: software_isp: debayer_egl: Add an eGL debayer class
libcamera: software_isp: debayer_egl: Make DebayerEGL an environment option
libcamera: shaders: Use highp not mediump for float precision
libcamera: shaders: Extend debayer shaders to apply RGB gain values on output
libcamera: software_isp: Switch on uncalibrated CCM to validate eGLDebayer
libcamera: software_isp: Make isStandardBayerOrder static
libcamera: software_isp: debayer_cpu: Make getInputConfig and getOutputConfig static
libcamera: shaders: Extend bayer shaders to support swapping R and B on output
libcamera: software_isp: Add a gpuisp todo list
Hans de Goede (5):
libcamera: swstats_cpu: Update statsProcessFn() / processLine0() documentation
libcamera: swstats_cpu: Drop patternSize_ documentation
libcamera: swstats_cpu: Move header to libcamera/internal/software_isp
libcamera: software_isp: Move benchmark code to its own class
libcamera: swstats_cpu: Add processFrame() method
Milan Zamazal (4):
libcamera: shaders: Fix neighbouring positions in 8-bit debayering
libcamera: software_isp: GPU support for unpacked 10/12-bit formats
libcamera: shaders: Rename bayer_8 to bayer_unpacked
libcamera: software_isp: Reduce statistics image area
include/libcamera/base/utils.h | 3 +
include/libcamera/internal/egl.h | 133 +++++
include/libcamera/internal/gbm.h | 39 ++
include/libcamera/internal/mapped_framebuffer.h | 4 +
include/libcamera/internal/meson.build | 11 +
.../libcamera/internal/shaders}/RGB.frag | 2 +-
.../libcamera/internal/shaders}/YUV_2_planes.frag | 2 +-
.../libcamera/internal/shaders}/YUV_3_planes.frag | 2 +-
.../libcamera/internal/shaders}/YUV_packed.frag | 2 +-
.../internal/shaders}/bayer_1x_packed.frag | 62 +-
.../libcamera/internal/shaders/bayer_unpacked.frag | 78 ++-
.../libcamera/internal/shaders/bayer_unpacked.vert | 8 +-
.../libcamera/internal/shaders}/identity.vert | 0
include/libcamera/internal/shaders/meson.build | 10 +
.../libcamera/internal/software_isp/benchmark.h | 36 ++
.../internal/software_isp/debayer_params.h | 7 +
.../libcamera/internal/software_isp/meson.build | 2 +
.../libcamera/internal/software_isp/software_isp.h | 5 +-
.../libcamera/internal}/software_isp/swstats_cpu.h | 12 +
src/android/metadata/camera_metadata.c | 4 +-
src/apps/qcam/assets/shader/shaders.qrc | 16 +-
src/apps/qcam/viewfinder_gl.cpp | 70 +--
src/ipa/simple/algorithms/ccm.cpp | 4 +-
src/ipa/simple/algorithms/lut.cpp | 1 +
src/ipa/simple/data/uncalibrated.yaml | 12 +-
src/libcamera/egl.cpp | 408 +++++++++++++
src/libcamera/gbm.cpp | 61 ++
src/libcamera/mapped_framebuffer.cpp | 10 +
src/libcamera/meson.build | 34 ++
src/libcamera/software_isp/benchmark.cpp | 93 +++
src/libcamera/software_isp/debayer.cpp | 61 ++
src/libcamera/software_isp/debayer.h | 42 +-
src/libcamera/software_isp/debayer_cpu.cpp | 88 +--
src/libcamera/software_isp/debayer_cpu.h | 44 +-
src/libcamera/software_isp/debayer_egl.cpp | 628 +++++++++++++++++++++
src/libcamera/software_isp/debayer_egl.h | 171 ++++++
src/libcamera/software_isp/gpuisp-todo.txt | 61 ++
src/libcamera/software_isp/meson.build | 9 +
src/libcamera/software_isp/software_isp.cpp | 40 +-
src/libcamera/software_isp/swstats_cpu.cpp | 89 ++-
utils/gen-shader-header.py | 38 ++
utils/gen-shader-headers.sh | 44 ++
utils/meson.build | 2 +
43 files changed, 2236 insertions(+), 212 deletions(-)
---
base-commit: 1bd66f54a6bc928f99e321630f43d200df4d3579
change-id: 20250823-b4-v0-5-2-gpuisp-v2-a-d40b3b78d741
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue at linaro.org>
More information about the libcamera-devel
mailing list