SIMD black level correction -- Re: Soft ISP TODO list
Pavel Machek
pavel at ucw.cz
Tue Apr 23 08:59:53 CEST 2024
Hi!
> > > > > - If we split the processing to pre-bayer and post-bayer parts, we should
> > > > > probably work with uint16_t or float's, which may have impact on performance.
> > > > >
> > > > > - Pavel couldn't get a better performance by using SIMD CPU instructions for
> > > > > debayering. Applying a CCM matrix may be a different matter. Anyway, SIMD on
> > > > > CPU is hard to use and may differ on architectures, so the question is whether
> > > > > it's worth to invest into it.
> > >
> > > Good question :-)
> >
> > Oh, so good news is you write SIMD code once with gcc intristics, and
> > gcc does its magic. You don't have to know assembly for that, but it
> > certainly helps to look at the assembly if it looks reasonable.
>
> There are also potentially interesting helper libraries such as
> https://github.com/vectorclass/version2 (I haven't checked the license
> compatibility).
Ok, so I played with black level correction a bit and got pleasant
surprise: [Ignore wrong name].
void debayer8(uint8_t *dst, const uint8_t *src)
{
for (int x = 0; x < (int)WIDTH; x++) {
uint8_t v = src[x];
if (v < 16)
dst[x] = 0;
else
dst[x] = v-16;
}
}
gcc translates it to vector code automatically, and results is only
10% slower than plain memcpy. Test was done on thinkpad x60. If I
disable vector instructions, result is 4x time of plain memcpy. I'm
quite impressed both by vector unit and by the gcc :-).
Best regards,
Pavel
00001340 <debayer8>:
1340: e8 f0 ff ff ff call 1335 <__x86.get_pc_thunk.dx>
1345: 81 c2 af 2c 00 00 add $0x2caf,%edx
134b: 56 push %esi
134c: 53 push %ebx
134d: 8b 4c 24 0c mov 0xc(%esp),%ecx
1351: 8b 5c 24 10 mov 0x10(%esp),%ebx
1355: 89 c8 mov %ecx,%eax
1357: 8d 73 01 lea 0x1(%ebx),%esi
135a: 29 f0 sub %esi,%eax
135c: 83 f8 0e cmp $0xe,%eax
135f: b8 00 00 00 00 mov $0x0,%eax
1364: 76 58 jbe 13be <debayer8+0x7e>
1366: 66 0f 6f a2 4c e0 ff movdqa -0x1fb4(%edx),%xmm4
136d: ff
136e: 66 0f 6f 9a 5c e0 ff movdqa -0x1fa4(%edx),%xmm3
1375: ff
1376: 66 0f ef d2 pxor %xmm2,%xmm2
137a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
1380: f3 0f 6f 04 03 movdqu (%ebx,%eax,1),%xmm0
1385: f3 0f 6f 0c 03 movdqu (%ebx,%eax,1),%xmm1
138a: 66 0f d8 c4 psubusb %xmm4,%xmm0
138e: 66 0f fc cb paddb %xmm3,%xmm1
1392: 66 0f 74 c2 pcmpeqb %xmm2,%xmm0
1396: 66 0f df c1 pandn %xmm1,%xmm0
139a: 0f 11 04 01 movups %xmm0,(%ecx,%eax,1)
139e: 83 c0 10 add $0x10,%eax
13a1: 3d 80 07 00 00 cmp $0x780,%eax
13a6: 75 d8 jne 1380 <debayer8+0x40>
13a8: 5b pop %ebx
13a9: 5e pop %esi
13aa: c3 ret
13ab: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
13af: 90 nop
13b0: c6 04 01 00 movb $0x0,(%ecx,%eax,1)
13b4: 83 c0 01 add $0x1,%eax
13b7: 3d 80 07 00 00 cmp $0x780,%eax
13bc: 74 ea je 13a8 <debayer8+0x68>
13be: 0f b6 14 03 movzbl (%ebx,%eax,1),%edx
13c2: 80 fa 0f cmp $0xf,%dl
13c5: 76 e9 jbe 13b0 <debayer8+0x70>
13c7: 83 ea 10 sub $0x10,%edx
13ca: 88 14 01 mov %dl,(%ecx,%eax,1)
13cd: 83 c0 01 add $0x1,%eax
13d0: 3d 80 07 00 00 cmp $0x780,%eax
13d5: 75 e7 jne 13be <debayer8+0x7e>
13d7: eb cf jmp 13a8 <debayer8+0x68>
13d9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
--
People of Russia, stop Putin before his war on Ukraine escalates.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://lists.libcamera.org/pipermail/libcamera-devel/attachments/20240423/0a917a10/attachment.sig>
More information about the libcamera-devel
mailing list