SIMD black level correction -- Re: Soft ISP TODO list

Pavel Machek pavel at ucw.cz
Tue Apr 23 08:59:53 CEST 2024


Hi!

> > > > > - If we split the processing to pre-bayer and post-bayer parts, we should
> > > > >   probably work with uint16_t or float's, which may have impact on performance.
> > > > > 
> > > > > - Pavel couldn't get a better performance by using SIMD CPU instructions for
> > > > >   debayering.  Applying a CCM matrix may be a different matter.  Anyway, SIMD on
> > > > >   CPU is hard to use and may differ on architectures, so the question is whether
> > > > >   it's worth to invest into it.
> > > 
> > > Good question :-)
> > 
> > Oh, so good news is you write SIMD code once with gcc intristics, and
> > gcc does its magic. You don't have to know assembly for that, but it
> > certainly helps to look at the assembly if it looks reasonable.
> 
> There are also potentially interesting helper libraries such as
> https://github.com/vectorclass/version2 (I haven't checked the license
> compatibility).

Ok, so I played with black level correction a bit and got pleasant
surprise: [Ignore wrong name].

void debayer8(uint8_t *dst, const uint8_t *src)
{
        for (int x = 0; x < (int)WIDTH; x++)  {
                uint8_t v = src[x];
                if (v < 16)
                        dst[x] = 0;
                else
                        dst[x] = v-16;
	}
}

gcc translates it to vector code automatically, and results is only
10% slower than plain memcpy. Test was done on thinkpad x60. If I
disable vector instructions, result is 4x time of plain memcpy. I'm
quite impressed both by vector unit and by the gcc :-).

Best regards,
								Pavel

00001340 <debayer8>:
    1340:       e8 f0 ff ff ff          call   1335 <__x86.get_pc_thunk.dx>
    1345:       81 c2 af 2c 00 00       add    $0x2caf,%edx
    134b:       56                      push   %esi
    134c:       53                      push   %ebx
    134d:       8b 4c 24 0c             mov    0xc(%esp),%ecx
    1351:       8b 5c 24 10             mov    0x10(%esp),%ebx
    1355:       89 c8                   mov    %ecx,%eax
    1357:       8d 73 01                lea    0x1(%ebx),%esi
    135a:       29 f0                   sub    %esi,%eax
    135c:       83 f8 0e                cmp    $0xe,%eax
    135f:       b8 00 00 00 00          mov    $0x0,%eax
    1364:       76 58                   jbe    13be <debayer8+0x7e>
    1366:       66 0f 6f a2 4c e0 ff    movdqa -0x1fb4(%edx),%xmm4
    136d:       ff 
    136e:       66 0f 6f 9a 5c e0 ff    movdqa -0x1fa4(%edx),%xmm3
    1375:       ff 
    1376:       66 0f ef d2             pxor   %xmm2,%xmm2
    137a:       8d b6 00 00 00 00       lea    0x0(%esi),%esi
    1380:       f3 0f 6f 04 03          movdqu (%ebx,%eax,1),%xmm0
    1385:       f3 0f 6f 0c 03          movdqu (%ebx,%eax,1),%xmm1
    138a:       66 0f d8 c4             psubusb %xmm4,%xmm0
    138e:       66 0f fc cb             paddb  %xmm3,%xmm1
    1392:       66 0f 74 c2             pcmpeqb %xmm2,%xmm0
    1396:       66 0f df c1             pandn  %xmm1,%xmm0
    139a:       0f 11 04 01             movups %xmm0,(%ecx,%eax,1)
    139e:       83 c0 10                add    $0x10,%eax
    13a1:       3d 80 07 00 00          cmp    $0x780,%eax
    13a6:       75 d8                   jne    1380 <debayer8+0x40>
    13a8:       5b                      pop    %ebx
    13a9:       5e                      pop    %esi
    13aa:       c3                      ret
    13ab:       8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
    13af:       90                      nop
    13b0:       c6 04 01 00             movb   $0x0,(%ecx,%eax,1)
    13b4:       83 c0 01                add    $0x1,%eax
    13b7:       3d 80 07 00 00          cmp    $0x780,%eax
    13bc:       74 ea                   je     13a8 <debayer8+0x68>
    13be:       0f b6 14 03             movzbl (%ebx,%eax,1),%edx
    13c2:       80 fa 0f                cmp    $0xf,%dl
    13c5:       76 e9                   jbe    13b0 <debayer8+0x70>
    13c7:       83 ea 10                sub    $0x10,%edx
    13ca:       88 14 01                mov    %dl,(%ecx,%eax,1)
    13cd:       83 c0 01                add    $0x1,%eax
    13d0:       3d 80 07 00 00          cmp    $0x780,%eax
    13d5:       75 e7                   jne    13be <debayer8+0x7e>
    13d7:       eb cf                   jmp    13a8 <debayer8+0x68>
    13d9:       8d b4 26 00 00 00 00    lea    0x0(%esi,%eiz,1),%esi



-- 
People of Russia, stop Putin before his war on Ukraine escalates.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://lists.libcamera.org/pipermail/libcamera-devel/attachments/20240423/0a917a10/attachment.sig>


More information about the libcamera-devel mailing list