[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linaro-multimedia
Subject:    libpng profiling
From:       mans.rullgard () linaro ! org (Mans Rullgard)
Date:       2011-09-23 17:21:56
Message-ID: CAG5Tg6XFGd_9oDwz3iQ4kQEVpwfLhYqGUzbeij66Zndh1s4sMw () mail ! gmail ! com
[Download RAW message or body]

I did some quick and dirty profiling of libpng decoding on a Beagle-xm.

This is the result with one image:

    46.18%  pngbench  pngbench           [.] inflate_fast
    26.12%  pngbench  pngbench           [.] png_read_filter_row
     7.81%  pngbench  pngbench           [.] inflate
     5.65%  pngbench  pngbench           [.] memcpy
     4.26%  pngbench  pngbench           [.] adler32
     2.39%  pngbench  pngbench           [.] crc32
     1.78%  pngbench  [kernel.kallsyms]  [k] __copy_to_user
     1.76%  pngbench  [kernel.kallsyms]  [k] __do_softirq
     1.40%  pngbench  pngbench           [.] inflate_table
     1.02%  pngbench  [kernel.kallsyms]  [k] __memzero


And another:

    64.79%  pngbench  pngbench           [.] inflate_fast
     8.61%  pngbench  pngbench           [.] memcpy
     7.46%  pngbench  pngbench           [.] adler32
     5.10%  pngbench  pngbench           [.] crc32
     3.49%  pngbench  pngbench           [.] inflate
     3.16%  pngbench  [kernel.kallsyms]  [k] __copy_to_user
     1.33%  pngbench  [kernel.kallsyms]  [k] __memzero

And a third:

    47.00%  pngbench  pngbench           [.] png_read_filter_row
    28.52%  pngbench  pngbench           [.] inflate_fast
     5.12%  pngbench  pngbench           [.] memcpy
     4.23%  pngbench  pngbench           [.] crc32
     3.85%  pngbench  pngbench           [.] adler32
     1.60%  pngbench  [kernel.kallsyms]  [k] __memzero
     1.56%  pngbench  pngbench           [.] inflate_table
     1.50%  pngbench  [kernel.kallsyms]  [k] __copy_to_user
     1.38%  pngbench  [kernel.kallsyms]  [k] __do_softirq
     0.78%  pngbench  pngbench           [.] inflate


Two of these are coded using a predictive filter resulting in
png_read_filter_row()
using a substantial amount of decoding time.  Multiple filters are available,
thus the different amounts of time seen in that function above.  When no such
filter is used, decoding time is dominated by zlib decompression.

Two checksum functions feature in these profiles.  Adler32 is the checksum
used by zlib to verify data integrity, and crc32 is used by PNG.

Optimising the png_read_filter_row function in NEON is possible in
principle, the
effort of hooking this up in libpng might however be non-trivial.  Assuming a
speedup of 4x for this function, the overall decoding performance
improvement would
be up to ~1.6x depending on the image.  This should definitely be investigated
further.

A worryingly large amount of time is also spent in memcpy().  If some
of these calls
could be eliminated, a further 10% speed might be gained.  This is likely to be
quite difficult.

Optimising zlib is of course also possible in theory, but is probably even more
difficult.

-- 
Mans Rullgard / mru


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic