'[pypy-commit] extradoc extradoc: refactor, add links from the original blog post'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pypy-svn
Subject:    [pypy-commit] extradoc extradoc: refactor, add links from the original blog post
From:       mattip <pypy.commits () gmail ! com>
Date:       2016-10-31 17:01:20
Message-ID: 581778e0.cb091c0a.8841a.1027 () mx ! google ! com
[Download RAW message or body]

Author: Matti Picus <matti.picus@gmail.com>
Branch: extradoc
Changeset: r5744:31fc5ffbd610
Date: 2016-10-31 18:56 +0200
http://bitbucket.org/pypy/extradoc/changeset/31fc5ffbd610/

Log:	refactor, add links from the original blog post

diff --git a/blog/draft/vectorization_extended.rst b/blog/draft/vectorization_extended.rst
--- a/blog/draft/vectorization_extended.rst
+++ b/blog/draft/vectorization_extended.rst
@@ -1,32 +1,38 @@
-We are happy to announce that both the PowerPC backend and the s390x backend
-have been enhanced. Both are now capable to emit SIMD instructions vectorized
-loops. Special thanks to IBM for funding this work.
+We are happy to announce that JIT support in both the PowerPC backend and the
+s390x backend have been enhanced. Both can now vectorize loops via SIMD
+instructions. Special thanks to IBM for funding this work.
 
 
-If you are not familiar with this topic you can more details here.
+If you are not familiar with this topic you can more details here_.
 
 
 There are many more enhancements under the hood. Most notably, all pure
-operations are now delayed to the latest possible point. In some cases indices
-has been calculated more than once or they needed an additional register,
+operations are now delayed until the latest possible point. In some cases indices
+have been calculated more than once or they needed an additional register,
 because the old value is still used. Additionally it is now possible to load
-quadword aligned memory in both ppc and s390x (x86 currently cannot do that).
+quadword-aligned memory in both PPC and s390x (x86 currently cannot do that).
+
+
+.. _here: http://pypyvecopt.blogspot.co.at
 
 NumPy & CPyExt
 --------------
 
-The community and core development effort pushes CPyExt towards a complete, but
-emulated layer for CPython C extensions. This is great, because the one
-restriction preventing the deployment of PyPy in several scenarios is soon going
-to be removed. We advocate not to use the CPyExt, but rather to not write C code
-at all (let PyPy speed up your Python code) or use cffi.
+The community and core developers have been moving CPyExt towards a complete, but
+emulated, layer for CPython C extensions. This is great, because the one
+restriction preventing the wider deployment of PyPy in several scenarios will
+hopefully 
+be removed. However, we advocate not to use CPyExt, but rather to not write C code
+at all (let PyPy speed up your Python code) or use cffi_.
 
 
-The work done in this project helps micro numpy (NumPyPy) to speed up the
-operations for ppc and s390x. But, NumPyPy and NumPy ... do we need both? There
-are several cases where one of them is not the best performing solution. Our
-plans are to integrate both, use one of the solutions where we know the other
-one will not perform well.
+The work done here to support vectorization helps ``micronumpy`` (NumPyPy) to speed up 
+operations for PPC and s390x. So why is PyPy supporting both NumPyPy and NumPy, do we 
+actually need both? Yes, there are places where gcc can beat the JIT, and places
+where the tight integration between NumPyPy and  PyPy is more performant We do
+have
+plans to integrate both, hijacking the C-extension method calls to use NumPyPy where we 
+know NumPyPy can be faster.
 
 
 Just to give you an idea why this is a benefit:
@@ -34,27 +40,28 @@
 
 NumPy arrays can carry custom dtypes and apply user defined python functions on
 the arrays. How could one optimize this kind of scenario? In traditional setup,
-you cannot. But as soon as Micro NumPy is turned on, you can suddenly JIT
+you cannot. But as soon as NumPyPy is turned on, you can suddenly JIT
 compile this code and vectorize it.
 
 Another example is element access that occurs frequently, or any other calls
-that cross to the C level more frequently.
+that cross between Python and the C level frequently.
 
+.. _cffi: http://cffi.readthedocs.io/en/latest
 
 Benchmarks
 ----------
 
-Let's have a look at some benchmarks reusing mikefc's numpy benchmark suite. The
-suite only runs a subset of all commands showing that the core functionality is
-properly working. Additionally it has been rewritten to use perf instead of the
-timeit stdlib module.
+Let's have a look at some benchmarks reusing `mikefc's numpy benchmark suite`_.
+I only ran a subset of microbenchmarks, showing that the core functionality is
+functioning properly. Additionally I use ``perf`` instead of the ``timeit`` stdlib module.
 
+.. _`mikefc's numpy benchmark suite`: https://bitbucket.org/mikefc/numpy-benchmark
 
 Setup
 -----
 x86 runs on a Intel i7-2600 clocked at 3.40GHz using 4 cores. PowerPC runs on
 the Power 8 clocked at 3.425GHz providing 160 cores. Last but not least the
-mainframe machine clocked up to 4 GHz, but fully virtualized (as it is common
+mainframe machine ran clocked up to 4 GHz, but fully virtualized (as it is common
 for such machines).
 
 
@@ -69,7 +76,7 @@
 
 
 Blue shows CPython 2.7.10+ available on that platform using the latest NumPy
-(1.11). Micro NumPy is used for PyPy. PyPy+ indicates that the vectorization
+(1.11). NumPyPy is used for PyPy. PyPy+ indicates that the vectorization
 optimization is turned on.
 
 All bar charts show the median value of all runs (5 samples, 100 loops, 10 inner
@@ -78,13 +85,13 @@
 
 
 The comparison is really comparing speed of machine code. It compares the PyPy's
-JIT output vs GCC's output. It has little to do with the speed of the
+JIT output vs GCC's output. These microbenchmarks have little to do with the speed of the
 interpreter.
 
 
-Both new SIMD backends speedup the numeric kernels. Some times it is near to the
+Both new SIMD backends speedup the numeric kernels. Sometime it is near to the
 speed of CPython (note that PyPy will execute the machine code kernel after a
-interpreting it at least 1000 times), some times it is faster. The maximum
+interpreting it at least 1000 times), sometime it is faster. The maximum
 parallelism very much depends on the extension emitted by the compiler. All
 three SIMD backends have the same core register size (which is 128 bit). This
 means that all three behave similar but ppc and s390x gain more because they can
@@ -94,16 +101,15 @@
 Future directions
 -----------------
 
-Python seems to be in an ongoing transition from a language used mostly for web
-development to also be used in data science. This is currently starting to
-emerge in Europe and Python is already heavily used for data science in the
-United States of America and many other places around the world.
+Python is achieving rapid adoption in data science. This is currently a trend
+emerge in Europe, and Python is already heavily used for data science in the
+USA many other places around the world.
 
 
-I believe that PyPy has a valuable contribution for data scientists, helping
+I believe that PyPy can make a valuable contribution to data scientists, helping
 them to rapidly write scientific programs in Python and run them at near native
 speed. If you happen to be in that situation, we are eager to hear you feedback
-or resolve your issues and also work together to improve your simulations,
-calculations, .... Just get in touch!
+or resolve your issues and also work together to improve the performance of your,
+code. Just get in touch!
 
 Richard Plangger (plan_rich) and the PyPy team
_______________________________________________
pypy-commit mailing list
pypy-commit@python.org
https://mail.python.org/mailman/listinfo/pypy-commit
[prev in list] [next in list] [prev in thread] [next in thread]