Difference between revisions of "Python:Speed"
| KevinYager (talk | contribs)   (→Make it faster) | |||
| (9 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
| − | [[Python]] is a powerful high-level programming language with a clean syntax. However, the flexibility and generality (e.g. dynamic typing) does have an associated performance cost.  | + | [[Python]] is a powerful high-level programming language with a clean syntax. However, the flexibility and generality (e.g. dynamic typing) does have an associated performance cost. | 
| − | # ''' | + | ==Make it faster== | 
| − | # '''Libraries''': Exploiting Python libraries (which are highly optimized and often written in lower-level languages) can greatly improve performance. In particular, using numpy for matrix-style numerical computations (rather than using expensive for-loops or other iterations) can massively speedup computations.  | + | There are various strategies to improve the speed of execution of Python code: | 
| + | |||
| + | # '''Make it Pythonic''': Often code runs slowly simply because you are not taking full advantage of Python idioms. Raymond Hettinger has nice notes about "being Pythonic" (e.g. [https://gist.github.com/0x4D31/f0b633548d8e0cfb66ee3bea6a0deff9 notes] or [https://www.youtube.com/watch?feature=player_embedded&v=OSGv2VnC0go video]). | ||
| + | # '''Libraries''': Exploiting Python libraries (which are highly optimized and often written in lower-level languages) can greatly improve performance. In particular, using [https://numpy.org/ numpy] for matrix-style numerical computations (rather than using expensive for-loops or other iterations) can massively speedup computations. [https://www.scipy.org/ Scipy] can be used for optimizations, image processing can be improved using scipy or the Python Image Library (PIL), image analysis using [https://scikit-image.org/ scikit-image], simple machine-learning using [https://scikit-learn.org/stable/ scikit-learn], etc. Consider also libraries such as [https://github.com/pydata/numexpr NumExpr] to optimize code. | ||
| + | # '''Re-code''': If code is running slowly, you should identify the bottleneck in the code, and rework it. Typically, using the most appropriate algorithm can improve execution by orders-of-magnitude. | ||
| + | # '''Don't worry about it''': Before spending serious effort optimizing code for speed, you should decide if it's even necessary. Does it matter if your code takes a minute or an hour to run? Oftentimes it simply isn't worth optimizing code. | ||
| + | # '''JIT''': Just-in-time compilation (JIT) involves compiling Python code as it is needed. The compiling adds a speed penalty as code is first run, but improves overall execution speed if the code iterates over a large dataset. This is very easy to add to existing Python code, so it's nearly "speedup for free". | ||
| + | #* Psycho ([http://psyco.sourceforge.net/ official site], [https://en.wikipedia.org/wiki/Psyco Wikipedia]) provides a 2-4× speedup (100× in some cases). It is only 32-bit. | ||
| + | #* PyPy ([http://pypy.org/ official site], [https://en.wikipedia.org/wiki/PyPy Wikipedia]) is an alternative Python interpretation, which features JIT. It provides a 2-25× speedup. Unfortunately, modules/libraries have to be re-installed/re-compiled into the PyPy environment (separate from the usual Python environment). | ||
| + | #* '''Numba''' ([http://numba.pydata.org/ official site]) uses decorators to allow ultra-easy speedups. (CUDA extension also available.) | ||
| + | #* Pythran ([https://pythran.readthedocs.io/en/latest/ official site) is an ahead-of-time compiler. | ||
| + | # '''Parallel''': Python has several mechanisms for adding simple parallelism to your code. For instance if you're processing hundreds of images in sequence, and have a computer with 16 cores, there's no reason you can't load and process 10 images at a time in parallel. | ||
| + | #* joblib ([https://joblib.readthedocs.io/en/latest/ official site]) allows simple distribution of tasks. | ||
| + | #* Parallel Python, pp ([https://www.parallelpython.com/ official site]) allows distributed computing. | ||
| + | #* threading ([https://docs.python.org/2/library/threading.html#module-threading official docs]) allows one to manage threads. | ||
| + | #* multiprocessing ([https://docs.python.org/2/library/multiprocessing.html official docs]) allows one to manage processes. | ||
| + | #* In some cases you may want to use a more complex workflow system that allows for distributed computation: | ||
| + | #** [https://dask.org/ dask] (and [https://github.com/python-streamz/streamz Streamz]) allow one to construct workflows that use multiple workers | ||
| + | #** [http://www.celeryproject.org/ Celery] distributed task queue | ||
| + | #** [https://github.com/AustralianSynchrotron/lightflow lightflow] distributed workflow | ||
| + | #** [https://grpc.io/docs/guides/ gRPC] (via [https://pypi.org/project/grpcio/ grpcio] allows services on remote machines to be activated as a simple method call from the client perspective | ||
| # '''Externals''': Critical code can be written in C/C++, and called as a function within Python. This allows the computational bottleneck to be written in a more specialized and efficient manner. | # '''Externals''': Critical code can be written in C/C++, and called as a function within Python. This allows the computational bottleneck to be written in a more specialized and efficient manner. | ||
| #* SWIG ([http://www.swig.org/ official site], [https://en.wikipedia.org/wiki/SWIG Wikipedia]) can provide a 50-200× speedup. | #* SWIG ([http://www.swig.org/ official site], [https://en.wikipedia.org/wiki/SWIG Wikipedia]) can provide a 50-200× speedup. | ||
| − | #* Cython ([http://cython.org/ official site, [https://en.wikipedia.org/wiki/Cython Wikipedia]) is a version of Python with an interface for invoking C/C++ routines. | + | #* '''Cython''' ([http://cython.org/ official site, [https://en.wikipedia.org/wiki/Cython Wikipedia]) is a version of Python with an interface for invoking C/C++ routines. | 
| #* ctypes ([https://docs.python.org/2/library/ctypes.html documentation]) is a function library that provides C-compatible data types, allowing external libraries to be used in Python. | #* ctypes ([https://docs.python.org/2/library/ctypes.html documentation]) is a function library that provides C-compatible data types, allowing external libraries to be used in Python. | ||
| #* Python/C API ([http://dan.iel.fm/posts/python-c-extensions/ documentation], [http://dan.iel.fm/posts/python-c-extensions/ tutorial]) is available in Python, allowing C extensions to be directly called in Python without much overhead. This 'manual' method lacks the clean wrapping provided by the previously-enumerated methods, but is the most direct method and works well for calling small bits of code. | #* Python/C API ([http://dan.iel.fm/posts/python-c-extensions/ documentation], [http://dan.iel.fm/posts/python-c-extensions/ tutorial]) is available in Python, allowing C extensions to be directly called in Python without much overhead. This 'manual' method lacks the clean wrapping provided by the previously-enumerated methods, but is the most direct method and works well for calling small bits of code. | ||
| − | |||
| − | |||
| − | |||
| − | |||
| # '''Translation''': There are some attempts to automatically translate Python code into optimized lower-level code. | # '''Translation''': There are some attempts to automatically translate Python code into optimized lower-level code. | ||
| #* shedskin ([http://code.google.com/p/shedskin/ official site 1], [http://shedskin.github.io/ official site 2], [https://en.wikipedia.org/wiki/Shed_Skin Wikipedia]) translates Python into C++, providing a 2-200× speedup. Most extensions/libraries are not currently supported. On the other hand, one can isolate some critical code and convert this to an optimized external that is called from conventional Python code. | #* shedskin ([http://code.google.com/p/shedskin/ official site 1], [http://shedskin.github.io/ official site 2], [https://en.wikipedia.org/wiki/Shed_Skin Wikipedia]) translates Python into C++, providing a 2-200× speedup. Most extensions/libraries are not currently supported. On the other hand, one can isolate some critical code and convert this to an optimized external that is called from conventional Python code. | ||
| Line 65: | Line 81: | ||
| </source> | </source> | ||
| + | |||
| + | ==Numba== | ||
| + | Using [http://numba.pydata.org/ numba] is extremely easy: | ||
| + | <source lang="python"> | ||
| + | from numba import jit | ||
| + | |||
| + | # jit decorator tells Numba to compile this function. | ||
| + | # The argument types will be inferred by Numba when function is called. | ||
| + | @jit | ||
| + | def func(x): | ||
| + |     y = x*x | ||
| + |     return y | ||
| + | |||
| + | </source> | ||
| + | |||
| + | ==See Also== | ||
| + | * '''High-performance Python for crystallographic computing''' | ||
| + | ** A. Boulle and J. Kieffer [http://scripts.iucr.org/cgi-bin/paper?gj5229 High-performance Python for crystallographic computing] ''J. Appl. Cryst.'' '''2019''', 52 [http://gisaxs.com/index.php?title=Python&action=edit§ion=4tps://doi.org/10.1107/S1600576719008471 doi: 10.1107/S1600576719008471] | ||
Latest revision as of 12:19, 8 September 2019
Python is a powerful high-level programming language with a clean syntax. However, the flexibility and generality (e.g. dynamic typing) does have an associated performance cost.
Contents
Make it faster
There are various strategies to improve the speed of execution of Python code:
- Make it Pythonic: Often code runs slowly simply because you are not taking full advantage of Python idioms. Raymond Hettinger has nice notes about "being Pythonic" (e.g. notes or video).
- Libraries: Exploiting Python libraries (which are highly optimized and often written in lower-level languages) can greatly improve performance. In particular, using numpy for matrix-style numerical computations (rather than using expensive for-loops or other iterations) can massively speedup computations. Scipy can be used for optimizations, image processing can be improved using scipy or the Python Image Library (PIL), image analysis using scikit-image, simple machine-learning using scikit-learn, etc. Consider also libraries such as NumExpr to optimize code.
- Re-code: If code is running slowly, you should identify the bottleneck in the code, and rework it. Typically, using the most appropriate algorithm can improve execution by orders-of-magnitude.
- Don't worry about it: Before spending serious effort optimizing code for speed, you should decide if it's even necessary. Does it matter if your code takes a minute or an hour to run? Oftentimes it simply isn't worth optimizing code.
- JIT: Just-in-time compilation (JIT) involves compiling Python code as it is needed. The compiling adds a speed penalty as code is first run, but improves overall execution speed if the code iterates over a large dataset. This is very easy to add to existing Python code, so it's nearly "speedup for free".
- Psycho (official site, Wikipedia) provides a 2-4× speedup (100× in some cases). It is only 32-bit.
- PyPy (official site, Wikipedia) is an alternative Python interpretation, which features JIT. It provides a 2-25× speedup. Unfortunately, modules/libraries have to be re-installed/re-compiled into the PyPy environment (separate from the usual Python environment).
- Numba (official site) uses decorators to allow ultra-easy speedups. (CUDA extension also available.)
- Pythran ([https://pythran.readthedocs.io/en/latest/ official site) is an ahead-of-time compiler.
 
- Parallel: Python has several mechanisms for adding simple parallelism to your code. For instance if you're processing hundreds of images in sequence, and have a computer with 16 cores, there's no reason you can't load and process 10 images at a time in parallel.
- joblib (official site) allows simple distribution of tasks.
- Parallel Python, pp (official site) allows distributed computing.
- threading (official docs) allows one to manage threads.
- multiprocessing (official docs) allows one to manage processes.
- In some cases you may want to use a more complex workflow system that allows for distributed computation:
 
- Externals: Critical code can be written in C/C++, and called as a function within Python. This allows the computational bottleneck to be written in a more specialized and efficient manner.
- SWIG (official site, Wikipedia) can provide a 50-200× speedup.
- Cython (official site, [https://en.wikipedia.org/wiki/Cython Wikipedia) is a version of Python with an interface for invoking C/C++ routines.
- ctypes (documentation) is a function library that provides C-compatible data types, allowing external libraries to be used in Python.
- Python/C API (documentation, tutorial) is available in Python, allowing C extensions to be directly called in Python without much overhead. This 'manual' method lacks the clean wrapping provided by the previously-enumerated methods, but is the most direct method and works well for calling small bits of code.
 
- Translation: There are some attempts to automatically translate Python code into optimized lower-level code.
- shedskin (official site 1, official site 2, Wikipedia) translates Python into C++, providing a 2-200× speedup. Most extensions/libraries are not currently supported. On the other hand, one can isolate some critical code and convert this to an optimized external that is called from conventional Python code.
 
Matplotlib
The Matplotlib plotting package is popular and powerful. The plotting can sometimes be too slow for rapid animation-style plots. Refer to Speeding up Matplotlib for some hints about improving plotting performance. Here is one example that compares two methods for refreshing a plot:
import matplotlib.pyplot as plt
import numpy as np
import time
plt.ion()
imgs = np.random.random((100,100,100))
fig = plt.figure(0)
plt.imshow(imgs[0])
ax = plt.gca()
ch = ax.get_children()
plt.cla()
plt.show()
#slow method
tstart = time.time()
for i in np.arange(100):
    print("iteration {}".format(i))
    plt.cla()
    plt.imshow(imgs[0])
    plt.draw()
    time.sleep(0.1)
    plt.pause(.0001)
tend = time.time()
totslow = tend-tstart
plt.cla()
#fast method
tstart = time.time()
for i in np.arange(100):
    print("iteration {}".format(i))
    ch[2].set_data(imgs[i]) #ch[2] is the member of axes that stores the image, AxesImage object
    ax.draw_artist(ch[2])
    fig.canvas.blit()
    fig.canvas.flush_events()
tend = time.time()
totfast = tend-tstart
print("Slow method timing: {} seconds".format(totslow))
print("Fast method timing: {} seconds".format(totfast))
Numba
Using numba is extremely easy:
from numba import jit
# jit decorator tells Numba to compile this function.
# The argument types will be inferred by Numba when function is called.
@jit
def func(x):
    y = x*x
    return y
See Also
- High-performance Python for crystallographic computing
- A. Boulle and J. Kieffer High-performance Python for crystallographic computing J. Appl. Cryst. 2019, 52 doi: 10.1107/S1600576719008471
 

