Skip to content

Commit b90e1cc

Browse files
authored
Merge pull request numpy#280 from rgommers/unify-install-guides
Move the separate Python and NumPy install guide into main install page
2 parents b45ea4a + 8bb441f commit b90e1cc

File tree

2 files changed

+196
-197
lines changed

2 files changed

+196
-197
lines changed

content/en/install.md

+196-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ sidebar: false
55

66
The only prerequisite for NumPy is Python itself. If you don't have Python yet and want the simplest way to get started, we recommend you use the [Anaconda Distribution](https://www.anaconda.com/distribution) - it includes Python, NumPy, and other commonly used packages for scientific computing and data science.
77

8-
NumPy can be installed with `conda`, with `pip`, or with a package manager on macOS and Linux. For more detailed instructions, consult our [Python and NumPy installation guide](/installing-python-and-numpy-guide).
8+
NumPy can be installed with `conda`, with `pip`, or with a package manager on macOS and Linux. For more detailed instructions, consult our [Python and NumPy installation guide](#python-numpy-install-guide) below.
99

1010
## conda
1111

@@ -22,3 +22,198 @@ If you use `pip`, you can install it with:
2222
```bash
2323
pip install numpy
2424
```
25+
26+
<a name="python-numpy-install-guide"></a>
27+
# Python and NumPy installation guide
28+
29+
Installing and managing packages in Python is complicated, there are a
30+
number of alternative solutions for most tasks. This guide tries to give the
31+
reader a sense of the best (or most popular) solutions, and give clear
32+
recommendations. It focuses on users of Python, NumPy, and the PyData (or
33+
numerical computing) stack on common operating systems and hardware.
34+
35+
## Recommendations
36+
37+
We'll start with recommendations based on the user's experience level and
38+
operating system of interest. If you're in between "beginning" and "advanced",
39+
please go with "beginning" if you want to keep things simple, and with
40+
"advanced" if you want to work according to best practices that go a longer way
41+
in the future.
42+
43+
### Beginning users
44+
45+
On all of Windows, macOS, and Linux:
46+
47+
- Install [Anaconda](https://www.anaconda.com/distribution/) (it installs all
48+
packages you need and all other tools mentioned below).
49+
- For writing and executing code, use notebooks in
50+
[JupyterLab](https://jupyterlab.readthedocs.io/en/stable/index.html) for
51+
exploratory and interactive computing, and
52+
[Spyder](https://www.spyder-ide.org/) or [Visual Studio Code](https://code.visualstudio.com/)
53+
for writing scripts and packages.
54+
- Use [Anaconda Navigator](https://docs.anaconda.com/anaconda/navigator/) to
55+
manage your packages and start JupyterLab, Spyder, or Visual Studio Code.
56+
57+
58+
### Advanced users
59+
60+
#### Windows or macOS
61+
62+
- Install [Miniconda](https://docs.conda.io/en/latest/miniconda.html).
63+
- Keep the `base` conda environment minimal, and use one or more
64+
[conda environments](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#)
65+
to install the package you need for the task or project you're working on.
66+
- Unless you're fine with only the packages in the `defaults` channel, make `conda-forge`
67+
your default channel via [setting the channel priority](https://conda-forge.org/docs/user/introduction.html#how-can-i-install-packages-from-conda-forge).
68+
69+
70+
#### Linux
71+
72+
If you're fine with slightly outdated packages and prefer stability over being
73+
able to use the latest versions of libraries:
74+
- Use your OS package manager for as much as possible (Python itself, NumPy, and
75+
other libraries).
76+
- Install packages not provided by your package manager with `pip install somepackage --user`.
77+
78+
If you use a GPU:
79+
- Install [Miniconda](https://docs.conda.io/en/latest/miniconda.html).
80+
- Keep the `base` conda environment minimal, and use one or more
81+
[conda environments](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#)
82+
to install the package you need for the task or project you're working on.
83+
- Use the `defaults` conda channel (`conda-forge` doesn't have good support for
84+
GPU packages yet).
85+
86+
Otherwise:
87+
- Install [Miniforge](https://github.com/conda-forge/miniforge).
88+
- Keep the `base` conda environment minimal, and use one or more
89+
[conda environments](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#)
90+
to install the package you need for the task or project you're working on.
91+
92+
93+
#### Alternative if you prefer pip/PyPI
94+
95+
For users who know, from personal preference or reading about the main
96+
differences between conda and pip below, they prefer a pip/PyPI-based solution,
97+
we recommend:
98+
- Install Python from, for example, [python.org](https://www.python.org/downloads/),
99+
[Homebrew](https://brew.sh/), or your Linux package manager.
100+
- Use [Poetry](https://python-poetry.org/) as the most well-maintained tool
101+
that provides a dependency resolver and environment management capabilities
102+
in a similar fashion as conda does.
103+
104+
105+
## Python package management
106+
107+
Managing packages is a challenging problem, and, as a result, there are lots of
108+
tools. For web and general purpose Python development there's a whole
109+
[host of tools](https://packaging.python.org/guides/tool-recommendations/)
110+
complementary with pip. For high-performance computing (HPC),
111+
[Spack](https://github.com/spack/spack) is worth considering. For most NumPy
112+
users though, [conda](https://conda.io/en/latest/) and
113+
[pip](https://pip.pypa.io/en/stable/) are the two most popular tools.
114+
115+
116+
### Pip & conda
117+
118+
The two main tools that install Python packages are `pip` and `conda`. Their
119+
functionality partially overlaps (e.g. both can install `numpy`), however, they
120+
can also work together. We'll discuss the major differences between pip and
121+
conda here - this is important to understand if you want to manage packages
122+
effectively.
123+
124+
The first difference is that conda is cross-language and it can install Python,
125+
while pip is installed for a particular Python on your system and installs other
126+
packages to that same Python install only. This also means conda can install
127+
non-Python libraries and tools you may need (e.g. compilers, CUDA, HDF5), while
128+
pip can't.
129+
130+
The second difference is that pip installs from the Python Packaging Index
131+
(PyPI), while conda installs from its own channels (typically "defaults" or
132+
"conda-forge"). PyPI is the largest collection of packages by far, however, all
133+
popular packages are available for conda as well.
134+
135+
The third difference is that pip does not have a _dependency resolver_ (this is
136+
expected to change in the near future), while conda does. For simple cases (e.g.
137+
you just want NumPy, SciPy, Matplotlib, Pandas, Scikit-learn, and a few other
138+
packages) that doesn't matter, however, for complicated cases conda can be
139+
expected to do a better job keeping everything working well together. The flip
140+
side of that coin is that installing with pip is typically a _lot_ faster than
141+
installing with conda.
142+
143+
The fourth difference is that conda is an integrated solution for managing
144+
packages, dependencies and environments, while with pip you may need another
145+
tool (there are many!) for dealing with environments or complex dependencies.
146+
147+
148+
### Reproducible installs
149+
150+
Making the installation of all the packages your analysis, library or
151+
application depends on reproducible is important. Sounds obvious, yet most
152+
users don't think about doing this (at least until it's too late).
153+
154+
The problem with Python packaging is that sooner or later, something will
155+
break. It's not often this bad,
156+
157+
{{< figure src="/images/content_images/python_environment_xkcd.png"
158+
alt="Python Environment XKCD image"
159+
link="https://xkcd.com/1987/"
160+
width="400"
161+
attr="_XKCD illustration - Python environment degradation_">}}
162+
163+
but it does degrade over time. Hence, it's important to be able to delete and
164+
reconstruct the set of packages you have installed.
165+
166+
Best practice is to use a different environment per project you're working on,
167+
and record at least the names (and preferably versions) of the packages you
168+
directly depend on in a static metadata file. Each packaging tool has its own
169+
metadata format for this:
170+
- Conda: [conda environments and environment.yml](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#)
171+
- Pip: [virtual environments](https://docs.python.org/3/tutorial/venv.html) and
172+
[requirements.txt](https://pip.readthedocs.io/en/latest/user_guide/#requirements-files)
173+
- Poetry: [virtual environments and pyproject.toml](https://python-poetry.org/docs/basic-usage/)
174+
175+
Sometimes it's too much overhead to create and switch between new environments
176+
for small tasks. In that case we encourage you to not install too many packages
177+
into your base environment, and keep track of versions of packages some other
178+
way (e.g. comments inside files, or printing `numpy.__version__` after
179+
importing it in notebooks).
180+
181+
182+
## NumPy packages & accelerated linear algebra libraries
183+
184+
NumPy doesn't depend on any other Python packages, however, it does depend on an
185+
accelerated linear algebra library - typically
186+
[Intel MKL](https://software.intel.com/en-us/mkl) or
187+
[OpenBLAS](https://www.openblas.net/). Users don't have to worry about
188+
installing those, but it may still be important to understand how the packaging
189+
is done and how it affects performance and behavior users see.
190+
191+
The NumPy wheels on PyPI, which is what pip installs, are built with OpenBLAS.
192+
The OpenBLAS libraries are shipped within the wheels itself. This makes those
193+
wheels larger, and if a user installs (for example) SciPy as well, they will
194+
now have two copies of OpenBLAS on disk.
195+
196+
In the conda defaults channel, NumPy is built against Intel MKL. MKL is a
197+
separate package that will be installed in the users' environment when they
198+
install NumPy. That MKL package is a lot larger than OpenBLAS, several hundred
199+
MB. MKL is typically a little faster and more robust than OpenBLAS.
200+
201+
In the conda-forge channel, NumPy is built against a dummy "BLAS" package. When
202+
a user installs NumPy from conda-forge, that BLAS package then gets installed
203+
together with the actual library - this defaults to OpenBLAS, but it can also
204+
be MKL (from the defaults channel), or even
205+
[BLIS](https://github.com/flame/blis) or reference BLAS.
206+
207+
Besides install sizes, performance and robustness, there are two more things to
208+
consider:
209+
- Intel MKL is not open source. For normal use this is not a problem, but if
210+
a user needs to redistribute an application built with NumPy, this could be
211+
an issue.
212+
- Both MKL and OpenBLAS will use multi-threading for function calls like
213+
`np.dot`, with the number of threads being determined by both a build-time
214+
option and an environment variable. Often all CPU cores will be used. This is
215+
sometimes unexpected for users; NumPy itself doesn't auto-parallelize any
216+
function calls. It can also be harmful for performance, for example when
217+
using another level of parallelization manually or with, e.g. Dask or
218+
scikit-learn functionality.
219+

0 commit comments

Comments
 (0)