You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Python Package Structure for Scientific Python Projects
2
2
3
-
We strongly suggest, but do not require, that you use the **src/** layout (discussed below)
4
-
for creating your new Python package.
5
-
6
-
We will review packages that use a flat layout as well. Learn more about both approaches below.
7
-
8
-
## Directories that should be in your starting Python package repository
9
-
10
-
There are several core directories that should be included in all Python source distributions / package structures:
11
-
12
-
-**docs/:** discussed in our docs chapter, this directory contains your user-facing documentation website
13
-
-**tests/** this directory contains the tests for your project code
14
-
-**src/package-name/**: this is the directory that contains the code for your Python project. It is normally named using your project's name.
15
-
16
-
```{admonition} Multiple packages in a src/ folder
17
-
:class: tip
18
-
19
-
In some more advanced cases you may have more than one package in your src/ directory. See [black's GitHub repo](https://github.com/psf/black/tree/main/src) for an example of this. However for most beginners you will likely only have one sub-directory in your src/ folder.
20
-
```
21
-
22
-
## Src vs flat layouts
23
-
24
3
There are two different layouts that you will commonly see
25
4
within the Python packaging ecosystem:
26
5
[src and flat layouts.](https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/)
27
6
Both layouts have advantages for different groups of maintainers.
28
7
29
-
The **src/package-name** approach nests your **package-name** directory, mentioned above as the directory where your code lives, into a **src/** directory like this:
30
-
31
-
**src/package-name**
32
-
33
-
In a flat layout approach, your package's code lives in a **package-name** directory
34
-
at the root of your package's repository.
35
-
36
-
On this page we:
37
-
38
-
1. Suggest the **src/package-name** layout structure for new packages. This layout prevents some commonly found issues with the flat layout (discussed below).
39
-
2. Introduce the flat layout as it is used in the scientific ecosystem. Currently this layout is the most common. As such it's good to be familiar with it in case you contribute to a package using a flat layout in the future!
8
+
We strongly suggest, but do not require, that you use the **src/** layout (discussed below)
9
+
for creating your Python package. This layout is also recommended in the
```{admonition} pyOpenSci will never require a specific package structure for peer review
42
13
:class: important
@@ -49,21 +20,54 @@ something getting started with Python packaging or someone who's package is
49
20
has a simple build and might be open to moving to a more fail-proof approach.
50
21
```
51
22
52
-
## The src/ layout for Python packages
23
+
An example of the **src/package** layout structure can be seen below.
24
+
25
+
```
26
+
myPackageRepoName
27
+
├── CHANGELOG.md ┐
28
+
├── CODE_OF_CONDUCT.md │
29
+
├── CONTRIBUTING.md │
30
+
├── docs │ Package documentation
31
+
│ └── index.md
32
+
│ └── ... │
33
+
├── LICENSE │
34
+
├── README.md ┘
35
+
├── pyproject.toml ┐
36
+
├── src │
37
+
│ └── myPackage │ Package source code, metadata,
38
+
│ ├── __init__.py │ and build instructions
39
+
│ ├── moduleA.py │
40
+
│ └── moduleB.py ┘
41
+
└── tests ┐
42
+
└── ... ┘ Package tests
43
+
```
44
+
45
+
Note the location of the following directories in the example above:
53
46
54
-
The **src/package-name** layout is the approach that we suggest
55
-
for new maintainers. It is also recommended in the
56
-
[PyPA packaging guide](https://packaging.python.org/en/latest/tutorials/packaging-projects/). We suggest the **src/package-name** layout because it
57
-
makes it easier for you to create a package build workflow that tests your
58
-
package as it will be installed on a users computer.
47
+
-**docs/:** discussed in our docs chapter, this directory contains your user-facing documentation website. In a **src/** layout docs/ are normally included at the same directory level of the **src/** folder.
48
+
-**tests/** this directory contains the tests for your project code. In a **src/** layout tests are normally included at the same directory level of the **src/** folder.
49
+
-**src/package/**: this is the directory that contains the code for your Python project. "Package" is normally your project's name.
59
50
60
-
The key characteristics of this layout include:
51
+
Also in the above example, notice that all of the core documentation files that
52
+
pyOpenSci requires live in the root of your project directory. These files
53
+
include:
61
54
62
-
- Your package uses a **src/package-name** directory structure,
63
-
- You include a `tests/` directory outside of the package
64
-
directory.
55
+
- CHANGELOG.md
56
+
- CODE_OF_CONDUCT.md
57
+
- CONTRIBUTING.md
58
+
- LICENSE.txt
59
+
- README.md
65
60
66
-
```{admonition} Example scientific packages that use **src/package-name** layout
@@ -73,30 +77,33 @@ The key characteristics of this layout include:
73
77
74
78
```
75
79
76
-
#### Pros of the src/ layout
80
+
### The src/ layout and testing
81
+
82
+
The benefit of using the **src/package** layout, particularly if you
83
+
are creating a new package, is that it ensures tests are run against the
84
+
installed version of your package rather than the files in your package
85
+
working directory. If you run your tests on your files rather than the
86
+
installed version, you may be missing issues that users encounter when
87
+
your package is installed.
88
+
89
+
If `tests/` are outside of the **src/package** directory, they aren't included in the package wheel. This makes your package size slightly smaller which then places places a smaller storage burden on PyPI which has over 400,000 packages to support.
90
+
91
+
-[Read more about reasons to use the **src/package** layout](https://hynek.me/articles/testing-packaging/)
77
92
78
93
```{admonition} How Python discovers and prioritizes importing modules
79
-
One of the main technical advantages of using the src/ layout, if you are just getting started with a new package,relates to how Python discovers packages. By default, Python adds a module in your current working directory to the front of the Python module search path.
80
94
81
-
This means that if you currently in your packages working directory, and your module code lives in the root e.g.: /package-name/module.py, python will discovers package-name/module.py before it the package as installed by pip or conda in a virtual environment.
95
+
By default, Python adds a module in your current working directory to the front of the Python module search path.
82
96
83
-
However, if your package lives in a directory structure that is **src/package-name** then it won't be, by default, added to the Python path. This means that when you run import package, python will be forced to first search the active environment (which has your package installed).
97
+
This means that if you run your tests in your packages working directory, using a flat layout, `/package/module.py`, Python will discover `package-name/module.py` file before it discovers the installed package.
84
98
85
-
Note that modern Python versions (3.11 and above) do have an option to adjust how the Python path finds modules (`PYTHONSAFEPATH`) however this is still a setting that a user would need to adjust in order to avoid the behavior of Python importing a module from your current working directory first.
86
-
```
99
+
However, if your package lives in a src/ directory structure **src/package** then it won't be, by default, added to the Python path. This means that when you import your package, Python will be forced to search the active environment (which has your package installed).
87
100
88
-
The benefits of the **src/package-name** layout include:
101
+
Note: Python versions 3.11 and above have a path setting that can be adjusted to ensure the priority is to use installed packages first (e.g. `PYTHONSAFEPATH`).
102
+
```
89
103
90
-
- It ensures that tests always run on the installed version of your
91
-
package rather than on the flat files imported directly from your package. If you run your tests on your flat files, you may be missing issues that users encounter when your package is installed.
92
-
- If `tests/` are outside of the **src/package-name** directory, they aren't by default
93
-
delivered to a user
94
-
installing your package. When test files (.py files only) are not included in the package wheel your package size will be slightly smaller. This places a smaller storage burden on PyPI which has over 400,000 packages to support.
104
+
#### Sometimes tests are needed in a distribution
95
105
96
-
````{admonition} A note about including tests and data in your package distribution
97
-
If you decide to include tests in your package, your directory structure
98
-
will look like the example below. Notice that below the tests directory
99
-
is contained within the src/package-name directory ensuring the tests will be included in your package's wheel.
106
+
We do not recommend including tests as part of your package wheel by default. However, not including tests in your package distribution will make it harder for people other than yourself to test whether your package is functioning correctly on their system. If you have a small test suite (Python files + data), and think your users may want to run tests locally on their systems, you can include tests by moving the `tests/` directory into the **src/package** directory (see example below).
100
107
101
108
```bash
102
109
src/
@@ -105,73 +112,33 @@ src/
105
112
docs/
106
113
```
107
114
108
-
Be sure to read the [pytest documentation](https://docs.pytest.org/en/7.2.x/explanation/goodpractices.html#choosing-a-test-layout-import-rules).
109
-
110
-
### **Don't include test suite datasets in your package**
111
-
112
-
Large datasets associated with tests will slow down your package's install which can be frustrating to users. It also will consume more storage space on PyPI which is largely supported by volunteer maintainers and has storage costs to consider for it's 400,000+ packages.
113
-
114
-
As such you
115
-
should never include datasets needed for your tests in your package
116
-
distribution. Rather consider hosting them on a data repository such as figshare or zenodo and using a tool such as [Pooch](https://www.fatiando.org/pooch/latest/) to access them when you run tests.
117
-
Check out the testing section of our guide for more information about tests.
118
-
119
-
````
115
+
Including the **tests/** directory in your **src/package** directory ensures that tests will be included in your package's wheel.
120
116
121
-
- The **src/package-name** layout is semantically more clear. Code is always found in the
122
-
**src/package-name** directory, `tests/` and `docs/`are in the root directory.
117
+
Be sure to read the [pytest documentation for more about including tests in your package distribution](https://docs.pytest.org/en/7.2.x/explanation/goodpractices.html#choosing-a-test-layout-import-rules).
123
118
124
-
```{admonition}A few notes about the src/ layout
119
+
```{admonition}Challenges with including tests and data in a package wheel
125
120
:class: tip
126
121
127
-
It is important to note here that sometimes when using the src/package-name structure the directory name (e.g. package name) is different from the actual project or package name. What is important to take away here is that you should store your code within a sub directory within **src/**.
122
+
Tests, especially when accompanied by test data can create a few small challenges including:
128
123
129
-
* [Read more about reasons to use the **src/package-name** layout](https://hynek.me/articles/testing-packaging/)
130
-
```
124
+
- Take up space in your distribution which will build up over time as storage space on PyPI
125
+
- Large file sizes can also slow down package install.
131
126
132
-
An example of the **src/package-name** layout structure can be seen below.
133
-
134
-
```
135
-
myPackage
136
-
├── CHANGELOG.md ┐
137
-
├── CODE_OF_CONDUCT.md │
138
-
├── CONTRIBUTING.md │
139
-
├── docs │ Package documentation
140
-
│ └── index.md
141
-
│ └── ... │
142
-
├── LICENSE │
143
-
├── README.md ┘
144
-
├── pyproject.toml ┐
145
-
├── src │
146
-
│ └── myPackage │ Package source code, metadata,
147
-
│ ├── __init__.py │ and build instructions
148
-
│ ├── moduleA.py │
149
-
│ └── moduleB.py ┘
150
-
└── tests ┐
151
-
└── ... ┘ Package tests
127
+
However, in some cases, particularly in the scientific Python ecosystems you may need to include tests.
152
128
```
153
129
154
-
## Core file requirements for a Python package
155
-
156
-
In the above example, notice that all of the core documentation files that
157
-
pyOpenSci requires live in the root of your project directory. These files
158
-
include:
159
-
160
-
- CHANGELOG.md
161
-
- CODE_OF_CONDUCT.md
162
-
- CONTRIBUTING.md
163
-
- LICENSE.txt
164
-
- README.md
130
+
### **Don't include test suite datasets in your package**
165
131
166
-
Also note that there is a **docs/** directory at the root where your user-facing
167
-
documentation website lives.
132
+
If you do include your tests in your package distribution, we strongly
133
+
discourage you from including data in your test suite directory. Rather,
134
+
host your test data in a repository such as Figshare or Zenodo. Use a
135
+
tool such as [Pooch](https://www.fatiando.org/pooch/latest/) to access
```{admonition} Why most scientific Python packages do not use source
201
168
:class: tip
202
169
203
-
In most cases the advantages of using the **src/package-name** layout for
170
+
In most cases the advantages of using the **src/package** layout for
204
171
larger scientific packages that already use flat approach are not worth it.
205
-
Moving from a flat layout to a **src/package-name** layout would come at a significant cost to
172
+
Moving from a flat layout to a **src/package** layout would come at a significant cost to
206
173
maintainers.
207
174
208
-
However, the advantages of using the **src/package-name** layout for a beginner are significant.
175
+
However, the advantages of using the **src/package** layout for a beginner are significant.
209
176
As such, we recommend that if you are getting started with creating a package,
210
-
that you consider using a **src/package-name** layout.
177
+
that you consider using a **src/package** layout.
211
178
```
212
179
213
180
## What does the flat layout structure look like?
@@ -267,3 +234,19 @@ It would be a significant maintenance cost and burden to move all of these
267
234
packages to a different layout. The potential benefits of the source layout
268
235
for these tools is not worth the maintenance investment.
269
236
```
237
+
238
+
<!--
239
+
Not sure where to put this now ... most new users won't have multiple packages. maybe this goes into the complex packing page as we build that out?
240
+
241
+
```{admonition} Multiple packages in a src/ folder
242
+
:class: tip
243
+
244
+
In some more advanced cases you may have more than one package in your src/ directory. See [black's GitHub repo](https://github.com/psf/black/tree/main/src) for an example of this. However, for most beginners you will likely only have one sub-directory in your **src/** folder.
245
+
``` -->
246
+
247
+
<!--
248
+
```{admonition} A few notes about the src/ layout
249
+
:class: tip
250
+
251
+
It is important to note here that sometimes when using the src/package structure the directory name (e.g. package name) is different from the actual project or package name. What is important to take away here is that you should store your code within a sub directory within **src/**.
0 commit comments