Skip to content

PDEP-1: Purpose and guidelines for pandas enhancement proposals #47444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Aug 3, 2022
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
8bde84f
PDEP-1: Purpose and guidelines for pandas enhancement proposals
datapythonista Jun 21, 2022
0b43492
black
datapythonista Jun 21, 2022
3d9a75b
Update PR number
datapythonista Jun 21, 2022
6d9d34b
Update web/pandas/pdeps/accepted/0001-purpose-and-guidelines.md
datapythonista Jun 21, 2022
a0e6cda
Update web/pandas/pdeps/accepted/0001-purpose-and-guidelines.md
datapythonista Jun 21, 2022
a0d7276
Update web/pandas/pdeps/accepted/0001-purpose-and-guidelines.md
datapythonista Jun 21, 2022
a8295b8
Implemented PDEPs including pandas version
datapythonista Jun 21, 2022
1e408dd
Merge branch 'pdep' of github.com:datapythonista/pandas into pdep
datapythonista Jun 21, 2022
291de8d
Merge remote-tracking branch 'upstream/main' into pdep
datapythonista Jun 25, 2022
2ce2164
Merge remote-tracking branch 'upstream/main' into pdep
datapythonista Jun 27, 2022
ebf1687
Addressed feedback from reviews, couple of visualization fixes, and s…
datapythonista Jun 27, 2022
05d43a5
Merge main
datapythonista Jul 30, 2022
d20de1e
Addressing comments from reviews
datapythonista Jul 30, 2022
9b37d11
Update web/pandas/pdeps/0001-purpose-and-guidelines.md
datapythonista Aug 3, 2022
55b3887
Update web/pandas/pdeps/0001-purpose-and-guidelines.md
datapythonista Aug 3, 2022
8c34db0
Update web/pandas/pdeps/0001-purpose-and-guidelines.md
datapythonista Aug 3, 2022
4f3343b
Update web/pandas/pdeps/0001-purpose-and-guidelines.md
datapythonista Aug 3, 2022
7c1a725
Last review comments and dates updated
datapythonista Aug 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 35 additions & 36 deletions web/pandas/about/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,35 @@ fundamental changes to the project that are likely to take months or
years of developer time. Smaller-scoped items will continue to be
tracked on our [issue tracker](https://github.com/pandas-dev/pandas/issues).

See [Roadmap evolution](#roadmap-evolution) for proposing
changes to this document.
The roadmap is defined as a set of major enhancement proposals named PDEPs.
For more information about PDEPs, and how to submit one, please refer to
[PEDP-1](/pdeps/accepted/0001-puropose-and-guidelines.html).

## Extensibility
## PDEPs

{% for pdep_type in ["Under discussion", "Accepted", "Implemented", "Rejected"] %}

<h3 id="pdeps-{{pdep_type}}">{{ pdep_type.replace("_", " ").capitalize() }}</h3>

<ul>
{% for pdep in pdeps[pdep_type] %}
<li><a href="{{ pdep.url }}">{{ pdep.title }}</a></li>
{% else %}
<li>There are currently no PDEPs with this status</li>
{% endfor %}
</ul>

{% endfor %}

## Roadmap points pending a PDEP

<div class="alert alert-warning" role="alert">
pandas is in the process of moving roadmap points to PDEPs (implemented in
June 2022). During the transition, some roadmap points will exist as PDEPs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
June 2022). During the transition, some roadmap points will exist as PDEPs,
July 2022). During the transition, some roadmap points will exist as PDEPs,

to match the PDEP created date? but probably August.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah switch to August :->

while others will exist as sections below.
</div>

### Extensibility

Pandas `extending.extension-types` allow
for extending NumPy types with custom data types and array storage.
Expand All @@ -33,7 +58,7 @@ library, making their behavior more consistent with the handling of
NumPy arrays. We'll do this by cleaning up pandas' internals and
adding new methods to the extension array interface.

## String data type
### String data type

Currently, pandas stores text data in an `object` -dtype NumPy array.
The current implementation has two primary drawbacks: First, `object`
Expand All @@ -54,7 +79,7 @@ work, we may need to implement certain operations expected by pandas
users (for example the algorithm used in, `Series.str.upper`). That work
may be done outside of pandas.

## Apache Arrow interoperability
### Apache Arrow interoperability

[Apache Arrow](https://arrow.apache.org) is a cross-language development
platform for in-memory data. The Arrow logical types are closely aligned
Expand All @@ -65,7 +90,7 @@ data types within pandas. This will let us take advantage of its I/O
capabilities and provide for better interoperability with other
languages and libraries using Arrow.

## Block manager rewrite
### Block manager rewrite

We'd like to replace pandas current internal data structures (a
collection of 1 or 2-D arrays) with a simpler collection of 1-D arrays.
Expand All @@ -92,7 +117,7 @@ See [these design
documents](https://dev.pandas.io/pandas2/internal-architecture.html#removal-of-blockmanager-new-dataframe-internals)
for more.

## Decoupling of indexing and internals
### Decoupling of indexing and internals

The code for getting and setting values in pandas' data structures
needs refactoring. In particular, we must clearly separate code that
Expand Down Expand Up @@ -150,7 +175,7 @@ which are actually expected (typically `KeyError`).
and when small differences in behavior are expected (e.g. getting with `.loc` raises for
missing labels, setting still doesn't), they can be managed with a specific parameter.

## Numba-accelerated operations
### Numba-accelerated operations

[Numba](https://numba.pydata.org) is a JIT compiler for Python code.
We'd like to provide ways for users to apply their own Numba-jitted
Expand All @@ -162,7 +187,7 @@ window contexts). This will improve the performance of
user-defined-functions in these operations by staying within compiled
code.

## Documentation improvements
### Documentation improvements

We'd like to improve the content, structure, and presentation of the
pandas documentation. Some specific goals include
Expand All @@ -177,7 +202,7 @@ pandas documentation. Some specific goals include
subsections of the documentation to make navigation and finding
content easier.

## Performance monitoring
### Performance monitoring

Pandas uses [airspeed velocity](https://asv.readthedocs.io/en/stable/)
to monitor for performance regressions. ASV itself is a fabulous tool,
Expand All @@ -197,29 +222,3 @@ We'd like to fund improvements and maintenance of these tools to
<https://pyperf.readthedocs.io/en/latest/system.html>
- Build a GitHub bot to request ASV runs *before* a PR is merged.
Currently, the benchmarks are only run nightly.

## Roadmap Evolution

Pandas continues to evolve. The direction is primarily determined by
community interest. Everyone is welcome to review existing items on the
roadmap and to propose a new item.

Each item on the roadmap should be a short summary of a larger design
proposal. The proposal should include

1. Short summary of the changes, which would be appropriate for
inclusion in the roadmap if accepted.
2. Motivation for the changes.
3. An explanation of why the change is in scope for pandas.
4. Detailed design: Preferably with example-usage (even if not
implemented yet) and API documentation
5. API Change: Any API changes that may result from the proposal.

That proposal may then be submitted as a GitHub issue, where the pandas
maintainers can review and comment on the design. The [pandas mailing
list](https://mail.python.org/mailman/listinfo/pandas-dev) should be
notified of the proposal.

When there's agreement that an implementation would be welcome, the
roadmap should be updated to include the summary and a link to the
discussion issue.
3 changes: 3 additions & 0 deletions web/pandas/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ main:
- pandas_web.Preprocessors.blog_add_posts
- pandas_web.Preprocessors.maintainers_add_info
- pandas_web.Preprocessors.home_add_releases
- pandas_web.Preprocessors.roadmap_pdeps
markdown_extensions:
- toc
- tables
Expand Down Expand Up @@ -177,3 +178,5 @@ sponsors:
- name: "Gousto"
url: https://www.gousto.co.uk/
kind: partner
roadmap:
pdeps_path: pdeps
127 changes: 127 additions & 0 deletions web/pandas/pdeps/0001-purpose-and-guidelines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# PDEP-1: Purpose and guidelines

- Created: 30 July 2022
- Status: Under discussion
- Discussion: [#47444](https://github.com/pandas-dev/pandas/pull/47444)
- Author: [Marc Garcia](https://github.com/datapythonista)
- Revision: 1

## PDEP definition, purpose and scope

A PDEP (pandas enhancement proposal) is a proposal for a **major** change in
pandas, in a similar way as a Python [PEP](https://peps.python.org/pep-0001/)
or a NumPy [NEP](https://numpy.org/neps/nep-0000.html).

Bug fixes and conceptually minor changes (e.g. adding a parameter to a function)
are out of the scope of PDEPs. A PDEP should be used for changes that are not
immediate and not obvious, and are expected to require a significant amount of
discussion and require detailed documentation before being implemented.

PDEP are appropriate for user facing changes, internal changes and organizational
discussions. Examples of topics worth a PDEP could include moving a module from
pandas to a separate repository, a refactoring of the pandas block manager or
a proposal of a new code of conduct.

## PDEP guidelines

### Target audience

A PDEP is a public document available to anyone, but the main stakeholders to
consider when writing a PDEP are:

- The core development team, who will have the final decision on whether a PDEP
is approved or not
- Contributors to pandas and other related projects, and experienced users. Their
feedback is highly encouraged and appreciated, to make sure all points of views
are taken into consideration
- The wider pandas community, in particular users, who may or may not have feedback
on the proposal, but should know and be able to understand the future direction of
the project

### PDEP authors

Anyone can propose a PDEP, but in most cases developers of pandas itself and related
projects are expected to author PDEPs. If you are unsure if you should be opening
an issue or creating a PDEP, it's probably safe to start by
[opening an issue](https://github.com/pandas-dev/pandas/issues/new/choose), which can
be eventually moved to a PDEP.

### Workflow

The possible states of a PDEP are:

- Under discussion
- Accepted
- Implemented
- Rejected

Next is described the workflow that PDEPs can follow.

#### Submitting a PDEP

Proposing a PDEP is done by creating a PR adding a new file to `web/pdeps/`.
The file is a markdown file, you can use `web/pdeps/0001.md` as a reference
for the expected format.

The initial status of a PDEP will be `Status: Under discussion`. This will be changed
to `Status: Accepted` when the PDEP is ready and have the approval of the core team.

#### Accepted PDEP

A PDEP can only be accepted by the core development team, if the proposal is considered
worth implementing. Decisions will be made based on the process detailed in the
[pandas governance document](https://github.com/pandas-dev/pandas-governance/blob/master/governance.md).
In general, more than one approval will be needed before the PR is merged. And
there should not be any `Request changes` review at the time of merging.

Once a PDEP is accepted, any contributions can be made toward the implementation of the PDEP,
with an open-ended completion timeline. Development of pandas is difficult to understand and
forecast, being the contributors a mix of volunteers and developers paid from different sources,
with different priorities. For companies, institutions or individuals with interest in seeing a
PDEP being implemented, or to in general see progress to the pandas roadmap, please check how
you can help in the [contributing page](/contribute.html).

#### Implemented PDEP

Once a PDEP is implemented and available in the main branch of pandas, its
status will be changed to `Status: Implemented`, so there is visibility that the PDEP
is not part of the roadmap and future plans, but a change that it already
happened. The first pandas version in which the PDEP implementation is
available will also be included in the PDEP.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet relevant, but would this be just a note in the revision history or something else?


#### Rejected PDEP

A PDEP can be rejected when the final decision is that its implementation is
not the best for the interests of the project. Rejected PDEPs are as useful as accepted
PDEPs, since there are discussions that are worth having, and decisions about
changes to pandas being made. They will be merged with `Status: Rejected`, so
there is visibility on what was discussed and what was the outcome of the
discussion. A PDEP can be rejected for different reasons, for example good ideas
that aren't backward-compatible, and the breaking changes aren't considered worth
implementing.

#### Invalid PDEP

For submitted PDEPs that do not contain proper documentation, are out of scope, or
are not useful to the community for any other reason, the PR will be closed after
discussion with the author, instead of merging them as rejected. This is to not
add noise to the list of rejected PDEPs, which should contain documentation as
good as an accepted PDEP, but where the final decision was to not implement the changes.

## Evolution of PDEPs

Most PDEPs aren't expected to change after accepted. Once there is agreement in the changes,
and they are implemented, the PDEP will be only useful to understand why the development happened,
and the details of the discussion.

But in some cases, a PDEP can be updated. For example, a PDEP defining a procedure or
a policy, like this one (PDEP-1). Or cases when after attempting the implementation,
new knowledge is obtained that makes the original PDEP obsolete, and changes are
required. When there are specific changes to be made to the original PDEP, this will
be edited, its `Revision: X` label will be increased by one, and a note will be added
to the `PDEP-N history` section. This will let readers understand that the PDEP has
changed and avoid confusion.

### PDEP-1 History

- 30 July 2022: Initial version
8 changes: 6 additions & 2 deletions web/pandas/static/css/pandas.css
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,19 @@ h1 {
color: #130654;
}
h2 {
font-size: 1.45rem;
font-size: 1.8rem;
font-weight: 700;
color: black;
color: #130654;
}
h3 {
font-size: 1.3rem;
font-weight: 600;
color: black;
}
h3 a {
color: black;
text-decoration: underline dotted !important;
}
a {
color: #130654;
}
Expand Down
57 changes: 57 additions & 0 deletions web/pandas_web.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,12 @@
The rest of the items in the file will be added directly to the context.
"""
import argparse
import collections
import datetime
import importlib
import operator
import os
import pathlib
import re
import shutil
import sys
Expand Down Expand Up @@ -185,6 +187,61 @@ def home_add_releases(context):
)
return context

@staticmethod
def roadmap_pdeps(context):
"""
PDEP's (pandas enhancement proposals) are not part of the bar
navigation. They are included as lists in the "Roadmap" page
and linked from there. This preprocessor obtains the list of
PDEP's in different status from the directory tree and GitHub.
"""
KNOWN_STATUS = {"Under discussion", "Accepted", "Implemented", "Rejected"}
context["pdeps"] = collections.defaultdict(list)

# accepted, rejected and implemented
pdeps_path = (
pathlib.Path(context["source_path"]) / context["roadmap"]["pdeps_path"]
)
for pdep in sorted(pdeps_path.iterdir()):
if pdep.suffix != ".md":
continue
with pdep.open() as f:
title = f.readline()[2:] # removing markdown title "# "
status = None
for line in f:
if line.startswith("- Status: "):
status = line.strip().split(": ", 1)[1]
break
if status not in KNOWN_STATUS:
raise RuntimeError(
f'PDEP "{pdep}" status "{status}" is unknown. '
f"Should be one of: {KNOWN_STATUS}"
)
html_file = pdep.with_suffix(".html").name
context["pdeps"][status].append(
{
"title": title,
"url": f"/pdeps/{html_file}",
}
)

# under discussion
github_repo_url = context["main"]["github_repo_url"]
resp = requests.get(
"https://api.github.com/search/issues?"
f"q=is:pr is:open label:PDEP repo:{github_repo_url}"
)
if context["ignore_io_errors"] and resp.status_code == 403:
return context
resp.raise_for_status()

for pdep in resp.json()["items"]:
context["pdeps"]["under_discussion"].append(
{"title": pdep["title"], "url": pdep["url"]}
)

return context


def get_callable(obj_as_str: str) -> object:
"""
Expand Down