From c73892d86511154483583cbc44948d6ea460dbd2 Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Mon, 14 Nov 2022 12:10:07 +0000 Subject: [PATCH 01/13] [skip ci] pdep-5 initial draft --- .../pdeps/0005-no-default-index-mode.md | 189 ++++++++++++++++++ 1 file changed, 189 insertions(+) create mode 100644 web/pandas/pdeps/0005-no-default-index-mode.md diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md new file mode 100644 index 0000000000000..330dd5c1d4967 --- /dev/null +++ b/web/pandas/pdeps/0005-no-default-index-mode.md @@ -0,0 +1,189 @@ +# PDEP-5: No-default-index mode + +- Created: 14 November 2022 +- Status: Draft +- Discussion: [#49693](https://github.com/pandas-dev/pandas/pull/49693) +- Author: [Marco Gorelli](https://github.com/MarcoGorelli) +- Revision: 1 + +## Abstract + +The suggestion is to add a `mode.no_default_index` option which, if enabled, +would ensure: +- if a ``DataFrame`` / ``Series`` is created, then by default it won't have an ``Index``; +- nobody will get an ``Index`` unless they ask for one - this would affect the default behaviour of ``groupby``, ``value_counts``, ``pivot_table``, and more. + +This option would not be the default. Users would need to explicitly opt-in to it, via ``pd.set_option('mode.no_default_index', True)``, via ``pd.option_context``, or via the ``PANDAS_NO_DEFAULT_INDEX`` environment variable. + +## Motivation and Scope + +The Index can be a source of confusion and frustration for pandas users. For example, let's consider the inputs + +```python +In [37]: ser1 = df.groupby('sender')['amount'].sum() + +In [38]: ser2 = df.groupby('receiver')['amount'].sum() + +In [39]: ser1 +Out[39]: +sender +1 10 +2 15 +3 20 +5 25 +Name: amount, dtype: int64 + +In [40]: ser2 +Out[40]: +receiver +1 10 +2 15 +3 20 +4 25 +Name: amount, dtype: int64 +``` +. Then: + +- it can be unexpected that summing `Series` with the same length (but different indices) produces `NaN`s in the result (https://stackoverflow.com/q/66094702/4451315): + + ```python + In [41]: ser1 + ser2 + Out[41]: + 1 20.0 + 2 30.0 + 3 40.0 + 4 NaN + 5 NaN + Name: amount, dtype: float64 + ``` + +- concatenation, even with `ignore_index=True`, still aligns on the index (https://github.com/pandas-dev/pandas/issues/25349): + + ```python + In [42]: pd.concat([ser1, ser2], axis=1, ignore_index=True) + Out[42]: + 0 1 + 1 10.0 10.0 + 2 15.0 15.0 + 3 20.0 20.0 + 5 25.0 NaN + 4 NaN 25.0 + ``` + +- it can be frustrating to have to repeatedly call `.reset_index()` (https://twitter.com/chowthedog/status/1559946277315641345): + + ```python + In [45]: df.value_counts(['sender', 'receiver']).reset_index().rename(columns={0: 'count'}) + Out[45]: + sender receiver count + 0 1 1 1 + 1 2 2 1 + 2 3 3 1 + 3 5 4 1 + ``` + +With this option enabled, users who don't want to worry about indices wouldn't need to. + +## Detailed Description + +This would require 3 steps: +1. creation of a ``NoIndex`` object, which would be a subclass of ``RangeIndex`` on which + some operations such as ``append`` would behave differently. + The ``default_index`` function would then return ``NoIndex`` (rather than ``RangeIndex``) if this mode is enabled; +2. adjusting ``DataFrameFormatter`` and ``SeriesFormatter`` to not print row labels for objects with a ``NoIndex``; +3. adjusting methods which currently return an index to just insert a new column instead. + +Let's expand on all three below. + +### 1. NoIndex object + +Most of the logic could be handled within the ``NoIndex`` object. +It would be like a ``RangeIndex``, but with the following differences: +- `name` could only be `None`; +- `start` could only be `0`, `step` `1`; +- when appending an extra element, the new `Index` would still be `NoIndex`; +- when slicing, one would still get a `NoIndex`; +- two ``NoIndex`` objects can't be aligned. Either they're the same length, or pandas raises; +- aligning a ``NoIndex`` object with one which has an index will raise, always; +- ``DataFrame`` columns can't be `NoIndex` (so ``transpose`` would need some adjustments when called on a ``NoIndex`` ``DataFrame``); +- `insert` and `delete` should raise. As a consequence, `.drop` with `axis=0` would always raise; +- arithmetic operations (e.g. `NoIndex(3) + 2`) would all raise. + +### 2. DataFrameFormatter and SeriesFormatter changes + +When printing an object with a ``NoIndex``, then the row labels wouldn't be shown: + +```python +In [14]: pd.set_option('mode.no_default_index', True) + +In [15]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}) + +In [16]: df +Out[16]: + a b c + 1 4 7 + 2 5 8 + 3 6 9 +``` + +### 3. Nobody should get an index unless they ask for one + +The following would work in the same way: +```python +pivot = ( + pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"], aggfunc=np.sum) +).reset_index() + +with pd.option_context('mode.no_default_index', True): + pivot = ( + pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"], aggfunc=np.sum) + ) +``` + +Likewise for ``value_counts``. In ``groupby``, the default would be ``as_index=False``. + +## Usage and Impact + +Users who like the power of the ``Index`` could continue using pandas exactly as it is, +without changing anything. + +The addition of this mode would enable users who don't want to think about indices to +not have to. + +The implementation would be quite simple: most of the logic would be handled within the +``NoIndex`` class, and only some minor adjustments (e.g. to the ``default_index`` function) +would be needed in core pandas. + +## Implementation + +Draft pull request showing proof of concept: https://github.com/pandas-dev/pandas/pull/49693. + +## Likely FAQ + +**Q: Aren't indices really powerful?** + +**A:** Yes! And they're also confusing to many users, even experienced developers. + It's fairly common to see pandas code with ``.reset_index`` scattered around every + other line. Such users would benefit from a mode in which they wouldn't need to think + about indices and alignment. + +**Q: In this mode, could users still get an ``Index`` if they really wanted to?** + +**A:** Yes! For example with + ```python + df.set_index(Index(range(len(df)))) + ``` + or, if they don't have a column named ``'index'``: + ```python + df.reset_index().set_index('index') + ``` + +**Q: Why is it necessary to change the behaviour of ``value_counts``? Isn't the introduction of a ``NoIndex`` object enough?** + +**A:** The objective of this mode is to enable users to not have to think about indices if they don't want to. If they have to call + ``.reset_index`` after each ``value_counts`` / ``pivot_table`` call, or remember to pass ``as_index=False`` to each ``groupby`` + call, then this objective has arguably not quite been reached. + +## PDEP History + +- 14 November: Initial draft From 573bf75e0956ac1d0c27ae2d82e6222eccb22923 Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Wed, 16 Nov 2022 11:56:55 +0000 Subject: [PATCH 02/13] [skip ci] first revision --- .../pdeps/0005-no-default-index-mode.md | 331 ++++++++++++------ 1 file changed, 230 insertions(+), 101 deletions(-) diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md index 330dd5c1d4967..35905a7abd355 100644 --- a/web/pandas/pdeps/0005-no-default-index-mode.md +++ b/web/pandas/pdeps/0005-no-default-index-mode.md @@ -1,48 +1,28 @@ -# PDEP-5: No-default-index mode +# PDEP-5: NoRowIndex - Created: 14 November 2022 - Status: Draft - Discussion: [#49693](https://github.com/pandas-dev/pandas/pull/49693) - Author: [Marco Gorelli](https://github.com/MarcoGorelli) -- Revision: 1 +- Revision: 2 ## Abstract -The suggestion is to add a `mode.no_default_index` option which, if enabled, -would ensure: -- if a ``DataFrame`` / ``Series`` is created, then by default it won't have an ``Index``; -- nobody will get an ``Index`` unless they ask for one - this would affect the default behaviour of ``groupby``, ``value_counts``, ``pivot_table``, and more. +The suggestion is to add a ``NoRowIndex`` class. Internally, it would act a bit like +a ``RangeIndex``, but some methods would be stricter. This would be one +step towards enabling users who don't want to think about indices to not have to. -This option would not be the default. Users would need to explicitly opt-in to it, via ``pd.set_option('mode.no_default_index', True)``, via ``pd.option_context``, or via the ``PANDAS_NO_DEFAULT_INDEX`` environment variable. - -## Motivation and Scope +## Motivation The Index can be a source of confusion and frustration for pandas users. For example, let's consider the inputs ```python -In [37]: ser1 = df.groupby('sender')['amount'].sum() - -In [38]: ser2 = df.groupby('receiver')['amount'].sum() - -In [39]: ser1 -Out[39]: -sender -1 10 -2 15 -3 20 -5 25 -Name: amount, dtype: int64 - -In [40]: ser2 -Out[40]: -receiver -1 10 -2 15 -3 20 -4 25 -Name: amount, dtype: int64 +In [37]: ser1 = pd.Series([10, 15, 20, 25], index=[1, 2, 3, 5]) + +In [38]: ser2 = pd.Series([10, 15, 20, 25], index=[1, 2, 3, 4]) ``` -. Then: + +Then: - it can be unexpected that summing `Series` with the same length (but different indices) produces `NaN`s in the result (https://stackoverflow.com/q/66094702/4451315): @@ -54,7 +34,7 @@ Name: amount, dtype: int64 3 40.0 4 NaN 5 NaN - Name: amount, dtype: float64 + dtype: float64 ``` - concatenation, even with `ignore_index=True`, still aligns on the index (https://github.com/pandas-dev/pandas/issues/25349): @@ -72,118 +52,267 @@ Name: amount, dtype: int64 - it can be frustrating to have to repeatedly call `.reset_index()` (https://twitter.com/chowthedog/status/1559946277315641345): - ```python - In [45]: df.value_counts(['sender', 'receiver']).reset_index().rename(columns={0: 'count'}) - Out[45]: - sender receiver count - 0 1 1 1 - 1 2 2 1 - 2 3 3 1 - 3 5 4 1 - ``` + ```python + In [3]: ser1.reset_index(drop=True) + ser2.reset_index(drop=True) + Out[3]: + 0 20 + 1 30 + 2 40 + 3 50 + dtype: int64 + ``` -With this option enabled, users who don't want to worry about indices wouldn't need to. +If a user didn't want to think about row labels (which they may have ended up after slicing / concatenating operations), +then ``NoRowIndex`` would enable the above to work in a more intuitive +manner (details and examples to follow below). -## Detailed Description +## Scope -This would require 3 steps: -1. creation of a ``NoIndex`` object, which would be a subclass of ``RangeIndex`` on which - some operations such as ``append`` would behave differently. - The ``default_index`` function would then return ``NoIndex`` (rather than ``RangeIndex``) if this mode is enabled; -2. adjusting ``DataFrameFormatter`` and ``SeriesFormatter`` to not print row labels for objects with a ``NoIndex``; -3. adjusting methods which currently return an index to just insert a new column instead. +This proposal deals exclusively with the ``NoRowIndex`` class. To allow users to fully "opt-out" of having to think +about row labels, the following could also be useful: +- a ``pd.set_option('mode.no_default_index')`` mode which would default to creating new ``DataFrame``s and + ``Series`` with ``NoRowIndex`` instead of ``RangeIndex``; +- giving ``as_index`` options to methods which currently create an index + (e.g. ``value_counts``, ``.sum()``, ``.pivot_table``) to just insert a new column instead of creating an + ``Index``. -Let's expand on all three below. +However, neither of the above will be discussed here. -### 1. NoIndex object +## Detailed Description -Most of the logic could be handled within the ``NoIndex`` object. -It would be like a ``RangeIndex``, but with the following differences: +The core pandas code would change as little as possible. The additional complexity should be handled +within the ``NoRowIndex`` object. It would act just like ``RangeIndex``, but would be a bit stricter +in some cases: - `name` could only be `None`; - `start` could only be `0`, `step` `1`; -- when appending an extra element, the new `Index` would still be `NoIndex`; -- when slicing, one would still get a `NoIndex`; -- two ``NoIndex`` objects can't be aligned. Either they're the same length, or pandas raises; -- aligning a ``NoIndex`` object with one which has an index will raise, always; -- ``DataFrame`` columns can't be `NoIndex` (so ``transpose`` would need some adjustments when called on a ``NoIndex`` ``DataFrame``); -- `insert` and `delete` should raise. As a consequence, `.drop` with `axis=0` would always raise; -- arithmetic operations (e.g. `NoIndex(3) + 2`) would all raise. +- when appending a ``NoRowIndex``, the result would still be ``NoRowIndex``; +- the ``NoRowIndex`` class would be preserved under slicing; +- it could only be aligned with another ``Index`` if it's also ``NoRowIndex`` and if it's of the same length; +- ``DataFrame`` columns can't be `NoRowIndex` (so ``transpose`` would need some adjustments when called on a ``NoRowIndex`` ``DataFrame``); +- `insert` and `delete` should raise. As a consequence, if ``df`` is a ``DataFrame`` with a + ``NoRowIndex``, then `df.drop` with `axis=0` would always raise; +- arithmetic operations (e.g. `NoRowIndex(3) + 2`) would always raise. +- when printing a ``DataFrame``/``Series`` with a ``NoRowIndex``, then the row labels wouldn't be printed. -### 2. DataFrameFormatter and SeriesFormatter changes +Let's go into more detail for some of these. -When printing an object with a ``NoIndex``, then the row labels wouldn't be shown: +### NoRowIndex.append + +If one has two ``DataFrame``s with ``NoRowIndex``, then one would expect that concatenating them would +result in a ``DataFrame`` which still has ``NoRowIndex``. To do this, the following rule could be introduced: + +> If appending a ``NoRowIndex`` of length ``y`` to a ``NoRowIndex`` of length ``x``, the result will be a + ``NoRowIndex`` of length ``x + y``. + +Example: ```python -In [14]: pd.set_option('mode.no_default_index', True) +In [7]: df1 = pd.DataFrame({'a': [1, 2], 'b': [4, 5]}, index=NoRowIndex(2)) -In [15]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}) +In [8]: df2 = pd.DataFrame({'a': [4], 'b': [0]}, index=NoRowIndex(1)) -In [16]: df -Out[16]: - a b c - 1 4 7 - 2 5 8 - 3 6 9 +In [9]: df1 +Out[9]: + a b + 1 4 + 2 5 + +In [10]: pd.concat([df1, df2]) +Out[10]: + a b + 1 4 + 2 5 + 4 0 + +In [11]: pd.concat([df1, df2]).index +Out[11]: NoRowIndex(len=3) +``` + +Appending anything other than another ``NoRowIndex`` would raise. + +### Slicing a ``NoRowIndex`` + +If one has a ``DataFrame`` with ``NoRowIndex``, then one would expect that a slice of it would still have +a ``NoRowIndex``. This could be accomplished with: + +> If a slice of length ``x`` is taken from a ``NoRowIndex`` of length ``y``, then one gets a + ``NoRowIndex`` of length ``x``. Label-based slicing would not be allowed. + +Example: + +```python +In [12]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, index=NoRowIndex(3)) + +In [13]: df.loc[df['a']>1, 'b'] +Out[13]: +5 +6 +Name: b, dtype: int64 + +In [14]: df.loc[df['a']>1, 'b'].index +Out[14]: NoRowIndex(len=2) +``` + +Slicing by label, however, would be disallowed: +```python +In [15]: df.loc[0, 'b'] +--------------------------------------------------------------------------- +IndexError: Cannot use label-based indexing on NoRowIndex! +``` +Note that other uses of ``.loc``, such as boolean masks, would still be allowed (see F.A.Q). + +### Aligning ``NoRowIndex``s + +To minimise surprises, the rule would be: + +> A ``NoRowIndex`` can only be aligned with another ``NoRowIndex`` of the same length. +> Attempting to align it with anything else would raise. + +Example: +```python +In [1]: ser1 = pd.Series([1, 2, 3], index=NoRowIndex(3)) + +In [2]: ser2 = pd.Series([4, 5, 6], index=NoRowIndex(3)) + +In [3]: ser1 + ser2 # works! +Out[3]: +5 +7 +9 +dtype: int64 + +In [4]: ser1 + ser2.iloc[1:] # errors! +--------------------------------------------------------------------------- +TypeError: Can't join NoRowIndex of different lengths ``` -### 3. Nobody should get an index unless they ask for one +### Columns can't be NoRowIndex -The following would work in the same way: +This proposal deals exclusively with letting users not have to think about +row labels. There's no suggestion to remove the column labels. + +In particular, calling ``transpose`` on a ``NoRowIndex`` ``DataFrame`` +would error. The error would come with a helpful error message, informing +users that they should first set an index. E.g.: ```python -pivot = ( - pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"], aggfunc=np.sum) -).reset_index() - -with pd.option_context('mode.no_default_index', True): - pivot = ( - pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"], aggfunc=np.sum) - ) +In [4]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, index=NoRowIndex(3)) + +In [5]: df.transpose() +--------------------------------------------------------------------------- +ValueError: Columns cannot be NoRowIndex. +If you got here via `transpose` or an `axis=1` operation, then you should first set an index, e.g.: `df.pipe(lambda _df: _df.set_axis(pd.RangeIndex(len(_df))))` ``` -Likewise for ``value_counts``. In ``groupby``, the default would be ``as_index=False``. +### DataFrameFormatter and SeriesFormatter changes + +When printing an object with a ``NoRowIndex``, then the row labels wouldn't be shown: + +```python +In [15]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, index=NoRowIndex(3)) + +In [16]: df +Out[16]: + a b + 1 4 + 2 5 + 3 6 +``` + +Of the above changes, this may be the only one that would need implementing within +``DataFrameFormatter`` / ``SerieFormatter``, as opposed to within ``NoRowIndex``. ## Usage and Impact -Users who like the power of the ``Index`` could continue using pandas exactly as it is, -without changing anything. +By itself, ``NoRowIndex`` would be of limited use. To become useful and user-friendly, +a ``no_default_index`` mode could be introduced which, if enabled, would change +the ``default_index`` function to return a ``NoRowIndex`` of the appropriate length. +In particular, ``.reset_index()`` would result in a ``DataFrame`` with a ``NoRowIndex``. +Likewise, a ``DataFrame`` constructed without explicitly specifying ``index=``. -The addition of this mode would enable users who don't want to think about indices to -not have to. +Furthermore, it could be useful to add ``as_index`` options to methods which currently +set an index, and then allow for that mode to control the ``as_index`` default. -The implementation would be quite simple: most of the logic would be handled within the -``NoIndex`` class, and only some minor adjustments (e.g. to the ``default_index`` function) -would be needed in core pandas. +Discussion of such a mode is out-of-scope for this proposal. A ``NoRowIndex`` would +just be a first step towards getting there. ## Implementation Draft pull request showing proof of concept: https://github.com/pandas-dev/pandas/pull/49693. +Note that implementation details could well change even if this PDEP were +accepted. For example, ``NoRowIndex`` wouldn't necessarily need to subclass +``RangeIndex``, and it wouldn't necessarily need to be accessible to the user +(``df.index`` could well return ``None``) + ## Likely FAQ +**Q: Couldn't users just use ``RangeIndex``? Why do we need a new class?** + +**A**: ``RangeIndex`` isn't preserved under slicing and appending, e.g.: + ```python + In [1]: ser = pd.Series([1,2,3]) + + In [2]: ser[ser!=2].index + Out[2]: Int64Index([0, 2], dtype='int64') + ``` + If someone doesn't want to think about row labels and starts off + with a ``RangeIndex``, they'll very quickly lose it. + **Q: Aren't indices really powerful?** **A:** Yes! And they're also confusing to many users, even experienced developers. It's fairly common to see pandas code with ``.reset_index`` scattered around every - other line. Such users would benefit from a mode in which they wouldn't need to think - about indices and alignment. + other line. Such users would benefit from being able to not think about indices + and alignment. Indices would be here to stay, and ``NoRowIndex`` would not be the + default. -**Q: In this mode, could users still get an ``Index`` if they really wanted to?** +**Q: How could one switch a ``NoRowIndex`` ``DataFrame`` back to one with an index?** -**A:** Yes! For example with +**A:** The simplest way would probably be: ```python - df.set_index(Index(range(len(df)))) - ``` - or, if they don't have a column named ``'index'``: - ```python - df.reset_index().set_index('index') + df.set_axis(pd.RangeIndex(len(df))) ``` + There's probably no need to introduce a new method for this. + +**Q: Why not let transpose switch ``NoRowIndex`` to ``RangeIndex`` under the hood before swapping index and columns?** + +**A:** This is the kind of magic that can lead to surprising behaviour that's + difficult to debug. For example, ``df.transpose().transpose()`` wouldn't + round-trip. It's easy enough to set an index after all, better to "force" users + to be intentional about what they want and end up with fewer surprises later + on. -**Q: Why is it necessary to change the behaviour of ``value_counts``? Isn't the introduction of a ``NoIndex`` object enough?** +**Q: What would df.sum(), and other methods which introduce an index, return?** -**A:** The objective of this mode is to enable users to not have to think about indices if they don't want to. If they have to call - ``.reset_index`` after each ``value_counts`` / ``pivot_table`` call, or remember to pass ``as_index=False`` to each ``groupby`` - call, then this objective has arguably not quite been reached. +**A:** Such methods would still set an index and would work the same way they + do now. There may be some way to change that (e.g. introducing ``as_index`` + arguments and introducing a mode to set its default) but that's out of scope + for this particular PDEP. + +**Q: How would a user opt-in to a ``NoRowIndex`` DataFrame?** + +**A:** This PDEP would only allow it via the constructor, passing + ``index=NoRowIndex(len(df))``. A mode could be introduced to toggle + making that the default, but would be out-of-scope for the current PDEP. + +**Q: Would ``.loc`` stop working?** + +**A:** No. It would only raise if used for label-based selection. Other uses + of ``.loc``, such as ``df.loc[:, col_1]`` or ``df.loc[mask, col_1]``, would + continue working. + +**Q: What's unintuitive about ``Series`` aligning indices when summing?** + +**A:** Not sure, but I once asked a group of experienced developers what the + output of + ```python + ser1 = pd.Series([1,1,1], index=[1,2,3]) + ser2 = pd.Series([1,1,1], index=[3,4,5]) + print(ser1 + ser2) + ``` + would be, and _nobody_ got it right. ## PDEP History - 14 November: Initial draft +- 18 November: First revision From 2284d2a552347342bafcc52c430ec09115458a13 Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Fri, 18 Nov 2022 15:25:22 +0000 Subject: [PATCH 03/13] [skip ci] note about multiindex --- web/pandas/pdeps/0005-no-default-index-mode.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md index 35905a7abd355..5f9b387476186 100644 --- a/web/pandas/pdeps/0005-no-default-index-mode.md +++ b/web/pandas/pdeps/0005-no-default-index-mode.md @@ -91,8 +91,9 @@ in some cases: - ``DataFrame`` columns can't be `NoRowIndex` (so ``transpose`` would need some adjustments when called on a ``NoRowIndex`` ``DataFrame``); - `insert` and `delete` should raise. As a consequence, if ``df`` is a ``DataFrame`` with a ``NoRowIndex``, then `df.drop` with `axis=0` would always raise; -- arithmetic operations (e.g. `NoRowIndex(3) + 2`) would always raise. -- when printing a ``DataFrame``/``Series`` with a ``NoRowIndex``, then the row labels wouldn't be printed. +- arithmetic operations (e.g. `NoRowIndex(3) + 2`) would always raise; +- when printing a ``DataFrame``/``Series`` with a ``NoRowIndex``, then the row labels wouldn't be printed; +- a ``MultiIndex`` could not be created with a ``NoRowIndex`` as one of its levels. Let's go into more detail for some of these. From a2dfde5ebfe349069fa607b3153d2b777ad0e230 Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Sat, 19 Nov 2022 07:34:52 +0000 Subject: [PATCH 04/13] [skip ci] clarify some points as per reviews --- .../pdeps/0005-no-default-index-mode.md | 96 ++++++++++++------- 1 file changed, 64 insertions(+), 32 deletions(-) diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md index 5f9b387476186..221e0db861f4b 100644 --- a/web/pandas/pdeps/0005-no-default-index-mode.md +++ b/web/pandas/pdeps/0005-no-default-index-mode.md @@ -10,7 +10,7 @@ The suggestion is to add a ``NoRowIndex`` class. Internally, it would act a bit like a ``RangeIndex``, but some methods would be stricter. This would be one -step towards enabling users who don't want to think about indices to not have to. +step towards enabling users who do not want to think about indices to not need to. ## Motivation @@ -24,7 +24,7 @@ In [38]: ser2 = pd.Series([10, 15, 20, 25], index=[1, 2, 3, 4]) Then: -- it can be unexpected that summing `Series` with the same length (but different indices) produces `NaN`s in the result (https://stackoverflow.com/q/66094702/4451315): +- it can be unexpected that adding `Series` with the same length (but different indices) produces `NaN`s in the result (https://stackoverflow.com/q/66094702/4451315): ```python In [41]: ser1 + ser2 @@ -62,7 +62,7 @@ Then: dtype: int64 ``` -If a user didn't want to think about row labels (which they may have ended up after slicing / concatenating operations), +If a user did not want to think about row labels (which they may have ended up after slicing / concatenating operations), then ``NoRowIndex`` would enable the above to work in a more intuitive manner (details and examples to follow below). @@ -70,7 +70,7 @@ manner (details and examples to follow below). This proposal deals exclusively with the ``NoRowIndex`` class. To allow users to fully "opt-out" of having to think about row labels, the following could also be useful: -- a ``pd.set_option('mode.no_default_index')`` mode which would default to creating new ``DataFrame``s and +- a ``pd.set_option('mode.no_row_index', True)`` mode which would default to creating new ``DataFrame``s and ``Series`` with ``NoRowIndex`` instead of ``RangeIndex``; - giving ``as_index`` options to methods which currently create an index (e.g. ``value_counts``, ``.sum()``, ``.pivot_table``) to just insert a new column instead of creating an @@ -85,17 +85,19 @@ within the ``NoRowIndex`` object. It would act just like ``RangeIndex``, but wou in some cases: - `name` could only be `None`; - `start` could only be `0`, `step` `1`; -- when appending a ``NoRowIndex``, the result would still be ``NoRowIndex``; +- when appending one ``NoRowIndex`` to another ``NoRowIndex``, the result would still be ``NoRowIndex``. + Appending a ``NoRowIndex`` to any other index (or vice-versa) would raise; - the ``NoRowIndex`` class would be preserved under slicing; -- it could only be aligned with another ``Index`` if it's also ``NoRowIndex`` and if it's of the same length; -- ``DataFrame`` columns can't be `NoRowIndex` (so ``transpose`` would need some adjustments when called on a ``NoRowIndex`` ``DataFrame``); +- a ``NoRowIndex`` could only be aligned with another ``Index`` if it's also ``NoRowIndex`` and if it's of the same length; +- ``DataFrame`` columns cannot be `NoRowIndex` (so ``transpose`` would need some adjustments when called on a ``NoRowIndex`` ``DataFrame``); - `insert` and `delete` should raise. As a consequence, if ``df`` is a ``DataFrame`` with a ``NoRowIndex``, then `df.drop` with `axis=0` would always raise; - arithmetic operations (e.g. `NoRowIndex(3) + 2`) would always raise; -- when printing a ``DataFrame``/``Series`` with a ``NoRowIndex``, then the row labels wouldn't be printed; +- when printing a ``DataFrame``/``Series`` with a ``NoRowIndex``, then the row labels would not be printed; - a ``MultiIndex`` could not be created with a ``NoRowIndex`` as one of its levels. -Let's go into more detail for some of these. +Let's go into more detail for some of these. In the examples that follow, the ``NoRowIndex`` will be passed explicitly, +but this is not how users would be expected to use it (see "Usage and Impact" section for details). ### NoRowIndex.append @@ -108,16 +110,21 @@ result in a ``DataFrame`` which still has ``NoRowIndex``. To do this, the follow Example: ```python -In [7]: df1 = pd.DataFrame({'a': [1, 2], 'b': [4, 5]}, index=NoRowIndex(2)) +In [6]: df1 = pd.DataFrame({'a': [1, 2], 'b': [4, 5]}, index=NoRowIndex(2)) -In [8]: df2 = pd.DataFrame({'a': [4], 'b': [0]}, index=NoRowIndex(1)) +In [7]: df2 = pd.DataFrame({'a': [4], 'b': [0]}, index=NoRowIndex(1)) -In [9]: df1 -Out[9]: +In [8]: df1 +Out[8]: a b 1 4 2 5 +In [9]: df2 +Out[9]: + a b + 4 0 + In [10]: pd.concat([df1, df2]) Out[10]: a b @@ -160,7 +167,11 @@ In [15]: df.loc[0, 'b'] --------------------------------------------------------------------------- IndexError: Cannot use label-based indexing on NoRowIndex! ``` -Note that other uses of ``.loc``, such as boolean masks, would still be allowed (see F.A.Q). + +Note too that: +- other uses of ``.loc``, such as boolean masks, would still be allowed (see F.A.Q); +- ``.iloc`` and ``.iat`` would keep working as before; +- ``.at`` would raise. ### Aligning ``NoRowIndex``s @@ -184,10 +195,10 @@ dtype: int64 In [4]: ser1 + ser2.iloc[1:] # errors! --------------------------------------------------------------------------- -TypeError: Can't join NoRowIndex of different lengths +TypeError: Cannot join NoRowIndex of different lengths ``` -### Columns can't be NoRowIndex +### Columns cannot be NoRowIndex This proposal deals exclusively with letting users not have to think about row labels. There's no suggestion to remove the column labels. @@ -206,7 +217,7 @@ If you got here via `transpose` or an `axis=1` operation, then you should first ### DataFrameFormatter and SeriesFormatter changes -When printing an object with a ``NoRowIndex``, then the row labels wouldn't be shown: +When printing an object with a ``NoRowIndex``, then the row labels would not be shown: ```python In [15]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, index=NoRowIndex(3)) @@ -224,16 +235,24 @@ Of the above changes, this may be the only one that would need implementing with ## Usage and Impact -By itself, ``NoRowIndex`` would be of limited use. To become useful and user-friendly, -a ``no_default_index`` mode could be introduced which, if enabled, would change -the ``default_index`` function to return a ``NoRowIndex`` of the appropriate length. -In particular, ``.reset_index()`` would result in a ``DataFrame`` with a ``NoRowIndex``. -Likewise, a ``DataFrame`` constructed without explicitly specifying ``index=``. +Users would not be expected to work with the ``NoRowIndex`` class itself directly. +Usage would probably involve a mode which would change how the ``default_index`` +function to return a ``NoRowIndex`` rather than a ``RangeIndex``. +Then, if a user opted in to this mode with + +```python +pd.set_option('mode.no_row_index', True) +``` + +then the following would all create a ``DataFrame`` with a ``NoRowIndex`` (as they +all call ``default_index``): -Furthermore, it could be useful to add ``as_index`` options to methods which currently -set an index, and then allow for that mode to control the ``as_index`` default. +- ``df.reset_index(drop=True)``; +- ``pd.concat([df1, df2], ignore_index=True)`` +- ``df1.merge(df2, on=col)``; +- ``df = pd.DataFrame({'col_1': [1, 2, 3]})`` -Discussion of such a mode is out-of-scope for this proposal. A ``NoRowIndex`` would +Further discussion of such a mode is out-of-scope for this proposal. A ``NoRowIndex`` would just be a first step towards getting there. ## Implementation @@ -241,25 +260,25 @@ just be a first step towards getting there. Draft pull request showing proof of concept: https://github.com/pandas-dev/pandas/pull/49693. Note that implementation details could well change even if this PDEP were -accepted. For example, ``NoRowIndex`` wouldn't necessarily need to subclass -``RangeIndex``, and it wouldn't necessarily need to be accessible to the user +accepted. For example, ``NoRowIndex`` would not necessarily need to subclass +``RangeIndex``, and it would not necessarily need to be accessible to the user (``df.index`` could well return ``None``) ## Likely FAQ -**Q: Couldn't users just use ``RangeIndex``? Why do we need a new class?** +**Q: Could not users just use ``RangeIndex``? Why do we need a new class?** -**A**: ``RangeIndex`` isn't preserved under slicing and appending, e.g.: +**A**: ``RangeIndex`` is not preserved under slicing and appending, e.g.: ```python In [1]: ser = pd.Series([1,2,3]) In [2]: ser[ser!=2].index Out[2]: Int64Index([0, 2], dtype='int64') ``` - If someone doesn't want to think about row labels and starts off + If someone does not want to think about row labels and starts off with a ``RangeIndex``, they'll very quickly lose it. -**Q: Aren't indices really powerful?** +**Q: Are not indices really powerful?** **A:** Yes! And they're also confusing to many users, even experienced developers. It's fairly common to see pandas code with ``.reset_index`` scattered around every @@ -275,10 +294,23 @@ accepted. For example, ``NoRowIndex`` wouldn't necessarily need to subclass ``` There's probably no need to introduce a new method for this. + Conversely, to get rid of the index, then (so long as one has enabled the ``mode.no_row_index`` option) + one could simply do ``df.reset_index(drop=True)``. + +**Q: How would ``tz_localize`` and other methods which operate on the index work on a ``NoRowIndex`` ``DataFrame``?** + +**A:** Same way they work on other ``NumericIndex``s, which would typically be to raise: + + ```python + In [2]: ser.tz_localize('UTC') + --------------------------------------------------------------------------- + TypeError: index is not a valid DatetimeIndex or PeriodIndex + ``` + **Q: Why not let transpose switch ``NoRowIndex`` to ``RangeIndex`` under the hood before swapping index and columns?** **A:** This is the kind of magic that can lead to surprising behaviour that's - difficult to debug. For example, ``df.transpose().transpose()`` wouldn't + difficult to debug. For example, ``df.transpose().transpose()`` would not round-trip. It's easy enough to set an index after all, better to "force" users to be intentional about what they want and end up with fewer surprises later on. From de7dbce1513da1d7cefa59fc6524a13c364986c2 Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Sat, 19 Nov 2022 08:28:35 +0000 Subject: [PATCH 05/13] [skip ci] fix typo --- web/pandas/pdeps/0005-no-default-index-mode.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md index 221e0db861f4b..b9cf5be8ad782 100644 --- a/web/pandas/pdeps/0005-no-default-index-mode.md +++ b/web/pandas/pdeps/0005-no-default-index-mode.md @@ -278,7 +278,7 @@ accepted. For example, ``NoRowIndex`` would not necessarily need to subclass If someone does not want to think about row labels and starts off with a ``RangeIndex``, they'll very quickly lose it. -**Q: Are not indices really powerful?** +**Q: Are indices not really powerful?** **A:** Yes! And they're also confusing to many users, even experienced developers. It's fairly common to see pandas code with ``.reset_index`` scattered around every From 69680e7344d920af07a0e83963adcddbc6dd90e1 Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Sun, 4 Dec 2022 15:19:28 +0000 Subject: [PATCH 06/13] [skip ci] withdraw --- .../pdeps/0005-no-default-index-mode.md | 32 +++++++++++++++++-- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md index b9cf5be8ad782..27e020f41db30 100644 --- a/web/pandas/pdeps/0005-no-default-index-mode.md +++ b/web/pandas/pdeps/0005-no-default-index-mode.md @@ -1,7 +1,7 @@ # PDEP-5: NoRowIndex - Created: 14 November 2022 -- Status: Draft +- Status: Withdrawn - Discussion: [#49693](https://github.com/pandas-dev/pandas/pull/49693) - Author: [Marco Gorelli](https://github.com/MarcoGorelli) - Revision: 2 @@ -345,7 +345,33 @@ accepted. For example, ``NoRowIndex`` would not necessarily need to subclass ``` would be, and _nobody_ got it right. +## Reasons for withdrawal + +After some discussions, it has become clear there is not enough for support for the proposal in its current state. +In short, it would add too much complexity to justify the potential benefits. It would unacceptably increase +the maintenance burden, the testing requirements, and the benefits would be minimal. + +Concretely: +- maintenace burden: it would not be possible to handle all the complexity within the ``NoRowIndex`` class itself, some + extra logic would need to go into the pandas core codebase, which is already very complex and hard to maintain; +- the testing burden would be too high. Propertly testing this would mean almost doubling the size of the test suite. + Coverage for options already is not great: for example [this issue](https://github.com/pandas-dev/pandas/issues/49732) + was caused by a PR which passed CI, but CI did not (and still does not) cover that option (plotting backends); +- it will not benefit users, as users do not tend to use nor discover options which are not the default. + +In order to make no-index the pandas default and have a chance of benefiting users, a more comprehensive set of changes +would need to made at the same time. This would require a proposal much larger in scope, and would be a much more radical change. +It may be that this proposal will be revisited in the future, but in its current state (as an option) it cannot be accepted. + +This has still been a useful exercise, though, as it has resulted in two related proposals (see below). + +## Related proposals + +- Deprecate automatic alignment, at least in some cases: https://github.com/pandas-dev/pandas/issues/49939; +- ``.value_counts`` behaviour change: https://github.com/pandas-dev/pandas/issues/49497 + ## PDEP History -- 14 November: Initial draft -- 18 November: First revision +- 14 November 2022: Initial draft +- 18 November 2022: First revision +- 14 December 2022: Withdrawal From 8155793f404aa62ec1e6d4f7b9155163238081f5 Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Sun, 4 Dec 2022 15:20:04 +0000 Subject: [PATCH 07/13] [skip ci] typos --- web/pandas/pdeps/0005-no-default-index-mode.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md index 27e020f41db30..30f6aed1e52d8 100644 --- a/web/pandas/pdeps/0005-no-default-index-mode.md +++ b/web/pandas/pdeps/0005-no-default-index-mode.md @@ -347,14 +347,14 @@ accepted. For example, ``NoRowIndex`` would not necessarily need to subclass ## Reasons for withdrawal -After some discussions, it has become clear there is not enough for support for the proposal in its current state. +After some discussions, it has become clear there is not enough for support for the proposal in its current state. In short, it would add too much complexity to justify the potential benefits. It would unacceptably increase the maintenance burden, the testing requirements, and the benefits would be minimal. Concretely: -- maintenace burden: it would not be possible to handle all the complexity within the ``NoRowIndex`` class itself, some +- maintenance burden: it would not be possible to handle all the complexity within the ``NoRowIndex`` class itself, some extra logic would need to go into the pandas core codebase, which is already very complex and hard to maintain; -- the testing burden would be too high. Propertly testing this would mean almost doubling the size of the test suite. +- the testing burden would be too high. Properly testing this would mean almost doubling the size of the test suite. Coverage for options already is not great: for example [this issue](https://github.com/pandas-dev/pandas/issues/49732) was caused by a PR which passed CI, but CI did not (and still does not) cover that option (plotting backends); - it will not benefit users, as users do not tend to use nor discover options which are not the default. From 997d601d500cfce2f9c09ae713d9252d21cc301e Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Mon, 12 Dec 2022 21:25:07 +0000 Subject: [PATCH 08/13] [skip ci] clarify benefit to users part --- web/pandas/pdeps/0005-no-default-index-mode.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md index 30f6aed1e52d8..dddf285270d02 100644 --- a/web/pandas/pdeps/0005-no-default-index-mode.md +++ b/web/pandas/pdeps/0005-no-default-index-mode.md @@ -357,7 +357,7 @@ Concretely: - the testing burden would be too high. Properly testing this would mean almost doubling the size of the test suite. Coverage for options already is not great: for example [this issue](https://github.com/pandas-dev/pandas/issues/49732) was caused by a PR which passed CI, but CI did not (and still does not) cover that option (plotting backends); -- it will not benefit users, as users do not tend to use nor discover options which are not the default. +- it will not benefit most users, as users do not tend to use nor discover options which are not the default. In order to make no-index the pandas default and have a chance of benefiting users, a more comprehensive set of changes would need to made at the same time. This would require a proposal much larger in scope, and would be a much more radical change. From 75056a28b27308ed5e4b8bb0b09ae9c7ffa67451 Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Tue, 3 Jan 2023 16:07:25 +0000 Subject: [PATCH 09/13] [skip ci] reword, reformat --- .../pdeps/0005-no-default-index-mode.md | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md index dddf285270d02..6e1024ef9e80d 100644 --- a/web/pandas/pdeps/0005-no-default-index-mode.md +++ b/web/pandas/pdeps/0005-no-default-index-mode.md @@ -17,9 +17,9 @@ step towards enabling users who do not want to think about indices to not need t The Index can be a source of confusion and frustration for pandas users. For example, let's consider the inputs ```python -In [37]: ser1 = pd.Series([10, 15, 20, 25], index=[1, 2, 3, 5]) +In[37]: ser1 = pd.Series([10, 15, 20, 25], index=[1, 2, 3, 5]) -In [38]: ser2 = pd.Series([10, 15, 20, 25], index=[1, 2, 3, 4]) +In[38]: ser2 = pd.Series([10, 15, 20, 25], index=[1, 2, 3, 4]) ``` Then: @@ -200,7 +200,7 @@ TypeError: Cannot join NoRowIndex of different lengths ### Columns cannot be NoRowIndex -This proposal deals exclusively with letting users not have to think about +This proposal deals exclusively with allowing users to not have to think about row labels. There's no suggestion to remove the column labels. In particular, calling ``transpose`` on a ``NoRowIndex`` ``DataFrame`` @@ -241,13 +241,13 @@ function to return a ``NoRowIndex`` rather than a ``RangeIndex``. Then, if a user opted in to this mode with ```python -pd.set_option('mode.no_row_index', True) +pd.set_option("mode.no_row_index", True) ``` then the following would all create a ``DataFrame`` with a ``NoRowIndex`` (as they all call ``default_index``): -- ``df.reset_index(drop=True)``; +- ``df.reset_index()``; - ``pd.concat([df1, df2], ignore_index=True)`` - ``df1.merge(df2, on=col)``; - ``df = pd.DataFrame({'col_1': [1, 2, 3]})`` @@ -270,10 +270,10 @@ accepted. For example, ``NoRowIndex`` would not necessarily need to subclass **A**: ``RangeIndex`` is not preserved under slicing and appending, e.g.: ```python - In [1]: ser = pd.Series([1,2,3]) + In[1]: ser = pd.Series([1, 2, 3]) - In [2]: ser[ser!=2].index - Out[2]: Int64Index([0, 2], dtype='int64') + In[2]: ser[ser != 2].index + Out[2]: Int64Index([0, 2], dtype="int64") ``` If someone does not want to think about row labels and starts off with a ``RangeIndex``, they'll very quickly lose it. @@ -281,8 +281,8 @@ accepted. For example, ``NoRowIndex`` would not necessarily need to subclass **Q: Are indices not really powerful?** **A:** Yes! And they're also confusing to many users, even experienced developers. - It's fairly common to see pandas code with ``.reset_index`` scattered around every - other line. Such users would benefit from being able to not think about indices + Often users are using ``.reset_index`` to avoid issues with indices and alignment. + Such users would benefit from being able to not think about indices and alignment. Indices would be here to stay, and ``NoRowIndex`` would not be the default. @@ -331,7 +331,7 @@ accepted. For example, ``NoRowIndex`` would not necessarily need to subclass **Q: Would ``.loc`` stop working?** **A:** No. It would only raise if used for label-based selection. Other uses - of ``.loc``, such as ``df.loc[:, col_1]`` or ``df.loc[mask, col_1]``, would + of ``.loc``, such as ``df.loc[:, col_1]`` or ``df.loc[boolean_mask, col_1]``, would continue working. **Q: What's unintuitive about ``Series`` aligning indices when summing?** @@ -339,8 +339,8 @@ accepted. For example, ``NoRowIndex`` would not necessarily need to subclass **A:** Not sure, but I once asked a group of experienced developers what the output of ```python - ser1 = pd.Series([1,1,1], index=[1,2,3]) - ser2 = pd.Series([1,1,1], index=[3,4,5]) + ser1 = pd.Series([1, 1, 1], index=[1, 2, 3]) + ser2 = pd.Series([1, 1, 1], index=[3, 4, 5]) print(ser1 + ser2) ``` would be, and _nobody_ got it right. From 443da8710eabb3c5fd4f3d37e2e8d10d891d3643 Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Tue, 3 Jan 2023 16:11:08 +0000 Subject: [PATCH 10/13] [skip ci] further reword --- web/pandas/pdeps/0005-no-default-index-mode.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md index 6e1024ef9e80d..f14da75b45869 100644 --- a/web/pandas/pdeps/0005-no-default-index-mode.md +++ b/web/pandas/pdeps/0005-no-default-index-mode.md @@ -200,7 +200,7 @@ TypeError: Cannot join NoRowIndex of different lengths ### Columns cannot be NoRowIndex -This proposal deals exclusively with allowing users to not have to think about +This proposal deals exclusively with allowing users to not need to think about row labels. There's no suggestion to remove the column labels. In particular, calling ``transpose`` on a ``NoRowIndex`` ``DataFrame`` From dfbd494cddcfe294ad415706ade4ea7556764003 Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Sat, 7 Jan 2023 12:52:55 +0000 Subject: [PATCH 11/13] status withdrawn --- web/pandas_web.py | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/web/pandas_web.py b/web/pandas_web.py index 4c30e1959fdff..bc12cc4776af4 100755 --- a/web/pandas_web.py +++ b/web/pandas_web.py @@ -239,7 +239,13 @@ def roadmap_pdeps(context): and linked from there. This preprocessor obtains the list of PDEP's in different status from the directory tree and GitHub. """ - KNOWN_STATUS = {"Under discussion", "Accepted", "Implemented", "Rejected"} + KNOWN_STATUS = { + "Under discussion", + "Accepted", + "Implemented", + "Rejected", + "Withdrawn", + } context["pdeps"] = collections.defaultdict(list) # accepted, rejected and implemented From 143472602ffd220d17e471f066ddc8864c5af81e Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Tue, 28 Feb 2023 11:50:59 +0000 Subject: [PATCH 12/13] clarify that mode.no_row_index would have been separate --- web/pandas/pdeps/0005-no-default-index-mode.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md index f14da75b45869..cde4c8e8bec78 100644 --- a/web/pandas/pdeps/0005-no-default-index-mode.md +++ b/web/pandas/pdeps/0005-no-default-index-mode.md @@ -238,7 +238,7 @@ Of the above changes, this may be the only one that would need implementing with Users would not be expected to work with the ``NoRowIndex`` class itself directly. Usage would probably involve a mode which would change how the ``default_index`` function to return a ``NoRowIndex`` rather than a ``RangeIndex``. -Then, if a user opted in to this mode with +Then, if a ``mode.no_row_index`` option was introduced and a user opted in to it with ```python pd.set_option("mode.no_row_index", True) @@ -294,7 +294,7 @@ accepted. For example, ``NoRowIndex`` would not necessarily need to subclass ``` There's probably no need to introduce a new method for this. - Conversely, to get rid of the index, then (so long as one has enabled the ``mode.no_row_index`` option) + Conversely, to get rid of the index, then if the ``mode.no_row_index`` option was introduced, then one could simply do ``df.reset_index(drop=True)``. **Q: How would ``tz_localize`` and other methods which operate on the index work on a ``NoRowIndex`` ``DataFrame``?** From ef0a5b2d8481c1ea790462159d29964ac8332fb6 Mon Sep 17 00:00:00 2001 From: MarcoGorelli <> Date: Tue, 28 Feb 2023 11:57:46 +0000 Subject: [PATCH 13/13] summarise revisions / withdrawal reasons --- web/pandas/pdeps/0005-no-default-index-mode.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/web/pandas/pdeps/0005-no-default-index-mode.md b/web/pandas/pdeps/0005-no-default-index-mode.md index cde4c8e8bec78..d543a4718e896 100644 --- a/web/pandas/pdeps/0005-no-default-index-mode.md +++ b/web/pandas/pdeps/0005-no-default-index-mode.md @@ -357,7 +357,9 @@ Concretely: - the testing burden would be too high. Properly testing this would mean almost doubling the size of the test suite. Coverage for options already is not great: for example [this issue](https://github.com/pandas-dev/pandas/issues/49732) was caused by a PR which passed CI, but CI did not (and still does not) cover that option (plotting backends); -- it will not benefit most users, as users do not tend to use nor discover options which are not the default. +- it will not benefit most users, as users do not tend to use nor discover options which are not the default; +- it would be difficult to reconcile with some existing behaviours: for example, ``df.sum()`` returns a Series with the + column names in the index. In order to make no-index the pandas default and have a chance of benefiting users, a more comprehensive set of changes would need to made at the same time. This would require a proposal much larger in scope, and would be a much more radical change. @@ -373,5 +375,6 @@ This has still been a useful exercise, though, as it has resulted in two related ## PDEP History - 14 November 2022: Initial draft -- 18 November 2022: First revision -- 14 December 2022: Withdrawal +- 18 November 2022: First revision (limited the proposal to a new class, leaving a ``mode`` to a separate proposal) +- 14 December 2022: Withdrawal (difficulty reconciling with some existing methods, lack of strong support, + maintenance burden increasing unjustifiably)