Skip to content

Commit d950516

Browse files
author
MarcoGorelli
committed
[skip ci] columns
1 parent 95a8356 commit d950516

File tree

1 file changed

+50
-33
lines changed

1 file changed

+50
-33
lines changed

web/pandas/pdeps/0005-no-default-index-mode.md

Lines changed: 50 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -153,50 +153,70 @@ In [15]: df.loc[0, 'b']
153153
IndexError: Cannot use label-based indexing on NoRowIndex!
154154
```
155155

156-
### 2. DataFrameFormatter and SeriesFormatter changes
156+
### Aligning ``NoRowIndex``s
157157

158-
When printing an object with a ``NoIndex``, then the row labels wouldn't be shown:
158+
To minimise surprises, the rule would be:
159+
160+
A ``NoRowIndex`` can only be aligned with another ``NoRowIndex`` of the same length.
161+
Attempting to align it with anything else would raise.
159162

163+
Example:
160164
```python
161-
In [14]: pd.set_option('mode.no_default_index', True)
165+
In [1]: ser1 = pd.Series([1,2,3], index=NoRowIndex(3))
162166

163-
In [15]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]})
167+
In [2]: ser2 = pd.Series([4, 5, 6], index=NoRowIndex(3))
164168

165-
In [16]: df
166-
Out[16]:
167-
a b c
168-
1 4 7
169-
2 5 8
170-
3 6 9
169+
In [3]: ser1 + ser2
170+
Out[3]:
171+
5
172+
7
173+
9
174+
dtype: int64
175+
176+
In [4]: ser1 + ser2.iloc[1:]
177+
---------------------------------------------------------------------------
178+
TypeError: Can't join NoRowIndex of different lengths
171179
```
172180

173-
### 3. Nobody should get an index unless they ask for one
181+
### Columns can't be NoRowIndex
182+
183+
This proposal deals exclusively with letting users not have to think about
184+
row labels. There's no suggestion to remove the column labels.
185+
One issue that arises, then, is what to do about ``transpose``, which would swap
186+
index and columns. Rather than making ``transpose`` break, it could be more
187+
user-friendly to, within ``transpose`` change a ``NoRowIndex`` index to a
188+
``RangeIndex`` of the same length before swapping index and columns.
189+
Note that calling ``transpose`` twice would no longer round-trip.
190+
191+
### DataFrameFormatter and SeriesFormatter changes
192+
193+
When printing an object with a ``NoIndex``, then the row labels wouldn't be shown:
174194

175-
The following would work in the same way:
176195
```python
177-
pivot = (
178-
pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"], aggfunc=np.sum)
179-
).reset_index()
180-
181-
with pd.option_context('mode.no_default_index', True):
182-
pivot = (
183-
pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"], aggfunc=np.sum)
184-
)
196+
In [15]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, index=NoRowIndex(3))
197+
198+
In [16]: df
199+
Out[16]:
200+
a b
201+
1 4
202+
2 5
203+
3 6
185204
```
186205

187-
Likewise for ``value_counts``. In ``groupby``, the default would be ``as_index=False``.
206+
Of the above changes, this may be the only one that would need implementing within
207+
``DataFrameFormatter`` / ``SerieFormatter``, as opposed to within ``NoRowIndex``.
188208

189209
## Usage and Impact
190210

191-
Users who like the power of the ``Index`` could continue using pandas exactly as it is,
192-
without changing anything.
193-
194-
The addition of this mode would enable users who don't want to think about indices to
195-
not have to.
211+
By itself, ``NoRowIndex`` would be of limited use. To become useful and user-friendly,
212+
a mode ``no_default_index`` could be introduced which, if introduced, would change
213+
the ``default_index`` function to return a ``NoRowIndex`` of the appropriate length.
214+
In particular, ``.reset_index()`` would result in a ``DataFrame`` with a ``NoRowIndex``.
215+
Likewise, a ``DataFrame`` constructed without explicitly specifying ``index=``.
196216

197-
The implementation would be quite simple: most of the logic would be handled within the
198-
``NoIndex`` class, and only some minor adjustments (e.g. to the ``default_index`` function)
199-
would be needed in core pandas.
217+
Then, if a user doesn't want to think about row labels, then with ``pd.set_option('no_default_index')``
218+
set, they wouldn't need to (barring methods such as `.pivot_table` which introduce an index).
219+
Discussion of such a mode is out-of-scope for this proposal.
200220

201221
## Implementation
202222

@@ -222,11 +242,8 @@ Draft pull request showing proof of concept: https://github.com/pandas-dev/panda
222242
df.reset_index().set_index('index')
223243
```
224244

225-
**Q: Why is it necessary to change the behaviour of ``value_counts``? Isn't the introduction of a ``NoIndex`` object enough?**
245+
**Q: Why can't a DataFrame columns be ``NoRowIndex``?**
226246

227-
**A:** The objective of this mode is to enable users to not have to think about indices if they don't want to. If they have to call
228-
``.reset_index`` after each ``value_counts`` / ``pivot_table`` call, or remember to pass ``as_index=False`` to each ``groupby``
229-
call, then this objective has arguably not quite been reached.
230247

231248
## PDEP History
232249

0 commit comments

Comments
 (0)