Skip to content

Commit 84a60db

Browse files
katrinleinweberproost
authored andcommitted
DOC: Harmonize column selection to bracket notation (pandas-dev#27562)
* Harmonize column selection to bracket notation As suggested by https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428#46f9
1 parent 892233e commit 84a60db

File tree

9 files changed

+54
-51
lines changed

9 files changed

+54
-51
lines changed

doc/source/getting_started/10min.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -278,7 +278,7 @@ Using a single column's values to select data.
278278

279279
.. ipython:: python
280280
281-
df[df.A > 0]
281+
df[df['A'] > 0]
282282
283283
Selecting values from a DataFrame where a boolean condition is met.
284284

doc/source/getting_started/basics.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -926,7 +926,7 @@ Single aggregations on a ``Series`` this will return a scalar value:
926926

927927
.. ipython:: python
928928
929-
tsdf.A.agg('sum')
929+
tsdf['A'].agg('sum')
930930
931931
932932
Aggregating with multiple functions
@@ -950,13 +950,13 @@ On a ``Series``, multiple functions return a ``Series``, indexed by the function
950950

951951
.. ipython:: python
952952
953-
tsdf.A.agg(['sum', 'mean'])
953+
tsdf['A'].agg(['sum', 'mean'])
954954
955955
Passing a ``lambda`` function will yield a ``<lambda>`` named row:
956956

957957
.. ipython:: python
958958
959-
tsdf.A.agg(['sum', lambda x: x.mean()])
959+
tsdf['A'].agg(['sum', lambda x: x.mean()])
960960
961961
Passing a named function will yield that name for the row:
962962

@@ -965,7 +965,7 @@ Passing a named function will yield that name for the row:
965965
def mymean(x):
966966
return x.mean()
967967
968-
tsdf.A.agg(['sum', mymean])
968+
tsdf['A'].agg(['sum', mymean])
969969
970970
Aggregating with a dict
971971
+++++++++++++++++++++++
@@ -1065,7 +1065,7 @@ Passing a single function to ``.transform()`` with a ``Series`` will yield a sin
10651065

10661066
.. ipython:: python
10671067
1068-
tsdf.A.transform(np.abs)
1068+
tsdf['A'].transform(np.abs)
10691069
10701070
10711071
Transform with multiple functions
@@ -1084,7 +1084,7 @@ resulting column names will be the transforming functions.
10841084

10851085
.. ipython:: python
10861086
1087-
tsdf.A.transform([np.abs, lambda x: x + 1])
1087+
tsdf['A'].transform([np.abs, lambda x: x + 1])
10881088
10891089
10901090
Transforming with a dict

doc/source/getting_started/comparison/comparison_with_r.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ R pandas
8181
=========================================== ===========================================
8282
``select(df, col_one = col1)`` ``df.rename(columns={'col1': 'col_one'})['col_one']``
8383
``rename(df, col_one = col1)`` ``df.rename(columns={'col1': 'col_one'})``
84-
``mutate(df, c=a-b)`` ``df.assign(c=df.a-df.b)``
84+
``mutate(df, c=a-b)`` ``df.assign(c=df['a']-df['b'])``
8585
=========================================== ===========================================
8686

8787

@@ -258,8 +258,8 @@ index/slice as well as standard boolean indexing:
258258
259259
df = pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)})
260260
df.query('a <= b')
261-
df[df.a <= df.b]
262-
df.loc[df.a <= df.b]
261+
df[df['a'] <= df['b']]
262+
df.loc[df['a'] <= df['b']]
263263
264264
For more details and examples see :ref:`the query documentation
265265
<indexing.query>`.
@@ -284,7 +284,7 @@ In ``pandas`` the equivalent expression, using the
284284
285285
df = pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)})
286286
df.eval('a + b')
287-
df.a + df.b # same as the previous expression
287+
df['a'] + df['b'] # same as the previous expression
288288
289289
In certain cases :meth:`~pandas.DataFrame.eval` will be much faster than
290290
evaluation in pure Python. For more details and examples see :ref:`the eval

doc/source/user_guide/advanced.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -738,7 +738,7 @@ and allows efficient indexing and storage of an index with a large number of dup
738738
df['B'] = df['B'].astype(CategoricalDtype(list('cab')))
739739
df
740740
df.dtypes
741-
df.B.cat.categories
741+
df['B'].cat.categories
742742
743743
Setting the index will create a ``CategoricalIndex``.
744744

doc/source/user_guide/cookbook.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -592,8 +592,8 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
592592
.. ipython:: python
593593
594594
df = pd.DataFrame([0, 1, 0, 1, 1, 1, 0, 1, 1], columns=['A'])
595-
df.A.groupby((df.A != df.A.shift()).cumsum()).groups
596-
df.A.groupby((df.A != df.A.shift()).cumsum()).cumsum()
595+
df['A'].groupby((df['A'] != df['A'].shift()).cumsum()).groups
596+
df['A'].groupby((df['A'] != df['A'].shift()).cumsum()).cumsum()
597597
598598
Expanding data
599599
**************
@@ -719,7 +719,7 @@ Rolling Apply to multiple columns where function calculates a Series before a Sc
719719
df
720720
721721
def gm(df, const):
722-
v = ((((df.A + df.B) + 1).cumprod()) - 1) * const
722+
v = ((((df['A'] + df['B']) + 1).cumprod()) - 1) * const
723723
return v.iloc[-1]
724724
725725
s = pd.Series({df.index[i]: gm(df.iloc[i:min(i + 51, len(df) - 1)], 5)

doc/source/user_guide/enhancingperf.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -393,15 +393,15 @@ Consider the following toy example of doubling each observation:
393393
.. code-block:: ipython
394394
395395
# Custom function without numba
396-
In [5]: %timeit df['col1_doubled'] = df.a.apply(double_every_value_nonumba) # noqa E501
396+
In [5]: %timeit df['col1_doubled'] = df['a'].apply(double_every_value_nonumba) # noqa E501
397397
1000 loops, best of 3: 797 us per loop
398398
399399
# Standard implementation (faster than a custom function)
400-
In [6]: %timeit df['col1_doubled'] = df.a * 2
400+
In [6]: %timeit df['col1_doubled'] = df['a'] * 2
401401
1000 loops, best of 3: 233 us per loop
402402
403403
# Custom function with numba
404-
In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.to_numpy())
404+
In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df['a'].to_numpy())
405405
1000 loops, best of 3: 145 us per loop
406406
407407
Caveats
@@ -643,8 +643,8 @@ The equivalent in standard Python would be
643643
.. ipython:: python
644644
645645
df = pd.DataFrame(dict(a=range(5), b=range(5, 10)))
646-
df['c'] = df.a + df.b
647-
df['d'] = df.a + df.b + df.c
646+
df['c'] = df['a'] + df['b']
647+
df['d'] = df['a'] + df['b'] + df['c']
648648
df['a'] = 1
649649
df
650650
@@ -688,7 +688,7 @@ name in an expression.
688688
689689
a = np.random.randn()
690690
df.query('@a < a')
691-
df.loc[a < df.a] # same as the previous expression
691+
df.loc[a < df['a']] # same as the previous expression
692692
693693
With :func:`pandas.eval` you cannot use the ``@`` prefix *at all*, because it
694694
isn't defined in that context. ``pandas`` will let you know this if you try to

doc/source/user_guide/indexing.rst

+21-18
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ as an attribute:
210210
See `here for an explanation of valid identifiers
211211
<https://docs.python.org/3/reference/lexical_analysis.html#identifiers>`__.
212212

213-
- The attribute will not be available if it conflicts with an existing method name, e.g. ``s.min`` is not allowed.
213+
- The attribute will not be available if it conflicts with an existing method name, e.g. ``s.min`` is not allowed, but ``s['min']`` is possible.
214214

215215
- Similarly, the attribute will not be available if it conflicts with any of the following list: ``index``,
216216
``major_axis``, ``minor_axis``, ``items``.
@@ -540,7 +540,7 @@ The ``callable`` must be a function with one argument (the calling Series or Dat
540540
columns=list('ABCD'))
541541
df1
542542
543-
df1.loc[lambda df: df.A > 0, :]
543+
df1.loc[lambda df: df['A'] > 0, :]
544544
df1.loc[:, lambda df: ['A', 'B']]
545545
546546
df1.iloc[:, lambda df: [0, 1]]
@@ -552,7 +552,7 @@ You can use callable indexing in ``Series``.
552552

553553
.. ipython:: python
554554
555-
df1.A.loc[lambda s: s > 0]
555+
df1['A'].loc[lambda s: s > 0]
556556
557557
Using these methods / indexers, you can chain data selection operations
558558
without using a temporary variable.
@@ -561,7 +561,7 @@ without using a temporary variable.
561561
562562
bb = pd.read_csv('data/baseball.csv', index_col='id')
563563
(bb.groupby(['year', 'team']).sum()
564-
.loc[lambda df: df.r > 100])
564+
.loc[lambda df: df['r'] > 100])
565565
566566
.. _indexing.deprecate_ix:
567567

@@ -871,9 +871,9 @@ Boolean indexing
871871
Another common operation is the use of boolean vectors to filter the data.
872872
The operators are: ``|`` for ``or``, ``&`` for ``and``, and ``~`` for ``not``.
873873
These **must** be grouped by using parentheses, since by default Python will
874-
evaluate an expression such as ``df.A > 2 & df.B < 3`` as
875-
``df.A > (2 & df.B) < 3``, while the desired evaluation order is
876-
``(df.A > 2) & (df.B < 3)``.
874+
evaluate an expression such as ``df['A'] > 2 & df['B'] < 3`` as
875+
``df['A'] > (2 & df['B']) < 3``, while the desired evaluation order is
876+
``(df['A > 2) & (df['B'] < 3)``.
877877

878878
Using a boolean vector to index a Series works exactly as in a NumPy ndarray:
879879

@@ -1134,7 +1134,7 @@ between the values of columns ``a`` and ``c``. For example:
11341134
df
11351135
11361136
# pure python
1137-
df[(df.a < df.b) & (df.b < df.c)]
1137+
df[(df['a'] < df['b']) & (df['b'] < df['c'])]
11381138
11391139
# query
11401140
df.query('(a < b) & (b < c)')
@@ -1241,7 +1241,7 @@ Full numpy-like syntax:
12411241
df = pd.DataFrame(np.random.randint(n, size=(n, 3)), columns=list('abc'))
12421242
df
12431243
df.query('(a < b) & (b < c)')
1244-
df[(df.a < df.b) & (df.b < df.c)]
1244+
df[(df['a'] < df['b']) & (df['b'] < df['c'])]
12451245
12461246
Slightly nicer by removing the parentheses (by binding making comparison
12471247
operators bind tighter than ``&`` and ``|``).
@@ -1279,12 +1279,12 @@ The ``in`` and ``not in`` operators
12791279
df.query('a in b')
12801280
12811281
# How you'd do it in pure Python
1282-
df[df.a.isin(df.b)]
1282+
df[df['a'].isin(df['b'])]
12831283
12841284
df.query('a not in b')
12851285
12861286
# pure Python
1287-
df[~df.a.isin(df.b)]
1287+
df[~df['a'].isin(df['b'])]
12881288
12891289
12901290
You can combine this with other expressions for very succinct queries:
@@ -1297,7 +1297,7 @@ You can combine this with other expressions for very succinct queries:
12971297
df.query('a in b and c < d')
12981298
12991299
# pure Python
1300-
df[df.b.isin(df.a) & (df.c < df.d)]
1300+
df[df['b'].isin(df['a']) & (df['c'] < df['d'])]
13011301
13021302
13031303
.. note::
@@ -1326,7 +1326,7 @@ to ``in``/``not in``.
13261326
df.query('b == ["a", "b", "c"]')
13271327
13281328
# pure Python
1329-
df[df.b.isin(["a", "b", "c"])]
1329+
df[df['b'].isin(["a", "b", "c"])]
13301330
13311331
df.query('c == [1, 2]')
13321332
@@ -1338,7 +1338,7 @@ to ``in``/``not in``.
13381338
df.query('[1, 2] not in c')
13391339
13401340
# pure Python
1341-
df[df.c.isin([1, 2])]
1341+
df[df['c'].isin([1, 2])]
13421342
13431343
13441344
Boolean operators
@@ -1352,7 +1352,7 @@ You can negate boolean expressions with the word ``not`` or the ``~`` operator.
13521352
df['bools'] = np.random.rand(len(df)) > 0.5
13531353
df.query('~bools')
13541354
df.query('not bools')
1355-
df.query('not bools') == df[~df.bools]
1355+
df.query('not bools') == df[~df['bools']]
13561356
13571357
Of course, expressions can be arbitrarily complex too:
13581358

@@ -1362,7 +1362,10 @@ Of course, expressions can be arbitrarily complex too:
13621362
shorter = df.query('a < b < c and (not bools) or bools > 2')
13631363
13641364
# equivalent in pure Python
1365-
longer = df[(df.a < df.b) & (df.b < df.c) & (~df.bools) | (df.bools > 2)]
1365+
longer = df[(df['a'] < df['b'])
1366+
& (df['b'] < df['c'])
1367+
& (~df['bools'])
1368+
| (df['bools'] > 2)]
13661369
13671370
shorter
13681371
longer
@@ -1835,14 +1838,14 @@ chained indexing expression, you can set the :ref:`option <options>`
18351838
18361839
# This will show the SettingWithCopyWarning
18371840
# but the frame values will be set
1838-
dfb['c'][dfb.a.str.startswith('o')] = 42
1841+
dfb['c'][dfb['a'].str.startswith('o')] = 42
18391842
18401843
This however is operating on a copy and will not work.
18411844

18421845
::
18431846

18441847
>>> pd.set_option('mode.chained_assignment','warn')
1845-
>>> dfb[dfb.a.str.startswith('o')]['c'] = 42
1848+
>>> dfb[dfb['a'].str.startswith('o')]['c'] = 42
18461849
Traceback (most recent call last)
18471850
...
18481851
SettingWithCopyWarning:

doc/source/user_guide/reshaping.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -469,7 +469,7 @@ If ``crosstab`` receives only two Series, it will provide a frequency table.
469469
'C': [1, 1, np.nan, 1, 1]})
470470
df
471471
472-
pd.crosstab(df.A, df.B)
472+
pd.crosstab(df['A'], df['B'])
473473
474474
Any input passed containing ``Categorical`` data will have **all** of its
475475
categories included in the cross-tabulation, even if the actual data does
@@ -489,21 +489,21 @@ using the ``normalize`` argument:
489489

490490
.. ipython:: python
491491
492-
pd.crosstab(df.A, df.B, normalize=True)
492+
pd.crosstab(df['A'], df['B'], normalize=True)
493493
494494
``normalize`` can also normalize values within each row or within each column:
495495

496496
.. ipython:: python
497497
498-
pd.crosstab(df.A, df.B, normalize='columns')
498+
pd.crosstab(df['A'], df['B'], normalize='columns')
499499
500500
``crosstab`` can also be passed a third ``Series`` and an aggregation function
501501
(``aggfunc``) that will be applied to the values of the third ``Series`` within
502502
each group defined by the first two ``Series``:
503503

504504
.. ipython:: python
505505
506-
pd.crosstab(df.A, df.B, values=df.C, aggfunc=np.sum)
506+
pd.crosstab(df['A'], df['B'], values=df['C'], aggfunc=np.sum)
507507
508508
Adding margins
509509
~~~~~~~~~~~~~~
@@ -512,7 +512,7 @@ Finally, one can also add margins or normalize this output.
512512

513513
.. ipython:: python
514514
515-
pd.crosstab(df.A, df.B, values=df.C, aggfunc=np.sum, normalize=True,
515+
pd.crosstab(df['A'], df['B'], values=df['C'], aggfunc=np.sum, normalize=True,
516516
margins=True)
517517
518518
.. _reshaping.tile:

doc/source/user_guide/visualization.rst

+7-7
Original file line numberDiff line numberDiff line change
@@ -1148,10 +1148,10 @@ To plot data on a secondary y-axis, use the ``secondary_y`` keyword:
11481148
11491149
.. ipython:: python
11501150
1151-
df.A.plot()
1151+
df['A'].plot()
11521152
11531153
@savefig series_plot_secondary_y.png
1154-
df.B.plot(secondary_y=True, style='g')
1154+
df['B'].plot(secondary_y=True, style='g')
11551155
11561156
.. ipython:: python
11571157
:suppress:
@@ -1205,7 +1205,7 @@ Here is the default behavior, notice how the x-axis tick labeling is performed:
12051205
plt.figure()
12061206
12071207
@savefig ser_plot_suppress.png
1208-
df.A.plot()
1208+
df['A'].plot()
12091209
12101210
.. ipython:: python
12111211
:suppress:
@@ -1219,7 +1219,7 @@ Using the ``x_compat`` parameter, you can suppress this behavior:
12191219
plt.figure()
12201220
12211221
@savefig ser_plot_suppress_parm.png
1222-
df.A.plot(x_compat=True)
1222+
df['A'].plot(x_compat=True)
12231223
12241224
.. ipython:: python
12251225
:suppress:
@@ -1235,9 +1235,9 @@ in ``pandas.plotting.plot_params`` can be used in a `with statement`:
12351235
12361236
@savefig ser_plot_suppress_context.png
12371237
with pd.plotting.plot_params.use('x_compat', True):
1238-
df.A.plot(color='r')
1239-
df.B.plot(color='g')
1240-
df.C.plot(color='b')
1238+
df['A'].plot(color='r')
1239+
df['B'].plot(color='g')
1240+
df['C'].plot(color='b')
12411241
12421242
.. ipython:: python
12431243
:suppress:

0 commit comments

Comments
 (0)