From db69ce0fd6367fea5cf637ad7c2be2d8343287a3 Mon Sep 17 00:00:00 2001 From: Katrin Leinweber <9948149+katrinleinweber@users.noreply.github.com> Date: Wed, 24 Jul 2019 14:17:17 +0200 Subject: [PATCH 1/8] Harmonize column selection to bracket notation As suggested by https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428#46f9 --- doc/source/user_guide/indexing.rst | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst index e3b75afcf945e..e8f24b3734757 100644 --- a/doc/source/user_guide/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -236,7 +236,7 @@ new column. In 0.21.0 and later, this will raise a ``UserWarning``: .. code-block:: ipython In [1]: df = pd.DataFrame({'one': [1., 2., 3.]}) - In [2]: df.two = [4, 5, 6] + In [2]: df['two'] = [4, 5, 6] UserWarning: Pandas doesn't allow Series to be assigned into nonexistent columns - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute_access In [3]: df Out[3]: @@ -540,7 +540,7 @@ The ``callable`` must be a function with one argument (the calling Series or Dat columns=list('ABCD')) df1 - df1.loc[lambda df: df.A > 0, :] + df1.loc[lambda df: df['A'] > 0, :] df1.loc[:, lambda df: ['A', 'B']] df1.iloc[:, lambda df: [0, 1]] @@ -561,7 +561,7 @@ without using a temporary variable. bb = pd.read_csv('data/baseball.csv', index_col='id') (bb.groupby(['year', 'team']).sum() - .loc[lambda df: df.r > 100]) + .loc[lambda df: df['r'] > 100]) .. _indexing.deprecate_ix: @@ -871,9 +871,9 @@ Boolean indexing Another common operation is the use of boolean vectors to filter the data. The operators are: ``|`` for ``or``, ``&`` for ``and``, and ``~`` for ``not``. These **must** be grouped by using parentheses, since by default Python will -evaluate an expression such as ``df.A > 2 & df.B < 3`` as -``df.A > (2 & df.B) < 3``, while the desired evaluation order is -``(df.A > 2) & (df.B < 3)``. +evaluate an expression such as ``df['A'] > 2 & df['B'] < 3`` as +``df['A'] > (2 & df['B']) < 3``, while the desired evaluation order is +``(df['A > 2) & (df['B'] < 3)``. Using a boolean vector to index a Series works exactly as in a NumPy ndarray: @@ -1134,7 +1134,7 @@ between the values of columns ``a`` and ``c``. For example: df # pure python - df[(df.a < df.b) & (df.b < df.c)] + df[(df['a'] < df['b']) & (df['b'] < df['c'])] # query df.query('(a < b) & (b < c)') @@ -1241,7 +1241,7 @@ Full numpy-like syntax: df = pd.DataFrame(np.random.randint(n, size=(n, 3)), columns=list('abc')) df df.query('(a < b) & (b < c)') - df[(df.a < df.b) & (df.b < df.c)] + df[(df['a'] < df['b']) & (df['b'] < df['c'])] Slightly nicer by removing the parentheses (by binding making comparison operators bind tighter than ``&`` and ``|``). @@ -1279,12 +1279,12 @@ The ``in`` and ``not in`` operators df.query('a in b') # How you'd do it in pure Python - df[df.a.isin(df.b)] + df[df['a'].isin(df['b'])] df.query('a not in b') # pure Python - df[~df.a.isin(df.b)] + df[~df['a'].isin(df['b'])] You can combine this with other expressions for very succinct queries: @@ -1297,7 +1297,7 @@ You can combine this with other expressions for very succinct queries: df.query('a in b and c < d') # pure Python - df[df.b.isin(df.a) & (df.c < df.d)] + df[df['b'].isin(df['a']) & (df['c'] < df['d'])] .. note:: @@ -1326,7 +1326,7 @@ to ``in``/``not in``. df.query('b == ["a", "b", "c"]') # pure Python - df[df.b.isin(["a", "b", "c"])] + df[df['b'].isin(["a", "b", "c"])] df.query('c == [1, 2]') @@ -1338,7 +1338,7 @@ to ``in``/``not in``. df.query('[1, 2] not in c') # pure Python - df[df.c.isin([1, 2])] + df[df['c'].isin([1, 2])] Boolean operators @@ -1362,7 +1362,7 @@ Of course, expressions can be arbitrarily complex too: shorter = df.query('a < b < c and (not bools) or bools > 2') # equivalent in pure Python - longer = df[(df.a < df.b) & (df.b < df.c) & (~df.bools) | (df.bools > 2)] + longer = df[(df['a'] < df['b']) & (df['b'] < df['c']) & (~df.bools) | (df.bools > 2)] shorter longer From 8e9a82dc5c9629cb376018719ef8ec5f461661e4 Mon Sep 17 00:00:00 2001 From: Katrin Leinweber Date: Wed, 24 Jul 2019 19:18:47 +0200 Subject: [PATCH 2/8] Harmonize column selection to bracket notation --- doc/source/getting_started/10min.rst | 4 ++-- doc/source/getting_started/basics.rst | 12 ++++++------ .../comparison/comparison_with_r.rst | 8 ++++---- doc/source/user_guide/advanced.rst | 2 +- doc/source/user_guide/cookbook.rst | 6 +++--- doc/source/user_guide/enhancingperf.rst | 14 +++++++------- doc/source/user_guide/indexing.rst | 16 ++++++++-------- doc/source/user_guide/reshaping.rst | 10 +++++----- doc/source/user_guide/visualization.rst | 14 +++++++------- 9 files changed, 43 insertions(+), 43 deletions(-) diff --git a/doc/source/getting_started/10min.rst b/doc/source/getting_started/10min.rst index 510c7ef97aa98..6ef5b868b3dce 100644 --- a/doc/source/getting_started/10min.rst +++ b/doc/source/getting_started/10min.rst @@ -170,7 +170,7 @@ Getting ~~~~~~~ Selecting a single column, which yields a ``Series``, -equivalent to ``df.A``: +equivalent to ``df['A']``: .. ipython:: python @@ -278,7 +278,7 @@ Using a single column's values to select data. .. ipython:: python - df[df.A > 0] + df[df['A'] > 0] Selecting values from a DataFrame where a boolean condition is met. diff --git a/doc/source/getting_started/basics.rst b/doc/source/getting_started/basics.rst index 3f6f56376861f..802ffadf2a81e 100644 --- a/doc/source/getting_started/basics.rst +++ b/doc/source/getting_started/basics.rst @@ -926,7 +926,7 @@ Single aggregations on a ``Series`` this will return a scalar value: .. ipython:: python - tsdf.A.agg('sum') + tsdf['A'].agg('sum') Aggregating with multiple functions @@ -950,13 +950,13 @@ On a ``Series``, multiple functions return a ``Series``, indexed by the function .. ipython:: python - tsdf.A.agg(['sum', 'mean']) + tsdf['A'].agg(['sum', 'mean']) Passing a ``lambda`` function will yield a ```` named row: .. ipython:: python - tsdf.A.agg(['sum', lambda x: x.mean()]) + tsdf['A'].agg(['sum', lambda x: x.mean()]) Passing a named function will yield that name for the row: @@ -965,7 +965,7 @@ Passing a named function will yield that name for the row: def mymean(x): return x.mean() - tsdf.A.agg(['sum', mymean]) + tsdf['A'].agg(['sum', mymean]) Aggregating with a dict +++++++++++++++++++++++ @@ -1065,7 +1065,7 @@ Passing a single function to ``.transform()`` with a ``Series`` will yield a sin .. ipython:: python - tsdf.A.transform(np.abs) + tsdf['A'].transform(np.abs) Transform with multiple functions @@ -1084,7 +1084,7 @@ resulting column names will be the transforming functions. .. ipython:: python - tsdf.A.transform([np.abs, lambda x: x + 1]) + tsdf['A'].transform([np.abs, lambda x: x + 1]) Transforming with a dict diff --git a/doc/source/getting_started/comparison/comparison_with_r.rst b/doc/source/getting_started/comparison/comparison_with_r.rst index 444e886bc951d..f67f46fc2b29b 100644 --- a/doc/source/getting_started/comparison/comparison_with_r.rst +++ b/doc/source/getting_started/comparison/comparison_with_r.rst @@ -81,7 +81,7 @@ R pandas =========================================== =========================================== ``select(df, col_one = col1)`` ``df.rename(columns={'col1': 'col_one'})['col_one']`` ``rename(df, col_one = col1)`` ``df.rename(columns={'col1': 'col_one'})`` -``mutate(df, c=a-b)`` ``df.assign(c=df.a-df.b)`` +``mutate(df, c=a-b)`` ``df.assign(c=df['a']-df['b'])`` =========================================== =========================================== @@ -258,8 +258,8 @@ index/slice as well as standard boolean indexing: df = pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)}) df.query('a <= b') - df[df.a <= df.b] - df.loc[df.a <= df.b] + df[df['a'] <= df['b']] + df.loc[df['a'] <= df['b']] For more details and examples see :ref:`the query documentation `. @@ -284,7 +284,7 @@ In ``pandas`` the equivalent expression, using the df = pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)}) df.eval('a + b') - df.a + df.b # same as the previous expression + df['a'] + df['b'] # same as the previous expression In certain cases :meth:`~pandas.DataFrame.eval` will be much faster than evaluation in pure Python. For more details and examples see :ref:`the eval diff --git a/doc/source/user_guide/advanced.rst b/doc/source/user_guide/advanced.rst index 22a9791ffde30..62a9b6396404a 100644 --- a/doc/source/user_guide/advanced.rst +++ b/doc/source/user_guide/advanced.rst @@ -738,7 +738,7 @@ and allows efficient indexing and storage of an index with a large number of dup df['B'] = df['B'].astype(CategoricalDtype(list('cab'))) df df.dtypes - df.B.cat.categories + df['B'].cat.categories Setting the index will create a ``CategoricalIndex``. diff --git a/doc/source/user_guide/cookbook.rst b/doc/source/user_guide/cookbook.rst index 15af5208a4f1f..c9d3bc3a28c70 100644 --- a/doc/source/user_guide/cookbook.rst +++ b/doc/source/user_guide/cookbook.rst @@ -592,8 +592,8 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to .. ipython:: python df = pd.DataFrame([0, 1, 0, 1, 1, 1, 0, 1, 1], columns=['A']) - df.A.groupby((df.A != df.A.shift()).cumsum()).groups - df.A.groupby((df.A != df.A.shift()).cumsum()).cumsum() + df['A'].groupby((df['A'] != df['A'].shift()).cumsum()).groups + df['A'].groupby((df['A'] != df['A'].shift()).cumsum()).cumsum() Expanding data ************** @@ -719,7 +719,7 @@ Rolling Apply to multiple columns where function calculates a Series before a Sc df def gm(df, const): - v = ((((df.A + df.B) + 1).cumprod()) - 1) * const + v = ((((df['A'] + df['B']) + 1).cumprod()) - 1) * const return v.iloc[-1] s = pd.Series({df.index[i]: gm(df.iloc[i:min(i + 51, len(df) - 1)], 5) diff --git a/doc/source/user_guide/enhancingperf.rst b/doc/source/user_guide/enhancingperf.rst index b77bfb9778837..4af05a3f119b0 100644 --- a/doc/source/user_guide/enhancingperf.rst +++ b/doc/source/user_guide/enhancingperf.rst @@ -393,15 +393,15 @@ Consider the following toy example of doubling each observation: .. code-block:: ipython # Custom function without numba - In [5]: %timeit df['col1_doubled'] = df.a.apply(double_every_value_nonumba) # noqa E501 + In [5]: %timeit df['col1_doubled'] = df['a'].apply(double_every_value_nonumba) # noqa E501 1000 loops, best of 3: 797 us per loop # Standard implementation (faster than a custom function) - In [6]: %timeit df['col1_doubled'] = df.a * 2 + In [6]: %timeit df['col1_doubled'] = df['a'] * 2 1000 loops, best of 3: 233 us per loop # Custom function with numba - In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.to_numpy()) + In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df['a'].to_numpy()) 1000 loops, best of 3: 145 us per loop Caveats @@ -475,7 +475,7 @@ These operations are supported by :func:`pandas.eval`: * Comparison operations, including chained comparisons, e.g., ``2 < df < df2`` * Boolean operations, e.g., ``df < df2 and df3 < df4 or not df_bool`` * ``list`` and ``tuple`` literals, e.g., ``[1, 2]`` or ``(1, 2)`` -* Attribute access, e.g., ``df.a`` +* Attribute access, e.g., ``df['a']`` * Subscript expressions, e.g., ``df[0]`` * Simple variable evaluation, e.g., ``pd.eval('df')`` (this is not very useful) * Math functions: `sin`, `cos`, `exp`, `log`, `expm1`, `log1p`, @@ -643,8 +643,8 @@ The equivalent in standard Python would be .. ipython:: python df = pd.DataFrame(dict(a=range(5), b=range(5, 10))) - df['c'] = df.a + df.b - df['d'] = df.a + df.b + df.c + df['c'] = df['a'] + df['b'] + df['d'] = df['a'] + df['b'] + df['c'] df['a'] = 1 df @@ -688,7 +688,7 @@ name in an expression. a = np.random.randn() df.query('@a < a') - df.loc[a < df.a] # same as the previous expression + df.loc[a < df['a']] # same as the previous expression With :func:`pandas.eval` you cannot use the ``@`` prefix *at all*, because it isn't defined in that context. ``pandas`` will let you know this if you try to diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst index e8f24b3734757..fc61f90f21adf 100644 --- a/doc/source/user_guide/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -206,11 +206,11 @@ as an attribute: .. warning:: - - You can use this access only if the index element is a valid Python identifier, e.g. ``s.1`` is not allowed. + - You can use this access only if the index element is a valid Python identifier, e.g. ``s.1`` is not allowed, but neither is ``s['1']``. See `here for an explanation of valid identifiers `__. - - The attribute will not be available if it conflicts with an existing method name, e.g. ``s.min`` is not allowed. + - The attribute will not be available if it conflicts with an existing method name, e.g. ``s.min`` is not allowed, but ``s['min']``is possible. - Similarly, the attribute will not be available if it conflicts with any of the following list: ``index``, ``major_axis``, ``minor_axis``, ``items``. @@ -236,7 +236,7 @@ new column. In 0.21.0 and later, this will raise a ``UserWarning``: .. code-block:: ipython In [1]: df = pd.DataFrame({'one': [1., 2., 3.]}) - In [2]: df['two'] = [4, 5, 6] + In [2]: df.two = [4, 5, 6] UserWarning: Pandas doesn't allow Series to be assigned into nonexistent columns - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute_access In [3]: df Out[3]: @@ -552,7 +552,7 @@ You can use callable indexing in ``Series``. .. ipython:: python - df1.A.loc[lambda s: s > 0] + df1['A']loc[lambda s: s > 0] Using these methods / indexers, you can chain data selection operations without using a temporary variable. @@ -1352,7 +1352,7 @@ You can negate boolean expressions with the word ``not`` or the ``~`` operator. df['bools'] = np.random.rand(len(df)) > 0.5 df.query('~bools') df.query('not bools') - df.query('not bools') == df[~df.bools] + df.query('not bools') == df[~df['bools']] Of course, expressions can be arbitrarily complex too: @@ -1362,7 +1362,7 @@ Of course, expressions can be arbitrarily complex too: shorter = df.query('a < b < c and (not bools) or bools > 2') # equivalent in pure Python - longer = df[(df['a'] < df['b']) & (df['b'] < df['c']) & (~df.bools) | (df.bools > 2)] + longer = df[(df['a'] < df['b']) & (df['b'] < df['c']) & (~df['bools']) | (df['bools'] > 2)] shorter longer @@ -1835,14 +1835,14 @@ chained indexing expression, you can set the :ref:`option ` # This will show the SettingWithCopyWarning # but the frame values will be set - dfb['c'][dfb.a.str.startswith('o')] = 42 + dfb['c'][dfb['a'].str.startswith('o')] = 42 This however is operating on a copy and will not work. :: >>> pd.set_option('mode.chained_assignment','warn') - >>> dfb[dfb.a.str.startswith('o')]['c'] = 42 + >>> dfb[dfb['a'].str.startswith('o')]['c'] = 42 Traceback (most recent call last) ... SettingWithCopyWarning: diff --git a/doc/source/user_guide/reshaping.rst b/doc/source/user_guide/reshaping.rst index f118fe84d523a..dd6d3062a8f0a 100644 --- a/doc/source/user_guide/reshaping.rst +++ b/doc/source/user_guide/reshaping.rst @@ -469,7 +469,7 @@ If ``crosstab`` receives only two Series, it will provide a frequency table. 'C': [1, 1, np.nan, 1, 1]}) df - pd.crosstab(df.A, df.B) + pd.crosstab(df['A'], df['B']) Any input passed containing ``Categorical`` data will have **all** of its categories included in the cross-tabulation, even if the actual data does @@ -489,13 +489,13 @@ using the ``normalize`` argument: .. ipython:: python - pd.crosstab(df.A, df.B, normalize=True) + pd.crosstab(df['A'], df['B'], normalize=True) ``normalize`` can also normalize values within each row or within each column: .. ipython:: python - pd.crosstab(df.A, df.B, normalize='columns') + pd.crosstab(df['A'], df['B'], normalize='columns') ``crosstab`` can also be passed a third ``Series`` and an aggregation function (``aggfunc``) that will be applied to the values of the third ``Series`` within @@ -503,7 +503,7 @@ each group defined by the first two ``Series``: .. ipython:: python - pd.crosstab(df.A, df.B, values=df.C, aggfunc=np.sum) + pd.crosstab(df['A'], df['B'], values=df['C'], aggfunc=np.sum) Adding margins ~~~~~~~~~~~~~~ @@ -512,7 +512,7 @@ Finally, one can also add margins or normalize this output. .. ipython:: python - pd.crosstab(df.A, df.B, values=df.C, aggfunc=np.sum, normalize=True, + pd.crosstab(df['A'], df['B'], values=df['C'], aggfunc=np.sum, normalize=True, margins=True) .. _reshaping.tile: diff --git a/doc/source/user_guide/visualization.rst b/doc/source/user_guide/visualization.rst index fdceaa5868cec..fa16b2f216610 100644 --- a/doc/source/user_guide/visualization.rst +++ b/doc/source/user_guide/visualization.rst @@ -1148,10 +1148,10 @@ To plot data on a secondary y-axis, use the ``secondary_y`` keyword: .. ipython:: python - df.A.plot() + df['A'].plot() @savefig series_plot_secondary_y.png - df.B.plot(secondary_y=True, style='g') + df['B'].plot(secondary_y=True, style='g') .. ipython:: python :suppress: @@ -1205,7 +1205,7 @@ Here is the default behavior, notice how the x-axis tick labeling is performed: plt.figure() @savefig ser_plot_suppress.png - df.A.plot() + df['A'].plot() .. ipython:: python :suppress: @@ -1219,7 +1219,7 @@ Using the ``x_compat`` parameter, you can suppress this behavior: plt.figure() @savefig ser_plot_suppress_parm.png - df.A.plot(x_compat=True) + df['A'].plot(x_compat=True) .. ipython:: python :suppress: @@ -1235,9 +1235,9 @@ in ``pandas.plotting.plot_params`` can be used in a `with statement`: @savefig ser_plot_suppress_context.png with pd.plotting.plot_params.use('x_compat', True): - df.A.plot(color='r') - df.B.plot(color='g') - df.C.plot(color='b') + df['A'].plot(color='r') + df['B'].plot(color='g') + df['C'].plot(color='b') .. ipython:: python :suppress: From 0d0c75a21cab7916bfb8dcd99491d85e976fa9a5 Mon Sep 17 00:00:00 2001 From: Katrin Leinweber Date: Fri, 26 Jul 2019 20:42:24 +0200 Subject: [PATCH 3/8] Fix CI error / typo --- doc/source/user_guide/indexing.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst index fc61f90f21adf..2cd4f436e5ee3 100644 --- a/doc/source/user_guide/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -552,7 +552,7 @@ You can use callable indexing in ``Series``. .. ipython:: python - df1['A']loc[lambda s: s > 0] + df1['A'].loc[lambda s: s > 0] Using these methods / indexers, you can chain data selection operations without using a temporary variable. From c8c992d539ec45395e70cc7b1b408def31be5b77 Mon Sep 17 00:00:00 2001 From: Katrin Leinweber Date: Fri, 26 Jul 2019 21:44:20 +0200 Subject: [PATCH 4/8] Fix CI error / typo --- doc/source/user_guide/indexing.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst index 2cd4f436e5ee3..12e86e9397b53 100644 --- a/doc/source/user_guide/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -1362,7 +1362,8 @@ Of course, expressions can be arbitrarily complex too: shorter = df.query('a < b < c and (not bools) or bools > 2') # equivalent in pure Python - longer = df[(df['a'] < df['b']) & (df['b'] < df['c']) & (~df['bools']) | (df['bools'] > 2)] + longer = df[(df['a'] < df['b']) & (df['b'] < df['c']) & \ + (~df['bools']) | (df['bools'] > 2)] shorter longer From 66a64e64f52a99a67fb9549e9f479be797a205ae Mon Sep 17 00:00:00 2001 From: Katrin Leinweber Date: Sat, 27 Jul 2019 16:19:20 +0200 Subject: [PATCH 5/8] Fix CI error / typo --- doc/source/user_guide/indexing.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst index 12e86e9397b53..add776632e552 100644 --- a/doc/source/user_guide/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -1362,8 +1362,8 @@ Of course, expressions can be arbitrarily complex too: shorter = df.query('a < b < c and (not bools) or bools > 2') # equivalent in pure Python - longer = df[(df['a'] < df['b']) & (df['b'] < df['c']) & \ - (~df['bools']) | (df['bools'] > 2)] + longer = df[(df['a'] < df['b']) & (df['b'] < df['c']) + & (~df['bools']) | (df['bools'] > 2)] shorter longer From 1fe41f53fb4dad067c6f5f357b4a60b828af3832 Mon Sep 17 00:00:00 2001 From: Katrin Leinweber Date: Sat, 27 Jul 2019 16:41:05 +0200 Subject: [PATCH 6/8] Revert or adjust some misfitting bracket notations as reviewed --- doc/source/getting_started/10min.rst | 2 +- doc/source/user_guide/enhancingperf.rst | 2 +- doc/source/user_guide/indexing.rst | 8 +++++--- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/doc/source/getting_started/10min.rst b/doc/source/getting_started/10min.rst index 6ef5b868b3dce..b4c65d1e173b3 100644 --- a/doc/source/getting_started/10min.rst +++ b/doc/source/getting_started/10min.rst @@ -170,7 +170,7 @@ Getting ~~~~~~~ Selecting a single column, which yields a ``Series``, -equivalent to ``df['A']``: +equivalent to ``d.A``: .. ipython:: python diff --git a/doc/source/user_guide/enhancingperf.rst b/doc/source/user_guide/enhancingperf.rst index 4af05a3f119b0..40c8c207c847c 100644 --- a/doc/source/user_guide/enhancingperf.rst +++ b/doc/source/user_guide/enhancingperf.rst @@ -475,7 +475,7 @@ These operations are supported by :func:`pandas.eval`: * Comparison operations, including chained comparisons, e.g., ``2 < df < df2`` * Boolean operations, e.g., ``df < df2 and df3 < df4 or not df_bool`` * ``list`` and ``tuple`` literals, e.g., ``[1, 2]`` or ``(1, 2)`` -* Attribute access, e.g., ``df['a']`` +* Attribute access, e.g., ``df.a`` * Subscript expressions, e.g., ``df[0]`` * Simple variable evaluation, e.g., ``pd.eval('df')`` (this is not very useful) * Math functions: `sin`, `cos`, `exp`, `log`, `expm1`, `log1p`, diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst index add776632e552..77f0cfaf21082 100644 --- a/doc/source/user_guide/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -206,7 +206,7 @@ as an attribute: .. warning:: - - You can use this access only if the index element is a valid Python identifier, e.g. ``s.1`` is not allowed, but neither is ``s['1']``. + - You can use this access only if the index element is a valid Python identifier, e.g. ``s.1`` is not allowed. See `here for an explanation of valid identifiers `__. @@ -1362,8 +1362,10 @@ Of course, expressions can be arbitrarily complex too: shorter = df.query('a < b < c and (not bools) or bools > 2') # equivalent in pure Python - longer = df[(df['a'] < df['b']) & (df['b'] < df['c']) - & (~df['bools']) | (df['bools'] > 2)] + longer = df[(df['a'] < df['b']) + & (df['b'] < df['c']) + & (~df['bools']) + | (df['bools'] > 2)] shorter longer From ef6fb4152dcb297a81492f4d17c30aecc8eb83f1 Mon Sep 17 00:00:00 2001 From: Katrin Leinweber <9948149+katrinleinweber@users.noreply.github.com> Date: Mon, 29 Jul 2019 18:26:30 +0200 Subject: [PATCH 7/8] Fix typo Co-Authored-By: Joris Van den Bossche --- doc/source/getting_started/10min.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/getting_started/10min.rst b/doc/source/getting_started/10min.rst index b4c65d1e173b3..d3ad6f99d5ecd 100644 --- a/doc/source/getting_started/10min.rst +++ b/doc/source/getting_started/10min.rst @@ -170,7 +170,7 @@ Getting ~~~~~~~ Selecting a single column, which yields a ``Series``, -equivalent to ``d.A``: +equivalent to ``df.A``: .. ipython:: python From 098531ca1996905d462c682810b46cda614752a5 Mon Sep 17 00:00:00 2001 From: Katrin Leinweber <9948149+katrinleinweber@users.noreply.github.com> Date: Mon, 29 Jul 2019 18:26:41 +0200 Subject: [PATCH 8/8] Fix typo Co-Authored-By: Joris Van den Bossche --- doc/source/user_guide/indexing.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst index 77f0cfaf21082..cf55ce0c9a6d4 100644 --- a/doc/source/user_guide/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -210,7 +210,7 @@ as an attribute: See `here for an explanation of valid identifiers `__. - - The attribute will not be available if it conflicts with an existing method name, e.g. ``s.min`` is not allowed, but ``s['min']``is possible. + - The attribute will not be available if it conflicts with an existing method name, e.g. ``s.min`` is not allowed, but ``s['min']`` is possible. - Similarly, the attribute will not be available if it conflicts with any of the following list: ``index``, ``major_axis``, ``minor_axis``, ``items``.