Skip to content

DOC: Fix format of basics.rst, following PEP-8 standard #23802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 21, 2018
121 changes: 65 additions & 56 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -149,9 +149,10 @@ either match on the *index* or *columns* via the **axis** keyword:

.. ipython:: python

df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})
df = pd.DataFrame({
'one': pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})
df
row = df.iloc[1]
column = df['two']
Expand All @@ -172,8 +173,8 @@ Furthermore you can align a level of a MultiIndexed DataFrame with a Series.
.. ipython:: python

dfmi = df.copy()
dfmi.index = pd.MultiIndex.from_tuples([(1,'a'),(1,'b'),(1,'c'),(2,'a')],
names=['first','second'])
dfmi.index = pd.MultiIndex.from_tuples([
(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a')], names=['first', 'second'])
dfmi.sub(column, axis=0, level='second')

With Panel, describing the matching behavior is a bit more difficult, so
Expand Down Expand Up @@ -336,8 +337,8 @@ fact, this expression is False:

.. ipython:: python

df+df == df*2
(df+df == df*2).all()
df + df == df * 2
(df + df == df * 2).all()

Notice that the boolean DataFrame ``df+df == df*2`` contains some False values!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the missing spaces here and in the paragraph above. They are not detected as they are not in a code block, but would be nice that they follow pep8 too.

This is because NaNs do not compare as equals:
Expand All @@ -352,15 +353,15 @@ corresponding locations treated as equal.

.. ipython:: python

(df+df).equals(df*2)
(df + df).equals(df * 2)

Note that the Series or DataFrame index needs to be in the same order for
equality to be True:

.. ipython:: python

df1 = pd.DataFrame({'col':['foo', 0, np.nan]})
df2 = pd.DataFrame({'col':[np.nan, 0, 'foo']}, index=[2,1,0])
df1 = pd.DataFrame({'col': ['foo', 0, np.nan]})
df2 = pd.DataFrame({'col': [np.nan, 0, 'foo']}, index=[2, 1, 0])
df1.equals(df2)
df1.equals(df2.sort_index())

Expand Down Expand Up @@ -423,10 +424,10 @@ which we illustrate:

.. ipython:: python

df1 = pd.DataFrame({'A' : [1., np.nan, 3., 5., np.nan],
'B' : [np.nan, 2., 3., np.nan, 6.]})
df2 = pd.DataFrame({'A' : [5., 2., 4., np.nan, 3., 7.],
'B' : [np.nan, np.nan, 3., 4., 6., 8.]})
df1 = pd.DataFrame({'A': [1., np.nan, 3., 5., np.nan],
'B': [np.nan, 2., 3., np.nan, 6.]})
df2 = pd.DataFrame({'A': [5., 2., 4., np.nan, 3., 7.],
'B': [np.nan, np.nan, 3., 4., 6., 8.]})
df1
df2
df1.combine_first(df2)
Expand All @@ -443,7 +444,8 @@ So, for instance, to reproduce :meth:`~DataFrame.combine_first` as above:

.. ipython:: python

combiner = lambda x, y: np.where(pd.isna(x), y, x)
def combiner(x, y):
np.where(pd.isna(x), y, x)
df1.combine(df2, combiner)

.. _basics.stats:
Expand Down Expand Up @@ -546,7 +548,7 @@ Series:

series = pd.Series(np.random.randn(500))
series[20:500] = np.nan
series[10:20] = 5
series[10:20] = 5
series.nunique()

.. _basics.describe:
Expand All @@ -563,7 +565,8 @@ course):
series = pd.Series(np.random.randn(1000))
series[::2] = np.nan
series.describe()
frame = pd.DataFrame(np.random.randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
frame = pd.DataFrame(
np.random.randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
frame.iloc[::2] = np.nan
frame.describe()

Expand Down Expand Up @@ -619,7 +622,7 @@ corresponding values:
s1
s1.idxmin(), s1.idxmax()

df1 = pd.DataFrame(np.random.randn(5,3), columns=['A','B','C'])
df1 = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'])
df1
df1.idxmin(axis=0)
df1.idxmax(axis=1)
Expand Down Expand Up @@ -753,11 +756,11 @@ For example, we can fit a regression using statsmodels. Their API expects a form
bb = pd.read_csv('data/baseball.csv', index_col='id')

(bb.query('h > 0')
.assign(ln_h = lambda df: np.log(df.h))
.assign(ln_h=lambda df: np.log(df.h))
.pipe((sm.ols, 'data'), 'hr ~ ln_h + year + g + C(lg)')
.fit()
.summary()
)
)

The pipe method is inspired by unix pipes and more recently dplyr_ and magrittr_, which
have introduced the popular ``(%>%)`` (read pipe) operator for R_.
Expand Down Expand Up @@ -880,7 +883,8 @@ output:

tsdf.agg('sum')

# these are equivalent to a ``.sum()`` because we are aggregating on a single function
# these are equivalent to a ``.sum()`` because we are aggregating
# on a single function
tsdf.sum()

Single aggregations on a ``Series`` this will return a scalar value:
Expand Down Expand Up @@ -924,7 +928,7 @@ Passing a named function will yield that name for the row:
.. ipython:: python

def mymean(x):
return x.mean()
return x.mean()

tsdf.A.agg(['sum', mymean])

Expand Down Expand Up @@ -1038,14 +1042,14 @@ will be the names of the transforming functions.

.. ipython:: python

tsdf.transform([np.abs, lambda x: x+1])
tsdf.transform([np.abs, lambda x: x + 1])

Passing multiple functions to a Series will yield a DataFrame. The
resulting column names will be the transforming functions.

.. ipython:: python

tsdf.A.transform([np.abs, lambda x: x+1])
tsdf.A.transform([np.abs, lambda x: x + 1])


Transforming with a dict
Expand All @@ -1056,15 +1060,15 @@ Passing a dict of functions will allow selective transforming per column.

.. ipython:: python

tsdf.transform({'A': np.abs, 'B': lambda x: x+1})
tsdf.transform({'A': np.abs, 'B': lambda x: x + 1})

Passing a dict of lists will generate a MultiIndexed DataFrame with these
selective transforms.

.. ipython:: python
:okwarning:

tsdf.transform({'A': np.abs, 'B': [lambda x: x+1, 'sqrt']})
tsdf.transform({'A': np.abs, 'B': [lambda x: x + 1, 'sqrt']})

.. _basics.elementwise:

Expand All @@ -1084,7 +1088,8 @@ a single value and returning a single value. For example:
.. ipython:: python

df4
f = lambda x: len(str(x))
def f(x):
len(str(x))
df4['one'].map(f)
df4.applymap(f)

Expand All @@ -1096,7 +1101,7 @@ to :ref:`merging/joining functionality <merging>`:

s = pd.Series(['six', 'seven', 'six', 'seven', 'six'],
index=['a', 'b', 'c', 'd', 'e'])
t = pd.Series({'six' : 6., 'seven' : 7.})
t = pd.Series({'six': 6., 'seven': 7.})
s
s.map(t)

Expand All @@ -1121,7 +1126,7 @@ A transformational apply.

.. ipython:: python

result = panel.apply(lambda x: x*2, axis='items')
result = panel.apply(lambda x: x * 2, axis='items')
result
result['ItemA']

Expand Down Expand Up @@ -1149,8 +1154,8 @@ the z-score across the ``major_axis``.
.. ipython:: python

result = panel.apply(
lambda x: (x-x.mean())/x.std(),
axis='major_axis')
lambda x: (x - x.mean()) / x.std(),
axis='major_axis')
result
result['ItemA']

Expand All @@ -1159,20 +1164,21 @@ Apply can also accept multiple axes in the ``axis`` argument. This will pass a

.. ipython:: python

f = lambda x: ((x.T-x.mean(1))/x.std(1)).T
def f(x):
((x.T - x.mean(1)) / x.std(1)).T

result = panel.apply(f, axis = ['items','major_axis'])
result = panel.apply(f, axis=['items', 'major_axis'])
result
result.loc[:,:,'ItemA']
result.loc[:, :, 'ItemA']

This is equivalent to the following:

.. ipython:: python

result = pd.Panel(dict([ (ax, f(panel.loc[:,:,ax]))
for ax in panel.minor_axis ]))
result = pd.Panel(dict([
(ax, f(panel.loc[:, :, ax])) for ax in panel.minor_axis]))
result
result.loc[:,:,'ItemA']
result.loc[:, :, 'ItemA']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't directly see how the above changes cause this, but, this example is now failing in the doc build with:

>>>-------------------------------------------------------------------------
Exception in /home/travis/build/pandas-dev/pandas/doc/source/basics.rst at block ending on line 1183
Specify :okexcept: as an option in the ipython:: block to suppress this message

KeyErrorTraceback (most recent call last)

~/build/pandas-dev/pandas/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3146             try:
-> 3147                 return self._engine.get_loc(key)
   3148             except KeyError:
~/build/pandas-dev/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/build/pandas-dev/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/build/pandas-dev/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
~/build/pandas-dev/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'ItemA'

During handling of the above exception, another exception occurred:

KeyErrorTraceback (most recent call last)
<ipython-input-209-ce1c0f9e4a44> in <module>
----> 1 result.loc[:, :, 'ItemA']
~/build/pandas-dev/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1495             except (KeyError, IndexError, AttributeError):
   1496                 pass
-> 1497             return self._getitem_tuple(key)
   1498         else:
   1499             # we by definition only have the 0th axis
~/build/pandas-dev/pandas/pandas/core/indexing.py in _getitem_tuple(self, tup)
    868     def _getitem_tuple(self, tup):
    869         try:
--> 870             return self._getitem_lowerdim(tup)
    871         except IndexingError:
    872             pass
~/build/pandas-dev/pandas/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
    988         for i, key in enumerate(tup):
    989             if is_label_like(key) or isinstance(key, tuple):
--> 990                 section = self._getitem_axis(key, axis=i)
    991 
    992                 # we have yielded a scalar ?
~/build/pandas-dev/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1914         # fall thru to straight lookup
   1915         self._validate_key(key, axis)
-> 1916         return self._get_label(key, axis=axis)
   1917 
   1918 
~/build/pandas-dev/pandas/pandas/core/indexing.py in _get_label(self, label, axis)
    146             raise IndexingError('no slices here, handle elsewhere')
    147 
--> 148         return self.obj._xs(label, axis=axis)
    149 
    150     def _get_loc(self, key, axis=None):
~/build/pandas-dev/pandas/pandas/core/panel.py in xs(self, key, axis)
    864         self._consolidate_inplace()
    865         axis_number = self._get_axis_number(axis)
--> 866         new_data = self._data.xs(key, axis=axis_number, copy=False)
    867         result = self._construct_return_type(new_data)
    868         copy = new_data.is_mixed_type
~/build/pandas-dev/pandas/pandas/core/internals/managers.py in xs(self, key, axis, copy, takeable)
    835             loc = key
    836         else:
--> 837             loc = self.axes[axis].get_loc(key)
    838 
    839         slicer = [slice(None, None) for _ in range(self.ndim)]
~/build/pandas-dev/pandas/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3147                 return self._engine.get_loc(key)
   3148             except KeyError:
-> 3149                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   3150         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   3151         if indexer.ndim > 1 or indexer.size > 1:
~/build/pandas-dev/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/build/pandas-dev/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/build/pandas-dev/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
~/build/pandas-dev/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'ItemA'



.. _basics.reindexing:
Expand Down Expand Up @@ -1502,7 +1508,7 @@ Thus, for example, iterating over a DataFrame gives you the column names:

.. ipython::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you see a reason for it, I'd make the .. ipython:: a .. ipython:: python as the rest, and would remove the In [0]...


In [0]: df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
In [0]: df = pd.DataFrame({'col1': np.random.randn(3), 'col2': np.random.randn(3)},
...: index=['a', 'b', 'c'])

In [0]: for col in df:
Expand Down Expand Up @@ -1625,7 +1631,7 @@ For instance, a contrived way to transpose the DataFrame would be:
print(df2)
print(df2.T)

df2_t = pd.DataFrame(dict((idx,values) for idx, values in df2.iterrows()))
df2_t = pd.DataFrame(dict((idx, values) for idx, values in df2.iterrows()))
print(df2_t)

itertuples
Expand Down Expand Up @@ -1676,7 +1682,7 @@ This enables nice expressions like this:

.. ipython:: python

s[s.dt.day==2]
s[s.dt.day == 2]

You can easily produces tz aware transformations:

Expand Down Expand Up @@ -1772,9 +1778,10 @@ used to sort a pandas object by its index levels.

.. ipython:: python

df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})
df = pd.DataFrame({
'one': pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})

unsorted_df = df.reindex(index=['a', 'd', 'c', 'b'],
columns=['three', 'two', 'one'])
Expand All @@ -1800,14 +1807,15 @@ to use to determine the sorted order.

.. ipython:: python

df1 = pd.DataFrame({'one':[2,1,1,1],'two':[1,3,2,4],'three':[5,4,3,2]})
df1 = pd.DataFrame({
'one': [2, 1, 1, 1], 'two': [1, 3, 2, 4], 'three': [5, 4, 3, 2]})
df1.sort_values(by='two')

The ``by`` parameter can take a list of column names, e.g.:

.. ipython:: python

df1[['one', 'two', 'three']].sort_values(by=['one','two'])
df1[['one', 'two', 'three']].sort_values(by=['one', 'two'])

These methods have special treatment of NA values via the ``na_position``
argument:
Expand Down Expand Up @@ -1910,8 +1918,9 @@ all levels to ``by``.

.. ipython:: python

df1.columns = pd.MultiIndex.from_tuples([('a','one'),('a','two'),('b','three')])
df1.sort_values(by=('a','two'))
df1.columns = pd.MultiIndex.from_tuples([
('a', 'one'), ('a', 'two'), ('b', 'three')])
df1.sort_values(by=('a', 'two'))


Copying
Expand Down Expand Up @@ -1960,13 +1969,10 @@ with the data type of each column.

.. ipython:: python

dft = pd.DataFrame(dict(A = np.random.rand(3),
B = 1,
C = 'foo',
dft = pd.DataFrame(dict(A = np.random.rand(3), B = 1, C = 'foo',
D = pd.Timestamp('20010102'),
E = pd.Series([1.0]*3).astype('float32'),
F = False,
G = pd.Series([1]*3,dtype='int8')))
F = False, G = pd.Series([1]*3,dtype='int8')))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you take a look at this example? The indentation doesn't seem right, as well as the spaces around =, and the missing ones around *, and it'd probably make things easier to read having B, C and G in their own line.

dft
dft.dtypes

Expand Down Expand Up @@ -2005,9 +2011,10 @@ different numeric dtypes will **NOT** be combined. The following example will gi
df1 = pd.DataFrame(np.random.randn(8, 1), columns=['A'], dtype='float32')
df1
df1.dtypes
df2 = pd.DataFrame(dict( A = pd.Series(np.random.randn(8), dtype='float16'),
df2 = pd.DataFrame(dict(A = pd.Series(np.random.randn(8), dtype='float16'),
B = pd.Series(np.random.randn(8)),
C = pd.Series(np.array(np.random.randn(8), dtype='uint8')) ))
C = pd.Series(np.array(
np.random.randn(8), dtype='uint8')) ))
df2
df2.dtypes

Expand Down Expand Up @@ -2121,7 +2128,8 @@ to the correct type.
import datetime
df = pd.DataFrame([[1, 2],
['a', 'b'],
[datetime.datetime(2016, 3, 2), datetime.datetime(2016, 3, 2)]])
[datetime.datetime(2016, 3, 2),
datetime.datetime(2016, 3, 2)]])
df = df.T
df
df.dtypes
Expand Down Expand Up @@ -2209,7 +2217,8 @@ as DataFrames. However, with :meth:`~pandas.DataFrame.apply`, we can "apply" the
.. ipython:: python

import datetime
df = pd.DataFrame([['2016-07-09', datetime.datetime(2016, 3, 2)]] * 2, dtype='O')
df = pd.DataFrame([
['2016-07-09', datetime.datetime(2016, 3, 2)]] * 2, dtype='O')
df
df.apply(pd.to_datetime)

Expand Down