Skip to content

DOC: use black to fix code style in doc pandas-dev#36777 #36813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 2, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 98 additions & 55 deletions doc/source/getting_started/comparison/comparison_with_r.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,16 +122,16 @@ Selecting multiple columns by name in ``pandas`` is straightforward

.. ipython:: python

df = pd.DataFrame(np.random.randn(10, 3), columns=list('abc'))
df[['a', 'c']]
df.loc[:, ['a', 'c']]
df = pd.DataFrame(np.random.randn(10, 3), columns=list("abc"))
df[["a", "c"]]
df.loc[:, ["a", "c"]]

Selecting multiple noncontiguous columns by integer location can be achieved
with a combination of the ``iloc`` indexer attribute and ``numpy.r_``.

.. ipython:: python

named = list('abcdefg')
named = list("abcdefg")
n = 30
columns = named + np.arange(len(named), n).tolist()
df = pd.DataFrame(np.random.randn(n, n), columns=columns)
Expand Down Expand Up @@ -160,14 +160,29 @@ function.
.. ipython:: python

df = pd.DataFrame(
{'v1': [1, 3, 5, 7, 8, 3, 5, np.nan, 4, 5, 7, 9],
'v2': [11, 33, 55, 77, 88, 33, 55, np.nan, 44, 55, 77, 99],
'by1': ["red", "blue", 1, 2, np.nan, "big", 1, 2, "red", 1, np.nan, 12],
'by2': ["wet", "dry", 99, 95, np.nan, "damp", 95, 99, "red", 99, np.nan,
np.nan]})

g = df.groupby(['by1', 'by2'])
g[['v1', 'v2']].mean()
{
"v1": [1, 3, 5, 7, 8, 3, 5, np.nan, 4, 5, 7, 9],
"v2": [11, 33, 55, 77, 88, 33, 55, np.nan, 44, 55, 77, 99],
"by1": ["red", "blue", 1, 2, np.nan, "big", 1, 2, "red", 1, np.nan, 12],
"by2": [
"wet",
"dry",
99,
95,
np.nan,
"damp",
95,
99,
"red",
99,
np.nan,
np.nan,
],
}
)

g = df.groupby(["by1", "by2"])
g[["v1", "v2"]].mean()

For more details and examples see :ref:`the groupby documentation
<groupby.split>`.
Expand Down Expand Up @@ -228,11 +243,14 @@ In ``pandas`` we may use :meth:`~pandas.pivot_table` method to handle this:
import string

baseball = pd.DataFrame(
{'team': ["team %d" % (x + 1) for x in range(5)] * 5,
'player': random.sample(list(string.ascii_lowercase), 25),
'batting avg': np.random.uniform(.200, .400, 25)})
{
"team": ["team %d" % (x + 1) for x in range(5)] * 5,
"player": random.sample(list(string.ascii_lowercase), 25),
"batting avg": np.random.uniform(0.200, 0.400, 25),
}
)

baseball.pivot_table(values='batting avg', columns='team', aggfunc=np.max)
baseball.pivot_table(values="batting avg", columns="team", aggfunc=np.max)

For more details and examples see :ref:`the reshaping documentation
<reshaping.pivot>`.
Expand All @@ -256,10 +274,10 @@ index/slice as well as standard boolean indexing:

.. ipython:: python

df = pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)})
df.query('a <= b')
df[df['a'] <= df['b']]
df.loc[df['a'] <= df['b']]
df = pd.DataFrame({"a": np.random.randn(10), "b": np.random.randn(10)})
df.query("a <= b")
df[df["a"] <= df["b"]]
df.loc[df["a"] <= df["b"]]

For more details and examples see :ref:`the query documentation
<indexing.query>`.
Expand All @@ -282,9 +300,9 @@ In ``pandas`` the equivalent expression, using the

.. ipython:: python

df = pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)})
df.eval('a + b')
df['a'] + df['b'] # same as the previous expression
df = pd.DataFrame({"a": np.random.randn(10), "b": np.random.randn(10)})
df.eval("a + b")
df["a"] + df["b"] # same as the previous expression

In certain cases :meth:`~pandas.DataFrame.eval` will be much faster than
evaluation in pure Python. For more details and examples see :ref:`the eval
Expand Down Expand Up @@ -334,14 +352,18 @@ In ``pandas`` the equivalent expression, using the

.. ipython:: python

df = pd.DataFrame({'x': np.random.uniform(1., 168., 120),
'y': np.random.uniform(7., 334., 120),
'z': np.random.uniform(1.7, 20.7, 120),
'month': [5, 6, 7, 8] * 30,
'week': np.random.randint(1, 4, 120)})
df = pd.DataFrame(
{
"x": np.random.uniform(1.0, 168.0, 120),
"y": np.random.uniform(7.0, 334.0, 120),
"z": np.random.uniform(1.7, 20.7, 120),
"month": [5, 6, 7, 8] * 30,
"week": np.random.randint(1, 4, 120),
}
)

grouped = df.groupby(['month', 'week'])
grouped['x'].agg([np.mean, np.std])
grouped = df.groupby(["month", "week"])
grouped["x"].agg([np.mean, np.std])


For more details and examples see :ref:`the groupby documentation
Expand Down Expand Up @@ -410,13 +432,17 @@ In Python, the :meth:`~pandas.melt` method is the R equivalent:

.. ipython:: python

cheese = pd.DataFrame({'first': ['John', 'Mary'],
'last': ['Doe', 'Bo'],
'height': [5.5, 6.0],
'weight': [130, 150]})
cheese = pd.DataFrame(
{
"first": ["John", "Mary"],
"last": ["Doe", "Bo"],
"height": [5.5, 6.0],
"weight": [130, 150],
}
)

pd.melt(cheese, id_vars=['first', 'last'])
cheese.set_index(['first', 'last']).stack() # alternative way
pd.melt(cheese, id_vars=["first", "last"])
cheese.set_index(["first", "last"]).stack() # alternative way

For more details and examples see :ref:`the reshaping documentation
<reshaping.melt>`.
Expand Down Expand Up @@ -444,15 +470,24 @@ In Python the best way is to make use of :meth:`~pandas.pivot_table`:

.. ipython:: python

df = pd.DataFrame({'x': np.random.uniform(1., 168., 12),
'y': np.random.uniform(7., 334., 12),
'z': np.random.uniform(1.7, 20.7, 12),
'month': [5, 6, 7] * 4,
'week': [1, 2] * 6})

mdf = pd.melt(df, id_vars=['month', 'week'])
pd.pivot_table(mdf, values='value', index=['variable', 'week'],
columns=['month'], aggfunc=np.mean)
df = pd.DataFrame(
{
"x": np.random.uniform(1.0, 168.0, 12),
"y": np.random.uniform(7.0, 334.0, 12),
"z": np.random.uniform(1.7, 20.7, 12),
"month": [5, 6, 7] * 4,
"week": [1, 2] * 6,
}
)

mdf = pd.melt(df, id_vars=["month", "week"])
pd.pivot_table(
mdf,
values="value",
index=["variable", "week"],
columns=["month"],
aggfunc=np.mean,
)

Similarly for ``dcast`` which uses a data.frame called ``df`` in R to
aggregate information based on ``Animal`` and ``FeedType``:
Expand All @@ -475,21 +510,29 @@ using :meth:`~pandas.pivot_table`:

.. ipython:: python

df = pd.DataFrame({
'Animal': ['Animal1', 'Animal2', 'Animal3', 'Animal2', 'Animal1',
'Animal2', 'Animal3'],
'FeedType': ['A', 'B', 'A', 'A', 'B', 'B', 'A'],
'Amount': [10, 7, 4, 2, 5, 6, 2],
})

df.pivot_table(values='Amount', index='Animal', columns='FeedType',
aggfunc='sum')
df = pd.DataFrame(
{
"Animal": [
"Animal1",
"Animal2",
"Animal3",
"Animal2",
"Animal1",
"Animal2",
"Animal3",
],
"FeedType": ["A", "B", "A", "A", "B", "B", "A"],
"Amount": [10, 7, 4, 2, 5, 6, 2],
}
)

df.pivot_table(values="Amount", index="Animal", columns="FeedType", aggfunc="sum")

The second approach is to use the :meth:`~pandas.DataFrame.groupby` method:

.. ipython:: python

df.groupby(['Animal', 'FeedType'])['Amount'].sum()
df.groupby(["Animal", "FeedType"])["Amount"].sum()

For more details and examples see :ref:`the reshaping documentation
<reshaping.pivot>` or :ref:`the groupby documentation<groupby.split>`.
Expand Down
Loading