Skip to content

DOC: Clarify output of diff (returned type and possible overflow) #32699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jun 1, 2020
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
3d63374
DOC: Note about dtypes in diff in Dataframe
mproszewska Mar 14, 2020
a53df3a
DOC: Note about dtypes in diff in Series
mproszewska Mar 14, 2020
c1d875a
DOC: Change comment
mproszewska Mar 21, 2020
2e8ccd0
DOC: Fix spaces
mproszewska Mar 21, 2020
50d55ee
DOC: Add doc decorator and overflow examples
mproszewska Mar 27, 2020
3efba72
DOC: Remove appender
mproszewska Apr 3, 2020
b7dd328
DOC: fix
mproszewska Apr 3, 2020
7fc66b9
DOC: fix
mproszewska Apr 3, 2020
8d53336
DOC: Fix
mproszewska Apr 3, 2020
ecf74e5
DOC: Fix strings
mproszewska Apr 9, 2020
74fe0e4
DOC: Fix
mproszewska Apr 9, 2020
3accc51
DOC: Fix
mproszewska Apr 9, 2020
45558c0
DOC: Fix
mproszewska Apr 9, 2020
df4e7d1
DOC: Add newlines
mproszewska Apr 9, 2020
16451c0
Merge branch 'master' into doc
mproszewska Apr 9, 2020
c9cd6c7
DOC: Fix newline
mproszewska Apr 9, 2020
ee062bc
DOC: Add dedent
mproszewska Apr 9, 2020
0f18905
DOC: Lint
mproszewska Apr 16, 2020
e799453
DOC: Lint
mproszewska Apr 16, 2020
b36b310
Run tests
mproszewska Apr 24, 2020
b92f42b
Change test_diff
mproszewska Apr 24, 2020
f03d4e9
Change stacklevel
mproszewska May 4, 2020
4a5b36f
Fix lint
mproszewska May 5, 2020
08fe128
Update algorithms.py
mproszewska May 5, 2020
5859ea6
Merge branch 'doc' of https://github.com/mproszewska/pandas into doc
mproszewska May 5, 2020
c94b45e
PERF: Remove unnecessary copies in sorting functions
mproszewska May 15, 2020
0ab450b
Run tests
mproszewska May 16, 2020
54c7304
Run tests
mproszewska May 16, 2020
6d72a34
Add asv
mproszewska May 22, 2020
5ba54a6
Run black
mproszewska May 22, 2020
2766270
Remove asv
mproszewska May 22, 2020
91176ca
Merge branch 'perf'
mproszewska May 24, 2020
a53d937
Add requested change
mproszewska May 28, 2020
b46a77d
Merge branch 'master' into doc
mproszewska May 28, 2020
949bcc0
Fix stacklevel
mproszewska May 28, 2020
234689d
Revert change
mproszewska May 31, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 20 additions & 35 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -7019,40 +7019,14 @@ def melt(
# ----------------------------------------------------------------------
# Time series-related

def diff(self, periods: int = 1, axis: Axis = 0) -> "DataFrame":
"""
First discrete difference of element.

Calculates the difference of a DataFrame element compared with another
element in the DataFrame (default is the element in the same column
of the previous row).

Parameters
----------
periods : int, default 1
Periods to shift for calculating difference, accepts negative
values.
axis : {0 or 'index', 1 or 'columns'}, default 0
Take difference over rows (0) or columns (1).

Returns
-------
DataFrame

See Also
--------
Series.diff: First discrete difference for a Series.
DataFrame.pct_change: Percent change over given number of periods.
DataFrame.shift: Shift index by desired number of periods with an
optional time freq.

Notes
-----
For boolean dtypes, this uses :meth:`operator.xor` rather than
:meth:`operator.sub`.

Examples
--------
@doc(
Series.diff,
klass="Dataframe",
extra_params="axis : {0 or 'index', 1 or 'columns'}, default 0\n "
"Take difference over rows (0) or columns (1).\n",
other_klass="Series",
examples=dedent(
"""
Difference with previous row

>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
Expand Down Expand Up @@ -7108,7 +7082,18 @@ def diff(self, periods: int = 1, axis: Axis = 0) -> "DataFrame":
3 -1.0 -2.0 -9.0
4 -1.0 -3.0 -11.0
5 NaN NaN NaN
"""

Overflow in input dtype

>>> df = pd.DataFrame({'a': [1, 0]}, dtype=np.uint8)
>>> df.diff()
a
0 NaN
1 255.0"""
),
)
def diff(self, periods: int = 1, axis: Axis = 0) -> "DataFrame":

bm_axis = self._get_block_manager_axis(axis)
self._consolidate_inplace()

Expand Down
83 changes: 51 additions & 32 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -2293,38 +2293,12 @@ def cov(self, other, min_periods=None) -> float:
return np.nan
return nanops.nancov(this.values, other.values, min_periods=min_periods)

def diff(self, periods: int = 1) -> "Series":
"""
First discrete difference of element.

Calculates the difference of a Series element compared with another
element in the Series (default is element in previous row).

Parameters
----------
periods : int, default 1
Periods to shift for calculating difference, accepts negative
values.

Returns
-------
Series
First differences of the Series.

See Also
--------
Series.pct_change: Percent change over given number of periods.
Series.shift: Shift index by desired number of periods with an
optional time freq.
DataFrame.diff: First discrete difference of object.

Notes
-----
For boolean dtypes, this uses :meth:`operator.xor` rather than
:meth:`operator.sub`.

Examples
--------
@doc(
klass="Series",
extra_params="",
other_klass="DataFrame",
examples=dedent(
"""
Difference with previous row

>>> s = pd.Series([1, 1, 2, 3, 5, 8])
Expand Down Expand Up @@ -2358,6 +2332,51 @@ def diff(self, periods: int = 1) -> "Series":
4 -3.0
5 NaN
dtype: float64

Overflow in input dtype

>>> s = pd.Series([1, 0], dtype=np.uint8)
>>> s.diff()
0 NaN
1 255.0
dtype: float64"""
),
)
def diff(self, periods: int = 1) -> "Series":
"""
First discrete difference of element.

Calculates the difference of a {klass} element compared with another
element in the {klass} (default is element in previous row).

Parameters
----------
periods : int, default 1
Periods to shift for calculating difference, accepts negative
values.
{extra_params}
Returns
-------
{klass}
First differences of the Series.

See Also
--------
{klass}.pct_change: Percent change over given number of periods.
{klass}.shift: Shift index by desired number of periods with an
optional time freq.
{other_klass}.diff: First discrete difference of object.

Notes
-----
For boolean dtypes, this uses :meth:`operator.xor` rather than
:meth:`operator.sub`.
The result is calculated according to current dtype in {klass},
however dtype of the result is always float64.

Examples
--------
{examples}
"""
result = algorithms.diff(self.array, periods)
return self._constructor(result, index=self.index).__finalize__(
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/sorting.py
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,7 @@ def ensure_key_mapped(values, key: Optional[Callable], levels=None):
from pandas.core.indexes.api import Index

if not key:
return values.copy()
return values

if isinstance(values, ABCMultiIndex):
return ensure_key_mapped_multiindex(values, key, level=levels)
Expand Down