Skip to content

API: Make .shift always copy (Fixes #22397) #22517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 15, 2018

Conversation

AaronCritchley
Copy link
Contributor

@codecov
Copy link

codecov bot commented Aug 27, 2018

Codecov Report

Merging #22517 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22517      +/-   ##
==========================================
- Coverage   92.04%   92.03%   -0.01%     
==========================================
  Files         169      169              
  Lines       50787    50780       -7     
==========================================
- Hits        46746    46737       -9     
- Misses       4041     4043       +2
Flag Coverage Δ
#multiple 90.44% <100%> (-0.01%) ⬇️
#single 42.22% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimelike.py 94.02% <100%> (ø) ⬆️
pandas/core/generic.py 96.44% <100%> (ø) ⬆️
pandas/util/_depr_module.py 65.11% <0%> (-2.33%) ⬇️
pandas/io/formats/style.py 96.16% <0%> (-0.27%) ⬇️
pandas/core/strings.py 98.63% <0%> (-0.01%) ⬇️
pandas/util/testing.py 85.75% <0%> (ø) ⬆️
pandas/core/indexes/category.py 97.28% <0%> (ø) ⬆️
pandas/io/parsers.py 95.48% <0%> (ø) ⬆️
pandas/core/resample.py 96.13% <0%> (ø) ⬆️
pandas/core/sparse/array.py 91.38% <0%> (ø) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa47b8d...8d2fc41. Read the comment docs.

# GH22397
s = Series([1, 2, 3])
assert s.shift(0) is not s
assert s.shift(1) is not s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we parametrize this? (I know it's only two examples, but why not three for example?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to be as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gfyoung gfyoung requested a review from TomAugspurger August 27, 2018 09:48
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

We should confirm that we're OK with outright changing the behavior, rather than deprecating it first. Deprecation would require adding a keyword that would then become obsolete, which I would rather avoid.

# GH22397
s = Series([1, 2, 3])
assert s.shift(0) is not s
assert s.shift(1) is not s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to be as is.


def test_shift_always_copy(self):
# GH22397
s = Series([1, 2, 3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u also test for datetime types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (assuming DatetimeIndex was meant, which required another fix) if you just meant a series of datetimes, will change - let me know

@@ -227,3 +227,12 @@ def test_valid_deprecated(self):
# GH18800
with tm.assert_produces_warning(FutureWarning):
pd.Series([]).valid()

@pytest.mark.parametrize("s", [
Series([np.arange(5)]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add period & timedelta here as well :<

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont really have a great place for these series / index tests functionailty, except for test_base which is already too big.

cc @jbrockmendel @mroeschke

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ofc, will do asap, thanks for the input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on this, but can you please clarify? I wasn't sure if it was expected that we can shift Period / Timedelta's or if we're supposed to shift by them, things I've tested below:

Shifting on them:

>>> pd.Timedelta('1 days 2 hours').shift
Traceback (most recent call last):
  File "/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-17-fc984c065985>", line 1, in <module>
    pd.Timedelta('1 days 2 hours').shift
AttributeError: 'Timedelta' object has no attribute 'shift'
>>> pd.Period('2011-01', 'M').shift
Traceback (most recent call last):
  File "/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-18-d3c2e47fd2e3>", line 1, in <module>
    pd.Period('2011-01', 'M').shift
AttributeError: 'Period' object has no attribute 'shift'

Shifting with them:

>>> pd.date_range('1/1/2011', periods=24, freq='H').shift(pd.Timedelta('1 days 2 hours'))
Traceback (most recent call last):
  File "/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-19-f8499141727f>", line 1, in <module>
    pd.date_range('1/1/2011', periods=24, freq='H').shift(pd.Timedelta('1 days 2 hours'))
  File "/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py", line 784, in shift
    start = self[0] + n * self.freq
  File "/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/pandas/tseries/offsets.py", line 425, in __rmul__
    return self.__mul__(someInt)
  File "/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/pandas/tseries/offsets.py", line 422, in __mul__
    **self.kwds)
  File "/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/pandas/tseries/offsets.py", line 195, in __init__
    self.n = int(n)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'Timedelta'
>>> pd.date_range('1/1/2011', periods=24, freq='H').shift(pd.Period('2011-02', 'M'))
Traceback (most recent call last):
  File "/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-24-b87cfa9cba95>", line 1, in <module>
    pd.date_range('1/1/2011', periods=24, freq='H').shift(pd.Period('2011-02', 'M'))
  File "/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py", line 784, in shift
    start = self[0] + n * self.freq
  File "/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/pandas/tseries/offsets.py", line 425, in __rmul__
    return self.__mul__(someInt)
  File "/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/pandas/tseries/offsets.py", line 421, in __mul__
    return self.__class__(n=someInt * self.n, normalize=self.normalize,
TypeError: unsupported operand type(s) for *: 'Period' and 'int'

Genuine apologies if I'm being stupid here!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is to make a series with a datetimeindex and shift using the freq argument.

In [16]: s = pd.Series(range(5), index=pd.date_range("2017", periods=5))

In [17]: s.shift(freq=pd.Timedelta('1D'))
Out[17]:
2017-01-02    0
2017-01-03    1
2017-01-04    2
2017-01-05    3
2017-01-06    4
Freq: D, dtype: int64

it'll complicate the test a bit. let us know if you need assistance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the test for timedelta, I couldn't find any docs on the intended usage of shifting by pd.Period? I tried a few approaches but couldn't get any to work. Happy to add the test if you could provide a small example!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if period is valid there. I only tried timedelta.

@jreback jreback added this to the 0.24.0 milestone Aug 29, 2018
@pep8speaks
Copy link

Hello @AaronCritchley! Thanks for updating the PR.

@TomAugspurger
Copy link
Contributor

You'll need to fetch and merge master for the CI failures to be fixed.

@jreback jreback merged commit d765dfb into pandas-dev:master Sep 15, 2018
@jreback
Copy link
Contributor

jreback commented Sep 15, 2018

thanks @AaronCritchley

@AaronCritchley AaronCritchley deleted the API-22397-shift-always-copy branch September 15, 2018 17:53
aeltanawy pushed a commit to aeltanawy/pandas that referenced this pull request Sep 20, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make Series.shift always a copy?
5 participants