Skip to content

Fixing shift() for ExtensionArray #23947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 9, 2018
Merged

Fixing shift() for ExtensionArray #23947

merged 12 commits into from
Dec 9, 2018

Conversation

varadgunjal
Copy link
Contributor

@varadgunjal varadgunjal commented Nov 27, 2018

@pep8speaks
Copy link

pep8speaks commented Nov 27, 2018

Hello @varadgunjal! Thanks for updating the PR.

Comment last updated on November 27, 2018 at 20:04 Hours UTC

@TomAugspurger
Copy link
Contributor

Looks like some of the other shift implementations were buggy: https://travis-ci.org/pandas-dev/pandas/jobs/460168964

https://travis-ci.org/pandas-dev/pandas/jobs/46016896.

Could you also update the docstring for ExtensionArray.shift adding a note about the expected behavior when periods is larger than self?

@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Nov 27, 2018
@TomAugspurger TomAugspurger added Reshaping Concat, Merge/Join, Stack/Unstack, Explode ExtensionArray Extending pandas with custom dtypes or arrays. labels Nov 27, 2018
@varadgunjal
Copy link
Contributor Author

So, this is the 2nd build I've seen fail. From what I understand from the logs, it looks like the previous implementation of shift is being used despite having pushed the update in the file.

@varadgunjal
Copy link
Contributor Author

varadgunjal commented Nov 27, 2018

Oops sorry, didn't read your comment before I posted the previous one.

Could you also update the docstring for ExtensionArray.shift adding a note about the expected behavior when periods is larger than self?

Completely slipped my mind. Will do so.

Looks like some of the other shift implementations were buggy.

So, what is the correct course of action? Should I dig in further and try to fix those or leave them for a separate PR?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a whatsnew note, put in the extension array section

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 27, 2018 via email

@varadgunjal
Copy link
Contributor Author

varadgunjal commented Nov 27, 2018

@TomAugspurger so I don't make an edit in whatsnew, correct? Also, how do I address the build failing on the newly added test?

@TomAugspurger
Copy link
Contributor

@varadgunjal it seems like we don't have a dedicated release note for ExtensionArray.shift. Can you add one on ~938 and use 22387 for the issue?

@varadgunjal
Copy link
Contributor Author

Just noticed 22387 refers to a PR opened by you for issue #22386. Should I ignore that and keep it as the issue?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 27, 2018 via email

@varadgunjal
Copy link
Contributor Author

Cool. I still don't get what we're supposed to do about the failing build.though.

@TomAugspurger
Copy link
Contributor

SparseArray defines SparseArray.shift. It seems it contains the same bug as the base ExtensionArray.shift, so you'll need to apply a similar fix there as well.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 27, 2018 via email

@varadgunjal
Copy link
Contributor Author

varadgunjal commented Nov 28, 2018

The newest casualty is ArrowBoolArray. But I've run up against a problem when I tried to fix that, since the assert values.type == pa.bool_() constrains the values in it to be boolean. Thus, adding in an na_value violates this. So, what is the expected behavior here? Should I modify the _from_sequence implementation inside ArrowBoolArray to treat NaNs differently?

@jreback jreback removed this from the 0.24.0 milestone Nov 29, 2018
@TomAugspurger
Copy link
Contributor

Feel free to add skips for the arrow tests. That implementation is pretty buggy right now.

@codecov
Copy link

codecov bot commented Dec 1, 2018

Codecov Report

Merging #23947 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23947      +/-   ##
==========================================
+ Coverage   92.31%   92.31%   +<.01%     
==========================================
  Files         161      161              
  Lines       51483    51471      -12     
==========================================
- Hits        47525    47517       -8     
+ Misses       3958     3954       -4
Flag Coverage Δ
#multiple 90.71% <100%> (ø) ⬆️
#single 42.43% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/base.py 98.01% <100%> (+0.66%) ⬆️
pandas/core/arrays/sparse.py 92.06% <100%> (+0.12%) ⬆️
pandas/core/dtypes/cast.py 89.38% <0%> (-0.25%) ⬇️
pandas/io/json/normalize.py 96.87% <0%> (ø) ⬆️
pandas/tseries/offsets.py 96.98% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 96.47% <0%> (ø) ⬆️
pandas/core/groupby/base.py 91.83% <0%> (ø) ⬆️
pandas/core/frame.py 97.02% <0%> (ø) ⬆️
pandas/core/nanops.py 95.05% <0%> (ø) ⬆️
pandas/io/json/json.py 93.09% <0%> (ø) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 90961f2...67c268d. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 1, 2018

Codecov Report

Merging #23947 into master will decrease coverage by 0.1%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23947      +/-   ##
==========================================
- Coverage   92.31%    92.2%   -0.11%     
==========================================
  Files         161      162       +1     
  Lines       51483    51700     +217     
==========================================
+ Hits        47525    47671     +146     
- Misses       3958     4029      +71
Flag Coverage Δ
#multiple 90.6% <100%> (-0.1%) ⬇️
#single 43.02% <0%> (+0.58%) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/base.py 97.41% <100%> (+0.06%) ⬆️
pandas/core/arrays/sparse.py 92.08% <100%> (+0.14%) ⬆️
pandas/io/s3.py 0% <0%> (-86.37%) ⬇️
pandas/core/arrays/timedeltas.py 87.1% <0%> (-9.34%) ⬇️
pandas/io/parquet.py 76.92% <0%> (-7.7%) ⬇️
pandas/io/formats/printing.py 85.41% <0%> (-7.67%) ⬇️
pandas/io/formats/console.py 74.24% <0%> (-1.52%) ⬇️
pandas/io/common.py 72.09% <0%> (-0.78%) ⬇️
pandas/io/json/json.py 92.61% <0%> (-0.48%) ⬇️
... and 69 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 90961f2...31a9cc8. Read the comment docs.

@varadgunjal
Copy link
Contributor Author

@TomAugspurger thanks! Pushed the required changes.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor doc things. LGTM otherwise.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping on green.

@@ -39,6 +39,14 @@ class TestInterface(BaseArrowTests, base.BaseInterfaceTests):
def test_repr(self, data):
raise pytest.skip("TODO")

def test_shift_non_empty_array(self, data):
raise pytest.skip("Skipping because the implementation of"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an issue for this?

Copy link
Contributor Author

@varadgunjal varadgunjal Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check through the issues list and update in a while. But the reason I added this is @ TomAugspurger's recommendation above.

Feel free to add skips for the arrow tests. That implementation is pretty buggy right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an issue for this

@@ -86,3 +87,25 @@ def test_isna_extension_array(self, data_missing):
assert not na.all()

assert na.dtype._is_boolean

@pytest.mark.parametrize('periods, indices', [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this to methods.py I think (and obviously move the overrides in arrow as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since moving this to methods takes it out of the BaseInterfaceTests class, I can discard the overrides in arrow, since, currently, there is no test class within it deriving from BaseMethodsTests (or BaseExtensionTests for that matter). Is that acceptable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment below

@jreback jreback added this to the 0.24.0 milestone Dec 2, 2018
@jreback
Copy link
Contributor

jreback commented Dec 5, 2018

@varadgunjal can you merge master and update to comments

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@varadgunjal you don't need to skip on the arrow tests any longer? or are these just not running the methods tests? ok with running those tests and then skipping for this one (if needed).

@varadgunjal
Copy link
Contributor Author

or are these just not running the methods tests?

This is the current behavior. The implementation of ArrowBoolArray does not run the methods tests at all.

@TomAugspurger TomAugspurger merged commit 823adcf into pandas-dev:master Dec 9, 2018
@TomAugspurger
Copy link
Contributor

Thanks @varadgunjal!

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ExtensionArray.shift incorrect for periods larger than length of array
4 participants