Skip to content

TST: ExtensionArrays disallow .values attribute #20794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jorisvandenbossche
Copy link
Member

xref #20735.
Adds a test to ensure EA authors don't use the .values or ._values attribute, which can give problems with pandas internals depending on how the data is stored.

@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Apr 23, 2018
@jorisvandenbossche
Copy link
Member Author

This is a temporary "fix" to the interface.
Long term solution would be to make sure we don't make such assumptions internally, but that would need to fix all places where we currently check for a _values or values attribute, which is much more work (as often it is actually a way to have code be generic for eg both Series and EA/ndarray, as _values on the Series then returns the underlying EA/ndarray).

@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Apr 23, 2018
@jorisvandenbossche
Copy link
Member Author

Ah, see you just did the same #20793 :-)

@TomAugspurger
Copy link
Contributor

Do you want to copy-paste my test here? https://github.com/pandas-dev/pandas/pull/20793/files#diff-fac89eff36e78c735bd1046e704780c9

I prefer your check. I was duck-typing on .values without a .dtype, but just checking for .values more generically like you do is best.

# Some aliases for common attribute names to ensure pandas supports
# these
self._items = self._data = self.data = self.values
self._items = self.data = self._data
Copy link
Contributor

@jreback jreback Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have .data here? (no a big deal, but this is private yes?) and I think you removed in liue of ._data)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just in the tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you exactly mean? It's not really about private/public, that's up to the EA author. The purpose of this it to ensure pandas has no problems if an EA author would use .data to store the underlying data (in geopandas we are actually using a data attribute)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand you can have a .data items, it just doesn't seem a good pattern to advertise

The fact that these are in the tests is very odd itself.

And the comment makes these doubly confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche ok I re-read what you are testing, this is ok

@@ -33,6 +33,13 @@ def __init__(self, values):
raise TypeError
self.data = values

# Some aliases for common attribute names to ensure pandas supports
# these
self._items = self._data = self.data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same?

@jorisvandenbossche
Copy link
Member Author

Do you want to copy-paste my test here? https://github.com/pandas-dev/pandas/pull/20793/files#diff-fac89eff36e78c735bd1046e704780c9

which test do you mean?

@codecov
Copy link

codecov bot commented Apr 23, 2018

Codecov Report

Merging #20794 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20794      +/-   ##
==========================================
- Coverage   91.85%   91.82%   -0.03%     
==========================================
  Files         153      153              
  Lines       49308    49310       +2     
==========================================
- Hits        45290    45280      -10     
- Misses       4018     4030      +12
Flag Coverage Δ
#multiple 90.21% <ø> (-0.03%) ⬇️
#single 41.89% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/base.py 84.14% <ø> (ø) ⬆️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️
pandas/core/indexes/base.py 96.68% <0%> (ø) ⬆️
pandas/core/frame.py 97.17% <0%> (ø) ⬆️
pandas/core/accessor.py 98.71% <0%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ae7e90...a57f8e0. Read the comment docs.

@jreback jreback merged commit 545d2de into pandas-dev:master Apr 24, 2018
@jreback
Copy link
Contributor

jreback commented Apr 24, 2018

thanks!

@jorisvandenbossche jorisvandenbossche deleted the extension-array-test-no-values-attribute branch April 24, 2018 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants