-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bugfix for constructing empty Series, with index, using ExtensionDtyp… #44615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix for constructing empty Series, with index, using ExtensionDtyp… #44615
Conversation
…e with na_value of None
Hello @venaturum! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2022-03-01 00:18:02 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add tests and whatsnew
@phofl a test for this would require defining an ExtensionDtype and ExtensionArray which is quite a lot of code to be adding. Are you aware of any existing "dummy" classes in the test suites that could be used? |
Just use the example from the issue? |
Although the example works for demonstrating the recursion error it doesn't end up giving the expected result due to implementations in the IntervalArray class. I'll have a play and see how minimal I can make a test. |
This is as small as I can make it
I would need some guidance on where it in the test suite it belongs. |
Int64Dtype works for example |
Unfortunately it won't pass the test, for the same reasons as IntervalArray
This results in
when the desired result is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the right place for this. tests of course are the first thing to do here.
@jreback would it be better handled within the following function? pandas/pandas/core/construction.py Line 807 in fd95026
|
Yah it'd be pretty convenient if subclasses could change Int64Dtype.na_value to, say, np.nan, which would be way less fragile. |
closing as this needs tests and new implementation. ping when ready / open a new PR |
Hey Jeff, the test here does the job, I just wish I could make it slimmer but I think that would require a different approach As for a new implementation I'm happy to try something else but need guidance on where in the code it belongs. Cheers. |
@venaturum not averse to a test like that but should be included as a first step |
Would the test be best placed in |
try tests.extension.base.constructors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@venaturum pls add a whatsnew note in 1.4 EA section
cc @jbrockmendel @phofl thoughts here
Note there is a test here
|
return self.data | ||
|
||
@property | ||
def dtype(self): | ||
return DummyDtype() | ||
return DummyArray._dtype() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need the inderection of _dtype? what is wrong with what was here before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because I'm subclassing DummyDtype and DummyArray in order to test several possible na_values
@pytest.mark.parametrize("na_value", [np.nan, pd.NA, None])
def test_empty_series_construction(na_value):
class TempDType(DummyDtype):
@classmethod
def construct_array_type(cls):
return TempArray
TempDType.na_value = na_value
class TempArray(DummyArray):
_dtype = TempDType
result = pd.Series(index=[1, 2, 3], dtype=TempDType())
expected = pd.Series([na_value] * 3, index=[1, 2, 3], dtype=TempDType())
tm.assert_series_equal(result, expected)
….com/venaturum/pandas into GH44602_ExtensionDtype_None_na_value
A different approach to the tests can be found here What is different:
If this is a better approach I will abandon this PR and start again, or if there is some better way of integrating this approach into the existing tests then please let me know. Alternatively, I'm happy for someone else to take this and run with it. |
scalars = [scalars] | ||
return DummyArray(scalars) | ||
|
||
def take(self, indices, allow_fill=False, fill_value=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need all of the new methods this implements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jbrockmendel , apologies for the slow reply. These methods, and those in the alternative solution are the minimum required for the tests to pass.
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, ping the current reviewers, and we can reopen. |
…e with na_value of None