-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: Ensure dtypes are set correctly for empty integer columns #24386 #34886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: Ensure dtypes are set correctly for empty integer columns #24386 #34886
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! one nitpick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution!
I would maybe move the test to tests/frame/test_constructors.py
pandas/tests/dtypes/test_dtypes.py
Outdated
|
||
|
||
def test_check_dtype_empty_column(): | ||
# GH24386: Ensure dtypes are set correctly for empty integer columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add to the comment "dictionary data with non-overlapping columns, resulting in an empty DataFrame", as it is specifically this case (not just an empty dataframe)
Thanks. Can I add the test at the bottom of the |
pandas/tests/dtypes/test_dtypes.py
Outdated
|
||
def test_check_dtype_empty_column(): | ||
# GH24386: Ensure dtypes are set correctly for empty integer columns | ||
data = pd.DataFrame({"a": [1, 2]}, columns=["b"], dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also pls parameterize over the dtypes mentioned in the OP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in test_constructors.py there are already similar tests, please move near them
("timedelta64[ns]", np.dtype("<m8[ns]")), | ||
("bool", np.dtype("bool")), | ||
("object", np.dtype("O")), | ||
("category", CategoricalDtype(categories=[], ordered=False)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great as long as you are adding a list, can you add the rest of the types. you ca easily do this by using the types defined in pandas._testing (we also have fixtures for this, but maybe not one that covers all types); would be ok with one for that as well. Note that I don't think we have Boolean, String defined pandas._testing (could add them there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more dtypes check, but now the list is very long. Any ideas on how to makes more pretty? I did not want to add it to the lists in pandas._testing, since I was not sure it would be compatible with other tests that rely on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to do it like this, rather use the already defined ones that are ther
e.g. ALL_EA_INT_DTYPES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any tips on how to refactor the parametrization if I use ALL_EA_INT_DTYPES? The parametrization now has the form [(input_value, expected_value)], but if I loop over ALL_EA_INT_DTYPES I think I still need to manually provide the expected value every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how's that? the expected IS the same as the input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course, you can easily tranform these:
In [1]: pd.Int64Dtype().name
Out[1]: 'Int64'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so my points is that you can simply pass in the ALL_* (just concatenate them all), and check the dtypes of the result columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, that already helps me a lot. But I for int
this would not work since the expected is 'Int64'. Similarly for float
, str
.
Sorry for the large number of questions, this is my first PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure so what you can do is break this into 2 tests, one for the easy cases where this is true and one for the special cases (where you can input 2 args for input and expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, I think I came to a much better test now.
thanks @avinashpancham |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff