Skip to content

TST: Ensure dtypes are set correctly for empty integer columns #24386 #34886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 20, 2020
26 changes: 26 additions & 0 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import pandas as pd
from pandas import (
Categorical,
CategoricalDtype,
DataFrame,
Index,
MultiIndex,
Expand Down Expand Up @@ -2245,6 +2246,31 @@ def test_from_records_empty_with_nonempty_fields_gh3682(self):
tm.assert_index_equal(df.index, Index([], name="id"))
assert df.index.name == "id"

@pytest.mark.parametrize(
"input_value, expected_value",
[
("int16", np.dtype("int16")),
("int32", np.dtype("int32")),
("int64", np.dtype("int64")),
("float16", np.dtype("float16")),
("float32", np.dtype("float32")),
("float64", np.dtype("float64")),
("uint16", np.dtype("uint16")),
("uint32", np.dtype("uint32")),
("uint64", np.dtype("uint64")),
("datetime64[ns]", np.dtype("<M8[ns]")),
("timedelta64[ns]", np.dtype("<m8[ns]")),
("bool", np.dtype("bool")),
("object", np.dtype("O")),
("category", CategoricalDtype(categories=[], ordered=False)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great as long as you are adding a list, can you add the rest of the types. you ca easily do this by using the types defined in pandas._testing (we also have fixtures for this, but maybe not one that covers all types); would be ok with one for that as well. Note that I don't think we have Boolean, String defined pandas._testing (could add them there)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more dtypes check, but now the list is very long. Any ideas on how to makes more pretty? I did not want to add it to the lists in pandas._testing, since I was not sure it would be compatible with other tests that rely on them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to do it like this, rather use the already defined ones that are ther
e.g. ALL_EA_INT_DTYPES

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any tips on how to refactor the parametrization if I use ALL_EA_INT_DTYPES? The parametrization now has the form [(input_value, expected_value)], but if I loop over ALL_EA_INT_DTYPES I think I still need to manually provide the expected value every time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how's that? the expected IS the same as the input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, you can easily tranform these:

In [1]: pd.Int64Dtype().name                                                                                                                                                                                        
Out[1]: 'Int64'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so my points is that you can simply pass in the ALL_* (just concatenate them all), and check the dtypes of the result columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, that already helps me a lot. But I for int this would not work since the expected is 'Int64'. Similarly for float, str.

Sorry for the large number of questions, this is my first PR :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure so what you can do is break this into 2 tests, one for the easy cases where this is true and one for the special cases (where you can input 2 args for input and expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, I think I came to a much better test now.

],
)
def test_check_dtype_empty_column(self, input_value, expected_value):
# GH24386: Ensure dtypes are set correctly for an empty DataFrame.
# Empty DataFrame is generated via dictionary data with non-overlapping columns.
data = pd.DataFrame({"a": [1, 2]}, columns=["b"], dtype=input_value)
assert data.b.dtype == expected_value

def test_from_records_with_datetimes(self):

# this may fail on certain platforms because of a numpy issue
Expand Down