Skip to content

BUG: Create empty dataframe with string dtype fails #33651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
6 changes: 5 additions & 1 deletion pandas/core/internals/construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,11 @@ def init_dict(data, index, columns, dtype=None):

# no obvious "empty" int column
if missing.any() and not is_integer_dtype(dtype):
if dtype is None or np.issubdtype(dtype, np.flexible):
if (
dtype is None
or is_extension_array_dtype(dtype)
or np.issubdtype(dtype, np.flexible)
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing this fixes the interval case

Suggested change
if (
dtype is None
or is_extension_array_dtype(dtype)
or np.issubdtype(dtype, np.flexible)
):
if dtype is None or (
not is_extension_array_dtype(dtype)
and np.issubdtype(dtype, np.flexible)
):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion doesn't work...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've pushed this change, works on my machine. can you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @simonjayhawkins . works on my environment too.

# GH#1783
nan_dtype = object
else:
Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/extension/arrow/test_bool.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ def test_from_dtype(self, data):
def test_from_sequence_from_cls(self, data):
super().test_from_sequence_from_cls(data)

@pytest.mark.xfail(reason="bad is-na for empty data")
def test_construct_empty_dataframe(self, dtype):
super().test_construct_empty_dataframe(dtype)


class TestReduce(base.BaseNoReduceTests):
def test_reduce_series_boolean(self):
Expand Down
6 changes: 6 additions & 0 deletions pandas/tests/extension/base/constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,3 +83,9 @@ def test_pandas_array_dtype(self, data):
result = pd.array(data, dtype=np.dtype(object))
expected = pd.arrays.PandasArray(np.asarray(data, dtype=object))
self.assert_equal(result, expected)

def test_construct_empty_dataframe(self, dtype):
# GH 33623
result = pd.DataFrame(columns=["a"], dtype=dtype)
expected = pd.DataFrame(data=[], columns=["a"], dtype=dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expected = pd.DataFrame(data=[], columns=["a"], dtype=dtype)
expected = pd.DataFrame({"a": pd.array([], dtype=dtype})

This seems a bit safer way to get the expected result.

self.assert_frame_equal(result, expected)
4 changes: 3 additions & 1 deletion pandas/tests/extension/test_integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,9 @@ class TestInterface(base.BaseInterfaceTests):


class TestConstructors(base.BaseConstructorsTests):
pass
@pytest.mark.xfail(reason="bad is-na for empty data")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this xfailed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • coerce_to_array() in core/arrays/integer.py doesn't accept array(nan).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would fix this here. What needs to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to allows values.ndim to be 0 in coerce_to_array().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead not pass a 0-dim array to coerce_to_array? It's not clear to me why we need a 0-d array in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added processing to convert np.nan to [].

values = [] if values is np.nan else values

def test_construct_empty_dataframe(self, dtype):
super().test_construct_empty_dataframe(dtype)


class TestReshaping(base.BaseReshapingTests):
Expand Down
4 changes: 3 additions & 1 deletion pandas/tests/extension/test_interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,9 @@ class TestCasting(BaseInterval, base.BaseCastingTests):


class TestConstructors(BaseInterval, base.BaseConstructorsTests):
pass
@pytest.mark.xfail(reason="bad is-na for empty data")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this xfailed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • object is not supported for IntervalArray
  • na_value of IntervalArray is float, so AttributeError: 'float' object has no attribute 'dtype' in construct_1d_arraylike_from_scalar().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would ideally fixed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if nan_dtype is dtype (IntervalDtype), can create df.

if is_interval_dtype(dtype):
    nan_dtype = dtype

def test_construct_empty_dataframe(self, dtype):
super().test_construct_empty_dataframe(dtype)


class TestGetitem(BaseInterval, base.BaseGetitemTests):
Expand Down