Skip to content

BUG: Create empty dataframe with string dtype fails #33651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
5 changes: 4 additions & 1 deletion pandas/core/internals/construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,10 @@ def init_dict(data, index, columns, dtype=None):

# no obvious "empty" int column
if missing.any() and not is_integer_dtype(dtype):
if dtype is None or np.issubdtype(dtype, np.flexible):
if is_dtype_equal(dtype, "string"):
# GH 33623
nan_dtype = dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be dtype.na_value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I can't figure out how to fix this from "this will be dtype.na_value".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [15]: pd.Int32Dtype.na_value                                                                                                                                                                                                                          
Out[15]: <NA>

nan_dtype = dtype.na_value

Copy link
Contributor Author

@kotamatsuoka kotamatsuoka Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change to nan_dtype = dtype.na_value, error occurs.

            if not isinstance(dtype, (np.dtype, type(np.dtype))):
>               dtype = dtype.dtype
E               AttributeError: 'NAType' object has no attribute 'dtype'

pandas/core/dtypes/cast.py:1545: AttributeError

So I updated it like this.

if (
            dtype is None
            or is_extension_array_dtype(dtype)
            or np.issubdtype(dtype, np.flexible)
        ):
    nan_dtype = object

elif dtype is None or np.issubdtype(dtype, np.flexible):
# GH#1783
nan_dtype = object
else:
Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -2679,3 +2679,7 @@ def test_construction_from_set_raises(self):
msg = "Set type is unordered"
with pytest.raises(TypeError, match=msg):
pd.DataFrame({"a": {1, 2, 3}})

def test_construct_empty_dataframe_with_string_dtype(self):
# GH 33623
pd.DataFrame(columns=["a"], dtype="string")