Skip to content

BUG: Series construction with EA dtype and index but no data fails #33846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
May 2, 2020
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
a06e1a4
BUG: Series construction with EA dtype and index but no data fails
simonjayhawkins Apr 28, 2020
6ae3342
redo tests
simonjayhawkins Apr 29, 2020
72f8ec3
Merge remote-tracking branch 'upstream/master' into broadcast-ea-bug
simonjayhawkins Apr 29, 2020
7a17b33
Merge remote-tracking branch 'upstream/master' into broadcast-ea-bug
simonjayhawkins Apr 30, 2020
1881a03
add test_series_constructor_scalar_with_one_element_index
simonjayhawkins Apr 30, 2020
45ef9a5
move dtype to test function parameters
simonjayhawkins Apr 30, 2020
a339f05
comment - whatsnew
simonjayhawkins Apr 30, 2020
6bfbd1a
comment - merge tests
simonjayhawkins Apr 30, 2020
1c8bd8c
special case to avoid _try_cast call
simonjayhawkins Apr 30, 2020
840df49
troubleshoot
simonjayhawkins Apr 30, 2020
9cf81ee
less failures
simonjayhawkins Apr 30, 2020
d427714
maybe_cast_to_datetime
simonjayhawkins Apr 30, 2020
d47cba4
add failure reason for pyarrow
simonjayhawkins Apr 30, 2020
c5cc30d
update issue ref for ArrowBoolDtype
simonjayhawkins Apr 30, 2020
421aa7c
remove sparse test overrides
simonjayhawkins Apr 30, 2020
4c51356
ref to new issue for JSONDtype RecursionError
simonjayhawkins Apr 30, 2020
ff4ff63
collection as scalar msg and gh ref
simonjayhawkins Apr 30, 2020
aa11bb6
Merge remote-tracking branch 'upstream/master' into broadcast-ea-bug
simonjayhawkins May 1, 2020
268f3a5
fix corner case
simonjayhawkins May 1, 2020
2df2bf1
comment - maybe_cast_to_datetime
simonjayhawkins May 1, 2020
211328c
Merge remote-tracking branch 'upstream/master' into broadcast-ea-bug
simonjayhawkins May 1, 2020
e598f4c
add test for gh-33559
simonjayhawkins May 1, 2020
8c44e23
troubleshoot timeout
simonjayhawkins May 1, 2020
dac66d0
troubleshoot timeout
simonjayhawkins May 1, 2020
b363fb2
Merge remote-tracking branch 'upstream/master' into broadcast-ea-bug
simonjayhawkins May 1, 2020
f2026d3
troubleshoot timeout
simonjayhawkins May 1, 2020
52fcd7f
skip on py3.6
simonjayhawkins May 1, 2020
4907f34
Merge remote-tracking branch 'upstream/master' into broadcast-ea-bug
simonjayhawkins May 1, 2020
663c863
Merge branch 'master' into broadcast-ea-bug
jreback May 2, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -732,7 +732,7 @@ ExtensionArray
^^^^^^^^^^^^^^

- Fixed bug where :meth:`Series.value_counts` would raise on empty input of ``Int64`` dtype (:issue:`33317`)
-
- Fixed bug in :class:`Series` construction with EA dtype and index but no data or scalar data fails (:issue:`26469`)


Other
Expand Down
6 changes: 5 additions & 1 deletion pandas/core/construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,7 @@ def _try_cast(

Parameters
----------
arr : ndarray, list, tuple, iterator (catchall)
arr : ndarray, scalar, list, tuple, iterator (catchall)
Excludes: ExtensionArray, Series, Index.
dtype : np.dtype, ExtensionDtype or None
copy : bool
Expand All @@ -533,6 +533,10 @@ def _try_cast(
if isinstance(dtype, ExtensionDtype) and dtype.kind != "M":
# create an extension array from its dtype
# DatetimeTZ case needs to go through maybe_cast_to_datetime

if lib.is_scalar(arr):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could look into how to identify a collection that could be considered a 'scalar' for some EA, eg JSONDtype. although I think out-of-scope for the issue that this PR attempts to fix (i.e. IntegerArray, where the scalars are scalars)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than this I would call: construct_1d_arraylike_from_scalar

but I wouldn't do this right here, rather on L453, e.g. add an elif is_scalar(data)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's option 3 in #33846 (comment)

do this just for EA types and keep the code path the same for non-EA types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this will work generically

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm getting a few failures in pandas/tests/series/test_constructors.py. i'll push the change anyway use the ci to see what else fails while I investigate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk

arr = [arr]

array_type = dtype.construct_array_type()._from_sequence
subarr = array_type(arr, dtype=dtype, copy=copy)
return subarr
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/extension/arrow/test_bool.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,16 @@ def test_from_dtype(self, data):
def test_from_sequence_from_cls(self, data):
super().test_from_sequence_from_cls(data)

@pytest.mark.xfail(reason="GH-26469")
def test_series_constructor_no_data_with_index(self, dtype, na_value):
# pyarrow.lib.ArrowInvalid: only handle 1-dimensional arrays
super().test_series_constructor_no_data_with_index(dtype, na_value)

@pytest.mark.xfail(reason="GH-26469")
def test_series_constructor_scalar_na_with_index(self, dtype, na_value):
# pyarrow.lib.ArrowInvalid: only handle 1-dimensional arrays
super().test_series_constructor_scalar_na_with_index(dtype, na_value)


class TestReduce(base.BaseNoReduceTests):
def test_reduce_series_boolean(self):
Expand Down
20 changes: 20 additions & 0 deletions pandas/tests/extension/base/constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,26 @@ def test_series_constructor(self, data):
assert result2.dtype == data.dtype
assert isinstance(result2._mgr.blocks[0], ExtensionBlock)

def test_series_constructor_no_data_with_index(self, dtype, na_value):
result = pd.Series(index=[1, 2, 3], dtype=dtype)
expected = pd.Series([na_value] * 3, index=[1, 2, 3], dtype=dtype)
self.assert_series_equal(result, expected)

def test_series_constructor_scalar_na_with_index(self, dtype, na_value):
result = pd.Series(na_value, index=[1, 2, 3], dtype=dtype)
expected = pd.Series([na_value] * 3, index=[1, 2, 3], dtype=dtype)
self.assert_series_equal(result, expected)

def test_series_constructor_scalar_with_index(self, data, dtype):
scalar = data[0]
result = pd.Series(scalar, index=[1, 2, 3], dtype=dtype)
expected = pd.Series([scalar] * 3, index=[1, 2, 3], dtype=dtype)
self.assert_series_equal(result, expected)

result = pd.Series(scalar, index=["foo"], dtype=dtype)
expected = pd.Series([scalar], index=["foo"], dtype=dtype)
self.assert_series_equal(result, expected)

@pytest.mark.parametrize("from_series", [True, False])
def test_dataframe_constructor_from_dict(self, data, from_series):
if from_series:
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/extension/json/test_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,21 @@ def test_from_dtype(self, data):
# construct from our dtype & string dtype
pass

@pytest.mark.xfail(reason="GH-26469")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be a new issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few checks to go but I think we need a discussion on when to allow a collection to be treated as scalar. so yes, will probably raise an issue for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk, and just flip the references to that, otherwise lgtm. ping on green.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #33900 and #33901

def test_series_constructor_no_data_with_index(self, dtype, na_value):
# RecursionError: maximum recursion depth exceeded in comparison
super().test_series_constructor_no_data_with_index(dtype, na_value)

@pytest.mark.xfail(reason="GH-26469")
def test_series_constructor_scalar_na_with_index(self, dtype, na_value):
# RecursionError: maximum recursion depth exceeded in comparison
super().test_series_constructor_scalar_na_with_index(dtype, na_value)

@pytest.mark.xfail(reason="GH-26469")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a more informative message

def test_series_constructor_scalar_with_index(self, data, dtype):
# TypeError: All values must be of type <class 'collections.abc.Mapping'>
super().test_series_constructor_scalar_with_index(data, dtype)


class TestReshaping(BaseJSON, base.BaseReshapingTests):
@pytest.mark.skip(reason="Different definitions of NA")
Expand Down
5 changes: 4 additions & 1 deletion pandas/tests/extension/test_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,10 @@ class TestDatetimeDtype(BaseDatetimeTests, base.BaseDtypeTests):


class TestConstructors(BaseDatetimeTests, base.BaseConstructorsTests):
pass
@pytest.mark.xfail(reason="GH-26469")
def test_series_constructor_scalar_with_index(self, data, dtype):
# TypeError: data type not understood
super().test_series_constructor_scalar_with_index(data, dtype)


class TestGetitem(BaseDatetimeTests, base.BaseGetitemTests):
Expand Down
5 changes: 5 additions & 0 deletions pandas/tests/extension/test_numpy.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,11 @@ def test_array_from_scalars(self, data):
# ValueError: PandasArray must be 1-dimensional.
super().test_array_from_scalars(data)

@skip_nested
def test_series_constructor_scalar_with_index(self, data, dtype):
# ValueError: Length of passed values is 1, index implies 3.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the object dtype, the scalar is a tuple, so this failure is related to #33846 (comment)

super().test_series_constructor_scalar_with_index(data, dtype)


class TestDtype(BaseNumPyTests, base.BaseDtypeTests):
@pytest.mark.skip(reason="Incorrect expected.")
Expand Down
10 changes: 9 additions & 1 deletion pandas/tests/extension/test_sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,15 @@ def test_view(self, data):


class TestConstructors(BaseSparseTests, base.BaseConstructorsTests):
pass
@pytest.mark.xfail(reason="GH-26469", strict=False)
def test_series_constructor_no_data_with_index(self, dtype, na_value):
# ValueError: Cannot convert non-finite values (NA or inf) to integer
super().test_series_constructor_no_data_with_index(dtype, na_value)

@pytest.mark.xfail(reason="GH-26469", strict=False)
def test_series_constructor_scalar_na_with_index(self, dtype, na_value):
# ValueError: Cannot convert non-finite values (NA or inf) to integer
super().test_series_constructor_scalar_na_with_index(dtype, na_value)


class TestReshaping(BaseSparseTests, base.BaseReshapingTests):
Expand Down