Skip to content

Series construction with dtype=JSONDtype() and index with collection as 'scalar' #33901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
simonjayhawkins opened this issue Apr 30, 2020 · 1 comment
Open
3 tasks done
Labels
API - Consistency Internal Consistency of API/Behavior Bug ExtensionArray Extending pandas with custom dtypes or arrays. Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.).

Comments

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Apr 30, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1436.g6016b9841'
>>>
>>> from pandas.tests.extension.json.array import JSONDtype
>>>
>>> pd.Series({"g": 63}, index=[1, 2, 3], dtype=JSONDtype())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\core\series.py", line 283, in __init__
    data, index = self._init_dict(data, index, dtype)
  File "C:\Users\simon\pandas\pandas\core\series.py", line 372, in _init_dict
    s = create_series_with_explicit_dtype(
  File "C:\Users\simon\pandas\pandas\core\construction.py", line 624, in create_series_with_explicit_dtype
    return Series(
  File "C:\Users\simon\pandas\pandas\core\series.py", line 329, in __init__
    data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
  File "C:\Users\simon\pandas\pandas\core\construction.py", line 441, in sanitize_array
    subarr = _try_cast(data, dtype, copy, raise_cast_failure)
  File "C:\Users\simon\pandas\pandas\core\construction.py", line 537, in _try_cast
    subarr = array_type(arr, dtype=dtype, copy=copy)
  File "C:\Users\simon\pandas\pandas\tests\extension\json\array.py", line 64, in _from_sequence
    return cls(scalars)
  File "C:\Users\simon\pandas\pandas\tests\extension\json\array.py", line 52, in __init__
    raise TypeError("All values must be of type " + str(self.dtype.type))
TypeError: All values must be of type <class 'collections.abc.Mapping'>

Problem description

Extension arrays may be able to hold a collection as a scalar value. should this be allowed?

Expected Output

same as

>>> pd.Series([{"g": 63}, {"g": 63}, {"g": 63}], index=[1, 2, 3], dtype=JSONDtype())
1    {'g': 63}
2    {'g': 63}
3    {'g': 63}
dtype: json

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

@simonjayhawkins simonjayhawkins added Bug Needs Triage Issue that has not been reviewed by a pandas team member ExtensionArray Extending pandas with custom dtypes or arrays. API - Consistency Internal Consistency of API/Behavior and removed Needs Triage Issue that has not been reviewed by a pandas team member Bug labels Apr 30, 2020
@simonjayhawkins simonjayhawkins changed the title BUG: Series construction with dtype=JSONDtype() and index with collection as 'scalar' Series construction with dtype=JSONDtype() and index with collection as 'scalar' Apr 30, 2020
@simonjayhawkins simonjayhawkins added the Needs Discussion Requires discussion from core team before further action label Apr 30, 2020
@TomAugspurger
Copy link
Contributor

I don't think we can support this in general. In this case the "scalar" type already has a meaning in the Series constructor: the keys are the index values.

It would be up to the caller to call this as

   ...: >>> pd.Series({1: {"g": 63}}, index=[1, 2, 3], dtype=JSONDtype())
Out[4]:
1    {'g': 63}
2           {}
3           {}
dtype: json

@mroeschke mroeschke added Bug Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). and removed Needs Discussion Requires discussion from core team before further action labels Aug 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug ExtensionArray Extending pandas with custom dtypes or arrays. Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.).
Projects
None yet
Development

No branches or pull requests

3 participants