Skip to content

BUG: Series(list_of_intervals) results in object dtype #23563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jschendel opened this issue Nov 8, 2018 · 6 comments · Fixed by #28399
Closed

BUG: Series(list_of_intervals) results in object dtype #23563

jschendel opened this issue Nov 8, 2018 · 6 comments · Fixed by #28399
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type
Milestone

Comments

@jschendel
Copy link
Member

Code Sample, a copy-pastable example if possible

Constructing a Series from a list of Interval objects results in an object dtype, and is not backed by an IntervalArray:

In [2]: s = pd.Series([pd.Interval(0, 1), pd.Interval(1, 2), pd.Interval(2, 3)])

In [3]: s.dtype
Out[3]: dtype('O')

In [4]: s.values
Out[4]:
array([Interval(0, 1, closed='right'), Interval(1, 2, closed='right'),
       Interval(2, 3, closed='right')], dtype=object)

Note that constructing a Series from an IntervalArray results in the correct dtype and is backed by an IntervalArray:

In [5]: s2 = pd.Series(pd.core.arrays.IntervalArray.from_breaks(range(4)))

In [6]: s2.dtype
Out[6]: interval[int64]

In [7]: s2.values
Out[7]:
IntervalArray([(0, 1], (1, 2], (2, 3]],
              closed='right',
              dtype='interval[int64]')

Problem description

The input data is not being inferred as interval dtype, but rather as object dtype, and is not being backed by an IntervalArray.

Expected Output

I'd expect to the Series to have an interval dtype and be backed by an IntervalArray.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 8212001
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.0.dev0+948.g82120016e
pytest: 3.8.2
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.28.2
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.6.0
xarray: 0.9.6
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@jschendel jschendel added Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type labels Nov 8, 2018
@jschendel jschendel added this to the Contributions Welcome milestone Nov 8, 2018
@TomAugspurger
Copy link
Contributor

For the record, Index(List[Interval]) does result in an IntervalIndex.

We really need to rewrite series._sanitize_array and consolidate with Index.__new__. I'll try to make time for that today or tomorrow.

@jorisvandenbossche jorisvandenbossche modified the milestones: Contributions Welcome, 0.24.0 Nov 8, 2018
@jreback jreback modified the milestones: 0.24.0, Contributions Welcome Dec 2, 2018
@jbrockmendel jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Jul 23, 2019
@jschendel jschendel modified the milestones: Contributions Welcome, 0.25.2 Sep 12, 2019
@jschendel jschendel modified the milestones: 0.25.2, 1.0 Oct 1, 2019
@TomAugspurger
Copy link
Contributor

Not sure why this is showing up now, but there seems to be an issue with IntervalDtype.kind. According to the base class, that's supposed to be a str, but IntervalDtype declares it to be kind: Optional[str_type] = None.

I think it should just be "O" for object dtype. @jschendel do you have thoughts?

@jschendel
Copy link
Member Author

jschendel commented Dec 30, 2019

I think it should just be "O" for object dtype. @jschendel do you have thoughts?

Agreed. Made the change locally but it still doesn't quite fix the failing test, as it now fails due to #24112, since fixing this issue causes the Series in the test have interval dtype instead of object dtype.

@TomAugspurger
Copy link
Contributor

@jschendel are you able to updat this today now that #24112 is fixed?

@jschendel
Copy link
Member Author

The IntervalDtype.kind issue is fixed but #24112 is a different issue that pops up after applying the IntervalDtype.kind fix. Due to the updated inference, the failing test now does equality comparisons against Series[Interval] instead of Series[object], which is broken as described in #24112.

I have a POC fix for #24112 and applying it here fixes the broken test and should clear the way. Still need to fully test my POC fix but I should be able to get that done tonight and have a PR open if I don't run into any unexpected complications. I'm guessing there will be some back and forth in terms of PR review as the fix is non-trivial though.

In the meantime, I could also update this and strictly xfail the broken test if we want to continue moving forward here. Or could wait for #24112 to be fixed. I'm fine with either option.

@TomAugspurger
Copy link
Contributor

(Sorry that I commented on the issue instead of the PR.)

Probably want to wait for the equality thing to be fixed first. I'll prioritize reviewing that so we can get both in 1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants