Skip to content

Empty series dtype discarded by reset_index with MultiIndex #27913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aquasync opened this issue Aug 14, 2019 · 2 comments
Closed

Empty series dtype discarded by reset_index with MultiIndex #27913

aquasync opened this issue Aug 14, 2019 · 2 comments
Labels
Duplicate Report Duplicate issue or pull request Needs Info Clarification about behavior needed to assess issue

Comments

@aquasync
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd

# this is fine
pd.DataFrame({'x': pd.Series([], dtype='datetime64[ns]'), 'y': pd.Series([]), 'z': pd.Series([])}).set_index(['x']).reset_index().dtypes

# this changes 'x' to float64
pd.DataFrame({'x': pd.Series([], dtype='datetime64[ns]'), 'y': pd.Series([]), 'z': pd.Series([])}).set_index(['x', 'y']).reset_index().dtypes

# same behaviour with all missing
pd.DataFrame({'x': pd.Series(['NaT'], dtype='datetime64[ns]'), 'y': pd.Series([2.]), 'z': pd.Series([3.])}).set_index(['x', 'y']).reset_index().dtypes

Problem description

DataFrame.reset_index shouldn't discard the type of the index/index levels when empty or all missing. It correctly handles this for a single level index (eg DatetimeIndex), but doesn't for MultiIndex.

In my case the symptom was a strange TypeError on a join, which I tracked down to the dtype of the series being lost after a reset_index.

The problem may be the branch in _maybe_casted_values, where values is overwritten with an np.empty() if all values are missing. I modified the condition in my copy to if len(mask) and mask.all() which avoids the problem for empty arrays, though not for all-missing.

Output of pd.show_versions()

I've only been able to run this with 0.24.2 so far as 0.25 fails to import on python 3.5.2 (due to this I think) but the relevant code in reset_index looks the same.

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-142-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8

pandas: 0.24.2
pytest: None
pip: 8.1.1
setuptools: 20.7.0
Cython: None
numpy: 1.17.0
scipy: None
pyarrow: None
xarray: None
IPython: 7.3.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.2
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.2
xlrd: None
xlwt: None
xlsxwriter: 1.1.8
lxml.etree: 3.5.0
bs4: 4.4.1
html5lib: 0.999
sqlalchemy: 1.3.5
pymysql: 0.9.3
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@TomAugspurger
Copy link
Contributor

Does this look to be the same as #19602?

@TomAugspurger TomAugspurger added Needs Info Clarification about behavior needed to assess issue Duplicate Report Duplicate issue or pull request labels Aug 14, 2019
@aquasync
Copy link
Author

Ahh indeed it is - not sure why it didn't come up in the related issues. Sorry for the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

2 participants