-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
reset_index() on MultiIndexed empty dataframe does not preserve dtypes #19602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yeah, I could see these preserving the index dtypes. Interested in making a PR? |
@TomAugspurger : unfortunately I am not skilled enough for a PR yet, sorry :( |
I just came across this code for an unrelated PR. The issue may be at Lines 3387 to 3389 in a214915
If we check for |
Is resolved with something like: ?
|
Not sure. Could you try it out and see?
…On Mon, Feb 26, 2018 at 4:16 PM, Stuart Reynolds ***@***.***> wrote:
Is resolved with something like: ?
if mask.all():
values = np.empty(len(mask), dtype=index.dtype)
values.fill(np.nan)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19602 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIsdvE5KL5wbiz9o6YzNGwv8qURQEks5tYy1IgaJpZM4R-4zy>
.
|
Did anything happen in this direction? |
Still open. @vaibhawc can you investigate the issue and make a PR? |
My issue was resolved by what @stuz5000 suggested. Though I didn't quite understand how it happened. |
Sure, give that a shot. |
Ok, could you please help me with base and compare branch? |
Contributing guidelines are at http://pandas-docs.github.io/pandas-docs-travis/contributing.html. Post if you have any issues. |
btw, as a quick workaround, you can copy the previous dtypes and assign them back after grouping:
|
Code Sample, a copy-pastable example if possible
Problem description
The dtypes are preserved instead if either
a) index is not a MultiIndex
b) dataframe is not empty
(b) is a big issue for programs that calculate subset of dataframes that sometimes
can be empty, since downstream code might expect a certain dtype and fail when it finds
a float64 instead.
Real-world scenario: sampling a system (a collection of processes or threads) at regular intervals,
and collecting some measures (cpu used, or other resources or figures); a very common strategy
in performance investigation software (e.g. check Oracle's v$active_session_history).
Here, the natural index is (sample_time, process_id), sample_time being datetime64 (a Time Series).
Even more naturally, we want to computes differences of sample_time, yielding a timedelta64,
and divide it by np.timedelta64(1,'s') to get the elapsed time in seconds; but when the
initial dataframe is empty, we try to divide float64 / np.timedelta64(1,'s') and get an exception.
An obvious workaround is to check for empty dataframes after EVERY reset_index()
and coerce the float64s back to their correct value - but that easily becomes a maintenance/coverage nightmare :O
Expected Output
resetted columns having their initial dtype
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: None
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.1
openpyxl: 2.4.9
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: