Skip to content

BUG/PERF: DataFrame.to_records inconsistent dtypes for MultiIndex #47279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 8, 2022

Conversation

lukemanley
Copy link
Member

Fixes a bug/inconsistency in DataFrame.to_records that produced different numpy dtypes for Index vs MultiIndex.

NOTE: one existing test was updated to reflect the change: test_to_records_index_name

The fix is also beneficial in terms of performance:

import pandas as pd
import numpy as np

n = 5_000_000

mi = pd.MultiIndex.from_arrays(
    [
        np.arange(n),
        pd.date_range('1970-01-01', periods=n, freq='ms'),
    ]
)

df = pd.DataFrame(np.random.randn(n), index=mi)

%timeit df.to_records(index=True)
# 6.78 s ± 264 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)      <- main
# 90.9 ms ± 3.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- PR

@lukemanley lukemanley added Bug Performance Memory or execution speed performance Dtype Conversions Unexpected or buggy dtype conversions labels Jun 7, 2022
@mroeschke mroeschke added this to the 1.5 milestone Jun 8, 2022
@mroeschke mroeschke merged commit 11881ea into pandas-dev:main Jun 8, 2022
@mroeschke
Copy link
Member

Double awesome! Thanks @lukemanley

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
…ndas-dev#47279)

DataFrame.to_records fix inconsistent dtypes with MultiIndex
@lukemanley lukemanley deleted the df-to-records-multiindex branch September 10, 2022 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.to_records inconsistent data typing for Index vs MultiIndex
2 participants