Skip to content

BUG/PERF: DataFrame.isin lossy data conversion #53514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 5, 2023

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Jun 3, 2023

fixes edge case bug (lossy data conversion):

import pandas as pd

val = 1666880195890293744

df = pd.DataFrame({"a": [val], "b": [1.0]})

df.apply(lambda x: x.isin([val]))  # True, False
df.isin([val])                     # False, False 

perf (for EA dtypes):

import pandas as pd
import numpy as np

data = np.random.randint(0, 1000, (100_000, 2))
df = pd.DataFrame(data, dtype="int64[pyarrow]")

%timeit df.isin([1, 50])

# 25.6 ms ± 1.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- main
# 2.1 ms ± 265 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)   <- PR

@lukemanley lukemanley added Bug Performance Memory or execution speed performance labels Jun 3, 2023
@lukemanley lukemanley added this to the 2.1 milestone Jun 3, 2023
@mroeschke mroeschke merged commit 0dcd155 into pandas-dev:main Jun 5, 2023
@mroeschke
Copy link
Member

Thanks @lukemanley

topper-123 pushed a commit to topper-123/pandas that referenced this pull request Jun 5, 2023
* BUG/PERF: DataFrame.isin lossy data conversion

* gh refs
@lukemanley lukemanley deleted the dataframe-isin branch June 8, 2023 02:31
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
* BUG/PERF: DataFrame.isin lossy data conversion

* gh refs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants