Skip to content

Lose the index before 'na_value' assignment in 'to_numpy' #45775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 5, 2022

Conversation

DamianBarabonkovQC
Copy link
Contributor

@DamianBarabonkovQC DamianBarabonkovQC commented Feb 2, 2022

This Pull Request addresses the issue where casting a multi-indexed Series to_numpy fails if an na_value is supplied. See the issue #45774 for more information.

The choice to use asanyarray over other solutions to lose the index is one based on performance and simplicity. The following simple benchmark was used to make this determination:

import time
import pandas as pd
import numpy as np

t0 = time.time()

series = pd.Series(range(100000000))

#res = series.isna() # Orignal buggy: Elapsed time:  0.19312691688537598 sec
#res = tuple(series.isna()) # Elapsed time:  5.303609132766724 sec
#res = series.isna().reset_index(drop=True) # Elapsed time:  0.2203369140625 sec
#res = series.isna().to_numpy()  # Elapsed time:  0.19327998161315918 sec
#res = np.array(series.isna())  # Elapsed time:  0.25583529472351074 sec
res = np.asanyarray(series.isna())  # Elapsed time:  0.19619202613830566 sec

t1 = time.time()

#print(res)
print("Elapsed time: ", t1 - t0)

This demonstrates that there is no significant overhead to using the to_numpy operation.

@DamianBarabonkovQC DamianBarabonkovQC changed the title Reset the index before 'na_value' assignment in 'to_numpy' Lose the index before 'na_value' assignment in 'to_numpy' Feb 2, 2022
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests. Also this broke existing tests

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests pls

@@ -286,7 +286,7 @@ Missing

MultiIndex
^^^^^^^^^^
-
- Bug in :meth:`IndexOpsMixin.to_numpy` where multiindexed Series could not be converted to numpy arrays when an ``na_value`` was supplied (:issue:`45774`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the note would need to be user facing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this comment? I do not understand completely what "needs to be user facing" means. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jreback . May you please elaborate on your comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexOpsMixin is private, this is not exposed through the api

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be addressed.

@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 3, 2022
@jreback jreback added this to the 1.5 milestone Feb 26, 2022
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls merge master as well

@jreback jreback merged commit 3e718e3 into pandas-dev:main Mar 5, 2022
@jreback
Copy link
Contributor

jreback commented Mar 5, 2022

thanks @DamianBarabonkovQC

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Multi-indexed Series to Numpy fails when na_value supplied
3 participants