-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Allow s.map(d, na_action='raise')
#60482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
756b49f
to
5efce7b
Compare
map_array(na_action='raise')
s.map(d, na_action='raise')
s.map(d, na_action='raise')
s.map(d, na_action='raise')
Still interested! |
Sorry for pinging @mroeschke - can you please spend a quick glimpse on that, if the design and the goal is worth following ... thank you in advance! |
Sorry @kopytjuk nobody checked this earlier. While I'm ok to find a solution to the use case being addressed here (raise when the dict in map doesn't contain all the values in the series/dataframe), I'm -1 in the approach here. The main reason is that I think we should get rid of the Second, I'm personally not a big fan of adding a new parameter to I think the best solution here is that you have a validation after the map to ensure not missing values are there. I don't think there is a pure pandas one-liner to do it, and maybe adding it is not a bad idea. But I think there are third party libraries to ensure data consistency in pandas, maybe that's the way to go. What do you think? |
Thanks for your reply @datapythonista!
I agree on that, the "na_action" is not a fitting argument name for what I implemented, since the argument relates to the original (series) data and not the keys of the mapping.
I agree on that as well, libraries like In addition to your points you made: a novice user won't activate the proposed So in summary, after weeks of distance to my proposal, I agree - I am also "-1" on the Do you think, raising a warning would be a good alternative approach? Especially in explorative contexts (in which a developer looks at the data in a notebook) such a warning would be helpful, to make sure that the user does not miss a value in the mapping. I think warnings help to develop a more stable code, because people start looking into their data more deeply and take a look at their code, especially if parts of it are LLM generated. What do you guys think about the "warning-only-without-any-additional-arguments" approach in case the mapping does not cover the original data? |
Addresses #14210
Often users use
s.map()
blindly (without checking the original series beforehand due to their expectations) and are surprised by downstream errors which are caused by the implicit replacement of values tonp.nan
if they are not part of the mapping dictionary:This example here also shows how a typo in the input data can cause unwanted trouble.
This PR adds a possibility to raise an error via
s.map(d, na_option='raise')
, if the mapping does not cover all values in the array.doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.