ENH: Allow `s.map(d, na_action='raise')` #60482

kopytjuk · 2024-12-03T18:30:07Z

Addresses #14210

Often users use s.map() blindly (without checking the original series beforehand due to their expectations) and are surprised by downstream errors which are caused by the implicit replacement of values to np.nan if they are not part of the mapping dictionary:

import pandas as pd

s = pd.Series(["cat", "dog", "mouse"])
d = {"cat": "kitten", "dog": "puppy"}

res = s.map(d)

print(res)
# 0    kitten
# 1     puppy
# 2       NaN
# dtype: object

This example here also shows how a typo in the input data can cause unwanted trouble.

This PR adds a possibility to raise an error via s.map(d, na_option='raise'), if the mapping does not cover all values in the array.

Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

github-actions · 2025-01-03T00:07:16Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

kopytjuk · 2025-03-07T08:26:53Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

Still interested!

kopytjuk · 2025-03-08T10:46:38Z

Sorry for pinging @mroeschke - can you please spend a quick glimpse on that, if the design and the goal is worth following ... thank you in advance!

datapythonista · 2025-04-20T11:40:23Z

Sorry @kopytjuk nobody checked this earlier. While I'm ok to find a solution to the use case being addressed here (raise when the dict in map doesn't contain all the values in the series/dataframe), I'm -1 in the approach here.

The main reason is that I think we should get rid of the na_action parameter. Which I find cumbersome, and reading the issue descriptions confirms that not even very experienced pandas core devs understand what the parameter does without proper research.

Second, na_action refers to the action to so when missing values are found in the data. So, na_action="ignore" means that missing values should be ignored and not passed to the function. If things are logical and consistent, to me na_action="raise" would mean that if missing values are found in the original data, we would raise. Which is different to what's being implemented here. Adding even more confusion to an already confusing API.

I'm personally not a big fan of adding a new parameter to .map. In particular because .map is already two functions in one, mapping with a dictionary, and applying a function elementwise, and the new parameter would be tricky when considered for both cases. But still this would be a better option than using na_action.

I think the best solution here is that you have a validation after the map to ensure not missing values are there. I don't think there is a pure pandas one-liner to do it, and maybe adding it is not a bad idea. But I think there are third party libraries to ensure data consistency in pandas, maybe that's the way to go.

What do you think?

kopytjuk · 2025-04-20T17:17:35Z

Thanks for your reply @datapythonista!

If things are logical and consistent, to me na_action="raise" would mean that if missing values are found in the original data, we would raise. Which is different to what's being implemented here.

I agree on that, the "na_action" is not a fitting argument name for what I implemented, since the argument relates to the original (series) data and not the keys of the mapping.

I think the best solution here is that you have a validation after the map to ensure not missing values are there. I don't think there is a pure pandas one-liner to do it, and maybe adding it is not a bad idea. But I think there are third party libraries to ensure data consistency in pandas, maybe that's the way to go.

I agree on that as well, libraries like pandera are often used in data engineering use-cases to ensure that the resulting dataframe (after a chain of operations) follows a certain schema (column names, datatypes and existence of NaNs).

In addition to your points you made: a novice user won't activate the proposed na_action="raise" argument anyway, which he does not understand (as you stated, the understanding of the mechanics na_action is not straighforward) ...

So in summary, after weeks of distance to my proposal, I agree - I am also "-1" on the na_action="raise" approach.

Do you think, raising a warning would be a good alternative approach? Especially in explorative contexts (in which a developer looks at the data in a notebook) such a warning would be helpful, to make sure that the user does not miss a value in the mapping. I think warnings help to develop a more stable code, because people start looking into their data more deeply and take a look at their code, especially if parts of it are LLM generated.

What do you guys think about the "warning-only-without-any-additional-arguments" approach in case the mapping does not cover the original data?

github-actions bot added the Stale label Jan 3, 2025

kopytjuk added 5 commits March 7, 2025 08:15

implement na_action='raise' to map_array()

27a3188

Remove whitespace

0931d08

Update quotes

bc31e19

Apply ruff

071a242

Add tests and better docstring

5efce7b

kopytjuk force-pushed the add-raise-to-series-map branch from 756b49f to 5efce7b Compare March 7, 2025 08:05

kopytjuk added 5 commits March 7, 2025 09:08

Revert whitespace changes

ad9aab4

Revert other changes

aff9b7b

Adapt typing

4dc160b

Update error message

75b2b9c

fix typo

a2b8cff

kopytjuk changed the title ~~ENH: Implement map_array(na_action='raise')~~ ENH: Implement s.map(d, na_action='raise') Mar 7, 2025

kopytjuk changed the title ~~ENH: Implement s.map(d, na_action='raise')~~ ENH: Allow s.map(d, na_action='raise') Mar 7, 2025

Run ruff

0903d15

kopytjuk mentioned this pull request Mar 7, 2025

Option to Raise Exception on Unmapped Value in Series.map #14210

Open

Update na_action for dataframe map

16545e3

datapythonista added Apply Apply, Aggregate, Transform, Map and removed Stale labels Apr 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Allow `s.map(d, na_action='raise')` #60482

ENH: Allow `s.map(d, na_action='raise')` #60482

kopytjuk commented Dec 3, 2024 •

edited

Loading

github-actions bot commented Jan 3, 2025

kopytjuk commented Mar 7, 2025

kopytjuk commented Mar 8, 2025

datapythonista commented Apr 20, 2025

kopytjuk commented Apr 20, 2025 •

edited

Loading

ENH: Allow s.map(d, na_action='raise') #60482

Are you sure you want to change the base?

ENH: Allow s.map(d, na_action='raise') #60482

Conversation

kopytjuk commented Dec 3, 2024 • edited Loading

github-actions bot commented Jan 3, 2025

kopytjuk commented Mar 7, 2025

kopytjuk commented Mar 8, 2025

datapythonista commented Apr 20, 2025

kopytjuk commented Apr 20, 2025 • edited Loading

ENH: Allow `s.map(d, na_action='raise')` #60482

ENH: Allow `s.map(d, na_action='raise')` #60482

kopytjuk commented Dec 3, 2024 •

edited

Loading

kopytjuk commented Apr 20, 2025 •

edited

Loading