-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
QST: How to solve pandas (2.2.0) "FutureWarning: Downcasting behavior in replace
is deprecated" on a Series?
#57734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just do
as suggested on the stack overflow question The series will retain object dtype in pandas 3.0 instead of casting to int64 |
But doesn't this just deactivate the message but doesn't modify the behavior. To my understanding the behavior is the problem and need to get solved. Or not? |
I'm having this problem as well. I have the feeling it's related to s = Series(['foo', 'bar'])
replace_dict = {'foo': '1', 'bar': '2'} # replacements maintain original types
s = s.replace(replace_dict) makes the warning go away. I agree with @buhtz in that setting the "future" option isn't really getting at the root of understanding how to make this right. I think the hard part for most of us who have relied on |
This is exactly the thing we are trying to solve. replace was previously casting your dtypes and will stop doing so in pandas 3 |
But it is unclear how to replace and cast. E.g. when I have
Is that the solution you have in mind? From a users perspective it is a smelling workaround. The other way around is nearly not possible because I can not cast a str word to an integer.
What is wrong with casting in replace() ? |
One alternative (although I realise a non import pandas as pd
df = pd.DataFrame(['male', 'male', 'female'], columns=['gender']) # from the original example
genders = pd.Categorical(df['gender'])
df = df.assign(gender=genders.codes) If semantically similar data is spread across multiple columns, it gets a little more involved: import random
import numpy as np
import pandas as pd
def create_data(columns):
genders = ['male', 'male', 'female']
for i in columns:
yield (i, genders.copy())
random.shuffle(genders)
# Create the dataframe
columns = [ f'gender_{x}' for x in range(3) ]
df = pd.DataFrame(dict(create_data(columns)))
# Incorporate all relevant data into the categorical
view = (df
.filter(items=columns)
.unstack())
categories = pd.Categorical(view)
values = np.hsplit(categories.codes, len(columns))
to_replace = dict(zip(columns, values))
df = df.assign(**to_replace) which I think is what the Categorical documentation is trying to imply. |
I got here, trying to understand what The message I get is from
Maybe the confusion arises from the way the message is phrased, I believe it's kind of confusing, it creates more questions than answers:
From what I understand, |
So... I did some digging and I think I have a better grasp of what's going on with this FutureWarning. So I wrote an article in Medium to explain what's happening. If you want to give it a read, here it is: Deciphering the cryptic FutureWarning for .fillna in Pandas 2 Long story short, do: with pd.option_context('future.no_silent_downcasting', True):
# Do you thing with fillna, ffill, bfill, replace... and possible use infer_objects if needed |
- In function `_cols_operation_balance_by_instrument_for_group` changed `prev_operation_balance[<colname>]` for `df.loc[prev_idx, <colname>]` as this is easier to understand, it shows that we are accessing the previous index value. - Implemented the usage of `with pd.option_context('future.no_silent_downcasting', True):` for `.fillna()` to avoid unexpected downcasting. See pandas-dev/pandas#57734 (comment) . Used throughout `cols_operation*` functions. - Removed usage of `DataFrame.convert_dtypes()` as it doesn't simplify dtypes, it only passes to a dtype that supports pd.NA. See pandas-dev/pandas#58543 . - Added `DataFrame.infer_objects()` when returning the ledger or `cols_operation*` functions to try to avoid objects if possible. - Changed the structure for `cols_operation*` functions: - Added a verification of `self._ledger_df`, if empty the function returns an empty DataFrame with the structure needed. Allows for less computing if empty. - The way the parameter `show_instr_accnt` creates a return with columns ['instrument', 'account'] is structured the same way on all functions. - Simplified how the empty ledger is created in `_create_empty_ledger_df`. - Changes column name 'balance sell profit loss' to 'accumulated sell profit loss'. - Minor code fixes. - Minor formatting fixes.
I feel like this thread is starting to become a resource. In that spirit: I just experienced another case where records = [
{'a': ''},
{'a': 12.3},
]
df = pd.DataFrame.from_records(records) I would have first reached for In [1]: df.dtypes
Out[1]:
a object
dtype: object
In [2]: x = df.assign(a=lambda x: pd.to_numeric(x['a']))
In [3]: x
Out[3]:
a
0 NaN
1 12.3
In [4]: x.dtypes
Out[4]:
a float64
dtype: object |
From your code:
I would do it like this, it feels a little cleaner and easier to read: df['a'] = pd.to_numeric(df['a']) You said you wanted to use with pd.option_context('future.no_silent_downcasting', True):
df2 = (df
.replace('', float('nan')) # Replace empty string for nans
.infer_objects() # Allow pandas to try to "infer better dtypes"
)
df2.dtypes
# a float64
# dtype: object A note about
That would not work because |
explicitly do the conversion in two steps and the future warning will go away. In the first step, do the replace with the numbers as strings to match the original dtype in the second step, convert the dtype to int This will run without the warning even when you have not suppressed warnings |
I got this because I was trying to filter a dataframe using the output from What I would normally do: df = pd.DataFrame({'A': ['1', '2', 'test', pd.NA]})
mask = df['A'].str.isnumeric().fillna(False) What I need to do now: df = pd.DataFrame({'A': ['1', '2', 'test', pd.NA]})
with pd.option_context('future.no_silent_downcasting', True):
mask = df['A'].str.isnumeric().fillna(False) The mask still seems to work without casting it to boolean. See the official deprecation notice in the release notes. Note that if you don't mind either way, the original code still works, and will silently downcast the dtype (with a warning) until Pandas 3.0, then will switch to preserve the dtype after Pandas 3.0. |
It would be great if we could stop breaking changes. |
Hello Folks,
This works well when going trom string to int; but i struggle to go from string to bool :
I want to replace '' with False & 'X' with True Trying to go from string to bool directly
going from string to int to bool works; but isn't there a better solution ? i must be missing something obvious right ?
|
Arnaudno, You're doing more work than you need to. The boolean value of all strings except for empty strings is true (empty strings have a boolean value of false). So in your case, you don't need the replace at all. All you need is to convert the string to boolean and you will get the result you want. df=pd.DataFrame({'a':['','X','','X','X']}) and you will get 0 False |
Thank you very much @Data-Salad . |
Instead of .replace you can also use .map |
Những chuyện này là sao. Nói hỗ trợ giúp đỡ. Nhưng làm văn bản. Công thức.
Không có lời dẫn hay thuyết minh. Rồi cuối cùng những thông tin vừa xong
lãi vỡ ra. Lại bị thay đổi 1 lần nữa. 3 cái diện thoại giờ nó dùng k khác
gì thập niên 80. Thông tin cá nhân giờ thể xác thực. Đến số điện thoại.
Hiện tại cũng không thể chứng minh nó là của mình. Nói không tin. Không hợp
tác phối hợp. Vậy 1 tháng qua tôi trải qua những gì ai biết không???
Vào Th 7, 21 thg 12, 2024 lúc 17:08 herzphi ***@***.***> đã
viết:
… Instead of .replace you can also use .map
—
Reply to this email directly, view it on GitHub
<#57734 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKUN6KFMDA6CMZOAEQRIQ232GU4Z5AVCNFSM6AAAAABEHHIQSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJYGA3TENBTGI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
To sum up, to resolve this warning, values of the same type should be used and then they can be cast to e.g. replace_dict = {'foo': '2', 'bar': '4'} # keys are str and so must be values
s = s.replace(replace_dict).astype(int) # cast values to int |
This solution is not perfect for those of us what have empty rows in the column they are replacing (I have a column of occasional text labels which I was converting to int's for plotting purposes). In this case |
In my case, I had to convert strings to either bools or nan depending on the value of the string. Instead of suppressing the warning resulting from using replace(), I think it is generally better to use map() when you need to both change dtypes and deal with null values. Given:
I do:
Which, without any warnings, results in:
|
Research
I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/q/77995105/4865723
Question about pandas
Hello,
and please take my apologize for asking this way. My stackoverflow
question [1] was closed for IMHO no good reason. The linked duplicates
do not help me [2]. And I was also asking on pydata mailing list [3] without response.
The example code below gives me this error using Pandas 2.2.0
I found several postings about this future warning. But my problem is I
don't understand why it happens and I also don't know how to solve it.
I am aware of other questions and answers [2] but I don't know how to
apply them to my own code. The reason might be that I do not understand
the cause of the error.
The linked answers using
astype()
before replacement. But again: Idon't know how this could solve my problem.
Thanks in advance
Christian
[1] -- https://stackoverflow.com/q/77995105/4865723
[2] -- https://stackoverflow.com/q/77900971/4865723
[3] -- https://groups.google.com/g/pydata/c/yWbl4zKEqSE
The text was updated successfully, but these errors were encountered: