-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Inconsistency on dtype and fillna #49251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report! If you do
you get
The call to For your other source of null values, this is occurring because the is e.g. no column q3 in the 2nd DataFrame. This happens after the call to fillna, and should be expected. |
Assuming current behavior is to remain as is, perhaps this can be better documented as |
I tried the first row without missing value (added 'status':'bad') and got the same result |
If a value isn't missing in any row, then there is no coercion to float - they remain integers. If nulls appear because the of the concat, they are not coerced to floats because at this point they are strings and not integers. You're combining three operations here - (a) DataFrame construction via |
I see, the "issue" is column-wise not row-wise, now I understand what is going on. |
Yes - that's correct. Each column in a pandas DataFrame has a single dtype - you can see what the dtype is for each column with If there is no remaining issue for you here, please feel free to close. |
On second thought, I think the docs could be improved a little here as noted in #49251 (comment). Leaving this open. |
Hi, how can I work on the documentation related to this issue? |
For general information on contributing, I would recommend: https://pandas.pydata.org/pandas-docs/dev/development/index.html For this specific issue, #49251 (comment) describes how I think the docs could be improved. |
@rhshadrach Thank you so much. I will explore more from the link. :-) |
take |
Python 3.9.7
Pandas: 1.3.5
Same result in
Python 3.9.12
Pandas 1.4.2
Using the following dictionary to create a df, all numbers are integers, 'drop' is boolean and 'status' is a string:
using the following code:
df =pd.concat( {k: pd.DataFrame.from_dict(v, 'index', dtype=str).fillna('EMPTY') for k, v in dct.items()}, axis=0, sort=False).reset_index()
get the following DataFrame:
line 0 q3 and e2 where both 0, now one in 0 the other is 0.0
line 0 q1 = 995 is int, but q2 = 7633.0 is float, before been set as string
In lines 2 and 3, some NaN where fill as "EMPTY" others were not.
The text was updated successfully, but these errors were encountered: