-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
documentation of read_csv producing NaN floats in string column #20875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Most of the linked issues have been closed and the read_csv documentation touches on this information in the Maybe I'm just misunderstanding what you are asking for - do you have a code example of the behavior you feel isn't documented? |
Closed by #20895. |
FWIW I'm a complete Python/pandas noob and I just hit this. Although the docs do mention It seems counter-intuitive when setting the column type to string that the column ends up with floats (they're NaNs, but that's still breaking for something expecting strings), but if it's intended, I don't think the docs make it very clear (at least for the inexperienced like me). Edit: Passing |
@DanTup Here's how to make it easy for the developers to accept this change:
|
There are a good few issue reports around the surprising behaviour of read_csv when reading empty cells and/or cells containing one of the special NaN strings with dtype=str, including n/a, null and NA. This behaviour should be documented in the description of the "dtype" parameter as this is what most people will read who encounter a type error. Ideally, there should also be pointers to workarounds, for example setting na_values = [] and keep_default_na = False seems to fix the problem with non-empty strings.
Related (incomplete list): issue #4849, issue #10205, issue #10647, issue #16569, issue #15669
The text was updated successfully, but these errors were encountered: