-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Mention NaN handling in dtype description #20895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mention NaN handling in dtype description #20895
Conversation
To achieve preservation and avoid interpretation of string or object dtypes, NaN value interpretation must be switched off.
Codecov Report
@@ Coverage Diff @@
## master #20895 +/- ##
==========================================
+ Coverage 91.78% 91.82% +0.04%
==========================================
Files 153 153
Lines 49341 49491 +150
==========================================
+ Hits 45287 45446 +159
+ Misses 4054 4045 -9
Continue to review full report at Codecov.
|
pandas/io/parsers.py
Outdated
@@ -125,7 +125,8 @@ | |||
are duplicate names in the columns. | |||
dtype : Type name or dict of column -> type, default None | |||
Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32} | |||
Use `str` or `object` to preserve and not interpret dtype. | |||
Use `str` or `object` together with passing `keep_default_na=False` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too prescriptive here, rather you can put a pointer to the NA Values section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for feedback. I made a new commit to my branch trying to address the issue.
Being less prescriptive how to use dtype=str as per suggestion from @jreback
@@ -125,7 +125,8 @@ | |||
are duplicate names in the columns. | |||
dtype : Type name or dict of column -> type, default None | |||
Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32} | |||
Use `str` or `object` to preserve and not interpret dtype. | |||
Use `str` or `object` together with suitable `na_values` settings | |||
to preserve and not interpret dtype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a newline below this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a newline would be good here but this is orthogonal to the issue addressed in this PR. I'd think the newline should be added in a separate PR. Can anybody please provide a second opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's minor enough that I wouldn't consider that an issue.
@@ -125,7 +125,8 @@ | |||
are duplicate names in the columns. | |||
dtype : Type name or dict of column -> type, default None | |||
Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32} | |||
Use `str` or `object` to preserve and not interpret dtype. | |||
Use `str` or `object` together with suitable `na_values` settings | |||
to preserve and not interpret dtype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the same test in io.rst as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. Done. (I previously assumed the doc is generated from .py.)
Apply changes to the dtype description in io/parser.py (mentioning NaN handling) also to io.rst.
Not sure what role to use with a parameter reference. Using a literal for the moment.
CircleCI failures is fixed on master. Thanks @jowagner. |
And thanks for your patience while I learned how to do this. |
To achieve preservation and avoid interpretation of string or object dtypes, NaN value interpretation must be switched off.
git diff upstream/master -u -- "*.py" | flake8 --diff
pandas.read_csv.html
renders text as intended