Mention NaN handling in dtype description #20895

jowagner · 2018-05-01T08:06:29Z

To achieve preservation and avoid interpretation of string or object dtypes, NaN value interpretation must be switched off.

closes issue documentation of read_csv producing NaN floats in string column #20875
tests passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
pandas.read_csv.html renders text as intended

To achieve preservation and avoid interpretation of string or object dtypes, NaN value interpretation must be switched off.

codecov · 2018-05-01T09:54:23Z

Codecov Report

Merging #20895 into master will increase coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #20895      +/-   ##
==========================================
+ Coverage   91.78%   91.82%   +0.04%     
==========================================
  Files         153      153              
  Lines       49341    49491     +150     
==========================================
+ Hits        45287    45446     +159     
+ Misses       4054     4045       -9

Flag	Coverage Δ
#multiple	`90.22% <ø> (+0.04%)`	⬆️
#single	`41.85% <ø> (-0.1%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.46% <ø> (ø)`	⬆️
pandas/core/dtypes/missing.py	`91.95% <0%> (-0.99%)`	⬇️
pandas/core/window.py	`96.27% <0%> (-0.02%)`	⬇️
pandas/core/series.py	`94.02% <0%> (-0.01%)`	⬇️
pandas/core/reshape/pivot.py	`96.97% <0%> (ø)`	⬆️
pandas/core/indexes/interval.py	`93.08% <0%> (ø)`	⬆️
pandas/core/reshape/reshape.py	`100% <0%> (ø)`	⬆️
pandas/core/panel.py	`97.29% <0%> (ø)`	⬆️
pandas/core/reshape/merge.py	`94.25% <0%> (ø)`	⬆️
pandas/core/frame.py	`97.22% <0%> (ø)`	⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 28edd06...53dd911. Read the comment docs.

jreback · 2018-05-01T10:27:14Z

pandas/io/parsers.py

@@ -125,7 +125,8 @@
    are duplicate names in the columns.
 dtype : Type name or dict of column -> type, default None
    Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
-    Use `str` or `object` to preserve and not interpret dtype.
+    Use `str` or `object` together with passing `keep_default_na=False` and


this is too prescriptive here, rather you can put a pointer to the NA Values section

Thanks for feedback. I made a new commit to my branch trying to address the issue.

@jreback

Being less prescriptive how to use dtype=str as per suggestion from @jreback

gfyoung · 2018-05-08T14:43:38Z

pandas/io/parsers.py

@@ -125,7 +125,8 @@
    are duplicate names in the columns.
 dtype : Type name or dict of column -> type, default None
    Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
-    Use `str` or `object` to preserve and not interpret dtype.
+    Use `str` or `object` together with suitable `na_values` settings
+    to preserve and not interpret dtype.


Can we add a newline below this?

I agree that a newline would be good here but this is orthogonal to the issue addressed in this PR. I'd think the newline should be added in a separate PR. Can anybody please provide a second opinion?

It's minor enough that I wouldn't consider that an issue.

jreback · 2018-05-10T10:16:05Z

pandas/io/parsers.py

@@ -125,7 +125,8 @@
    are duplicate names in the columns.
 dtype : Type name or dict of column -> type, default None
    Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
-    Use `str` or `object` to preserve and not interpret dtype.
+    Use `str` or `object` together with suitable `na_values` settings
+    to preserve and not interpret dtype.


can you add the same test in io.rst as well

Of course. Done. (I previously assumed the doc is generated from .py.)

Apply changes to the dtype description in io/parser.py (mentioning NaN handling) also to io.rst.

Not sure what role to use with a parameter reference. Using a literal for the moment.

TomAugspurger · 2018-05-14T19:16:06Z

CircleCI failures is fixed on master.

Thanks @jowagner.

jowagner · 2018-05-15T08:44:56Z

And thanks for your patience while I learned how to do this.

Mention NaN handling in dtype description

e89f065

To achieve preservation and avoid interpretation of string or object dtypes, NaN value interpretation must be switched off.

jreback added Docs IO CSV read_csv, to_csv labels May 1, 2018

jreback requested changes May 1, 2018

View reviewed changes

reduce detail in reference to NaN settings

cdcb54b

Being less prescriptive how to use dtype=str as per suggestion from @jreback

gfyoung reviewed May 8, 2018

View reviewed changes

jreback added this to the 0.23.0 milestone May 10, 2018

jreback requested changes May 10, 2018

View reviewed changes

jowagner added 2 commits May 10, 2018 15:16

mention NaN handling in dtype description

e51aef7

Apply changes to the dtype description in io/parser.py (mentioning NaN handling) also to io.rst.

Fix rst usage error in io.rst

53dd911

Not sure what role to use with a parameter reference. Using a literal for the moment.

TomAugspurger merged commit 83a46ca into pandas-dev:master May 14, 2018

TomAugspurger mentioned this pull request May 14, 2018

documentation of read_csv producing NaN floats in string column #20875

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mention NaN handling in dtype description #20895

Mention NaN handling in dtype description #20895

jowagner commented May 1, 2018 •

edited

Loading

codecov bot commented May 1, 2018 •

edited

Loading

jreback May 1, 2018

jowagner May 1, 2018 •

edited

Loading

gfyoung May 8, 2018

jowagner May 10, 2018

gfyoung May 14, 2018

jreback May 10, 2018

jowagner May 10, 2018

TomAugspurger commented May 14, 2018

jowagner commented May 15, 2018

Mention NaN handling in dtype description #20895

Mention NaN handling in dtype description #20895

Conversation

jowagner commented May 1, 2018 • edited Loading

codecov bot commented May 1, 2018 • edited Loading

Codecov Report

jreback May 1, 2018

Choose a reason for hiding this comment

jowagner May 1, 2018 • edited Loading

Choose a reason for hiding this comment

gfyoung May 8, 2018

Choose a reason for hiding this comment

jowagner May 10, 2018

Choose a reason for hiding this comment

gfyoung May 14, 2018

Choose a reason for hiding this comment

jreback May 10, 2018

Choose a reason for hiding this comment

jowagner May 10, 2018

Choose a reason for hiding this comment

TomAugspurger commented May 14, 2018

jowagner commented May 15, 2018

jowagner commented May 1, 2018 •

edited

Loading

codecov bot commented May 1, 2018 •

edited

Loading

jowagner May 1, 2018 •

edited

Loading