Skip to content

Mention NaN handling in dtype description #20895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 14, 2018

Conversation

jowagner
Copy link
Contributor

@jowagner jowagner commented May 1, 2018

To achieve preservation and avoid interpretation of string or object dtypes, NaN value interpretation must be switched off.

To achieve preservation and avoid interpretation of string or object dtypes, NaN value interpretation must be switched off.
@codecov
Copy link

codecov bot commented May 1, 2018

Codecov Report

Merging #20895 into master will increase coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20895      +/-   ##
==========================================
+ Coverage   91.78%   91.82%   +0.04%     
==========================================
  Files         153      153              
  Lines       49341    49491     +150     
==========================================
+ Hits        45287    45446     +159     
+ Misses       4054     4045       -9
Flag Coverage Δ
#multiple 90.22% <ø> (+0.04%) ⬆️
#single 41.85% <ø> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.46% <ø> (ø) ⬆️
pandas/core/dtypes/missing.py 91.95% <0%> (-0.99%) ⬇️
pandas/core/window.py 96.27% <0%> (-0.02%) ⬇️
pandas/core/series.py 94.02% <0%> (-0.01%) ⬇️
pandas/core/reshape/pivot.py 96.97% <0%> (ø) ⬆️
pandas/core/indexes/interval.py 93.08% <0%> (ø) ⬆️
pandas/core/reshape/reshape.py 100% <0%> (ø) ⬆️
pandas/core/panel.py 97.29% <0%> (ø) ⬆️
pandas/core/reshape/merge.py 94.25% <0%> (ø) ⬆️
pandas/core/frame.py 97.22% <0%> (ø) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 28edd06...53dd911. Read the comment docs.

@jreback jreback added Docs IO CSV read_csv, to_csv labels May 1, 2018
@@ -125,7 +125,8 @@
are duplicate names in the columns.
dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
Use `str` or `object` to preserve and not interpret dtype.
Use `str` or `object` together with passing `keep_default_na=False` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too prescriptive here, rather you can put a pointer to the NA Values section

Copy link
Contributor Author

@jowagner jowagner May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedback. I made a new commit to my branch trying to address the issue.

Being less prescriptive how to use dtype=str as per suggestion from @jreback
@@ -125,7 +125,8 @@
are duplicate names in the columns.
dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
Use `str` or `object` to preserve and not interpret dtype.
Use `str` or `object` together with suitable `na_values` settings
to preserve and not interpret dtype.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a newline below this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a newline would be good here but this is orthogonal to the issue addressed in this PR. I'd think the newline should be added in a separate PR. Can anybody please provide a second opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's minor enough that I wouldn't consider that an issue.

@jreback jreback added this to the 0.23.0 milestone May 10, 2018
@@ -125,7 +125,8 @@
are duplicate names in the columns.
dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
Use `str` or `object` to preserve and not interpret dtype.
Use `str` or `object` together with suitable `na_values` settings
to preserve and not interpret dtype.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the same test in io.rst as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. Done. (I previously assumed the doc is generated from .py.)

jowagner added 2 commits May 10, 2018 15:16
Apply changes to the dtype description in io/parser.py (mentioning NaN handling) also to io.rst.
Not sure what role to use with a parameter reference. Using a literal for the moment.
@TomAugspurger
Copy link
Contributor

CircleCI failures is fixed on master.

Thanks @jowagner.

@jowagner
Copy link
Contributor Author

And thanks for your patience while I learned how to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants