Skip to content

BUG: Patch handling no NA values in TextFileReader #15881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Apr 3, 2017

When cleaning na_values during initialization of TextFileReader, we return a list whenever we specify that na_values should be empty. However, the rest of the code expects a set.

Closes #15835.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 3, 2017

@jreback : I checked on my machine that the issue originally presented in #15835 has been resolved, but any good way to test this without incorporating the Excel file into the code base (or should we leave alone and just use the test I wrote)?

@jreback jreback added Bug IO CSV read_csv, to_csv labels Apr 3, 2017
@@ -1011,6 +1011,7 @@ I/O
- Bug in ``DataFrame.to_stata()`` and ``StataWriter`` which produces incorrectly formatted files to be produced for some locales (:issue:`13856`)
- Bug in ``StataReader`` and ``StataWriter`` which allows invalid encodings (:issue:`15723`)
- Bug in ``pd.to_json()`` for the C engine where rollover was not correctly handled for case where frac is odd and diff is exactly 0.5 (:issue:`15716`, :issue:`15864`)
- Bug in ``pd.read_csv()`` when an index was specified and no values were specified as null values (:issue:`15835`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put with the other pd.read_csv bugs (at the top of this section, IIRC)

Copy link
Member Author

@gfyoung gfyoung Apr 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your comment below, I'll leave this alone, but yes, you are correct that the rest are at the top of this section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, basically just trying to organize these for easy reading. IOW, trying to put similar reports next to each other if possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having had to patch my test (comment below), I fixed this too.

@jreback jreback added this to the 0.20.0 milestone Apr 3, 2017
@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

lgtm. I can fix the whatsnew on merge.

@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

@jreback : I checked on my machine that the issue originally presented in #15835 has been resolved, but any good way to test this without incorporating the Excel file into the code base (or should we leave alone and just use the test I wrote)?

no this is fine.

@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

interesting this breaks some tests.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 3, 2017

interesting this breaks some tests.

Yeah, that's the test I just wrote :/ Hmm...need to figure this out.

@gfyoung gfyoung force-pushed the keep-default-na-excel branch from ff7eb61 to 0bb6f64 Compare April 3, 2017 18:41
@@ -995,6 +995,7 @@ I/O
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`)
- Bug in ``pd.read_csv()`` with ``parse_dates`` when multiline headers are specified (:issue:`15376`)
- Bug in ``pd.read_csv()`` with ``float_precision='round_trip'`` which caused a segfault when a text entry is parsed (:issue:`15140`)
- Bug in ``pd.read_csv()`` when an index was specified and no values were specified as null values (:issue:`15835`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thxs

@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

random failure on osx build so I restarted. ping on green.

@codecov
Copy link

codecov bot commented Apr 3, 2017

Codecov Report

Merging #15881 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15881      +/-   ##
==========================================
- Coverage   90.99%   90.97%   -0.02%     
==========================================
  Files         143      143              
  Lines       49477    49477              
==========================================
- Hits        45020    45011       -9     
- Misses       4457     4466       +9
Flag Coverage Δ
#multiple 88.73% <100%> (ø) ⬆️
#single 40.63% <0%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.52% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.56% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca7207f...0bb6f64. Read the comment docs.

@gfyoung
Copy link
Member Author

gfyoung commented Apr 3, 2017

@jreback : All green and ready to merge.

@jreback jreback closed this in ff652a5 Apr 3, 2017
@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

thanks!

@gfyoung gfyoung deleted the keep-default-na-excel branch April 3, 2017 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants