BUG: Respect usecols even with empty data #12506

gfyoung · 2016-03-02T01:45:47Z

in which the usecols argument was not being respected for empty data. This is because no filtering was applied when the first (and only) chunk was being read.

gfyoung · 2016-03-02T04:00:36Z

Tests are passing. Should be good to merge.

jreback · 2016-03-02T12:54:20Z

pandas/io/tests/test_parsers.py

+
+        def check_usecols(stringIO):
+            df = read_csv(stringIO, names=names, usecols=usecols)
+            assert_array_equal(df.columns, usecols)


NEVER use numpy.testing imports!!!!!

ALWAYS use pandas testing functions. These are much more comprehensive and fully test all behavior. It is also confusing to a reader.

Fully test vs an expected frame and use assert_frame_equal

I think I got it with the first exclamation point. ;) - fixed.

gfyoung · 2016-03-03T22:18:27Z

If some could cancel this build #18023 (it's an old build), that would be great. Thanks!

gfyoung · 2016-03-06T10:42:46Z

@jreback : Tests are passing once more. Should be good to merge AFAICT.

jreback · 2016-03-06T15:20:42Z

pandas/io/parsers.py

+                    while self.line_pos <= hr:
+                        line = self._next_line()
+
+                except StopIteration:


why are you adding all of this code and making this very complicated?

the idea is to make things simpler. pls see if you can find a general soln.

How complicated is this code? It's essentially boilerplate and logic from the C engine code. I'm not writing anything new here. I don't understand why people would have a hard time understanding this code but yet the C engine code is considered comprehensible.

jreback · 2016-03-06T15:41:21Z

@gfyoung you had to change a whole bunch of things to get your code to work (e.g. changing long established behavior in excel/html parsing), the StopIteration can now also be a ValueError. That is just crazy. My point is to make minimal changes that get the job done, use general methods where possible, and don't duplicate code.

gfyoung · 2016-03-08T09:57:16Z

@jreback : Simplified things a little by changing the error from ValueError to StopIteration. I still kept some of the error messages from the C engine because I find them to be quite useful.

jreback · 2016-03-08T13:15:49Z

at a glance looks ok. still working on 0.18.0, so will get to next week.

gfyoung · 2016-03-08T13:43:19Z

👍 will ping on all of them next week then.

Closes pandas-devgh-12493.

gfyoung · 2016-04-12T21:38:15Z

@jorisvandenbossche : Any update on this PR? I think @jreback is waiting for your feedback before he merges this in.

jorisvandenbossche · 2016-04-12T21:55:50Z

Sorry, slowly getting back on track :-) I will take a look now.

Just a quick question, my previous comment was about HTML and Excel parser. The whatsnew note now does not mention those anymore. There are no longer changes to to read_html and read_excel ?

gfyoung · 2016-04-12T22:03:17Z

@jorisvandenbossche : No worries. 😄 There are in fact changes to read_html and read_excel. If you look at the changed files, both changed the exception they are catching from StopIteration to EmptyDataError when there are empty tables. However, those changes are a result of the changes made to the parsers and are not direct changes to read_html and read_excel.

jorisvandenbossche · 2016-04-12T22:06:45Z

pandas/io/tests/test_parsers.py

+        df = self.read_csv(StringIO(''), names=names, usecols=usecols)
+
+    def test_read_with_bad_header(self):
+        name = self.__class__.__name__


is this used?

Wait...why did flake8 not catch that? No, I don't think so AFAICT.

jorisvandenbossche · 2016-04-12T22:09:22Z

@gfyoung But they only catch it under the hood? (I mean, it doesn't bubble up to the user?)

jorisvandenbossche · 2016-04-12T22:10:29Z

doc/source/whatsnew/v0.18.1.txt

+
+   In [1]: df = pd.read_csv(StringIO(''), engine='c')
+   ...
+   pandas.io.common.EmptyDataError: No columns to parse from file


Is it actually shown with the full name? (just a question, didn't test it, didn't fetch the PR, but I would just show the same as in an actual console)

Which is what I did. 😄 - FYI you can observe this full out name thing if you trigger any current CParserError.

OK, perfect! (I just wondered if it was the case)

gfyoung · 2016-04-12T22:11:18Z

@jorisvandenbossche : Not AFAICT (it's an explicit try-except block for read_html and read_excel)

jorisvandenbossche · 2016-04-12T22:25:29Z

@jreback @gfyoung I didn't look at the code changes in detail, but the whatsnew is in any case very clear and sounds logical! So good to go for me

jreback · 2016-04-13T00:45:33Z

doc/source/whatsnew/v0.18.1.txt

@@ -179,6 +179,43 @@ New Behavior:
    # Output is a DataFrame
    df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x[['value']].sum())

+Changes in ``read_csv`` errors


this sub-section should be in API changes.

add a direction above for referncding this: .._whatsnew.......

say Change inread_csvexceptions

jreback · 2016-04-13T00:50:29Z

@gfyoung ok just a couple of minor doc comments. Pls also have a look thru io.rst and see if any exceptions are mentioned (and if so fix them), as well as doc-strings.

In Python, when reading an empty file, it used to throw a StopIteration error with no error message. This PR helps to differentiate the case when no columns are inferable, which now leads to an EmptyDataError for both the C and Python engines. [ci skip]

jreback · 2016-04-13T01:25:42Z

doc/source/whatsnew/v0.18.1.txt

+
+In addition to this error change, several others have been made as well:
+
+- ``CParserError`` is now a ``ValueError`` instead of just an ``Exception`` (:issue:`12551`)


was this whatsnew just not put in before? (the PR was already merged)

Yes, but I moved it into this section because it's related

jreback · 2016-04-13T01:30:42Z

@gfyoung thanks!

gfyoung · 2016-04-13T01:32:13Z

@jreback : Sure thing! FYI, you can also cancel my build here.

jreback reviewed Mar 2, 2016
View reviewed changes

jreback added Bug IO CSV read_csv, to_csv labels Mar 2, 2016

gfyoung force-pushed the empty_usecols branch 6 times, most recently from e58ddd0 to f25f3c6 Compare March 3, 2016 22:17

gfyoung force-pushed the empty_usecols branch 2 times, most recently from 4fb3b0e to 84d7cb3 Compare March 6, 2016 08:15

jreback reviewed Mar 6, 2016
View reviewed changes

gfyoung force-pushed the empty_usecols branch 4 times, most recently from 3304d2e to 5addc69 Compare March 8, 2016 06:27

jreback added this to the 0.18.1 milestone Mar 8, 2016

gfyoung force-pushed the empty_usecols branch 3 times, most recently from efc8523 to c5bdde3 Compare March 9, 2016 02:57

gfyoung force-pushed the empty_usecols branch 7 times, most recently from 45d5876 to e4bc4ec Compare April 12, 2016 01:48

BUG: Respect usecols even with empty data

1b576ca

Closes pandas-devgh-12493.

gfyoung force-pushed the empty_usecols branch from e4bc4ec to 681f47c Compare April 12, 2016 17:08

jorisvandenbossche reviewed Apr 12, 2016
View reviewed changes

gfyoung force-pushed the empty_usecols branch from 681f47c to 46c4e62 Compare April 12, 2016 22:30

jreback reviewed Apr 13, 2016
View reviewed changes

gfyoung force-pushed the empty_usecols branch from 46c4e62 to 3589267 Compare April 13, 2016 01:19

jreback reviewed Apr 13, 2016
View reviewed changes

jreback closed this in 827745d Apr 13, 2016

gfyoung deleted the empty_usecols branch April 13, 2016 01:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Respect usecols even with empty data #12506

BUG: Respect usecols even with empty data #12506

gfyoung commented Mar 2, 2016

gfyoung commented Mar 2, 2016

jreback Mar 2, 2016

gfyoung Mar 2, 2016

gfyoung commented Mar 3, 2016

gfyoung commented Mar 6, 2016

jreback Mar 6, 2016

gfyoung Mar 6, 2016

jreback commented Mar 6, 2016

gfyoung commented Mar 8, 2016

jreback commented Mar 8, 2016

gfyoung commented Mar 8, 2016

gfyoung commented Apr 12, 2016

jorisvandenbossche commented Apr 12, 2016

gfyoung commented Apr 12, 2016

jorisvandenbossche Apr 12, 2016

gfyoung Apr 12, 2016

jorisvandenbossche commented Apr 12, 2016

jorisvandenbossche Apr 12, 2016

gfyoung Apr 12, 2016

jorisvandenbossche Apr 12, 2016

gfyoung commented Apr 12, 2016

jorisvandenbossche commented Apr 12, 2016

jreback Apr 13, 2016

jreback Apr 13, 2016

gfyoung Apr 13, 2016

jreback commented Apr 13, 2016

jreback Apr 13, 2016

gfyoung Apr 13, 2016

jreback commented Apr 13, 2016

gfyoung commented Apr 13, 2016


		In addition to this error change, several others have been made as well:

		- ``CParserError`` is now a ``ValueError`` instead of just an ``Exception`` (:issue:`12551`)

BUG: Respect usecols even with empty data #12506

BUG: Respect usecols even with empty data #12506

Conversation

gfyoung commented Mar 2, 2016

gfyoung commented Mar 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented Mar 3, 2016

gfyoung commented Mar 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 6, 2016

gfyoung commented Mar 8, 2016

jreback commented Mar 8, 2016

gfyoung commented Mar 8, 2016

gfyoung commented Apr 12, 2016

jorisvandenbossche commented Apr 12, 2016

gfyoung commented Apr 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Apr 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented Apr 12, 2016

jorisvandenbossche commented Apr 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 13, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 13, 2016

gfyoung commented Apr 13, 2016