-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Respect usecols even with empty data #12506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tests are passing. Should be good to merge. |
|
||
def check_usecols(stringIO): | ||
df = read_csv(stringIO, names=names, usecols=usecols) | ||
assert_array_equal(df.columns, usecols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEVER use numpy.testing
imports!!!!!
ALWAYS use pandas testing functions. These are much more comprehensive and fully test all behavior. It is also confusing to a reader.
Fully test vs an expected frame and use assert_frame_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got it with the first exclamation point. ;) - fixed.
e58ddd0
to
f25f3c6
Compare
If some could cancel this build #18023 (it's an old build), that would be great. Thanks! |
4fb3b0e
to
84d7cb3
Compare
@jreback : Tests are passing once more. Should be good to merge AFAICT. |
while self.line_pos <= hr: | ||
line = self._next_line() | ||
|
||
except StopIteration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you adding all of this code and making this very complicated?
the idea is to make things simpler. pls see if you can find a general soln.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How complicated is this code? It's essentially boilerplate and logic from the C engine code. I'm not writing anything new here. I don't understand why people would have a hard time understanding this code but yet the C engine code is considered comprehensible.
@gfyoung you had to change a whole bunch of things to get your code to work (e.g. changing long established behavior in excel/html parsing), the |
3304d2e
to
5addc69
Compare
@jreback : Simplified things a little by changing the error from |
at a glance looks ok. still working on 0.18.0, so will get to next week. |
👍 will ping on all of them next week then. |
efc8523
to
c5bdde3
Compare
45d5876
to
e4bc4ec
Compare
@jorisvandenbossche : Any update on this PR? I think @jreback is waiting for your feedback before he merges this in. |
Sorry, slowly getting back on track :-) I will take a look now. Just a quick question, my previous comment was about HTML and Excel parser. The whatsnew note now does not mention those anymore. There are no longer changes to to |
@jorisvandenbossche : No worries. 😄 There are in fact changes to |
df = self.read_csv(StringIO(''), names=names, usecols=usecols) | ||
|
||
def test_read_with_bad_header(self): | ||
name = self.__class__.__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait...why did flake8
not catch that? No, I don't think so AFAICT.
@gfyoung But they only catch it under the hood? (I mean, it doesn't bubble up to the user?) |
|
||
In [1]: df = pd.read_csv(StringIO(''), engine='c') | ||
... | ||
pandas.io.common.EmptyDataError: No columns to parse from file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it actually shown with the full name? (just a question, didn't test it, didn't fetch the PR, but I would just show the same as in an actual console)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is what I did. 😄 - FYI you can observe this full out name thing if you trigger any current CParserError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, perfect! (I just wondered if it was the case)
@jorisvandenbossche : Not AFAICT (it's an explicit |
@@ -179,6 +179,43 @@ New Behavior: | |||
# Output is a DataFrame | |||
df.groupby(pd.TimeGrouper(key='date', freq='M')).apply(lambda x: x[['value']].sum()) | |||
|
|||
Changes in ``read_csv`` errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sub-section should be in API changes.
add a direction above for referncding this: .._whatsnew.......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say Change in
read_csvexceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@gfyoung ok just a couple of minor doc comments. Pls also have a look thru |
In Python, when reading an empty file, it used to throw a StopIteration error with no error message. This PR helps to differentiate the case when no columns are inferable, which now leads to an EmptyDataError for both the C and Python engines. [ci skip]
|
||
In addition to this error change, several others have been made as well: | ||
|
||
- ``CParserError`` is now a ``ValueError`` instead of just an ``Exception`` (:issue:`12551`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this whatsnew just not put in before? (the PR was already merged)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I moved it into this section because it's related
@gfyoung thanks! |
closes #12493
in which the
usecols
argument was not being respected for empty data. This is because no filtering was applied when the first (and only) chunk was being read.