-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Raise error in read_csv when arguments header and prefix both are not None #31383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rushabh-v. can you add a what's new note. What's the current behaviour? Does this need a deprecation cycle?
All checks successful. And I have made all the requested changes @simonjayhawkins |
IIUC then currently >>> from io import StringIO
>>> s = StringIO("0,1\n2,3")
>>>
>>> import pandas as pd
>>> pd.__version__
'1.0.0rc0+228.g4edcc5541'
>>>
>>> pd.read_csv(s, header=0, prefix="_X")
0 1
0 2 3
>>> This would now raise a ValueError. IMO this is a breaking change. see what others think. |
oh, I think you are right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. The intention here is to raise a ValueError (its a bug and confusing to silently ignore)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @TomAugspurger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look OK, though the test is likely in the wrong file. Should be in pandas/tests/io/parser/test_common.py
probably.
pandas/tests/frame/test_to_csv.py
Outdated
@@ -575,6 +575,13 @@ def test_to_csv_headers(self): | |||
recons.reset_index(inplace=True) | |||
tm.assert_frame_equal(to_df, recons) | |||
|
|||
def test_to_csv_raises_on_header_prefix(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this testing to_csv or read_csv? Looks like it's in the to_csv file, but is testing read_csv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's testing read_csv that's a mistake in naming it.
But I actually tried putting this test in pandas/tests/io/parser/test_common.py
. But the problem there is that all the tests use io.parsers.read_csv
there. And io.parsers.read_csv
doesn't seem to be going into CParserWrapper
. So I tried putting it here. But now I am putting it back in test_common.py
and will import the pandas.read_csv
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you take a look, please?
@@ -2040,6 +2040,14 @@ def test_read_csv_memory_growth_chunksize(all_parsers): | |||
pass | |||
|
|||
|
|||
def test_read_csv_raises_on_header_prefix(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use the all_parsers fixture here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I first tried to do that in c053a8f but looks likey parsers.read_csv
is not going to CParserWrapper
. Which is what we want to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure it is, what exactly was failing? this is the standard pattern for parser tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is to check whether the error is being raised or not, and the error is raised from CParserWrapper
So, the whole test fails here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to apply your fix to all engines. You just applied to CParserWrapper
it seems.
I think the fix should actually be in ParserBase
so that all engines get the fix.
c053a8f is the correct test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rushabh-v : You're on the right track, but I think you tried too hard to adjust the tests to your fixes, when it really should be the other way around. See my earlier comments.
Hello @rushabh-v! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-02-03 09:25:48 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good. Just want to make sure that tests pass.
If they fail, we can update those accordingly.
@gfyoung 2 tests are failing, I don't see any obvious connection of them with this PR. |
Indeed, those test failures look unrelated. I just restarted the build. |
same test failure again. |
You may need to rebase or merge. Unclear why these warnings are popping up. cc @pandas-dev/pandas-core : is there something with |
Fails after merge too |
All greens now @gfyoung |
Thanks @rushabh-v ! |
Pandas 1.1 has broken Record Mover's usage of the read_csv() function by adding error checking in cases where a certain argument would be unused. Details of the Pandas change: * pandas-dev/pandas#27394 * pandas-dev/pandas#31383 See Records Mover test failures here: * https://app.circleci.com/pipelines/github/bluelabsio/records-mover/1089/workflows/e62f1cf0-f8d0-4e22-9652-112df72b02b8/jobs/9439
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff