-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Add warning if rows have more columns than expected #33782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Multiple tests are not passing due to the new warning. |
pandas/io/parsers.py
Outdated
@@ -2508,6 +2512,13 @@ def read(self, rows=None): | |||
content = content[1:] | |||
|
|||
alldata = self._rows_to_cols(content) | |||
if len(columns) != len(alldata) and notna(alldata[len(columns) :]).any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we need the notna
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In example mentioned in linked issue additional comma was added in one row. I assumed that additional commas are common and hence we might ignore them and don't raise a warning.
I'm using notna to check if data that won't be included contains only NaN values.
@mproszewska : Sorry for the long wait here! Overall, the solution is on the right track. |
Hello @mproszewska! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-10-08 19:27:23 UTC |
I have no idea how to check where this ipython directive error comes from. |
can you rebase and ill take a look at the ipython thing |
I rebased it |
on the docbuild, it looks like the following is issuing a warning
under this PR, is issuing a warning here the correct thing to do? If so, then an |
I think so. First row has 3 values and the rest - 4. Where in in.rst should :okwarning: be added? maybe there's another way to do that. It shouldn't be a common warning. |
conceptually this is ok. pls merge master and will re-look (and yes we would have to either fix the warnings or assert_produces_warning, though prob should fix the incorrect usages). |
hmm this looks like overlapping with #38587 |
closing in favor of #38587 |
read_csv
when given an additional value on the first row of CSV file #33037black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff