-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug in read_csv and read_excel not applying dtype to second col with dup cols #41411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
phofl
commented
May 10, 2021
- closes BUG: in read_excel for mangled columns only the original/first column dtype is correct, col.N is not parsed correctly #35211
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @phofl
counts[name] = count + 1 | ||
name = f'{name}.{count}' | ||
count = counts.get(name, 0) | ||
if count > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to unify this code between here and the python parser (followon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes definitely, but will have to refactor the PythonParser quite a bit and split into 2 classes to be able to inherit from TextReader
respectively a generic cython class where TextReader
and something like PythonTextReader
can inherit from.
I am planning to do this in the (probably medium-term) future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds great!
feels free to open an issue for tracking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though about using #39345 for this
FYI this breaks parsing with a non-dict dtype. |
Could you provide an example? |
Documentation for read_csv says dtype may be "a type name or dict". |
Your test file has duplicate columns too? Will look into this later |
Indeed it does, good eye. That's the unfortunate reality when non-programmer people make documents. Here's my test file: But of course
shows it too. |
I should have filed an issue first. I've done that now: #42022 |