Bug in read_csv and read_excel not applying dtype to second col with dup cols #41411

phofl · 2021-05-10T22:03:52Z

closes BUG: in read_excel for mangled columns only the original/first column dtype is correct, col.N is not parsed correctly #35211
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

…dup cols

jreback

jreback · 2021-05-12T13:55:28Z

pandas/_libs/parsers.pyx

-                            counts[name] = count + 1
-                            name = f'{name}.{count}'
-                            count = counts.get(name, 0)
+                        if count > 0:


would be nice to unify this code between here and the python parser (followon)

Yes definitely, but will have to refactor the PythonParser quite a bit and split into 2 classes to be able to inherit from TextReader respectively a generic cython class where TextReader and something like PythonTextReader can inherit from.

I am planning to do this in the (probably medium-term) future

sounds great!

feels free to open an issue for tracking

Though about using #39345 for this

fiendish · 2021-06-15T17:57:03Z

FYI this breaks parsing with a non-dict dtype.

phofl · 2021-06-15T18:01:04Z

Could you provide an example?

fiendish · 2021-06-15T18:09:39Z

Documentation for read_csv says dtype may be "a type name or dict".
pandas.read_csv("test.csv", engine="python", dtype="str") and pandas.read_csv("test.csv", engine="python", dtype=str) now error because this code's new calls to .get assume a dict.

phofl · 2021-06-15T18:14:16Z

Your test file has duplicate columns too? Will look into this later

fiendish · 2021-06-15T18:16:40Z

Your test file has duplicate columns too? Will look into this later

Indeed it does, good eye. That's the unfortunate reality when non-programmer people make documents.

Here's my test file:
test.csv

But of course

A,B,B
1,1,1

shows it too.

fiendish · 2021-06-15T18:23:24Z

I should have filed an issue first. I've done that now: #42022

… dup cols (pandas-dev#41411)

phofl added 2 commits May 11, 2021 00:02

Bug in read_csv and read_excel not applying dtype to second col with …

fbafc11

…dup cols

Merge branch 'master' of https://github.com/pandas-dev/pandas into 35211

0a4cb25

phofl added IO CSV read_csv, to_csv IO Excel read_excel, to_excel labels May 10, 2021

Change dtypes

bb62c19

jreback added this to the 1.3 milestone May 12, 2021

jreback approved these changes May 12, 2021

View reviewed changes

jreback merged commit 76792f1 into pandas-dev:master May 12, 2021

phofl deleted the 35211 branch May 12, 2021 13:59

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

Bug in read_csv and read_excel not applying dtype to second col with…

4937594

… dup cols (pandas-dev#41411)

phofl mentioned this pull request Jul 15, 2021

read_excel() modifies provided types dict when accessing file with duplicate column #42508

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in read_csv and read_excel not applying dtype to second col with dup cols #41411

Bug in read_csv and read_excel not applying dtype to second col with dup cols #41411

phofl commented May 10, 2021

jreback left a comment

jreback May 12, 2021

phofl May 12, 2021

jreback May 12, 2021

phofl May 12, 2021

fiendish commented Jun 15, 2021

phofl commented Jun 15, 2021

fiendish commented Jun 15, 2021

phofl commented Jun 15, 2021

fiendish commented Jun 15, 2021 •

edited

Loading

fiendish commented Jun 15, 2021

Bug in read_csv and read_excel not applying dtype to second col with dup cols #41411

Bug in read_csv and read_excel not applying dtype to second col with dup cols #41411

Conversation

phofl commented May 10, 2021

jreback left a comment

Choose a reason for hiding this comment

jreback May 12, 2021

Choose a reason for hiding this comment

phofl May 12, 2021

Choose a reason for hiding this comment

jreback May 12, 2021

Choose a reason for hiding this comment

phofl May 12, 2021

Choose a reason for hiding this comment

fiendish commented Jun 15, 2021

phofl commented Jun 15, 2021

fiendish commented Jun 15, 2021

phofl commented Jun 15, 2021

fiendish commented Jun 15, 2021 • edited Loading

fiendish commented Jun 15, 2021

fiendish commented Jun 15, 2021 •

edited

Loading