column name dedupe when already deduped returns "column_name.1.1" #27021

boldloop · 2019-06-24T16:41:59Z

This isn't necessarily a bug, just something I think could be handled slightly cleaner. The code:

from io import StringIO
csv = StringIO('a,a.1,a')
pd.read_csv(csv)

returns

a	a.1	a.1.1

where I think it could return

a	a.1	a.2

I encountered this when working with data that included the columns ["Column name", "COLUMN NAME", "COLUMN NAME"]. I work in multiple steps (i.e., separating each task into a separate script that takes the output from the previous task—works for auditability), so after the first step the columns became ["Column name", "COLUMN NAME", "COLUMN NAME.1"]. Down the pipeline, I renamed all columns to use all caps for consistency, but I ended up with ["COLUMN NAME.1.1", "COLUMN NAME", "COLUMN NAME.1"]. Not a huge problem, but a little bit annoying nonetheless.

I'm going to dig in and see if I can implement this later this week in a pull request—just filing it now so that I don't forget.

The text was updated successfully, but these errors were encountered:

phofl · 2021-11-27T12:46:11Z

duplicate of #14704

mroeschke added the IO CSV read_csv, to_csv label Nov 2, 2019

mroeschke added Bug Needs Discussion Requires discussion from core team before further action labels Jul 10, 2021

phofl closed this as completed Nov 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

column name dedupe when already deduped returns "column_name.1.1" #27021

column name dedupe when already deduped returns "column_name.1.1" #27021

boldloop commented Jun 24, 2019

phofl commented Nov 27, 2021

column name dedupe when already deduped returns "column_name.1.1" #27021

column name dedupe when already deduped returns "column_name.1.1" #27021

Comments

boldloop commented Jun 24, 2019

phofl commented Nov 27, 2021