column name dedupe when already deduped returns "column_name.1.1" #27021
Labels
Bug
IO CSV
read_csv, to_csv
Needs Discussion
Requires discussion from core team before further action
This isn't necessarily a bug, just something I think could be handled slightly cleaner. The code:
returns
where I think it could return
I encountered this when working with data that included the columns ["Column name", "COLUMN NAME", "COLUMN NAME"]. I work in multiple steps (i.e., separating each task into a separate script that takes the output from the previous task—works for auditability), so after the first step the columns became ["Column name", "COLUMN NAME", "COLUMN NAME.1"]. Down the pipeline, I renamed all columns to use all caps for consistency, but I ended up with ["COLUMN NAME.1.1", "COLUMN NAME", "COLUMN NAME.1"]. Not a huge problem, but a little bit annoying nonetheless.
I'm going to dig in and see if I can implement this later this week in a pull request—just filing it now so that I don't forget.
The text was updated successfully, but these errors were encountered: