Skip to content

BUG: escapechar=',' Causes Double Commas in Output in Pandas 2.2.2 #59454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
SamuelManaay opened this issue Aug 9, 2024 · 5 comments
Closed
3 tasks done
Assignees
Labels
Bug IO CSV read_csv, to_csv Upstream issue Issue related to pandas dependency

Comments

@SamuelManaay
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# Example DataFrame
df = pd.DataFrame({'column': ['value1, with comma', 'value2, with another comma']})

# Exporting DataFrame to CSV with escapechar=','
df.to_csv('output.csv', escapechar=',', index=False)

Issue Description

When using escapechar=',' in Pandas version 2.2.2, the resulting output includes double commas ,, in places where the data originally contained a comma. This behavior is inconsistent with Pandas version 1.4.3, where the escapechar did not cause this issue.

Steps to Reproduce:

  1. Create a DataFrame with a column containing commas in its values.
  2. Export the DataFrame to a CSV file using escapechar=','.
  3. Observe the output file and note the double commas ,, where there was originally a single comma.

Expected Behavior

The output CSV should properly escape commas without doubling them. The behavior should be consistent with previous Pandas versions where the escape character was used correctly.

Installed Versions

2.2.2

@SamuelManaay SamuelManaay added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 9, 2024
@rhshadrach
Copy link
Member

Thanks for the report, confirmed on main. Further investigations and PRs to fix are welcome.

@rhshadrach rhshadrach added IO CSV read_csv, to_csv and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 11, 2024
@wooseogchoi
Copy link
Contributor

take

@wooseogchoi
Copy link
Contributor

The pandas relies on the writer in cpython's csv in order to save the data frame to the given csv file. The escapechar also passes to the writer as escapechar. And the quoting is set to the csv's default value, QUOTE_MINIMAL, if it is not given. Therefore the bug, if it is bug, is not in pandas, but in csv of cpython.
I confirmed that csv in cpython also has the same issue.

'''
import csv
data = [['column'],
['value1, with comma'],
['value2, with another comma']]
with open('output-escape.csv'
, 'w') as csvfile:
writer = csv.writer(csvfile, escapechar=',')
writer.writerows(data)
'''
column
value1,, with comma
value2,, with another comma

So I reported it to cpython community.

python/cpython#123109

After investigating a little bit more, I found that the usage in this case seems to be incorrect as well.
Since the delimeter and the character we want to escape are the same, we can get the desired result by ommiting the escapechar from to_csv function call.

...
import pandas as pd
df = pd.DataFrame({'column':['value1, with comma', 'value2, with another comma']})
df.to_csv('pandas-no-escapedchar.csv', index=False)
'''
(base) PS C:\Users\User> cat pandas-no-escapedchar.csv
column
"value1, with comma"
"value2, with another comma"

@wooseogchoi
Copy link
Contributor

I will keep an eye on the issue in cpython and post updates.

@rhshadrach
Copy link
Member

Thanks for investigating! Closing this as an upstream issue.

@rhshadrach rhshadrach added the Upstream issue Issue related to pandas dependency label Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Upstream issue Issue related to pandas dependency
Projects
None yet
Development

No branches or pull requests

3 participants