Skip to content

DF to CSV splits row if one of the data points ends in \r (Windows) #22678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
asishm opened this issue Sep 12, 2018 · 1 comment
Closed

DF to CSV splits row if one of the data points ends in \r (Windows) #22678

asishm opened this issue Sep 12, 2018 · 1 comment

Comments

@asishm
Copy link
Contributor

asishm commented Sep 12, 2018

Code Sample, a copy-pastable example if possible

test_df = pd.DataFrame({"before": [1,2,3,4,6], "error": ["Water damage to 3 floor apartment building.  \r"]*5, "after": [8,9,10,11,12]})
test_df[['before', 'error', 'after']].to_csv("test_error.csv", index=False)
read_test_df = pd.read_csv("test_error.csv")
read_test_df
before error after
0 1.0 Water damage to 3 floor apartment building. NaN
1 NaN 8 NaN
2 2.0 Water damage to 3 floor apartment building. NaN
3 NaN 9 NaN
4 3.0 Water damage to 3 floor apartment building. NaN
5 NaN 10 NaN
6 4.0 Water damage to 3 floor apartment building. NaN
7 NaN 11 NaN
8 6.0 Water damage to 3 floor apartment building. NaN
9 NaN 12 NaN

Problem description

The error column field ends with \r. When outputting this to csv, this splits the row into two lines.

Expected Output

'

before error after
0 1 Water damage to 3 floor apartment building. \r 8
1 2 Water damage to 3 floor apartment building. \r 9
2 3 Water damage to 3 floor apartment building. \r 10
3 4 Water damage to 3 floor apartment building. \r 11
4 6 Water damage to 3 floor apartment building. \r 12
'

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

Dupe of #10018. Pandas isn't escaping \r properly inside strings.

For now you can specify line_terminator when writing, or lineterminator when reading these files.

Let us know if you want to work on #10018

@TomAugspurger TomAugspurger added this to the No action milestone Sep 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants