Skip to content

same .tsv file, get different data-frame structure using engine 'python' and 'c' #26545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Jane-Eyre opened this issue May 28, 2019 · 6 comments · Fixed by #26634
Closed

same .tsv file, get different data-frame structure using engine 'python' and 'c' #26545

Jane-Eyre opened this issue May 28, 2019 · 6 comments · Fixed by #26634
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@Jane-Eyre
Copy link

In my Mac, I have a Tab-Separated values file with encoding UTF-8, and the version of pandas is
0.24.2.
74872_zh_CN_UI.txt

when I use read_csv function with engine 'python' like this:
b = pd.read_csv("/Users/GHIBLI/Documents/vmware-L10n/bert/zh_CN/74872_zh_CN_UI.tsv", engine="python", delimiter="\t")
print(b.shape)
I got (8, 1)

if with default engine:
b = pd.read_csv("/Users/GHIBLI/Documents/vmware-L10n/bert/zh_CN/74872_zh_CN_UI.tsv", delimiter="\t")
print(b.shape)
I got (8,22)

In contrast to 'C' engine, this 'python' engine seems that is not as simple as just 'feature-complete' I think.

@Liam3851
Copy link
Contributor

I cannot reproduce using pandas 0.24.2 on either Ubuntu or Windows -- both show the shape as (8, 22) using both engines. It could be a Mac-specific issue, though it would be odd that the Python-engine implementation would behave differently based on the OS.

@WillAyd
Copy link
Member

WillAyd commented May 28, 2019

I can reproduce this on master. If you would like to take a deeper look and see what's going on would certainly appreciate it!

@WillAyd WillAyd added Bug IO CSV read_csv, to_csv labels May 28, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone May 28, 2019
@Liam3851
Copy link
Contributor

@WillAyd Are you using a Mac? I can't reproduce on master using Windows/python 3.7.3 (I suppose perhaps this could also be a python version difference, rather than an OS difference)

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.25.0.dev0+615.g998a0deea'

In [3]: pd.read_csv('74872_zh_CN_UI.txt', delimiter='\t', engine='python').shape
Out[3]: (8, 22)

@WillAyd
Copy link
Member

WillAyd commented May 28, 2019 via email

@luckydenis
Copy link
Contributor

I was able to play on linux 19.04 Python 3.7 Pandas 0.24.2

@luckydenis
Copy link
Contributor

Good afternoon, I found where in what a mistake. But have not yet figured out how best to fix it. Do you mind if I take this task?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants