-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_csv C-engine CParserError: Error tokenizing data #11166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Your second-to-last line includes an
|
I'm encountering this error as well. Using the method suggested by @chris-b1 causes the following error:
|
+1 |
I have also found this issue when reading a large csv file with the default egine. If I use engine='python' then it works fine. |
I missed @alfonsomhc answer because it just looked like a comment. You need
|
had the same issue trying to read a folder not a csv file |
Has anyone investigated this issue? It's killing performance when using read_csv in a keras generator. |
The original data provided is no longer available so the issue is not reproducible. Closing as it's not clear what the issue is, but @dgrahn or anyone else if you can provide a reproducible example we can reopen |
@WillAyd Let me know if you need additional info. Since GitHub doesn't accept CSVs, I changed the extension to .txt. for chunk in pandas.read_csv('debug.csv', chunksize=1000, names=range(2504)):
pass Here's the file: debug.txt Here's the exception from Windows 10, using Anaconda.
And the same on RedHat.
|
@dgrahn I have downloaded debug.txt and I get the following if you run
Which is different from the I have inspected the debug.txt file and the first two lines have 204 columns but the 3rd line has 2504 columns. This would make the file unparsable and explains why an error is thrown. Is this expected? GitHub could be doing some implicit conversion in the background between newline types ("\r\n" and "\n") that is messing up the uploaded example. |
@joshlk Did you use the |
@dgrahn good point. Ok can now reproduce the error with It's good to note that |
@joshlk I could open a separate issue if that would be preferred. |
Solved my problem. |
I tried this approach and was able to upload large data files. But when I checked the dimension of the dataframe I saw that the number of rows have increased. What can be the logical regions for that? |
@dheeman00 : I am facing the same problem as you with changing sizes. I have a dataframe of shape (100K, 21) and after using engine = 'python', it gives me a dataframe of shape (100,034,21) (without enging='python', I get the same error as OP). After comparing them, I figured the problem is with one of my columns that contains text field, some with unknown chars, and some of them are broken into two different rows (the second row with the continuation of the text has all other columns set to "nan"). |
@Pegayus: Yes you are right. Some of the "nan" value columns break down into multiple columns. I have performed the following task to resolve the issue. pd.read_csv(file_name, sep=',', usecols=columns_name, engine='python'). I have called the columns individually and it worked for me. |
I am unable to upload my data file. I have tried following: error : 2)netflix_df = pd.read_csv('/Users/pavnigairola/Desktop/netflix_titles.csv', Error: |
Error: PLEASE SUGGEST |
Hi,
I have encountered a dataset where the C-engine read_csv has problems. I am unsure of the exact issue but I have narrowed it down to a single row which I have pickled and uploaded it to dropbox. If you obtain the pickle try the following:
I get the following exception:
If you try and read the CSV using the python engine then no exception is thrown:
Suggesting that the issue is with read_csv and not to_csv. The versions I using are:
The text was updated successfully, but these errors were encountered: