-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_csv parse issue with newline in quoted items combined with skiprows #10911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This looks like what you want.
|
That would work for this example, but what if the quoted texts in the csv, coming from many different sources, use not only \n but sometimes also \r as a newline? |
well if u show an example that would help |
I added an example to the original post |
A combination of universal newline mode and the python parsing engine seems to work
|
Patches bug in C engine CSV parser in which quotation marks were not being respected in skipped rows. Closes pandas-devgh-10911. Closes pandas-devgh-12775.
Now I don't know if this is known or the desired behaviour but when I try to read certain rows from a large file that uses "~" (tilde) as a quotechar and use skiprows at the same time, the parser screws up as follows:
Note: I use "" in the output even though that isn't shown, if I didn't the markup would become messed up - sorry...
while the output I wish to get would be in this artificial case:
it seems when skipping rows, the parser ignores custom quotation - which in this case is undesired from my point of view.
EDIT: It might well be that in the quoted texts newlines are not always \n but sometimes also \r.
EDIT2 (31.8.):
The lineterminator fix fails as far as I can see with the following example:
The problem is that there is a "text"-column in the csv with html-formatted textblocks as content. However, there is no saying what kind of newline the creators of the html used originally and the textblocks stem from different sources.
I might also add that it respects the quoting perfectly if one does not use "skiprows".
versioninfo:
The text was updated successfully, but these errors were encountered: