REGR: errors='replace' when encoding/errors are not specified #38997

twoertwein · 2021-01-06T06:02:23Z

closes BUG: read_csv raising when null bytes are in skipped rows #38989
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Should 1.3 use errors='replace' when no encoding/errors are specified or use errors=None (strict)?

phofl · 2021-01-06T12:34:48Z

I would say replace only when encoding is Not set and inferred to utf-8.

This should go back to 1.2.x

pandas/io/common.py

jreback

tests!

twoertwein · 2021-01-07T02:33:31Z

Should a follow-up PR intentionally revert this PR for 1.3? Personally, I would want to know when read_csv fails to read certain characters (errors = None by default).

phofl · 2021-01-07T09:06:03Z

I don't think that content from skipped rows should matter and it should raise no error if the rest of the file is ok. Maybe you could add a test withouth skiprows, this raises on 1.1.5 which is good and should keep raising

jreback · 2021-01-07T13:55:48Z

pandas/io/common.py

@@ -553,8 +553,10 @@ def get_handle(
    Returns the dataclass IOHandles
    """
    # Windows does not default to utf-8. Set to utf-8 for a consistent behavior
+    encoding_not_specified = False


instead why don't you

encoding_passed, encoding = encoding, encoding or 'utf-i'

and then you can test encoding_passed is not None its the same as you have but i think a bit more natural

twoertwein · 2021-01-07T17:33:29Z

I thought errors is exposed in read_csv. It isn't. In that case it makes sense to keep 'replace' as the default.

I don't think it is feasible to ignore encoding errors only for skipped rows. (We could read everything in binary mode but we would need to decode it to determine line endings.)

phofl · 2021-01-07T17:34:37Z

This errors parameter is Not exposed

twoertwein · 2021-01-07T17:36:16Z

yes, it isn't exposed. Sorry, that is what I meant to say :)

jreback · 2021-01-07T18:48:23Z

thanks @twoertwein

jreback · 2021-01-07T18:48:41Z

@meeseeksdev backport 1.2.x

…rors are not specified

lumberbot-app · 2021-01-07T18:49:19Z

Something went wrong ... Please have a look at my logs.

phofl · 2021-01-07T19:07:01Z

Thanks

…ot specified (#39021) Co-authored-by: Torsten Wörtwein <[email protected]>

…-dev#38997)

phofl reviewed Jan 6, 2021

View reviewed changes

pandas/io/common.py Outdated Show resolved Hide resolved

simonjayhawkins added the IO CSV read_csv, to_csv label Jan 6, 2021

simonjayhawkins added this to the 1.2.1 milestone Jan 6, 2021

jreback requested changes Jan 6, 2021

View reviewed changes

jreback requested changes Jan 7, 2021

View reviewed changes

REGR: errors='replace' when encoding/errors are not specified

4129e53

jreback approved these changes Jan 7, 2021

View reviewed changes

jreback merged commit 89ddd8a into pandas-dev:master Jan 7, 2021

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 7, 2021

Backport PR pandas-dev#38997: REGR: errors='replace' when encoding/er…

4c47f13

…rors are not specified

meeseeksmachine mentioned this pull request Jan 7, 2021

Backport PR #38997 on branch 1.2.x (REGR: errors='replace' when encoding/errors are not specified) #39021

Merged

jreback pushed a commit that referenced this pull request Jan 7, 2021

Backport PR #38997: REGR: errors='replace' when encoding/errors are n…

2a42c1c

…ot specified (#39021) Co-authored-by: Torsten Wörtwein <[email protected]>

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021

REGR: errors='replace' when encoding/errors are not specified (pandas…

cd7ad7f

…-dev#38997)

twoertwein mentioned this pull request Jan 28, 2021

BUG: read_csv does not raise UnicodeDecodeError on non utf-8 characters #39450

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

REGR: errors='replace' when encoding/errors are not specified #38997

REGR: errors='replace' when encoding/errors are not specified #38997

Uh oh!

twoertwein commented Jan 6, 2021 •

edited

Loading

Uh oh!

phofl commented Jan 6, 2021 •

edited

Loading

Uh oh!

Uh oh!

jreback left a comment

Uh oh!

twoertwein commented Jan 7, 2021

Uh oh!

phofl commented Jan 7, 2021

Uh oh!

jreback Jan 7, 2021

Uh oh!

twoertwein commented Jan 7, 2021

Uh oh!

phofl commented Jan 7, 2021

Uh oh!

twoertwein commented Jan 7, 2021

Uh oh!

jreback commented Jan 7, 2021

Uh oh!

jreback commented Jan 7, 2021

Uh oh!

lumberbot-app bot commented Jan 7, 2021

Uh oh!

phofl commented Jan 7, 2021

Uh oh!

Uh oh!

Uh oh!

REGR: errors='replace' when encoding/errors are not specified #38997

REGR: errors='replace' when encoding/errors are not specified #38997

Uh oh!

Conversation

twoertwein commented Jan 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phofl commented Jan 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

twoertwein commented Jan 7, 2021

Uh oh!

phofl commented Jan 7, 2021

Uh oh!

jreback Jan 7, 2021

Choose a reason for hiding this comment

Uh oh!

twoertwein commented Jan 7, 2021

Uh oh!

phofl commented Jan 7, 2021

Uh oh!

twoertwein commented Jan 7, 2021

Uh oh!

jreback commented Jan 7, 2021

Uh oh!

jreback commented Jan 7, 2021

Uh oh!

lumberbot-app bot commented Jan 7, 2021

Uh oh!

phofl commented Jan 7, 2021

Uh oh!

Uh oh!

twoertwein commented Jan 6, 2021 •

edited

Loading

phofl commented Jan 6, 2021 •

edited

Loading