Skip to content

read_csv cannot use dtype and true_values/false_values #34655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JonnyWaffles opened this issue Jun 8, 2020 · 2 comments · Fixed by #39012
Closed

read_csv cannot use dtype and true_values/false_values #34655

JonnyWaffles opened this issue Jun 8, 2020 · 2 comments · Fixed by #39012
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. IO CSV read_csv, to_csv
Milestone

Comments

@JonnyWaffles
Copy link

Hi friends,

Not sure if this is working as intended, but it appears you cannot use both dtype and true_values or false_values kwargs when reading csv.

from io import StringIO
from csv import writer

import pandas as pd
import pytest


def test_pandas_read_write():
    df = pd.DataFrame({'A': ['yes', 'no'], 'B': ['yes', 'no']})
    out_io = StringIO(newline='')
    df.to_csv(out_io, index=False)

    out_io.seek(0)
    pd.read_csv(out_io)

    out_io.seek(0)
    kwargs = dict(dtype={'A': 'boolean', 'B': 'boolean'})

    with pytest.raises(ValueError):
        pd.read_csv(out_io, **kwargs)

    kwargs.update({'true_values': ['yes'], 'false_values': ['no']})
    out_io.seek(0)

    with pytest.raises(ValueError):
        pd.read_csv(out_io, **kwargs)

    out_io.seek(0)
    # pop dtype so true/false values work
    kwargs.pop('dtype')
    pd.read_csv(out_io, **kwargs)

Using converters kwarg to get the data how I like will fix my problem, but we may want to update the docs to let users know true/false user defined values will not work in conjugation with providing the boolean type.

@JonnyWaffles JonnyWaffles added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 8, 2020
@gwvr
Copy link

gwvr commented Jul 8, 2020

Confirmed. I'd like to read a column containing t, f & `` as a nullable boolean.
dtype={'column_name':'boolean'} does not respect `true_values=['t']`.

I've also tried using a converter to replace t with True and f with False, but read_csv ignores dtype parameters where a converter is used, so the column is read as an object series, rather than a nullable boolean.

For now, my workaround is to pass true_values=['t'] (and the equivalent for 'f') to read_csv, and chain astype({'column_name':'boolean'}).

@jbrockmendel jbrockmendel added IO CSV read_csv, to_csv and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 3, 2020
@mitar
Copy link
Contributor

mitar commented Dec 12, 2020

I would say this should be fixed, not documented as a limitation.

@jreback jreback added this to the 1.3 milestone Jan 8, 2021
@jreback jreback added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants