Skip to content

ENH: support EA types in read_csv #23228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Oct 18, 2018 · 1 comment · Fixed by #23255
Closed

ENH: support EA types in read_csv #23228

jreback opened this issue Oct 18, 2018 · 1 comment · Fixed by #23255
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. IO CSV read_csv, to_csv
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Oct 18, 2018

In [3]: df = pd.DataFrame({'Int': pd.Series([1, 2, 3], dtype='Int64'), 'A': [1, 2, 1]})
   ...: df
   ...: 
Out[3]: 
  Int  A
0   1  1
1   2  2
2   3  1

In [4]: data = df.to_csv(index=False)

In [5]: data
Out[5]: 'Int,A\n1,1\n2,2\n3,1\n'

In [6]: from io import StringIO

In [7]: pd.read_csv(StringIO(data), dtype={'Int': 'Int64'})~/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
    968 
    969         self._start_clock()
--> 970         columns = self._convert_column_data(rows=rows,
    971                                             footer=footer,
    972                                             upcast_na=True)

~/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()
   1096 
   1097             # Should return as the desired dtype (inferred or specified)
-> 1098             col_res, na_count = self._convert_tokens(
   1099                 i, start, end, name, na_filter, na_hashset,
   1100                 na_flist, col_dtype)

~/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
   1121 
   1122         if col_dtype is not None:
-> 1123             col_res, na_count = self._convert_with_dtype(
   1124                 col_dtype, i, start, end, na_filter,
   1125                 1, na_hashset, na_flist)

~/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
   1249                             "using parse_dates instead".format(dtype=dtype))
   1250         else:
-> 1251             raise TypeError("the dtype {dtype} is not "
   1252                             "supported for parsing".format(dtype=dtype))
   1253 

TypeError: the dtype Int64 is not supported for parsing

we already support Categorical, would be nice to have a general interface to this

Parsing with the actual dtype is also broken (as it parses to object)

In [8]: from pandas.core.arrays.integer import Int64Dtype

In [9]: pd.read_csv(StringIO(data), dtype={'Int': Int64Dtype})
Out[9]: 
  Int  A
0   1  1
1   2  2
2   3  1

In [10]: pd.read_csv(StringIO(data), dtype={'Int': Int64Dtype}).dtypes
Out[10]: 
Int    object
A       int64
dtype: object
@jreback jreback added Enhancement Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv Difficulty Intermediate ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 18, 2018
@jreback jreback added this to the Contributions Welcome milestone Oct 18, 2018
@kprestel
Copy link
Contributor

I think I have a fix for the fix part of this issue.

I will create a PR and continue working on the second part of the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants