ENH: support EA types in read_csv #23228

jreback · 2018-10-18T17:40:10Z

In [3]: df = pd.DataFrame({'Int': pd.Series([1, 2, 3], dtype='Int64'), 'A': [1, 2, 1]})
   ...: df
   ...: 
Out[3]: 
  Int  A
0   1  1
1   2  2
2   3  1

In [4]: data = df.to_csv(index=False)

In [5]: data
Out[5]: 'Int,A\n1,1\n2,2\n3,1\n'

In [6]: from io import StringIO

In [7]: pd.read_csv(StringIO(data), dtype={'Int': 'Int64'})~/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
    968 
    969         self._start_clock()
--> 970         columns = self._convert_column_data(rows=rows,
    971                                             footer=footer,
    972                                             upcast_na=True)

~/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()
   1096 
   1097             # Should return as the desired dtype (inferred or specified)
-> 1098             col_res, na_count = self._convert_tokens(
   1099                 i, start, end, name, na_filter, na_hashset,
   1100                 na_flist, col_dtype)

~/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
   1121 
   1122         if col_dtype is not None:
-> 1123             col_res, na_count = self._convert_with_dtype(
   1124                 col_dtype, i, start, end, na_filter,
   1125                 1, na_hashset, na_flist)

~/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
   1249                             "using parse_dates instead".format(dtype=dtype))
   1250         else:
-> 1251             raise TypeError("the dtype {dtype} is not "
   1252                             "supported for parsing".format(dtype=dtype))
   1253 

TypeError: the dtype Int64 is not supported for parsing

we already support Categorical, would be nice to have a general interface to this

Parsing with the actual dtype is also broken (as it parses to object)

In [8]: from pandas.core.arrays.integer import Int64Dtype

In [9]: pd.read_csv(StringIO(data), dtype={'Int': Int64Dtype})
Out[9]: 
  Int  A
0   1  1
1   2  2
2   3  1

In [10]: pd.read_csv(StringIO(data), dtype={'Int': Int64Dtype}).dtypes
Out[10]: 
Int    object
A       int64
dtype: object

The text was updated successfully, but these errors were encountered:

kprestel · 2018-10-20T19:10:24Z

I think I have a fix for the fix part of this issue.

I will create a PR and continue working on the second part of the issue.

jreback added Enhancement Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv Difficulty Intermediate ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 18, 2018

jreback added this to the Contributions Welcome milestone Oct 18, 2018

kprestel mentioned this issue Oct 20, 2018

ENH:Add EA types to read CSV #23255

Merged

jreback mentioned this issue Nov 9, 2018

ENH: Method to recover IntervalIndex when reloaing plain-text files #23595

Open

jreback modified the milestones: Contributions Welcome, 0.24.0 Dec 30, 2018

jreback closed this as completed in #23255 Jan 2, 2019

jreback mentioned this issue Jan 2, 2019

ENH: read_csv dtype with datetime w/tz & integrations with category #24542

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: support EA types in read_csv #23228

ENH: support EA types in read_csv #23228

jreback commented Oct 18, 2018

kprestel commented Oct 20, 2018

ENH: support EA types in read_csv #23228

ENH: support EA types in read_csv #23228

Comments

jreback commented Oct 18, 2018

kprestel commented Oct 20, 2018