Issue with parsing PeriodDtype columns in read_csv() #26934

mc4229 · 2019-06-19T01:25:47Z

Code Sample

df = pd.DataFrame({'Int': [1, 2, 3], 'Period': pd.period_range(start="2019-01", end="2019-03", freq="M")})
df.to_csv("PeriodDtype.csv")
pd.read_csv("PeriodDtype.csv", dtype={"Int": np.int64, "Period": pd.PeriodDtype("M")})

Problem description

Using pandas 0.24.2, I wrote a simple data frame with the following dtypes into a csv file,

Int           int64
Period    period[M]
dtype: object

When I tried to read it back in, I found that read_csv() could not parse PeriodDtype("M"). I got the following error message:

NotImplementedError: Extension Array: <class 'pandas.core.arrays.period.PeriodArray'> 
must implement _from_sequence_of_strings in order to be used in parser methods

I saw a similar issue #24542 raised for Datetime dtype. It seems that _from_sequence_of_strings() is also not defined for PeriodArray, which prevents parsing columns with PeriodDtype.

I think adding _from_sequence_of_strings() for PeriodArray would be a good enhancement. If that is the case I would be interested in making that change.

Output of `pd.show_versions()`

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 18.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 3.8.0
pip: 10.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.16.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml.etree: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-06-19T02:46:25Z

I think adding _from_sequence_of_strings() for PeriodArray would be a good enhancement. If that is the case I would be interested in making that change.

Yep. You may be able to reuse _from_sequence.

mc4229 · 2019-06-19T03:02:12Z

@TomAugspurger Thanks! I will take a look at this.

PaulCherian · 2019-06-19T21:32:22Z

I think adding _from_sequence_of_strings() for PeriodArray would be a good enhancement. If that is the case I would be interested in making that change.

Yep. You may be able to reuse _from_sequence.

@mc4229 I tried calling _sequence directly from _from_sequence_of_strings() and it works

Fixes: pandas-dev#26934 Signed-off-by: Antonio Gutierrez <[email protected]>

chibby0ne · 2019-07-13T14:47:58Z

Hi all, I created a PR using for this issue using the suggested approach.

Fixes: pandas-dev#26934 Signed-off-by: Antonio Gutierrez <[email protected]>

TomAugspurger added ExtensionArray Extending pandas with custom dtypes or arrays. IO CSV read_csv, to_csv labels Jun 19, 2019

TomAugspurger added this to the Contributions Welcome milestone Jun 19, 2019

TomAugspurger added the good first issue label Jun 19, 2019

chibby0ne added a commit to chibby0ne/pandas that referenced this issue Jul 13, 2019

arrays/period: allow parsing of PeriodDtype columns from read_csv

91b4d0e

Fixes: pandas-dev#26934 Signed-off-by: Antonio Gutierrez <[email protected]>

chibby0ne mentioned this issue Jul 13, 2019

arrays/period: allow parsing of PeriodDtype columns from read_csv #27380

Merged

5 tasks

chibby0ne added a commit to chibby0ne/pandas that referenced this issue Jul 13, 2019

arrays/period: allow parsing of PeriodDtype columns from read_csv

bcfaeed

Fixes: pandas-dev#26934 Signed-off-by: Antonio Gutierrez <[email protected]>

chibby0ne added a commit to chibby0ne/pandas that referenced this issue Jul 13, 2019

arrays/period: allow parsing of PeriodDtype columns from read_csv

457288d

Fixes: pandas-dev#26934 Signed-off-by: Antonio Gutierrez <[email protected]>

chibby0ne added a commit to chibby0ne/pandas that referenced this issue Jul 14, 2019

arrays/period: allow parsing of PeriodDtype columns from read_csv

ba5f438

Fixes: pandas-dev#26934 Signed-off-by: Antonio Gutierrez <[email protected]>

jreback modified the milestones: Contributions Welcome, 0.25.0 Jul 17, 2019

jreback closed this as completed in #27380 Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with parsing PeriodDtype columns in read_csv() #26934

Issue with parsing PeriodDtype columns in read_csv() #26934

mc4229 commented Jun 19, 2019

INSTALLED VERSIONS

TomAugspurger commented Jun 19, 2019

mc4229 commented Jun 19, 2019

PaulCherian commented Jun 19, 2019

chibby0ne commented Jul 13, 2019

Issue with parsing PeriodDtype columns in read_csv() #26934

Issue with parsing PeriodDtype columns in read_csv() #26934

Comments

mc4229 commented Jun 19, 2019

Code Sample

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Jun 19, 2019

mc4229 commented Jun 19, 2019

PaulCherian commented Jun 19, 2019

chibby0ne commented Jul 13, 2019

Output of `pd.show_versions()`