Skip to content

Period converted to overflowing Timestamp in DataFrame.to_csv #15982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
snorfalorpagus opened this issue Apr 12, 2017 · 4 comments
Closed

Period converted to overflowing Timestamp in DataFrame.to_csv #15982

snorfalorpagus opened this issue Apr 12, 2017 · 4 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label Period Period data type
Milestone

Comments

@snorfalorpagus
Copy link

snorfalorpagus commented Apr 12, 2017

Code Sample

import pandas
dates = ["1990-01-01", "2000-01-01", "3005-01-01"]
index = pandas.PeriodIndex(dates, freq="D")
df = pandas.DataFrame([4, 5, 6], index=index)
print(df)
df.to_csv("bug.csv")

Problem description

I have some data with dates in the far future (e.g. 4000-01-01). Following the documentation I am using pandas.Period: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#representing-out-of-bounds-spans

This works, until I come to export the dataframe to a CSV (demonstrated in the code above). Printing the dataframe works as expected:

        0

1990-01-01 4
2000-01-01 5
3005-01-01 6

But the CSV files looks like this:

,0
1990-01-01 00:00:00.000000000,4
2000-01-01 00:00:00.000000000,5
1835-11-23 00:50:52.580896768,6

It looks like the Period has been converted to a Timestamp and silently overflowed.

Related to #13346, but I think worth a distinct issue.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 33.1.0.post20170122
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.8.0
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.2
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: 2.46.1
pandas_datareader: None

@TomAugspurger TomAugspurger added IO Data IO issues that don't fit into a more specific label Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv Period Period data type labels Apr 12, 2017
@TomAugspurger
Copy link
Contributor

This is only an issue when the period is in the index:

In [5]: df.reset_index().to_csv("bug.csv", index=False)

In [6]: !cat bug.csv
index,0
1990-01-01,4
2000-01-01,5
3005-01-01,6

cc @gfyoung if you're interested in taking a look.

@TomAugspurger TomAugspurger added this to the Next Major Release milestone Apr 12, 2017
@jreback
Copy link
Contributor

jreback commented Apr 12, 2017

This is what is called ultimately. Should be straightforward to fix.

In [6]: index
Out[6]: PeriodIndex(['1990-01-01', '2000-01-01', '3005-01-01'], dtype='period[D]', freq='D')

In [7]: index.to_native_types()
Out[7]: 
array(['1990-01-01', '2000-01-01', '3005-01-01'], 
      dtype='<U10')

@gfyoung
Copy link
Member

gfyoung commented Apr 12, 2017

@jreback : Right, but WAY before that, we call index.to_timestamp() when we initialize CSVFormatter. That ultimately is why the bug (i.e. overflow) occurs.

@jreback
Copy link
Contributor

jreback commented Apr 12, 2017

ahh ok, that that has to fallback to handle out-of-bounds timestamps.

gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 12, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 12, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 12, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 13, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Apr 13, 2017
jreback pushed a commit that referenced this issue Apr 13, 2017
* BUG: Don't overflow PeriodIndex in to_csv

Closes gh-15982.

* TST: Test to_native_types for Period/DatetimeIndex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label Period Period data type
Projects
None yet
Development

No branches or pull requests

4 participants