BUG: Don't overflow PeriodIndex in to_csv #15984

gfyoung · 2017-04-12T17:29:24Z

Don't overflow time-stamp objects if being used as index in to_csv.

TomAugspurger · 2017-04-12T17:48:46Z

doc/source/whatsnew/v0.20.0.txt

@@ -1260,6 +1260,7 @@ I/O
 - Bug in ``pd.read_csv()`` in which invalid values for ``nrows`` and ``chunksize`` were allowed (:issue:`15767`)
 - Bug in ``pd.read_csv()`` for the Python engine in which unhelpful error messages were being raised when parsing errors occurred (:issue:`15910`)
 - Bug in ``pd.read_csv()`` in which the ``skipfooter`` parameter was not being properly validated (:issue:`15925`)
+- Bug in ``pd.to_csv()`` in which there was numeric overflow when a timestamp index was being written (:issue:`15982`)


timestamp -> Period I think.

I used timestamp because that seemed more clear (to the everyday user) than PeriodIndex. What do you think?

codecov · 2017-04-12T18:22:29Z

Codecov Report

Merging #15984 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15984      +/-   ##
==========================================
+ Coverage      91%      91%   +<.01%     
==========================================
  Files         145      145              
  Lines       49641    49639       -2     
==========================================
  Hits        45175    45175              
+ Misses       4466     4464       -2

Flag	Coverage Δ
#multiple	`88.77% <100%> (ø)`	⬆️
#single	`40.53% <100%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/indexes/base.py	`96.09% <ø> (ø)`	⬆️
pandas/formats/format.py	`95.4% <100%> (+0.05%)`	⬆️
pandas/core/common.py	`91.03% <0%> (+0.34%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2ed595...5939a24. Read the comment docs.

jreback · 2017-04-12T19:31:56Z

pandas/tests/frame/test_to_csv.py

@@ -1143,3 +1143,23 @@ def test_to_csv_quoting(self):
        df = df.set_index(['a', 'b'])
        expected = '"a","b","c"\n"1","3","5"\n"2","4","6"\n'
        self.assertEqual(df.to_csv(quoting=csv.QUOTE_ALL), expected)
+
+    def test_period_index_date_overflow(self):


can you also test NaT. I think we have very few tests for outputing a PI.

jreback · 2017-04-12T19:32:54Z

can you also tests .to_native_types() for a PI (and DTI if not already tested).

gfyoung · 2017-04-12T19:34:51Z

can you also tests .to_native_types() for a PI (and DTI if not already tested).

Slightly outside of the scope of this PR, but I can squeeze that in with another commit.

jreback · 2017-04-12T19:36:18Z

Slightly outside of the scope of this PR, but I can squeeze that in with another commit.

Ha, tests are never out of scope!

gfyoung · 2017-04-12T19:37:21Z

Ha, tests are never out of scope!

Well, I was focusing to_csv, not to_native_types 😄

gfyoung · 2017-04-13T01:57:56Z

@jreback , @TomAugspurger : All comments have been addressed, and everything is green.

jreback · 2017-04-13T10:05:00Z

pandas/indexes/base.py

+            An indexer into `self` that specifies which values
+            are used in the formatting process.
+        kwargs : dict
+            Options for specifying how the values should be formatted.


I think would be ok to actually list these options (with there defautls) in the signature itself. I don't recall why I didn't do this originally. followup PR for this though.

Derived classes change the signature with different defaults. That's why **kwargs is provided in the base implementation.

jreback · 2017-04-13T10:07:29Z

pandas/tests/frame/test_to_csv.py

+        df = pd.DataFrame([4, 5, 6], index=index)
+
+        buf = StringIO()
+        df.to_csv(buf)


if you are looking for something to clean :>

there is a bit of a dichotomy between this test file and pandas/formats/test_formats.py where this tests 'higher' level functionaility and test_formats somewhat lower, but this is probably not consistent, and certainly not documented (maybe just a comment at the top of the file would be enough).

Hmm...I'm not sure I really see any issue with this right now. What sort of comment (and which file) were you thinking exactly?

jreback · 2017-04-13T10:09:58Z

pandas/tests/indexes/datetimes/test_datetime.py

+        expected = np.array(['01-2017-01', '01-2017-02',
+                             '01-2017-03'], dtype=object)
+
+        result = index.to_native_types(date_format='%m-%Y-%d')


hmm, can you move these to test_formats.py (new file), in this directory, same with the period ones. I don't these are actually tested anywhere :>

No, they aren't, but they are now with this PR 😄

jreback · 2017-04-13T10:10:04Z

pandas/tests/indexes/period/test_period.py

+        index = PeriodIndex(['2017-01-01', '2017-01-02',
+                             '2017-01-03'], freq='D')
+
+        # First, with no arguments.


same as above

jreback · 2017-04-13T10:10:36Z

mostly about putting tests in better places, otherwise lgtm.

jreback · 2017-04-13T10:14:15Z

pandas/tests/frame/test_to_csv.py

+        index = pd.PeriodIndex(dates, freq="D")
+        df = pd.DataFrame([4, 5, 6], index=index)
+
+        buf = StringIO()


wondering outloud whether this particular testing idiom could be improved somehow. a possibilitiy

with tm.assert_format_equal(expected) as buf: df.to_csv(buf)

we do this a LOT in testing output formatting (suggestions on name welcome), certainly a separate PR (to do changes in the rest of the codebase), though could be introduced here if you'd like

Actually, now that I look at this, since df.to_csv DOES return if buf is None, complicating the testing with writing to a buffer seems unnecessary. In the future, we should encourage any tests of to_csv to do the following:

expected = "foo" result = df.to_csv() assert result == expected

That makes your point somewhat moot, as I think this way is much cleaner.

Closes pandas-devgh-15982.

gfyoung · 2017-04-13T15:41:32Z

@jreback : Build machines running into SSL errors when cloning repository. [ci skip] for now.

gfyoung · 2017-04-13T18:11:36Z

Addressed all comments, and everything is green.

jreback · 2017-04-13T22:11:39Z

thanks @gfyoung !

TomAugspurger reviewed Apr 12, 2017

View reviewed changes

jreback reviewed Apr 12, 2017

View reviewed changes

jreback added IO CSV read_csv, to_csv Period Period data type labels Apr 12, 2017

gfyoung force-pushed the period-datetime-overflow branch 3 times, most recently from 20cdf1e to 06cf781 Compare April 12, 2017 22:00

jreback reviewed Apr 13, 2017

View reviewed changes

BUG: Don't overflow PeriodIndex in to_csv

2126601

Closes pandas-devgh-15982.

gfyoung force-pushed the period-datetime-overflow branch 3 times, most recently from ac7e911 to 7165fdb Compare April 13, 2017 15:41

TST: Test to_native_types for Period/DatetimeIndex

5939a24

gfyoung force-pushed the period-datetime-overflow branch from 7165fdb to 5939a24 Compare April 13, 2017 16:36

jreback added this to the 0.20.0 milestone Apr 13, 2017

jreback merged commit 7ee73ff into pandas-dev:master Apr 13, 2017

gfyoung deleted the period-datetime-overflow branch April 13, 2017 22:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Don't overflow PeriodIndex in to_csv #15984

BUG: Don't overflow PeriodIndex in to_csv #15984

gfyoung commented Apr 12, 2017

TomAugspurger Apr 12, 2017

gfyoung Apr 12, 2017 •

edited

Loading

codecov bot commented Apr 12, 2017 •

edited

Loading

jreback Apr 12, 2017

gfyoung Apr 12, 2017

jreback commented Apr 12, 2017

gfyoung commented Apr 12, 2017

jreback commented Apr 12, 2017

gfyoung commented Apr 12, 2017

gfyoung commented Apr 13, 2017

jreback Apr 13, 2017

gfyoung Apr 13, 2017 •

edited

Loading

jreback Apr 13, 2017

gfyoung Apr 13, 2017

jreback Apr 13, 2017

gfyoung Apr 13, 2017

jreback Apr 13, 2017

gfyoung Apr 13, 2017

jreback commented Apr 13, 2017

jreback Apr 13, 2017

gfyoung Apr 13, 2017 •

edited

Loading

gfyoung commented Apr 13, 2017

gfyoung commented Apr 13, 2017

jreback commented Apr 13, 2017

BUG: Don't overflow PeriodIndex in to_csv #15984

BUG: Don't overflow PeriodIndex in to_csv #15984

Conversation

gfyoung commented Apr 12, 2017

Choose a reason for hiding this comment

gfyoung Apr 12, 2017 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Apr 12, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 12, 2017

gfyoung commented Apr 12, 2017

jreback commented Apr 12, 2017

gfyoung commented Apr 12, 2017

gfyoung commented Apr 13, 2017

Choose a reason for hiding this comment

gfyoung Apr 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 13, 2017

Choose a reason for hiding this comment

gfyoung Apr 13, 2017 • edited Loading

Choose a reason for hiding this comment

gfyoung commented Apr 13, 2017

gfyoung commented Apr 13, 2017

jreback commented Apr 13, 2017

gfyoung Apr 12, 2017 •

edited

Loading

codecov bot commented Apr 12, 2017 •

edited

Loading

gfyoung Apr 13, 2017 •

edited

Loading

gfyoung Apr 13, 2017 •

edited

Loading