Skip to content

ENH: to_csv() date formatting #4313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 12, 2013
Merged

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Jul 22, 2013

This commit adds support for formatting datetime object output from to_csv()
closes #2583

In [3]: spx = DataReader('^GSPC', data_source='yahoo')

In [4]: spx.head()
Out[4]: 
               Open     High      Low    Close      Volume  Adj Close
Date                                                                 
2010-01-04  1116.56  1133.87  1116.56  1132.99  3991400000    1132.99
2010-01-05  1132.66  1136.63  1129.66  1136.52  2491020000    1136.52
2010-01-06  1135.71  1139.19  1133.95  1137.14  4972660000    1137.14
2010-01-07  1136.27  1142.46  1131.32  1141.69  5270680000    1141.69
2010-01-08  1140.52  1145.39  1136.22  1144.98  4389590000    1144.98

In [5]: spx.to_csv('spx_temp.csv', date_format='%Y%m%d')

In [6]: !head spx_temp.csv
Date,Open,High,Low,Close,Volume,Adj Close
20100104,1116.56,1133.87,1116.56,1132.99,3991400000,1132.99
20100105,1132.66,1136.63,1129.66,1136.52,2491020000,1136.52
20100106,1135.71,1139.19,1133.95,1137.14,4972660000,1137.14
20100107,1136.27,1142.46,1131.32,1141.69,5270680000,1141.69
20100108,1140.52,1145.39,1136.22,1144.98,4389590000,1144.98
20100111,1145.96,1149.74,1142.02,1146.98,4255780000,1146.98
20100112,1143.81,1143.81,1131.77,1136.22,4716160000,1136.22
20100113,1137.31,1148.4,1133.18,1145.68,4170360000,1145.68
20100114,1145.68,1150.41,1143.8,1148.46,3915200000,1148.46

The date_format= keyword will be applied to every element of a DatetimeIndex (index or columns) and DatetimeBlock (values). It works for both the Python engine and the new Cython engine:

In [7]: datetimes = DataFrame({spx.index[0]: spx.index}, index=spx.index).head()

In [8]: datetimes
Out[8]: 
                    2010-01-04
Date                          
2010-01-04 2010-01-04 00:00:00
2010-01-05 2010-01-05 00:00:00
2010-01-06 2010-01-06 00:00:00
2010-01-07 2010-01-07 00:00:00
2010-01-08 2010-01-08 00:00:00

In [9]: datetimes.to_csv('datetimes_temp.csv', date_format='%m/%d/%Y')

In [10]: !head datetimes_temp.csv
Date,01/04/2010
01/04/2010,01/04/2010
01/05/2010,01/05/2010
01/06/2010,01/06/2010
01/07/2010,01/07/2010
01/08/2010,01/08/2010

In [11]: datetimes.to_csv('datetimes_temp.csv', date_format='%m/%d/%Y', engine='python')

In [12]: !head datetimes_temp.csvDate,01/04/2010
01/04/2010,01/04/2010
01/05/2010,01/05/2010
01/06/2010,01/06/2010
01/07/2010,01/07/2010
01/08/2010,01/08/2010

Let me know if there are any questions or issues.

@qwhelan
Copy link
Contributor Author

qwhelan commented Jul 22, 2013

@jreback @cpcloud Any chance this could make it in before the v0.12 release?

Also, I forgot to mention above that this doesn't handle MultiIndexes. I don't see a clean way to do it, but I can revisit it if someone requests it.

@cpcloud
Copy link
Member

cpcloud commented Jul 22, 2013

I think only bug fixes for now. Really trying to get v0.12 out ASAP!

@qwhelan
Copy link
Contributor Author

qwhelan commented Jul 22, 2013

Alright, just wanted to check.

Thanks.

@cpcloud
Copy link
Member

cpcloud commented Jul 22, 2013

Btw Thanks for the Pr!

@jreback
Copy link
Contributor

jreback commented Jul 22, 2013

@qwhelan I think you need to either raise or warn if the stringified datetimes are not == to the current

otherwise its pretty easy to chop say datetimes to dates (which is fine, except it should be done explicity by resetting the index, rather than a typo/incorrect format)

# should be in core/index.py
def hastimes(self):
    return not (set(self.times) == set([time(0,0)]))
in core/internals/DatetimeBlock
if.date_format is not None:

   # values might be a series here (e.g. a column from a frame)
   # need to convert to an index to test this
   if values.hastimes():

        # test converted == values, maybe by sampling
        # or can do:   (values == pd.to_datetime(converted)).all()

        # if not == raise/warn

@qwhelan
Copy link
Contributor Author

qwhelan commented Jul 24, 2013

@jreback I don't see hastimes() or anything with the same functionality. I'll add unless there's another function I should be using (and probably put it in tseries/tindex.py).

@jreback
Copy link
Contributor

jreback commented Jul 24, 2013

yep have to add it (that's why I put it out there!)

@jreback
Copy link
Contributor

jreback commented Aug 23, 2013

@qwhelan can you rebase to current?

this looks pretty good otherwise

@cpcloud cpcloud mentioned this pull request Aug 26, 2013
5 tasks
@jreback
Copy link
Contributor

jreback commented Sep 24, 2013

@qwhelan this somehow got lost....can you rebase to current master.....can get this in for 0.13...thxs

@jreback
Copy link
Contributor

jreback commented Sep 24, 2013

also...need to make sure this handles NaT as well (e.g. use in your sample tests)

@jtratner
Copy link
Contributor

perf test?

@qwhelan
Copy link
Contributor Author

qwhelan commented Sep 24, 2013

Sorry for neglecting this. I'll make the changes later this week.

@qwhelan
Copy link
Contributor Author

qwhelan commented Oct 4, 2013

Added a perf test and tested/handled NaTs.

I was caught in rebase hell for the last week, but it should be ready to go unless there are additional requests.

@jreback
Copy link
Contributor

jreback commented Oct 4, 2013

move example to 0.13.0 (from 0.12)

rvalues.flat[imask] = np.array(
[Timestamp(val)._repr_base for val in values.ravel()[imask]], dtype=object)

if self.dtype == 'datetime64[ns]':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changed in master, (as the timedelta stuff was moved); pls start with the current and don't add (which prob happened in a rebase)

@jreback
Copy link
Contributor

jreback commented Oct 7, 2013

@qwhelan almost there....just need those 2 changes!

@qwhelan
Copy link
Contributor Author

qwhelan commented Oct 8, 2013

@jreback Just pushed the changes. Let me know if you're looking for something different.

@jreback
Copy link
Contributor

jreback commented Oct 8, 2013

@qwhelan can you move the v0.12.0 announcement to v0.13.0?
change the test that uses NaT to include some dates as well as NaT's

@qwhelan
Copy link
Contributor Author

qwhelan commented Oct 9, 2013

@jreback Made those changes.

@@ -1001,7 +1017,15 @@ def _helper_csv(self, writer, na_rep=None, cols=None,
if float_format is not None and com.is_float(val):
val = float_format % val
elif isinstance(val, np.datetime64):
val = lib.Timestamp(val)._repr_base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can all be collapsed down, something like:

if date_format is None:
        date_formatter = lambda x: x._repr_base
else 
        date_formatter = lambda x: x.strftime(date_format) if notnull(x)

if isinstance(val, (np.datetime64, datetime.datetime)):
       val = date_formatter(lib.Timestamp(val))

@jreback
Copy link
Contributor

jreback commented Oct 11, 2013

@qwhelan can you update to my comments....?

DOC: add date_format to release notes
@qwhelan
Copy link
Contributor Author

qwhelan commented Oct 12, 2013

@jreback All done. Thanks for the suggestions.

@jreback jreback merged commit ce669d6 into pandas-dev:master Oct 12, 2013
@jreback
Copy link
Contributor

jreback commented Oct 12, 2013

thanks...merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Date formatting option in to_csv?
4 participants