Skip to content

ENH: convert datetime components (year, month, day, ...) in different columns to datetime #8158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Sep 2, 2014 · 4 comments
Labels
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Sep 2, 2014

from SO

I didn't find an issue about this, but it has come up some times at stackoverflow: having columns with integers for year, month, day, hour, ..., how do you convert this to a datetime column/index ?

http://stackoverflow.com/questions/19350806/how-to-convert-columns-into-one-datetime-column-in-pandas

You have the typical solution of adding the columns: pd.to_datetime((df['Y']*10000 + df['M']*100 + df['D']).astype('int'), format='%Y%m%d'), and @unutbu added now a faster solution using numpy's different datetime64 resolutions to that question on SO.

I personally think this would be a nice addition to pandas to have a more native solution for this. But then we need to figure out a nice API. Or we keep it as is, but try to document it more (add as example to docs?)

@jreback
Copy link
Contributor

jreback commented Sep 2, 2014

I think we tried to doc this some where (or cookbook - don't remember offhand)

how about

pd.to_datetime(DataFrame,
format={'Y' : column_year,
'm' : column_month,
'd' : column_day})

format could also be list of these columns I guess as well (this is more powerful though; a list would have to be unambiguously Ymd)

@unutbu
Copy link
Contributor

unutbu commented Sep 2, 2014

How about adding a helper function like

def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
              seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
    years = np.asarray(years) - 1970
    months = np.asarray(months) - 1
    days = np.asarray(days) - 1
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
             '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
    vals = (years, months, days, weeks, hours, minutes, seconds,
            milliseconds, microseconds, nanoseconds)
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
               if v is not None)

It would be easier than adding the functionality to pd.to_datetime, and people could apply it wherever they wish. Note that combine64 returns a NumPy array of dtype datetime64[*]. It can be passed to pd.to_datetime to create a DatetimeIndex:

In [196]: combine64(df['Y'], df['M'])
Out[196]: array(['1990-01-01', '1991-02-01', '1992-03-01', '1993-04-01'], dtype='datetime64[D]')

In [197]: pd.to_datetime(combine64(df['Y'], df['M']))
Out[197]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[1990-01-01, ..., 1993-04-01]
Length: 4, Freq: None, Timezone: None

@shoyer
Copy link
Member

shoyer commented Sep 4, 2014

I like @unutbu's combine64 helper function. It seems cleaner than shoving support for format dictionaries into to_datetime.

A couple of thoughts:

  1. The name combine64 is too ambiguous. How about combine_datetime instead?
  2. I think it should return a DatetimeIndex directly, since this is the only fully functional datetime array we have access to in pandas.

@jreback jreback added this to the Next Major Release milestone Oct 20, 2015
@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

@unutbu want to do a PR for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants