Skip to content

Incorrect handling of datetime64 values in structured arrays #2095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wesm opened this issue Oct 20, 2012 · 8 comments
Closed

Incorrect handling of datetime64 values in structured arrays #2095

wesm opened this issue Oct 20, 2012 · 8 comments
Labels
Milestone

Comments

@wesm
Copy link
Member

wesm commented Oct 20, 2012

see:

arr = np.array([ (datetime.datetime(2012, 9, 9, 0, 0), datetime.datetime(2012, 9, 8, 15, 10))],
dtype=[('Date', '<M8[us]'), ('Forecasting', '<M8[us]')])
DataFrame(arr)

cf http://stackoverflow.com/questions/12369546/pandas-parsing-datetime-column-from-sqlite-database

@pag
Copy link

pag commented Oct 30, 2012

Similarly with timedelta64:

>>> np.array([1100, 20], dtype='timedelta64[s]')
 36 array([0:18:20, 0:00:20], dtype=timedelta64[s])
>>> print pd.DataFrame({'x': np.array([1100, 20], dtype='timedelta64[s]')}).to_string()
      x
0  1100
1    20

@pag
Copy link

pag commented Oct 30, 2012

It's even worse with timedelta64 because I can find no way to convince DataFrame to just let it through unmolested. Is there any way to do this? I've tried

>>> orig
 74 array([0:18:20, 0:00:20], dtype=timedelta64[s])
>>> df['x']
 75 
0    1100
1      20
Name: x
>>> df['x'] = orig
>>> df['x']
 77 
0    1100
1      20
Name: x
>>> df['x'] = df['x'].astype('timedelta64[s]')
>>> df['x']
 79 
0    1100
1      20
Name: x

@wesm
Copy link
Member Author

wesm commented Oct 30, 2012

DataFrame's preference is to coerce datetime64 values to nanoseconds. I have no test coverage at all for timedelta64-- where is this data originating for you? I will do what I can

@wesm wesm closed this as completed in 6f02df9 Nov 1, 2012
@pag
Copy link

pag commented Nov 1, 2012

The timedelta64s are coming from a database of offsets (KDB times; they can be negative, so they're best represented as time deltas in some cases).

Thanks for the quick patch. Is it harder to leave the timedelta64 type information as it was passed in than convert to nanoseconds? If I pass in datetime64[D] I want to maintain the distinction between points in time and symbolic dates. Pandas should respect that if possible.

@wesm
Copy link
Member Author

wesm commented Nov 1, 2012

Handling multiple datetime64/timedelta64 units is pretty difficult right now. I do plan to change that soon (there will be more "type management work" in 0.10, not soon-to-be-released 0.9.1 though)-- I'm going to look at the timedelta64 issue separately.

@pag
Copy link

pag commented Nov 1, 2012

OK, thanks very much

@pag
Copy link

pag commented Nov 30, 2012

Is there a ticket I can watch for the improved datetime handling? I've searched through but can't find anything similar.

@wesm
Copy link
Member Author

wesm commented Dec 6, 2012

You should open one (or more) for whatever features you're looking for exactly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants