Skip to content

Constructing a DatetimeArray with pd.array? #24656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jan 7, 2019 · 4 comments
Closed

Constructing a DatetimeArray with pd.array? #24656

jorisvandenbossche opened this issue Jan 7, 2019 · 4 comments
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Milestone

Comments

@jorisvandenbossche
Copy link
Member

I wanted to create a naive DatetimeArray with pd.array, but that doesn't seem very easy or even impossible:

Specyfing dtype='datetime64[ns]' gives a numpy backed PandasArray:

In [7]: pd.array(['2012-01-01', '2012-01-02'], dtype='datetime64[ns]')
Out[7]: 
<PandasArray>
[numpy.datetime64('2012-01-01T00:00:00.000000000'), numpy.datetime64('2012-01-02T00:00:00.000000000')]
Length: 2, dtype: datetime64[ns]

We also can't specify a custom dtype, as it needs a tz:

In [8]: pd.array(['2012-01-01', '2012-01-02'], dtype=pd.DatetimeTZDtype(tz=None))  
...
TypeError: A 'tz' is required.

Passing a DatetimeIndex also gives a PandasArray:

In [9]: pd.array(pd.date_range('2012-01-01', periods=3)) 
Out[9]: 
<PandasArray>
[numpy.datetime64('2012-01-01T00:00:00.000000000'),
 numpy.datetime64('2012-01-02T00:00:00.000000000'),
 numpy.datetime64('2012-01-03T00:00:00.000000000')]
Length: 3, dtype: datetime64[ns]

Should one of those give a DatetimeArray instead of PandasArray?

@jorisvandenbossche
Copy link
Member Author

Hmm, I see that passing datetime / Timestamp objects, then it is inferred to return a DatetimeArray:

In [10]: pd.array([pd.Timestamp('2012-01-01'), pd.Timestamp('2012-01-02')])  
Out[10]: 
<DatetimeArray>
['2012-01-01 00:00:00', '2012-01-02 00:00:00']
Length: 2, dtype: datetime64[ns]

(but that's not the most convenient way to quickly create such an array)

Also passing a datetime64[ns] numpy array gives a DatetimeArray:

In [13]: pd.array(np.array(['2012-01-01', '2012-01-02'], dtype='datetime64[ns]'))        
Out[13]: 
<DatetimeArray>
['2012-01-01 00:00:00', '2012-01-02 00:00:00']
Length: 2, dtype: datetime64[ns]

but when specifying that dtype inside pd.array it returns a PandasArray.

Reading the docstring of pd.array again this might be all according to the rules, but still, it feels a bit strange (or maybe at least the unpacking of the DatetimeIndex case)

cc @TomAugspurger

@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jan 7, 2019
@TomAugspurger
Copy link
Contributor

Agreed, both of those feel strange. The DTI one is probably an outright buggy.

The pd.array(['2012-01-01', '2012-01-02'], dtype='datetime64[ns]') one is as you say maybe following the "rules", but we might want to consider overriding NumPy when we have arrays that share string aliases with NumPy (I think just datetime64[ns] and timedelta64[ns]).

@jorisvandenbossche
Copy link
Member Author

but we might want to consider overriding NumPy when we have arrays that share string aliases with NumPy (I think just datetime64[ns] and timedelta64[ns]).

Yes, I would be +1 on that.
It somehow complicates the "rules", but it is also very strange that pd.array does not give the Array type that pandas actually uses under the hood for those dtypes.
(this signals a bit the problems of clashing dtypes / the fact that DatetimeArray still has a numpy dtype)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 7, 2019 via email

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 7, 2019
@jreback jreback added this to the 0.24.0 milestone Jan 8, 2019
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 8, 2019
TomAugspurger added a commit that referenced this issue Jan 8, 2019
* API: Datetime/TimedeltaArray from to_datetime

Closes #24656
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
* API: Datetime/TimedeltaArray from to_datetime

Closes pandas-dev#24656
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
* API: Datetime/TimedeltaArray from to_datetime

Closes pandas-dev#24656
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

3 participants