DataFrame.values not a 2D-array when constructed from timezone-aware datetimes #13407

aburgm · 2016-06-09T00:03:39Z

When a DataFrame column is constructed from timezone-aware datetime objects, its values attribute returns a pandas.DatetimeIndex instead of a 2D numpy array. This is problematic because the datetime index does not support all operations that a numpy array does.

Code Sample, a copy-pastable example if possible

import datetime
import dateutil
import pandas
import numpy as np
df = pandas.DataFrame()
df['Time'] = [datetime.datetime(2015,1,1,tzinfo=dateutil.tz.tzutc())]
df.dropna(axis=0) # raises ValueError: 'axis' entry is out of bounds

Also, print df.values returns DatetimeIndex(['2015-01-01'], dtype='datetime64[ns, UTC]', freq=None).

Expected Output

The df.dropna call should be a no-op.

Compare this to the case when constructed using df['Time'] = [datetime.datetime(2015,1,1)]. In that case, df.dropna works as expected, and df.values is array([['2014-12-31T16:00:00.000000000-0800']], dtype='datetime64[ns]').

output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: None
pip: 8.0.2
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.1
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-06-09T00:08:08Z

This must be shortcutting on the 1-d case in lcd_types. If you have an additional column it will work.

In [10]: df['foo'] = 'bar'

In [11]: df.values
Out[11]: array([[Timestamp('2015-01-01 00:00:00+0000', tz='UTC'), 'bar']], dtype=object)

Note that using .values is pretty inefficient, nor does numpy support these extended dtypes.

jreback · 2016-06-09T00:09:56Z

This statement is non-sensical, as numpy barley understands timezones. And an Index API is a super-set of 1-d numpy operations.

This is problematic because the datetime index does not support all operations that a numpy array does.

aburgm · 2016-06-09T02:39:20Z

Agreed; I did not phrase that properly. What I meant is that some operations on a pandas dataframe, such as dropna along the 0-axis, fail, apparently because the index is a 1-D structure where a 2-D structure was expected.

npdoty · 2018-02-11T22:56:06Z

Spent a long time debugging this today. It is very surprising to have dropna throw an AxisError exception in numpy when the column happens to be datetime64[ns, UTC] but not when it's datetime64[ns].

As a workaround, I believe users can use df[df.column_name.notnull()] (as described on StackOverflow) instead of dropna(subset=['column name']).

… (pandas-dev#20422)

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions Timezones Timezone data dtype labels Jun 9, 2016

jreback added this to the Next Major Release milestone Jun 9, 2016

jreback added Difficulty Intermediate labels Jun 9, 2016

jreback mentioned this issue Nov 19, 2017

BUG: DataFrame created with tzinfo cannot use to_dict(orient="records") #18372

Closed

chris-b1 mentioned this issue Jan 9, 2018

DataFrame() returns an empty DataFrame, if DatetimeIndex with timezone-info and column label are passed #19157

Closed

jreback mentioned this issue Jan 21, 2018

BUG: DatetimeIndex(tz) & single column name, return empty df (GH19157) #19330

Merged

4 tasks

JQGoh added a commit to JQGoh/pandas that referenced this issue Mar 20, 2018

BUG: dropna() on single column timezone-aware values (pandas-dev#13407)

510f6cc

JQGoh mentioned this issue Mar 20, 2018

BUG: dropna() on single column timezone-aware values (#13407) #20422

Merged

4 tasks

JQGoh added a commit to JQGoh/pandas that referenced this issue Mar 23, 2018

BUG: dropna() on single column timezone-aware values (pandas-dev#13407)

cbe3c81

JQGoh added a commit to JQGoh/pandas that referenced this issue Mar 23, 2018

BUG: dropna() on single column timezone-aware values (pandas-dev#13407)

539e2c7

JQGoh added a commit to JQGoh/pandas that referenced this issue Mar 23, 2018

BUG: dropna() on single column timezone-aware values (pandas-dev#13407)

5445316

JQGoh added a commit to JQGoh/pandas that referenced this issue Mar 23, 2018

BUG: dropna() on single column timezone-aware values (pandas-dev#13407)

4fee776

JQGoh added a commit to JQGoh/pandas that referenced this issue Mar 25, 2018

BUG: dropna() on single column timezone-aware values (pandas-dev#13407)

9626413

jreback modified the milestones: Next Major Release, 0.23.0 Mar 25, 2018

JQGoh added a commit to JQGoh/pandas that referenced this issue Mar 25, 2018

BUG: dropna() on single column timezone-aware values (pandas-dev#13407)

f995d93

jreback closed this as completed in #20422 Mar 25, 2018

jreback pushed a commit that referenced this issue Mar 25, 2018

BUG: dropna() on single column timezone-aware values (#13407) (#20422)

add37ac

javadnoorb pushed a commit to javadnoorb/pandas that referenced this issue Mar 29, 2018

BUG: dropna() on single column timezone-aware values (pandas-dev#13407)…

daa3b33

… (pandas-dev#20422)

dworvos pushed a commit to dworvos/pandas that referenced this issue Apr 2, 2018

BUG: dropna() on single column timezone-aware values (pandas-dev#13407)…

993cb03

… (pandas-dev#20422)

kornilova203 pushed a commit to kornilova203/pandas that referenced this issue Apr 23, 2018

BUG: dropna() on single column timezone-aware values (pandas-dev#13407)…

b0ac102

… (pandas-dev#20422)

TomAugspurger mentioned this issue Dec 12, 2018

REF: DatetimeLikeArray #24024

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.values not a 2D-array when constructed from timezone-aware datetimes #13407

DataFrame.values not a 2D-array when constructed from timezone-aware datetimes #13407

aburgm commented Jun 9, 2016

jreback commented Jun 9, 2016

jreback commented Jun 9, 2016

aburgm commented Jun 9, 2016 •

edited

Loading

npdoty commented Feb 11, 2018

DataFrame.values not a 2D-array when constructed from timezone-aware datetimes #13407

DataFrame.values not a 2D-array when constructed from timezone-aware datetimes #13407

Comments

aburgm commented Jun 9, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

jreback commented Jun 9, 2016

jreback commented Jun 9, 2016

aburgm commented Jun 9, 2016 • edited Loading

npdoty commented Feb 11, 2018

output of `pd.show_versions()`

aburgm commented Jun 9, 2016 •

edited

Loading