You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importpandasaspdimportdatetime# setup a dataframe with some dates indf_dates=pd.DataFrame({
'a': [datetime.datetime(2010, 1, i) foriinrange(2, 7)],
'b': [datetime.datetime(2010, 1, i) foriinrange(1, 6)]
})
# set first date in first column to NaTdf_dates.a[0] =pd.np.nanIn [13]: df_dates.dtypesOut[13]:
adatetime64[ns]
bdatetime64[ns]
dtype: object# pandas is happy to take the minimum as I'd like, even dealing with NaTIn [14]: df_dates.min(axis=1)
Out[14]:
02010-01-0112010-01-0222010-01-0332010-01-0442010-01-05dtype: datetime64[ns]
In [15]: df_dates.min(axis=0)
Out[15]:
a2010-01-03b2010-01-01dtype: datetime64[ns]
# now set the datetimes to UTCforcolindf_dates.columns:
df_dates[col] =df_dates[col].dt.tz_localize('utc')
# now pandas doesn't seem to be able to deal with NaT# No minimum for first axisIn [19]: df_dates.min(axis=0)
Out[19]: Series([], dtype: float64)
# NaNs everywhere for second axisIn [20]: df_dates.min(axis=1)
Out[20]:
0NaN1NaN2NaN3NaN4NaNdtype: float64# skipna kwarg doesn't seem to do anythingIn [21]: df_dates.min(axis=1, skipna=True)
Out[21]:
0NaN1NaN2NaN3NaN4NaNdtype: float64
Problem description
I'd like to be able to take the minimum on timezone aware datetimes dataframes. Found this issue when a NaT was hidden in there. It works fine when there are no NaTs but the behaviour seems to change as above for timezone aware datetimes. The minimum method should output the same for both timezone aware and naive data? This seems to be an issue to me. I haven't tried .max()/.mean()/... etc.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 7
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
Code Sample
Problem description
I'd like to be able to take the minimum on timezone aware datetimes dataframes. Found this issue when a NaT was hidden in there. It works fine when there are no NaTs but the behaviour seems to change as above for timezone aware datetimes. The minimum method should output the same for both timezone aware and naive data? This seems to be an issue to me. I haven't tried .max()/.mean()/... etc.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 7
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.1
numpy : 1.16.5
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.2.0
Cython : 0.29.13
pytest : 5.2.0
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 0.14.0
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.8
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.1
The text was updated successfully, but these errors were encountered: