Skip to content

BUG: df.apply handles np.timedelta64 as timestamp, should be timedelta #7778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stharrold opened this issue Jul 17, 2014 · 7 comments · Fixed by #7779
Closed

BUG: df.apply handles np.timedelta64 as timestamp, should be timedelta #7778

stharrold opened this issue Jul 17, 2014 · 7 comments · Fixed by #7779
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Timedelta Timedelta data type
Milestone

Comments

@stharrold
Copy link

I think there may be a bug with the row-wise handling of numpy.timedelta64 data types when using DataFrame.apply. As a check, the problem does not appear when using DataFrame.applymap. The problem may be related to #4532, but I'm unsure. I've included an example below.

This is only a minor problem for my use-case, which is cross-checking timestamps from a counter/timer card. I can easily work around the issue with DataFrame.itertuples etc.

Thank you for your time and for making such a useful package!

Example

Version

Import and check versions.

$ date
Thu Jul 17 16:28:38 CDT 2014
$ conda update pandas
Fetching package metadata: ..
# All requested packages already installed.
# packages in environment at /Users/harrold/anaconda:
#
pandas                    0.14.1               np18py27_0  
$ ipython
Python 2.7.8 |Anaconda 2.0.1 (x86_64)| (default, Jul  2 2014, 15:36:00) 
Type "copyright", "credits" or "license" for more information.

IPython 2.1.0 -- An enhanced Interactive Python.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from __future__ import print_function

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: pd.util.print_versions.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Darwin
OS-release: 11.4.2
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.1
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2014.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: 0.999
httplib2: 0.8
apiclient: 1.2
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None
Create test data

Using subset of original raw data as example.

In [5]: datetime_start = np.datetime64(u'2014-05-31T01:23:19.9600345Z')

In [6]: timedeltas_elapsed = [30053400, 40053249, 50053098]

Compute datetimes from elapsed timedeltas, then create differential timedeltas from datetimes. All elements are either type numpy.datetime64 or numpy.timedelta64.

In [7]: df = pd.DataFrame(dict(datetimes = timedeltas_elapsed))

In [8]: df = df.applymap(lambda elt: np.timedelta64(elt, 'us'))

In [9]: df = df.applymap(lambda elt: np.datetime64(datetime_start + elt))

In [10]: df['differential_timedeltas'] = df['datetimes'] - df['datetimes'].shift()

In [11]: print(df)
                      datetimes  differential_timedeltas
0 2014-05-31 01:23:50.013434500                      NaT
1 2014-05-31 01:24:00.013283500          00:00:09.999849
2 2014-05-31 01:24:10.013132500          00:00:09.999849
Expected behavior

With element-wise handling using DataFrame.applymap, all elements are correctly identified as datetimes (timestamps) or timedeltas.

In [12]: print(df.applymap(lambda elt: type(elt)))
                          datetimes     differential_timedeltas
0  <class 'pandas.tslib.Timestamp'>  <type 'numpy.timedelta64'>
1  <class 'pandas.tslib.Timestamp'>  <type 'numpy.timedelta64'>
2  <class 'pandas.tslib.Timestamp'>  <type 'numpy.timedelta64'>
Bug

With row-wise handling using DataFrame.apply, all elements are type pandas.tslib.Timestamp. I expected 'differential_timedeltas' to be type numpy.timedelta64 or another type of timedelta, not a type of datetime (timestamp).

In [13]: # For 'datetimes':

In [14]: print(df.apply(lambda row: type(row['datetimes']), axis=1))
0    <class 'pandas.tslib.Timestamp'>
1    <class 'pandas.tslib.Timestamp'>
2    <class 'pandas.tslib.Timestamp'>
dtype: object

In [15]: # For 'differential_timedeltas':

In [16]: print(df.apply(lambda row: type(row['differential_timedeltas']), axis=1))
0      <class 'pandas.tslib.NaTType'>
1    <class 'pandas.tslib.Timestamp'>
2    <class 'pandas.tslib.Timestamp'>
dtype: object
@jreback
Copy link
Contributor

jreback commented Jul 17, 2014

actually this is a sympton of a more insidius issue.

try df.values. This is a 'common-dtype' for the frame, unfortunately its wrong, it should be object, and NOT datetime64[ns]. So some logic messed up here: https://github.com/pydata/pandas/blob/master/pandas/core/internals.py#L3673

I think it needs a tiny tweek (and test cases of course!).

interested in a pull-request?

@jreback jreback added this to the 0.15.0 milestone Jul 17, 2014
@stharrold
Copy link
Author

Thanks for the fast response!

I'll give it a go, but as a newbie this may be over my head to do in a time-efficient manner. Guess I'll start reading through http://pandas.pydata.org/developers.html ...

I'll reply to this thread when I have some progress on the patch. Thanks again.

@cpcloud
Copy link
Member

cpcloud commented Jul 17, 2014

If you have questions along the way don't hesitate to ask, we don't bite

@jreback
Copy link
Contributor

jreback commented Jul 17, 2014

@stharrold turns out this was a bit non-trivial (and needed extra testing). fixed in #7779

timedelta == need_a_scalar_type! (though just using timedelta itself so its ok

@stharrold
Copy link
Author

@jreback Wow, that's quite a patch! Thank you! I'm glad I could help with the bug report.

@jreback
Copy link
Contributor

jreback commented Jul 18, 2014

@stharrold thanks....found a couple of other odd conversions at the same time

@jreback
Copy link
Contributor

jreback commented Jul 18, 2014

@stharrold thanks for the report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Timedelta Timedelta data type
Projects
None yet
3 participants