Skip to content

BUG: nested dict construction timedelta #11129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chris-b1 opened this issue Sep 16, 2015 · 8 comments
Closed

BUG: nested dict construction timedelta #11129

chris-b1 opened this issue Sep 16, 2015 · 8 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timedelta Timedelta data type
Milestone

Comments

@chris-b1
Copy link
Contributor

Related to #10160 / PR #10269

In [2]: from datetime import timedelta
...: from pandas import * 
...: td_as_int = [1, 2, 3, 4]
...: 
...: def create_data(constructor):
...:     return dict((i, {constructor(s): 2*i}) for i, s in enumerate(td_as_int))
...: 
...: data_timedelta64 = create_data(lambda x: np.timedelta64(x, 'D'))
...: data_timedelta = create_data(lambda x: timedelta(days=x))
...: 
...: result_timedelta64 = DataFrame(data_timedelta64)
...: result_timedelta = DataFrame(data_timedelta)
...: 

In [3]: result_timedelta64
Out[3]: 
        0   1   2   3
1 days   0 NaN NaN NaN
2 days NaN   2 NaN NaN
3 days NaN NaN   4 NaN
4 days NaN NaN NaN   6

In [4]: result_timedelta
Out[4]: 
        0   1   2   3
1 days NaN NaN NaN NaN
2 days NaN NaN NaN NaN
3 days NaN NaN NaN NaN
4 days NaN NaN NaN NaN

This seems to happen because Timedelta doesn't have hash equality with datetime.timedelta It could easily be worked around here, but I was wondering if there's any reason not to make Timedelta work like Timestamp and match hashes if above a certain resolution?

@lmjohns3
Copy link

Are you sure you meant to reference #11115 (Interpolate erroneously fills NaNs at beginning of sequence when limit_direction == 'both') on this issue?

@chris-b1
Copy link
Contributor Author

Sorry, wrong issue. Edited above, thanks!

@jreback
Copy link
Contributor

jreback commented Sep 16, 2015

@chris-b1 yep this prob need just a change to core.common._maybe_box_datetimelike for datetime.datetime and datetime.timedelta in the isinstance

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions Timedelta Timedelta data type labels Sep 16, 2015
@jreback jreback added this to the Next Major Release milestone Sep 16, 2015
@chris-b1
Copy link
Contributor Author

FYI, this actually works for datetime.datetime because the hash matches pd.Timestamp (covered by tests in that PR), but I'll go ahead and change it as suggested.

@jreback
Copy link
Contributor

jreback commented Sep 17, 2015

what exactly is not matching for the hash of a Timedelta (prob don't have any tests for this).

@chris-b1
Copy link
Contributor Author

right now the Timedelta hash is just based on the nanosecond value, so it won't ever match a datetime.timedelta

In [42]: from datetime import timedelta, datetime

In [43]: d = {timedelta(days=1): 1, datetime(2013,1,1): 2}

In [44]: d[pd.Timestamp('2013-1-1')]
Out[44]: 2

In [45]: d[pd.Timedelta(1, 'D')]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-45-927af64de84f> in <module>()
----> 1 d[pd.Timedelta(1, 'D')]

KeyError: Timedelta('1 days 00:00:00')

I tried changing the hash definition as below, patterned after _Timestamp, which works, but wasn't sure about the perf impact of _ensure_components()

def __hash__(_Timedelta self):
    self._ensure_components()
    if self._ns > 0:
        return hash(self.value)
    else:
        return timedelta.__hash__(self)

@jreback
Copy link
Contributor

jreback commented Sep 17, 2015

hmm, _ensure_components can be cdef inline anyhow. its not called by python I don't think.

try that. (hash change looks ok). But I think we also need the change to _maybe_box_datetimelike

@jreback jreback modified the milestones: 0.17.0, Next Major Release Sep 17, 2015
@shoyer
Copy link
Member

shoyer commented Sep 17, 2015

We should definitely match the timedelta hash on our subclass. I would consider that also a bug.

chris-b1 added a commit to chris-b1/pandas that referenced this issue Sep 17, 2015
jreback added a commit that referenced this issue Sep 17, 2015
BUG: nested construction with timedelta #11129
yarikoptic added a commit to neurodebian/pandas that referenced this issue Oct 11, 2015
* commit 'v0.17.0rc1-92-gc6bcc99': (29 commits)
  CI: tests latest versions of openpyxl
  COMPAT: openpyxl >= 2.2 support, pandas-dev#10125
  Tests demonstrating how to use sqlalchemy.text() objects in read_sql()
  TST: Capture warnings in _check_plot_works
  COMPAT/BUG: color handling in scatter
  COMPAT: Support for matplotlib 1.5
  ERR/API: Raise NotImplementedError when Panel operator function is not implemented, pandas-dev#7692
  DOC: minor doc formatting fixes
  PERF: nested dict DataFrame construction
  DEPR: deprecate SparsePanel
  BLD: dateutil->python-dateutil in conda recipe
  BUG/API: GH11086 where freq is not inferred if both freq is None
  ENH: add merge indicator to DataFrame.merge
  PERF: improves performance in groupby.size
  BUG: DatetimeTZBlock.fillna raises TypeError
  PERF: infer_datetime_format without padding pandas-dev#11142
  PERF: improves performance in SeriesGroupBy.transform
  TST: Verify fix for buffer overflow in read_csv with engine='c' (GH pandas-dev#9735)
  DEPR: Series.is_timeseries
  BUG: nested construction with timedelta pandas-dev#11129
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

4 participants