Skip to content

Bug in merging datetime64[ns, tz] dtypes #11405 #11410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 23, 2015

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Oct 22, 2015

closes #11405

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype labels Oct 22, 2015
@jreback jreback added this to the 0.17.1 milestone Oct 22, 2015
# dtypes = set()
upcast_classes = set()
null_upcast_classes = set()
upcast_classes = defaultdict(lambda: [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultdict(list) avoids the need for a lambda.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

jreback added a commit that referenced this pull request Oct 23, 2015
Bug in merging datetime64[ns, tz] dtypes #11405
@jreback jreback merged commit 1b2e45a into pandas-dev:master Oct 23, 2015
@dflatow
Copy link

dflatow commented Mar 24, 2017

@jreback Is this still an issue merging on multiple tz-aware columns?

pandas version:
0.18.1
A:

            timestamp_start             timestamp_end  data
0 2016-12-08 17:08:49+00:00 2016-12-08 17:08:55+00:00     1
1 2016-12-08 17:08:55+00:00 2016-12-08 17:09:04+00:00     1
2 2016-12-08 17:09:04+00:00 2016-12-08 17:09:06+00:00     1
3 2016-12-08 17:09:06+00:00 2016-12-08 17:09:11+00:00     1
4 2016-12-08 17:09:11+00:00 2016-12-08 17:09:13+00:00     1

A dtypes:

timestamp_start    datetime64[ns, UTC]
timestamp_end      datetime64[ns, UTC]
data                             int64
dtype: object

B:

        timestamp_end     timestamp_start
0 2016-12-08 17:18:30 2016-12-08 17:18:21
1 2016-12-08 18:50:39 2016-12-08 18:50:13
2 2016-12-08 19:20:58 2016-12-08 19:20:50
3 2016-12-08 22:24:38 2016-12-08 22:24:24
4 2016-12-08 22:26:22 2016-12-08 22:24:41
5 2016-12-08 22:28:49 2016-12-08 22:28:41
6 2016-12-08 22:53:01 2016-12-08 22:52:22
7 2016-12-08 22:57:14 2016-12-08 22:57:00

B dtypes:

timestamp_end      datetime64[ns]
timestamp_start    datetime64[ns]
dtype: object

A.merge(B, on=["timestamp_start", "timestamp_end"])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-119-18bced4bc0e5> in <module>()
      4 print "B:\n",B
      5 print "\nB dtypes:\n", B.dtypes
----> 6 A.merge(B, on=["timestamp_start", "timestamp_end"])

/venv/lib/python2.7/site-packages/pandas/core/frame.pyc in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator)
   4435                      right_on=right_on, left_index=left_index,
   4436                      right_index=right_index, sort=sort, suffixes=suffixes,
-> 4437                      copy=copy, indicator=indicator)
   4438 
   4439     def round(self, decimals=0, *args, **kwargs):

/venv/lib/python2.7/site-packages/pandas/tools/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator)
     37                          right_index=right_index, sort=sort, suffixes=suffixes,
     38                          copy=copy, indicator=indicator)
---> 39     return op.get_result()
     40 if __debug__:
     41     merge.__doc__ = _merge_doc % '\nleft : DataFrame'

/venv/lib/python2.7/site-packages/pandas/tools/merge.pyc in get_result(self)
    215                 self.left, self.right)
    216 
--> 217         join_index, left_indexer, right_indexer = self._get_join_info()
    218 
    219         ldata, rdata = self.left._data, self.right._data

/venv/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_join_info(self)
    351              right_indexer) = _get_join_indexers(self.left_join_keys,
    352                                                  self.right_join_keys,
--> 353                                                  sort=self.sort, how=self.how)
    354             if self.right_index:
    355                 if len(self.left) > 0:

/venv/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_join_indexers(left_keys, right_keys, sort, how)
    544 
    545     # get left & right join labels and num. of levels at each location
--> 546     llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
    547 
    548     # get flat i8 keys from label lists

TypeError: type object argument after * must be a sequence, not itertools.imap

@jreback
Copy link
Contributor Author

jreback commented Mar 24, 2017

that need a nicer error message, but why would you think that should work at all? you cannot compare a tz-aware type with a tz-naive type

@dflatow
Copy link

dflatow commented Mar 24, 2017 via email

@jreback
Copy link
Contributor Author

jreback commented Mar 24, 2017

@dflatow its actually a good catch on the error message :> see #15800 if you want to submit a fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.merge fails on datetime columns with tzinfo
3 participants