-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: need better inference for path in Series construction (GH9456) #9924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- dict with datetime64 keys now working - when isinstance(index, DatetimeIndex), use lib.fast_multiget correctly to avoid raising a TypeError exception
@@ -166,9 +166,9 @@ def __init__(self, data=None, index=None, dtype=None, name=None, | |||
else: | |||
index = Index(_try_sort(data)) | |||
try: | |||
if isinstance(index, DatetimeIndex): | |||
if isinstance(index, DatetimeIndex) and lib.infer_dtype(data) != 'datetime64': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see why this path is hit in the first place and can go from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I added .value
at line 171, lib.fast_multiget
was throwing a TypeError
. Now, the else
clause must handles the conversion of datetime64 dicts because we cannot compare a Timestamp
with a datetime64
:
>>> d = {datetime.datetime(2015, 1, 7, 2, 0): 42544017.198965244}
>>> i = pandas.core.index.Index(d)
>>> i.astype('O').values
array([Timestamp('2015-01-07 02:00:00')], dtype=object)
>>> datetime.datetime(2015, 1, 7, 2, 0) == pandas.lib.Timestamp('2015-01-07 02:00:00')
True
>>> d = {numpy.datetime64('2015-01-06T19:00:00.000000000-0500'): 42544017.198965244}
>>> i = pandas.core.index.Index(d)
>>> i.astype('O').values
array([Timestamp('2015-01-07 00:00:00')], dtype=object)
>>> numpy.datetime64('2015-01-06T19:00:00.000000000-0500') == pandas.lib.Timestamp('2015-01-07 00:00:00')
False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not what I mean. put halt in there and see what test actually hits this. This is handling an edge case. Your fix, just confuses things. I think there is a more general soln.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the tests and reported the results in the issue: #9456.
Maybe a better fix would be to correctly handle the comparison between Timestamp and datetime64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to have the least special cases as possible. You rarely actually want to convert a DatetimeIndex
using .astype('O')
this is the least desirable result (though it may have been done for some reason). But it is tricky because a user can pass Timestamp/datetime.datetime/datetime.date/np.datetime64
as elements and a non-index like (but that is index-like). Best best is to do _ensure_index
which will convert list-like things to an Index
(and possibly to a DatetimeIndex
if its compat; if its not then all bets are off).
- dict with datetime64 keys now working - use lib.fast_multiget correctly to avoid unnecessary exceptions
I rewrote the fix to be more generic. If the type of Index elements is the same as the type of data elements, we use the Index to access and copy the dict value. If not, we use the original index. See comments in diff. |
data = [data.get(i, nan) for i in index] | ||
# lib.fast_multiget raises TypeError if type(data) != dict | ||
|
||
if lib.infer_dtype(data) == lib.infer_dtype(index.values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I try to avoid a call to np.array()
by checking if we can access data
values using index.values
. However I checked the code for infer_dtype
and it may be quite complex. Maybe we should just go straight to the else
?
this overlaps with #10269 (your PR attempts to fix Series the other DataFrame) |
superseded by #10269 thanks! |
closes #9456.