Skip to content

BUG: need better inference for path in Series construction (GH9456) #9924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

BUG: need better inference for path in Series construction (GH9456) #9924

wants to merge 4 commits into from

Conversation

patrickfournier
Copy link

closes #9456.

- dict with datetime64 keys now working
- when isinstance(index, DatetimeIndex), use lib.fast_multiget correctly
  to avoid raising a TypeError exception
@jreback jreback changed the title Issue 9456 BUG: need better inference for path in Series construction (GH94560 Apr 17, 2015
@sinhrks sinhrks added Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions labels May 1, 2015
@@ -166,9 +166,9 @@ def __init__(self, data=None, index=None, dtype=None, name=None,
else:
index = Index(_try_sort(data))
try:
if isinstance(index, DatetimeIndex):
if isinstance(index, DatetimeIndex) and lib.infer_dtype(data) != 'datetime64':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see why this path is hit in the first place and can go from there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I added .value at line 171, lib.fast_multiget was throwing a TypeError. Now, the else clause must handles the conversion of datetime64 dicts because we cannot compare a Timestamp with a datetime64:

>>> d = {datetime.datetime(2015, 1, 7, 2, 0): 42544017.198965244}
>>> i = pandas.core.index.Index(d)
>>> i.astype('O').values
array([Timestamp('2015-01-07 02:00:00')], dtype=object)
>>> datetime.datetime(2015, 1, 7, 2, 0) == pandas.lib.Timestamp('2015-01-07 02:00:00')
True

>>> d = {numpy.datetime64('2015-01-06T19:00:00.000000000-0500'): 42544017.198965244}
>>> i = pandas.core.index.Index(d)
>>> i.astype('O').values
array([Timestamp('2015-01-07 00:00:00')], dtype=object)
>>> numpy.datetime64('2015-01-06T19:00:00.000000000-0500') == pandas.lib.Timestamp('2015-01-07 00:00:00')
False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not what I mean. put halt in there and see what test actually hits this. This is handling an edge case. Your fix, just confuses things. I think there is a more general soln.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the tests and reported the results in the issue: #9456.

Maybe a better fix would be to correctly handle the comparison between Timestamp and datetime64?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to have the least special cases as possible. You rarely actually want to convert a DatetimeIndex using .astype('O') this is the least desirable result (though it may have been done for some reason). But it is tricky because a user can pass Timestamp/datetime.datetime/datetime.date/np.datetime64 as elements and a non-index like (but that is index-like). Best best is to do _ensure_index which will convert list-like things to an Index (and possibly to a DatetimeIndex if its compat; if its not then all bets are off).

@jreback jreback added this to the Next Major Release milestone May 29, 2015
- dict with datetime64 keys now working
- use lib.fast_multiget correctly to avoid unnecessary exceptions
@patrickfournier
Copy link
Author

I rewrote the fix to be more generic. If the type of Index elements is the same as the type of data elements, we use the Index to access and copy the dict value. If not, we use the original index. See comments in diff.

data = [data.get(i, nan) for i in index]
# lib.fast_multiget raises TypeError if type(data) != dict

if lib.infer_dtype(data) == lib.infer_dtype(index.values):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I try to avoid a call to np.array() by checking if we can access data values using index.values. However I checked the code for infer_dtype and it may be quite complex. Maybe we should just go straight to the else?

@patrickfournier patrickfournier changed the title BUG: need better inference for path in Series construction (GH94560 BUG: need better inference for path in Series construction (GH9456) Jun 3, 2015
@jreback
Copy link
Contributor

jreback commented Jun 4, 2015

this overlaps with #10269 (your PR attempts to fix Series the other DataFrame)

@jreback
Copy link
Contributor

jreback commented Jun 10, 2015

superseded by #10269

thanks!

@jreback jreback closed this Jun 10, 2015
@jreback jreback modified the milestones: 0.16.2, Next Major Release Jun 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: need better inference for path in Series construction
3 participants