Skip to content

BUG: need better inference for path in Series construction (GH9456) #9924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,9 +166,9 @@ def __init__(self, data=None, index=None, dtype=None, name=None,
else:
index = Index(_try_sort(data))
try:
if isinstance(index, DatetimeIndex):
if isinstance(index, DatetimeIndex) and lib.infer_dtype(data) != 'datetime64':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see why this path is hit in the first place and can go from there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I added .value at line 171, lib.fast_multiget was throwing a TypeError. Now, the else clause must handles the conversion of datetime64 dicts because we cannot compare a Timestamp with a datetime64:

>>> d = {datetime.datetime(2015, 1, 7, 2, 0): 42544017.198965244}
>>> i = pandas.core.index.Index(d)
>>> i.astype('O').values
array([Timestamp('2015-01-07 02:00:00')], dtype=object)
>>> datetime.datetime(2015, 1, 7, 2, 0) == pandas.lib.Timestamp('2015-01-07 02:00:00')
True

>>> d = {numpy.datetime64('2015-01-06T19:00:00.000000000-0500'): 42544017.198965244}
>>> i = pandas.core.index.Index(d)
>>> i.astype('O').values
array([Timestamp('2015-01-07 00:00:00')], dtype=object)
>>> numpy.datetime64('2015-01-06T19:00:00.000000000-0500') == pandas.lib.Timestamp('2015-01-07 00:00:00')
False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not what I mean. put halt in there and see what test actually hits this. This is handling an edge case. Your fix, just confuses things. I think there is a more general soln.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the tests and reported the results in the issue: #9456.

Maybe a better fix would be to correctly handle the comparison between Timestamp and datetime64?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to have the least special cases as possible. You rarely actually want to convert a DatetimeIndex using .astype('O') this is the least desirable result (though it may have been done for some reason). But it is tricky because a user can pass Timestamp/datetime.datetime/datetime.date/np.datetime64 as elements and a non-index like (but that is index-like). Best best is to do _ensure_index which will convert list-like things to an Index (and possibly to a DatetimeIndex if its compat; if its not then all bets are off).

# coerce back to datetime objects for lookup
data = lib.fast_multiget(data, index.astype('O'),
data = lib.fast_multiget(data, index.astype('O').values,
default=np.nan)
elif isinstance(index, PeriodIndex):
data = [data.get(i, nan) for i in index]
Expand Down
29 changes: 29 additions & 0 deletions pandas/tests/test_series.py
Original file line number Diff line number Diff line change
Expand Up @@ -894,6 +894,35 @@ def test_constructor_dtype_datetime64(self):
dr = date_range('20130101',periods=3,tz='US/Eastern')
self.assertTrue(str(Series(dr).iloc[0].tz) == 'US/Eastern')

# GH 9456
d = {np.datetime64('2015-01-07T02:00:00.000000000+0200'): 4.2,
np.datetime64('2015-01-08T02:00:00.000000000+0200'): 4.0,
np.datetime64('2015-01-09T02:00:00.000000000+0200'): 3.9,
np.datetime64('2015-01-12T02:00:00.000000000+0200'): 3.5}
keys = list()
vals = list()
for k in sorted(d.keys()):
vals.append(d[k])
keys.append(k)
expected = Series(vals, keys)

s = Series(d)
assert_series_equal(s, expected)

d = {datetime(2015,1,7): 4.2,
datetime(2015,1,8): 4.0,
datetime(2015,1,9): 3.9,
datetime(2015,1,12): 3.5}
s = Series(d)
assert_series_equal(s, expected)

d = {datetime(2015,1,7): 4.2,
np.datetime64('2015-01-08T02:00:00.000000000+0200'): 4.0,
'20150109': 3.9,
np.datetime64('2015-01-12T02:00:00.000000000+0200'): 3.5}
s = Series(d)
assert_series_equal(s, expected)

def test_constructor_periodindex(self):
# GH7932
# converting a PeriodIndex when put in a Series
Expand Down