-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: nested dict DataFrame construction #11158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if oindex is None: | ||
oindex = index.astype('O') | ||
|
||
if isinstance(index, (DatetimeIndex, TimedeltaIndex)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember why we do this, but is PeriodIndex
needed as well? (and do we have tests for same)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to handle the case where the dict had datetime/timedelta64
s as the keys - they're boxed in creating the index, so need to be changed in the dict too so the lookups work. From this PR #10269.
AFAIK not needed for PeriodIndex
since there aren't any types that get coerced to it? This test covers nested construction with a PeriodIndex
.
https://github.com/pydata/pandas/blob/master/pandas/tseries/tests/test_period.py#L2078
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, can you move all of these types of tests to the same location (e.g. out of test_period, not sure where datetimeindex/timedeltaindex are) and prob into test_frame (though I hate adding to this, its really IS part of it).
@@ -1032,6 +1032,7 @@ Performance Improvements | |||
- Improved performance of ``to_datetime`` when specified format string is ISO8601 (:issue:`10178`) | |||
- 2x improvement of ``Series.value_counts`` for float dtype (:issue:`10821`) | |||
- Enable ``infer_datetime_format`` in ``to_datetime`` when date components do not have 0 padding (:issue:`11142`) | |||
- Regression from 0.16.1 in constructing ``DataFrame`` from nested dictionary (:issue:`11084`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add with dateimelikes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this affected any type
ping when green (and tests are moved) |
48c5884
to
fcb78dc
Compare
@jreback - moved the |
PERF: nested dict DataFrame construction
thanks@ |
Part of #11084
http://pydata.github.io/pandas/#frame_ctor.frame_ctor_nested_dict_int64.time_frame_ctor_nested_dict_int64
Two changes to get back to the old performance:
Timedeltas
/Timestamps
if not aDatetimeIndex
/TimedeltaIndex