Skip to content

PERF: nested dict DataFrame construction #11158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 21, 2015

Conversation

chris-b1
Copy link
Contributor

Part of #11084

http://pydata.github.io/pandas/#frame_ctor.frame_ctor_nested_dict_int64.time_frame_ctor_nested_dict_int64

    before     after       ratio
   [d28fd70 ] [48c588  ]
-  325.06ms    72.25ms      0.22  frame_ctor.frame_ctor_nested_dict.time_frame_ctor_nested_dict
-  364.35ms   106.67ms      0.29  frame_ctor.frame_ctor_nested_dict_int64.time_frame_ctor_nested_dict_int64

Two changes to get back to the old performance:

  • skip attempting to box Timedeltas/Timestamps if not a DatetimeIndex/TimedeltaIndex
  • only convert index to object once

if oindex is None:
oindex = index.astype('O')

if isinstance(index, (DatetimeIndex, TimedeltaIndex)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember why we do this, but is PeriodIndex needed as well? (and do we have tests for same)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to handle the case where the dict had datetime/timedelta64s as the keys - they're boxed in creating the index, so need to be changed in the dict too so the lookups work. From this PR #10269.

AFAIK not needed for PeriodIndex since there aren't any types that get coerced to it? This test covers nested construction with a PeriodIndex.
https://github.com/pydata/pandas/blob/master/pandas/tseries/tests/test_period.py#L2078

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, can you move all of these types of tests to the same location (e.g. out of test_period, not sure where datetimeindex/timedeltaindex are) and prob into test_frame (though I hate adding to this, its really IS part of it).

@jreback jreback mentioned this pull request Sep 21, 2015
13 tasks
@jreback jreback added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode Regression Functionality that used to work in a prior pandas version labels Sep 21, 2015
@jreback jreback added this to the 0.17.0 milestone Sep 21, 2015
@@ -1032,6 +1032,7 @@ Performance Improvements
- Improved performance of ``to_datetime`` when specified format string is ISO8601 (:issue:`10178`)
- 2x improvement of ``Series.value_counts`` for float dtype (:issue:`10821`)
- Enable ``infer_datetime_format`` in ``to_datetime`` when date components do not have 0 padding (:issue:`11142`)
- Regression from 0.16.1 in constructing ``DataFrame`` from nested dictionary (:issue:`11084`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add with dateimelikes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this affected any type

@jreback
Copy link
Contributor

jreback commented Sep 21, 2015

ping when green (and tests are moved)

@chris-b1
Copy link
Contributor Author

@jreback - moved the Period test as suggested, travis is green. Thanks.

jreback added a commit that referenced this pull request Sep 21, 2015
PERF: nested dict DataFrame construction
@jreback jreback merged commit 581d5be into pandas-dev:master Sep 21, 2015
@jreback
Copy link
Contributor

jreback commented Sep 21, 2015

thanks@

@chris-b1 chris-b1 deleted the frame-nested-dict branch September 24, 2015 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants