-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Cythonize from_nested_dict
#33485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
6d44e55
78ae5ab
61c841d
1c7ffbb
94977b2
bcb25b9
b3ebae6
7dfbca8
c8e515f
4829f78
5faa02b
d349b4f
4f8afed
babaf64
e22fbda
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
import collections | ||
from collections import abc | ||
from decimal import Decimal | ||
from fractions import Fraction | ||
|
@@ -2526,3 +2527,21 @@ def fast_multiget(dict mapping, ndarray keys, default=np.nan): | |
output[i] = default | ||
|
||
return maybe_convert_objects(output) | ||
|
||
|
||
@cython.wraparound(False) | ||
@cython.boundscheck(False) | ||
def from_nested_dict(object data) -> dict: | ||
cdef: | ||
object new_data = collections.defaultdict(dict) | ||
object index, column, value, dict_iterator | ||
dict data_dct, nested_dict | ||
|
||
data_dct = dict(data) | ||
|
||
for index, dict_iterator in data_dct.items(): | ||
nested_dict = dict(dict_iterator) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If dict_iterator is a Series, this conversion will be slow. Probably best to just accept the user's choice IMO and accept a slowdown if dict_iterator is not a dict... |
||
for column, value in nested_dict.items(): | ||
new_data[column][index] = value | ||
|
||
return dict(new_data) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1266,7 +1266,7 @@ def from_dict(cls, data, orient="columns", dtype=None, columns=None) -> "DataFra | |
if len(data) > 0: | ||
# TODO speed up Series case | ||
if isinstance(list(data.values())[0], (Series, dict)): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not your PR, but this line above is potentially very expensive. Can you change it to: first_val = next(iter((data.values())), None)
if isinstance(first_val, (Series, dict)): to avoid creating that list. Does this make a difference in your ASVs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! the ASV results are even better:
|
||
data = _from_nested_dict(data) | ||
data = lib.from_nested_dict(data) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you change the above to data = dict(data) if not type(data) is dict else data # convert OrderedDict
data = lib.from_nested_dict(data) you can change the function interface in |
||
else: | ||
data, index = list(data.values()), list(data.keys()) | ||
elif orient == "columns": | ||
|
@@ -8817,12 +8817,3 @@ def isin(self, values) -> "DataFrame": | |
|
||
ops.add_flex_arithmetic_methods(DataFrame) | ||
ops.add_special_arithmetic_methods(DataFrame) | ||
|
||
|
||
def _from_nested_dict(data): | ||
# TODO: this should be seriously cythonized | ||
new_data = collections.defaultdict(dict) | ||
for index, s in data.items(): | ||
for col, v in s.items(): | ||
new_data[col][index] = v | ||
return new_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you set
new_data
to typedict new_data
and make related changes below (usedict.setdefault
etc.)? I think that should make fewer calls into Python, making this faster.