-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Provide dict object for to_dict() #16122 #16220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
38fa22b
3070fa3
ccc33dd
73acea0
67c57e8
546816b
086c598
d6c0deb
f297ee8
8469977
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,8 @@ | |
import warnings | ||
from datetime import datetime, timedelta | ||
from functools import partial | ||
import inspect | ||
import collections | ||
|
||
import numpy as np | ||
from pandas._libs import lib, tslib | ||
|
@@ -479,6 +481,43 @@ def _dict_compat(d): | |
for key, value in iteritems(d)) | ||
|
||
|
||
def _standardize_mapping(into): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. doesn't need to be leading underscore, this is a private module. do you have a more informative name? (longer is ok). |
||
""" | ||
Helper function to standardize the supplied mapping so it can | ||
be passed to the ``Series.to_dict()`` and ``DataFrame.to_dict()`` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a versionadded |
||
Parameters | ||
---------- | ||
into : instance or subclass of collections.Mapping | ||
The argument supplied to ``to_dict``. Must be a class, an | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove references to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, though you can add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed the references and added a |
||
initialized collections.defaultdict, or an empty instance | ||
of a collections.Mapping subclass. | ||
|
||
Returns | ||
------- | ||
mapping : a collections.Mapping subclass or other constructor | ||
a callable object that can accept an iterator to create | ||
the desired Mapping. | ||
|
||
""" | ||
if not inspect.isclass(into): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if its not a class, then you just look at the class itself, it doesn't matter if its empty. this should return a class only. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason I wrote it this way is because I imagined someone might pass a pre-populated mapping to this function and expect the series or frame to be appended to what was already in the passed mapping. The error was to make it explicit that this would not happen. Also happy to simplify this function a little - I'll remove this logic if you don't think its worth checking an empty mapping. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you are defining the interface here; this is an internal function, so it should do one thing and ideally have a limited input set. This is just determining if its a valid mapping, no need to get complicated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I would remove this check. This function should just be to ensure that we have an instance that's satisfies the Mapping interface. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This all makes sense - I've reworked the function. It now only returns a class or a partial (if a |
||
if len(into) > 0: | ||
raise ValueError( | ||
"to_dict() only accepts empty mappings.") | ||
elif type(into) == collections.defaultdict: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use isinstance |
||
return partial( | ||
collections.defaultdict, into.default_factory) | ||
else: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. huh? what is this trying to do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the For the last If you think a different structure works better, or it should be broken up into multiple functions, let me know. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok I see. the problem is you need to accept instances to deal with defaultdict, so you can now be passed an instance of defaultdict OR a class of something else (or even a class of defaultdict which is an error). hmm. I would right off the bat detect an instance of a default dict, else only allow a mapping ABC (ex-defaultdict). Then I think you are good, no? (IOW else you would raise). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't need the else (just return the partial) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
return _standardize_mapping(type(into)) | ||
elif not issubclass(into, collections.Mapping): | ||
raise TypeError('unsupported type: {}'.format(into)) | ||
elif into == collections.defaultdict: | ||
raise TypeError( | ||
'to_dict() only accepts initialized defaultdicts') | ||
else: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
return into | ||
|
||
|
||
def sentinel_factory(): | ||
class Sentinel(object): | ||
pass | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -63,7 +63,8 @@ | |
_default_index, | ||
_values_from_object, | ||
_maybe_box_datetimelike, | ||
_dict_compat) | ||
_dict_compat, | ||
_standardize_mapping) | ||
from pandas.core.generic import NDFrame, _shared_docs | ||
from pandas.core.index import Index, MultiIndex, _ensure_index | ||
from pandas.core.indexing import (maybe_droplevels, convert_to_index_sliceable, | ||
|
@@ -860,7 +861,7 @@ def from_dict(cls, data, orient='columns', dtype=None): | |
|
||
return cls(data, index=index, columns=columns, dtype=dtype) | ||
|
||
def to_dict(self, orient='dict'): | ||
def to_dict(self, orient='dict', into=dict): | ||
"""Convert DataFrame to dictionary. | ||
|
||
Parameters | ||
|
@@ -882,32 +883,45 @@ def to_dict(self, orient='dict'): | |
Abbreviations are allowed. `s` indicates `series` and `sp` | ||
indicates `split`. | ||
|
||
into : class, default dict | ||
The collections.Mapping subclass used for all Mappings | ||
in the return value. Can be the actual class or an empty | ||
instance of the mapping type you want. If you want a | ||
collections.defaultdict, you must pass an initialized | ||
instance. | ||
.. versionadded:: 0.21.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. blank line |
||
|
||
Returns | ||
------- | ||
result : dict like {column -> {index -> value}} | ||
result : collections.Mapping like {column -> {index -> value}} | ||
If ``into`` is collections.defaultdict, the return | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add some examples of using this. |
||
value's default_factory will be None. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really understand this one. It seems to contrast with what is said above about a defaultdict needing to be initialized. So in that case, the default_factory will just be what is provided in the initialized defaultdict? (and which can be None, if you didn't provide a default_factory, but that seems a bit besides the point to mention here) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is from an earlier version, and I forgot to update the docs. This sentence is wrong, and its gone now. |
||
""" | ||
if not self.columns.is_unique: | ||
warnings.warn("DataFrame columns are not unique, some " | ||
"columns will be omitted.", UserWarning) | ||
# GH16122 | ||
into_c = _standardize_mapping(into) | ||
if orient.lower().startswith('d'): | ||
return dict((k, v.to_dict()) for k, v in compat.iteritems(self)) | ||
return into_c( | ||
(k, v.to_dict(into)) for k, v in compat.iteritems(self)) | ||
elif orient.lower().startswith('l'): | ||
return dict((k, v.tolist()) for k, v in compat.iteritems(self)) | ||
return into_c((k, v.tolist()) for k, v in compat.iteritems(self)) | ||
elif orient.lower().startswith('sp'): | ||
return {'index': self.index.tolist(), | ||
'columns': self.columns.tolist(), | ||
'data': lib.map_infer(self.values.ravel(), | ||
_maybe_box_datetimelike) | ||
.reshape(self.values.shape).tolist()} | ||
return into_c((('index', self.index.tolist()), | ||
('columns', self.columns.tolist()), | ||
('data', lib.map_infer(self.values.ravel(), | ||
_maybe_box_datetimelike) | ||
.reshape(self.values.shape).tolist()))) | ||
elif orient.lower().startswith('s'): | ||
return dict((k, _maybe_box_datetimelike(v)) | ||
for k, v in compat.iteritems(self)) | ||
return into_c((k, _maybe_box_datetimelike(v)) | ||
for k, v in compat.iteritems(self)) | ||
elif orient.lower().startswith('r'): | ||
return [dict((k, _maybe_box_datetimelike(v)) | ||
for k, v in zip(self.columns, row)) | ||
return [into_c((k, _maybe_box_datetimelike(v)) | ||
for k, v in zip(self.columns, row)) | ||
for row in self.values] | ||
elif orient.lower().startswith('i'): | ||
return dict((k, v.to_dict()) for k, v in self.iterrows()) | ||
return into_c((k, v.to_dict(into)) for k, v in self.iterrows()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it correct to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is correct. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gotcha.nI forgot that |
||
else: | ||
raise ValueError("orient '%s' not understood" % orient) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,7 +46,8 @@ | |
_maybe_match_name, | ||
SettingWithCopyError, | ||
_maybe_box_datetimelike, | ||
_dict_compat) | ||
_dict_compat, | ||
_standardize_mapping) | ||
from pandas.core.index import (Index, MultiIndex, InvalidIndexError, | ||
Float64Index, _ensure_index) | ||
from pandas.core.indexing import check_bool_indexer, maybe_convert_indices | ||
|
@@ -1074,15 +1075,27 @@ def tolist(self): | |
""" Convert Series to a nested list """ | ||
return list(self.asobject) | ||
|
||
def to_dict(self): | ||
def to_dict(self, into=dict): | ||
""" | ||
Convert Series to {label -> value} dict | ||
Convert Series to {label -> value} dict or dict-like object | ||
Parameters | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. blank line between the first line and Parameters There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
---------- | ||
into : class, default dict | ||
The collections.Mapping subclass to use as the return | ||
object. Can be the actual class or an empty | ||
instance of the mapping type you want. If you want a | ||
collections.defaultdict, you must pass an initialized | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you must pass it initialized. (maybe elsewhere as well) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed here and in |
||
.. versionadded:: 0.21.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same |
||
|
||
Returns | ||
------- | ||
value_dict : dict | ||
value_dict : collections.Mapping | ||
If ``into`` is collections.defaultdict, the return | ||
value's default_factory will be None. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be removed as well I think (you updated it for the DataFrame docstring) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
""" | ||
return dict(compat.iteritems(self)) | ||
# GH16122 | ||
into_c = _standardize_mapping(into) | ||
return into_c(compat.iteritems(self)) | ||
|
||
def to_frame(self, name=None): | ||
""" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
import pytest | ||
import collections | ||
import numpy as np | ||
|
||
from pandas import compat | ||
|
@@ -13,50 +14,6 @@ | |
|
||
class TestDataFrameConvertTo(TestData): | ||
|
||
def test_to_dict(self): | ||
test_data = { | ||
'A': {'1': 1, '2': 2}, | ||
'B': {'1': '1', '2': '2', '3': '3'}, | ||
} | ||
recons_data = DataFrame(test_data).to_dict() | ||
|
||
for k, v in compat.iteritems(test_data): | ||
for k2, v2 in compat.iteritems(v): | ||
assert v2 == recons_data[k][k2] | ||
|
||
recons_data = DataFrame(test_data).to_dict("l") | ||
|
||
for k, v in compat.iteritems(test_data): | ||
for k2, v2 in compat.iteritems(v): | ||
assert v2 == recons_data[k][int(k2) - 1] | ||
|
||
recons_data = DataFrame(test_data).to_dict("s") | ||
|
||
for k, v in compat.iteritems(test_data): | ||
for k2, v2 in compat.iteritems(v): | ||
assert v2 == recons_data[k][k2] | ||
|
||
recons_data = DataFrame(test_data).to_dict("sp") | ||
expected_split = {'columns': ['A', 'B'], 'index': ['1', '2', '3'], | ||
'data': [[1.0, '1'], [2.0, '2'], [np.nan, '3']]} | ||
tm.assert_dict_equal(recons_data, expected_split) | ||
|
||
recons_data = DataFrame(test_data).to_dict("r") | ||
expected_records = [{'A': 1.0, 'B': '1'}, | ||
{'A': 2.0, 'B': '2'}, | ||
{'A': np.nan, 'B': '3'}] | ||
assert isinstance(recons_data, list) | ||
assert len(recons_data) == 3 | ||
for l, r in zip(recons_data, expected_records): | ||
tm.assert_dict_equal(l, r) | ||
|
||
# GH10844 | ||
recons_data = DataFrame(test_data).to_dict("i") | ||
|
||
for k, v in compat.iteritems(test_data): | ||
for k2, v2 in compat.iteritems(v): | ||
assert v2 == recons_data[k2][k] | ||
|
||
def test_to_dict_timestamp(self): | ||
|
||
# GH11247 | ||
|
@@ -190,17 +147,65 @@ def test_to_records_with_unicode_column_names(self): | |
) | ||
tm.assert_almost_equal(result, expected) | ||
|
||
@pytest.mark.parametrize('mapping', [ | ||
dict, | ||
collections.defaultdict(list), | ||
collections.OrderedDict]) | ||
def test_to_dict(self, mapping): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This may be tested elsewhere, but can you add a test with a dataframe that has duplicate columns? Make sure to catch the warning. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done - I added one at the end. Let me know if that is what you were getting at. |
||
test_data = { | ||
'A': {'1': 1, '2': 2}, | ||
'B': {'1': '1', '2': '2', '3': '3'}, | ||
} | ||
# GH16122 | ||
recons_data = DataFrame(test_data).to_dict(into=mapping) | ||
|
||
for k, v in compat.iteritems(test_data): | ||
for k2, v2 in compat.iteritems(v): | ||
assert (v2 == recons_data[k][k2]) | ||
|
||
recons_data = DataFrame(test_data).to_dict("l", mapping) | ||
|
||
for k, v in compat.iteritems(test_data): | ||
for k2, v2 in compat.iteritems(v): | ||
assert (v2 == recons_data[k][int(k2) - 1]) | ||
|
||
recons_data = DataFrame(test_data).to_dict("s", mapping) | ||
|
||
for k, v in compat.iteritems(test_data): | ||
for k2, v2 in compat.iteritems(v): | ||
assert (v2 == recons_data[k][k2]) | ||
|
||
recons_data = DataFrame(test_data).to_dict("sp", mapping) | ||
expected_split = {'columns': ['A', 'B'], 'index': ['1', '2', '3'], | ||
'data': [[1.0, '1'], [2.0, '2'], [np.nan, '3']]} | ||
tm.assert_dict_equal(recons_data, expected_split) | ||
|
||
recons_data = DataFrame(test_data).to_dict("r", mapping) | ||
expected_records = [{'A': 1.0, 'B': '1'}, | ||
{'A': 2.0, 'B': '2'}, | ||
{'A': np.nan, 'B': '3'}] | ||
assert isinstance(recons_data, list) | ||
assert (len(recons_data) == 3) | ||
for l, r in zip(recons_data, expected_records): | ||
tm.assert_dict_equal(l, r) | ||
|
||
# GH10844 | ||
recons_data = DataFrame(test_data).to_dict("i") | ||
|
||
for k, v in compat.iteritems(test_data): | ||
for k2, v2 in compat.iteritems(v): | ||
assert (v2 == recons_data[k2][k]) | ||
|
||
@pytest.mark.parametrize('tz', ['UTC', 'GMT', 'US/Eastern']) | ||
def test_to_records_datetimeindex_with_tz(tz): | ||
# GH13937 | ||
dr = date_range('2016-01-01', periods=10, | ||
freq='S', tz=tz) | ||
@pytest.mark.parametrize('tz', ['UTC', 'GMT', 'US/Eastern']) | ||
def test_to_records_datetimeindex_with_tz(self, tz): | ||
# GH13937 | ||
dr = date_range('2016-01-01', periods=10, | ||
freq='S', tz=tz) | ||
|
||
df = DataFrame({'datetime': dr}, index=dr) | ||
df = DataFrame({'datetime': dr}, index=dr) | ||
|
||
expected = df.to_records() | ||
result = df.tz_convert("UTC").to_records() | ||
expected = df.to_records() | ||
result = df.tz_convert("UTC").to_records() | ||
|
||
# both converted to UTC, so they are equal | ||
tm.assert_numpy_array_equal(result, expected) | ||
# both converted to UTC, so they are equal | ||
tm.assert_numpy_array_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset this file