Skip to content

ENH: Add columns parameter to from_dict #19802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 22, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,7 @@ Other Enhancements
- ``IntervalIndex.astype`` now supports conversions between subtypes when passed an ``IntervalDtype`` (:issue:`19197`)
- :class:`IntervalIndex` and its associated constructor methods (``from_arrays``, ``from_breaks``, ``from_tuples``) have gained a ``dtype`` parameter (:issue:`19262`)
- Added :func:`SeriesGroupBy.is_monotonic_increasing` and :func:`SeriesGroupBy.is_monotonic_decreasing` (:issue:`17015`)
- :func:`DataFrame.from_dict` now accepts a ``columns`` argument that can be used to specify the column names when ``orient='index'`` is used (:issue:`18529`)

.. _whatsnew_0230.api_breaking:

Expand Down
15 changes: 12 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -876,7 +876,7 @@ def dot(self, other):
# IO methods (to / from other formats)

@classmethod
def from_dict(cls, data, orient='columns', dtype=None):
def from_dict(cls, data, orient='columns', dtype=None, columns=None):
"""
Construct DataFrame from dict of array-like or dicts

Expand All @@ -890,12 +890,17 @@ def from_dict(cls, data, orient='columns', dtype=None):
(default). Otherwise if the keys should be rows, pass 'index'.
dtype : dtype, default None
Data type to force, otherwise infer
columns: list, default None
Column labels to use when orient='index'. Raises a ValueError
if used with orient='columns'

.. versionadded:: 0.23.0

Returns
-------
DataFrame
"""
index, columns = None, None
index = None
orient = orient.lower()
if orient == 'index':
if len(data) > 0:
Expand All @@ -904,7 +909,11 @@ def from_dict(cls, data, orient='columns', dtype=None):
data = _from_nested_dict(data)
else:
data, index = list(data.values()), list(data.keys())
elif orient != 'columns': # pragma: no cover
elif orient == 'columns':
if columns is not None:
raise ValueError("cannot use columns parameter with "
"orient='columns'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, we could even allow it for orient='columns', as DataFrame(dict, columns=..) will just handle this fine.
But, not really sure of the value here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on what you mean? If I pass a dict with columns to DataFrame() I get:

In [1]: pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, columns = ['one', 'two'])
Out[1]:
Empty DataFrame
Columns: [one, two]
Index: []

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is exactly what I meant. The columns keyword does a 'reindexing' operation, and does not 'overwrite' if the data has keys as well. That's the current behaviour of DataFrame(..). So we could follow here the same pattern, but since this is sometimes also somewhat surprising behaviour, not sure if it is needed to have that here as well.
So therefore my doubt :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I understand now. It makes more sense to me to raise a ValueError in the case with from_dict(..., orient='columns', columns=[...]) but I can change it to not raise if we want to be consistent with DataFrame(dict, columns=[..])

else: # pragma: no cover
raise ValueError('only recognize index or columns for orient')

return cls(data, index=index, columns=columns, dtype=dtype)
Expand Down
19 changes: 19 additions & 0 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -1091,6 +1091,25 @@ def test_constructor_orient(self):
xp = DataFrame.from_dict(a).T.reindex(list(a.keys()))
tm.assert_frame_equal(rs, xp)

def test_from_dict_columns_parameter(self):
# GH 18529
# Test new columns parameter for from_dict that was added to make
# from_items(..., orient='index', columns=[...]) easier to replicate
result = DataFrame.from_dict(OrderedDict([('A', [1, 2]),
('B', [4, 5])]),
orient='index', columns=['one', 'two'])
expected = DataFrame([[1, 2], [4, 5]], index=['A', 'B'],
columns=['one', 'two'])
tm.assert_frame_equal(result, expected)

msg = "cannot use columns parameter with orient='columns'"
with tm.assert_raises_regex(ValueError, msg):
DataFrame.from_dict(dict([('A', [1, 2]), ('B', [4, 5])]),
orient='columns', columns=['one', 'two'])
with tm.assert_raises_regex(ValueError, msg):
DataFrame.from_dict(dict([('A', [1, 2]), ('B', [4, 5])]),
columns=['one', 'two'])

def test_constructor_Series_named(self):
a = Series([1, 2, 3], index=['a', 'b', 'c'], name='x')
df = DataFrame(a)
Expand Down