Skip to content

Commit e6c7dea

Browse files
topper-123jreback
authored andcommitted
ENH: Let initialisation from dicts use insertion order for python >= 3.6 (part III) (pandas-dev#19884)
1 parent d615f86 commit e6c7dea

File tree

15 files changed

+193
-32
lines changed

15 files changed

+193
-32
lines changed

doc/source/dsintro.rst

+33-4
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,28 @@ index is passed, one will be created having values ``[0, ..., len(data) - 1]``.
8181

8282
**From dict**
8383

84-
If ``data`` is a dict, if **index** is passed the values in data corresponding
85-
to the labels in the index will be pulled out. Otherwise, an index will be
86-
constructed from the sorted keys of the dict, if possible.
84+
Series can be instantiated from dicts:
85+
86+
.. ipython:: python
87+
88+
d = {'b' : 1, 'a' : 0, 'c' : 2}
89+
pd.Series(d)
90+
91+
.. note::
92+
93+
When the data is a dict, and an index is not passed, the ``Series`` index
94+
will be ordered by the dict's insertion order, if you're using Python
95+
version >= 3.6 and Pandas version >= 0.23.
96+
97+
If you're using Python < 3.6 or Pandas < 0.23, and an index is not passed,
98+
the ``Series`` index will be the lexically ordered list of dict keys.
99+
100+
In the example above, if you were on a Python version lower than 3.6 or a
101+
Pandas version lower than 0.23, the ``Series`` would be ordered by the lexical
102+
order of the dict keys (i.e. ``['a', 'b', 'c']`` rather than ``['b', 'a', 'c']``).
103+
104+
If an index is passed, the values in data corresponding to the labels in the
105+
index will be pulled out.
87106

88107
.. ipython:: python
89108
@@ -243,12 +262,22 @@ not matching up to the passed index.
243262
If axis labels are not passed, they will be constructed from the input data
244263
based on common sense rules.
245264

265+
.. note::
266+
267+
When the data is a dict, and ``columns`` is not specified, the ``DataFrame``
268+
columns will be ordered by the dict's insertion order, if you are using
269+
Python version >= 3.6 and Pandas >= 0.23.
270+
271+
If you are using Python < 3.6 or Pandas < 0.23, and ``columns`` is not
272+
specified, the ``DataFrame`` columns will be the lexically ordered list of dict
273+
keys.
274+
246275
From dict of Series or dicts
247276
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
248277

249278
The resulting **index** will be the **union** of the indexes of the various
250279
Series. If there are any nested dicts, these will first be converted to
251-
Series. If no columns are passed, the columns will be the sorted list of dict
280+
Series. If no columns are passed, the columns will be the ordered list of dict
252281
keys.
253282

254283
.. ipython:: python

doc/source/whatsnew/v0.23.0.txt

+54-3
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
v0.23.0
44
-------
55

6-
This is a major release from 0.21.1 and includes a number of API changes,
6+
This is a major release from 0.22.0 and includes a number of API changes,
77
deprecations, new features, enhancements, and performance improvements along
88
with a large number of bug fixes. We recommend that all users upgrade to this
99
version.
@@ -249,7 +249,7 @@ The :func:`DataFrame.assign` now accepts dependent keyword arguments for python
249249
using ``.assign()`` to update an existing column. Previously, callables
250250
referring to other variables being updated would get the "old" values
251251

252-
Previous Behaviour:
252+
Previous Behavior:
253253

254254
.. code-block:: ipython
255255

@@ -262,7 +262,7 @@ The :func:`DataFrame.assign` now accepts dependent keyword arguments for python
262262
1 3 -2
263263
2 4 -3
264264

265-
New Behaviour:
265+
New Behavior:
266266

267267
.. ipython:: python
268268

@@ -361,6 +361,57 @@ If installed, we now require:
361361
| openpyxl | 2.4.0 | |
362362
+-----------------+-----------------+----------+
363363

364+
.. _whatsnew_0230.api_breaking.dict_insertion_order:
365+
366+
Instantation from dicts preserves dict insertion order for python 3.6+
367+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
368+
369+
Until Python 3.6, dicts in Python had no formally defined ordering. For Python
370+
version 3.6 and later, dicts are ordered by insertion order, see
371+
`PEP 468 <https://www.python.org/dev/peps/pep-0468/>`_.
372+
Pandas will use the dict's insertion order, when creating a ``Series`` or
373+
``DataFrame`` from a dict and you're using Python version 3.6 or
374+
higher. (:issue:`19884`)
375+
376+
Previous Behavior (and current behavior if on Python < 3.6):
377+
378+
.. code-block:: ipython
379+
380+
In [1]: pd.Series({'Income': 2000,
381+
... 'Expenses': -1500,
382+
... 'Taxes': -200,
383+
... 'Net result': 300})
384+
Expenses -1500
385+
Income 2000
386+
Net result 300
387+
Taxes -200
388+
dtype: int64
389+
390+
Note the Series above is ordered alphabetically by the index values.
391+
392+
New Behavior (for Python >= 3.6):
393+
394+
.. ipython:: python
395+
396+
pd.Series({'Income': 2000,
397+
'Expenses': -1500,
398+
'Taxes': -200,
399+
'Net result': 300})
400+
401+
Notice that the Series is now ordered by insertion order. This new behavior is
402+
used for all relevant pandas types (``Series``, ``DataFrame``, ``SparseSeries``
403+
and ``SparseDataFrame``).
404+
405+
If you wish to retain the old behavior while using Python >= 3.6, you can use
406+
``.sort_index()``:
407+
408+
.. ipython:: python
409+
410+
pd.Series({'Income': 2000,
411+
'Expenses': -1500,
412+
'Taxes': -200,
413+
'Net result': 300}).sort_index()
414+
364415
.. _whatsnew_0230.api_breaking.deprecate_panel:
365416

366417
Deprecate Panel

pandas/core/common.py

+11-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
from pandas._libs import lib, tslib
1212

1313
from pandas import compat
14-
from pandas.compat import long, zip, iteritems
14+
from pandas.compat import long, zip, iteritems, PY36, OrderedDict
1515
from pandas.core.config import get_option
1616
from pandas.core.dtypes.generic import ABCSeries, ABCIndex
1717
from pandas.core.dtypes.common import _NS_DTYPE
@@ -186,6 +186,16 @@ def _try_sort(iterable):
186186
return listed
187187

188188

189+
def _dict_keys_to_ordered_list(mapping):
190+
# when pandas drops support for Python < 3.6, this function
191+
# can be replaced by a simple list(mapping.keys())
192+
if PY36 or isinstance(mapping, OrderedDict):
193+
keys = list(mapping.keys())
194+
else:
195+
keys = _try_sort(mapping)
196+
return keys
197+
198+
189199
def iterpairs(seq):
190200
"""
191201
Parameters

pandas/core/frame.py

+6-3
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,11 @@ class DataFrame(NDFrame):
252252
----------
253253
data : numpy ndarray (structured or homogeneous), dict, or DataFrame
254254
Dict can contain Series, arrays, constants, or list-like objects
255+
256+
.. versionchanged :: 0.23.0
257+
If data is a dict, argument order is maintained for Python 3.6
258+
and later.
259+
255260
index : Index or array-like
256261
Index to use for resulting frame. Will default to RangeIndex if
257262
no indexing information part of input data and no index provided
@@ -460,9 +465,7 @@ def _init_dict(self, data, index, columns, dtype=None):
460465
arrays.append(v)
461466

462467
else:
463-
keys = list(data.keys())
464-
if not isinstance(data, OrderedDict):
465-
keys = com._try_sort(keys)
468+
keys = com._dict_keys_to_ordered_list(data)
466469
columns = data_names = Index(keys)
467470
arrays = [data[k] for k in keys]
468471

pandas/core/panel.py

+2-4
Original file line numberDiff line numberDiff line change
@@ -204,10 +204,8 @@ def _init_dict(self, data, axes, dtype=None):
204204
for k, v in compat.iteritems(data)
205205
if k in haxis)
206206
else:
207-
ks = list(data.keys())
208-
if not isinstance(data, OrderedDict):
209-
ks = com._try_sort(ks)
210-
haxis = Index(ks)
207+
keys = com._dict_keys_to_ordered_list(data)
208+
haxis = Index(keys)
211209

212210
for k, v in compat.iteritems(data):
213211
if isinstance(v, dict):

pandas/core/series.py

+7-2
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
from pandas import compat
5555
from pandas.io.formats.terminal import get_terminal_size
5656
from pandas.compat import (
57-
zip, u, OrderedDict, StringIO, range, get_range_parameters)
57+
zip, u, OrderedDict, StringIO, range, get_range_parameters, PY36)
5858
from pandas.compat.numpy import function as nv
5959

6060
import pandas.core.ops as ops
@@ -130,6 +130,11 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
130130
----------
131131
data : array-like, dict, or scalar value
132132
Contains data stored in Series
133+
134+
.. versionchanged :: 0.23.0
135+
If data is a dict, argument order is maintained for Python 3.6
136+
and later.
137+
133138
index : array-like or Index (1d)
134139
Values must be hashable and have the same length as `data`.
135140
Non-unique index values are allowed. Will default to
@@ -297,7 +302,7 @@ def _init_dict(self, data, index=None, dtype=None):
297302
# Now we just make sure the order is respected, if any
298303
if index is not None:
299304
s = s.reindex(index, copy=False)
300-
elif not isinstance(data, OrderedDict):
305+
elif not PY36 and not isinstance(data, OrderedDict):
301306
try:
302307
s = s.sort_index()
303308
except TypeError:

pandas/core/sparse/frame.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@ class SparseDataFrame(DataFrame):
3939
Parameters
4040
----------
4141
data : same types as can be passed to DataFrame or scipy.sparse.spmatrix
42+
.. versionchanged :: 0.23.0
43+
If data is a dict, argument order is maintained for Python 3.6
44+
and later.
45+
4246
index : array-like, optional
4347
column : array-like, optional
4448
default_kind : {'block', 'integer'}, default 'block'
@@ -138,7 +142,8 @@ def _init_dict(self, data, index, columns, dtype=None):
138142
columns = _ensure_index(columns)
139143
data = {k: v for k, v in compat.iteritems(data) if k in columns}
140144
else:
141-
columns = Index(com._try_sort(list(data.keys())))
145+
keys = com._dict_keys_to_ordered_list(data)
146+
columns = Index(keys)
142147

143148
if index is None:
144149
index = extract_index(list(data.values()))

pandas/core/sparse/series.py

+4
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,10 @@ class SparseSeries(Series):
4242
Parameters
4343
----------
4444
data : {array-like, Series, SparseSeries, dict}
45+
.. versionchanged :: 0.23.0
46+
If data is a dict, argument order is maintained for Python 3.6
47+
and later.
48+
4549
kind : {'block', 'integer'}
4650
fill_value : float
4751
Code for missing value. Defaults depends on dtype.

pandas/tests/frame/test_constructors.py

+19-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
from pandas.core.dtypes.common import is_integer_dtype
1717
from pandas.compat import (lmap, long, zip, range, lrange, lzip,
18-
OrderedDict, is_platform_little_endian)
18+
OrderedDict, is_platform_little_endian, PY36)
1919
from pandas import compat
2020
from pandas import (DataFrame, Index, Series, isna,
2121
MultiIndex, Timedelta, Timestamp,
@@ -290,6 +290,24 @@ def test_constructor_dict(self):
290290
with tm.assert_raises_regex(ValueError, msg):
291291
DataFrame({'a': 0.7}, columns=['b'])
292292

293+
@pytest.mark.skipif(not PY36, reason='Insertion order for Python>=3.6')
294+
def test_constructor_dict_order_insertion(self):
295+
# GH19018
296+
# initialization ordering: by insertion order if python>= 3.6
297+
d = {'b': self.ts2, 'a': self.ts1}
298+
frame = DataFrame(data=d)
299+
expected = DataFrame(data=d, columns=list('ba'))
300+
tm.assert_frame_equal(frame, expected)
301+
302+
@pytest.mark.skipif(PY36, reason='order by value for Python<3.6')
303+
def test_constructor_dict_order_by_values(self):
304+
# GH19018
305+
# initialization ordering: by value if python<3.6
306+
d = {'b': self.ts2, 'a': self.ts1}
307+
frame = DataFrame(data=d)
308+
expected = DataFrame(data=d, columns=list('ab'))
309+
tm.assert_frame_equal(frame, expected)
310+
293311
def test_constructor_multi_index(self):
294312
# GH 4078
295313
# construction error with mi and all-nan frame

pandas/tests/io/test_excel.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -762,17 +762,17 @@ def test_read_excel_multiindex_empty_level(self, ext):
762762
# GH 12453
763763
with ensure_clean('.xlsx') as path:
764764
df = DataFrame({
765-
('Zero', ''): {0: 0},
766765
('One', 'x'): {0: 1},
767766
('Two', 'X'): {0: 3},
768-
('Two', 'Y'): {0: 7}
767+
('Two', 'Y'): {0: 7},
768+
('Zero', ''): {0: 0}
769769
})
770770

771771
expected = DataFrame({
772-
('Zero', 'Unnamed: 3_level_1'): {0: 0},
773772
('One', u'x'): {0: 1},
774773
('Two', u'X'): {0: 3},
775-
('Two', u'Y'): {0: 7}
774+
('Two', u'Y'): {0: 7},
775+
('Zero', 'Unnamed: 3_level_1'): {0: 0}
776776
})
777777

778778
df.to_excel(path)

pandas/tests/io/test_pytables.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2034,7 +2034,7 @@ def test_table_values_dtypes_roundtrip(self):
20342034
'bool': 1, 'int16': 1, 'int8': 1,
20352035
'int64': 1, 'object': 1, 'datetime64[ns]': 2})
20362036
result = result.sort_index()
2037-
result = expected.sort_index()
2037+
expected = expected.sort_index()
20382038
tm.assert_series_equal(result, expected)
20392039

20402040
def test_table_mixed_dtypes(self):

pandas/tests/series/test_constructors.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
from pandas._libs import lib
2323
from pandas._libs.tslib import iNaT
2424

25-
from pandas.compat import lrange, range, zip, long
25+
from pandas.compat import lrange, range, zip, long, PY36
2626
from pandas.util.testing import assert_series_equal
2727
import pandas.util.testing as tm
2828

@@ -811,6 +811,18 @@ def test_constructor_dict(self):
811811
expected.iloc[1] = 1
812812
assert_series_equal(result, expected)
813813

814+
def test_constructor_dict_order(self):
815+
# GH19018
816+
# initialization ordering: by insertion order if python>= 3.6, else
817+
# order by value
818+
d = {'b': 1, 'a': 0, 'c': 2}
819+
result = Series(d)
820+
if PY36:
821+
expected = Series([1, 0, 2], index=list('bac'))
822+
else:
823+
expected = Series([0, 1, 2], index=list('abc'))
824+
tm.assert_series_equal(result, expected)
825+
814826
@pytest.mark.parametrize("value", [2, np.nan, None, float('nan')])
815827
def test_constructor_dict_nan_key(self, value):
816828
# GH 18480

pandas/tests/sparse/frame/test_frame.py

+12
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,18 @@ def test_constructor(self):
139139

140140
repr(self.frame)
141141

142+
def test_constructor_dict_order(self):
143+
# GH19018
144+
# initialization ordering: by insertion order if python>= 3.6, else
145+
# order by value
146+
d = {'b': [2, 3], 'a': [0, 1]}
147+
frame = SparseDataFrame(data=d)
148+
if compat.PY36:
149+
expected = SparseDataFrame(data=d, columns=list('ba'))
150+
else:
151+
expected = SparseDataFrame(data=d, columns=list('ab'))
152+
tm.assert_sp_frame_equal(frame, expected)
153+
142154
def test_constructor_ndarray(self):
143155
# no index or columns
144156
sp = SparseDataFrame(self.frame.values)

0 commit comments

Comments
 (0)