Skip to content

Commit 964fb29

Browse files
author
tp
committed
initialization from dicts for py>=3.6 maintains insertion order
1 parent feedf66 commit 964fb29

File tree

12 files changed

+166
-20
lines changed

12 files changed

+166
-20
lines changed

doc/source/dsintro.rst

+30-4
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,26 @@ index is passed, one will be created having values ``[0, ..., len(data) - 1]``.
8181

8282
**From dict**
8383

84-
If ``data`` is a dict, if **index** is passed the values in data corresponding
85-
to the labels in the index will be pulled out. Otherwise, an index will be
86-
constructed from the sorted keys of the dict, if possible.
84+
.. note::
85+
86+
When the data is a dict, and index is not passed, the Series index
87+
will be ordered by the dict's insertion order, if you're using Python
88+
version >= 3.6 and Pandas version >= 0.23.
89+
90+
If you're using Python < 3.6 or Pandas < 0.23, and index is not passed,
91+
the Series index will be the lexically ordered list of dict keys.
92+
93+
.. ipython:: python
94+
95+
d = {'b' : 1, 'a' : 0, 'c' : 2}
96+
pd.Series(d)
97+
98+
If you in the example above were on a Python version lower than 3.6 or Pandas
99+
lower than 0.23, the Series would be ordered by the lexical order of the dict
100+
keys (i.e. ['a', 'b', 'c'] rather than ['b', 'a', 'c']).
101+
102+
If an index is passed, the values in data corresponding to the labels in the
103+
index will be pulled out.
87104

88105
.. ipython:: python
89106
@@ -243,12 +260,21 @@ not matching up to the passed index.
243260
If axis labels are not passed, they will be constructed from the input data
244261
based on common sense rules.
245262

263+
.. note::
264+
265+
When the data is a dict, and columns is not passed, the DataFrame columns
266+
will be ordered by the dict's insertion order, if you're using Python
267+
version >= 3.6 and Pandas >= 0.23.
268+
269+
If you're using Python < 3.6 or Pandas < 0.23, and columns is not passed,
270+
the DataFrame columns will be the lexically ordered list of dict keys.
271+
246272
From dict of Series or dicts
247273
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
248274

249275
The resulting **index** will be the **union** of the indexes of the various
250276
Series. If there are any nested dicts, these will first be converted to
251-
Series. If no columns are passed, the columns will be the sorted list of dict
277+
Series. If no columns are passed, the columns will be the ordered list of dict
252278
keys.
253279

254280
.. ipython:: python

doc/source/whatsnew/v0.23.0.txt

+54-3
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
v0.23.0
44
-------
55

6-
This is a major release from 0.21.1 and includes a number of API changes,
6+
This is a major release from 0.22.0 and includes a number of API changes,
77
deprecations, new features, enhancements, and performance improvements along
88
with a large number of bug fixes. We recommend that all users upgrade to this
99
version.
@@ -240,7 +240,7 @@ The :func:`DataFrame.assign` now accepts dependent keyword arguments for python
240240
using ``.assign()`` to update an existing column. Previously, callables
241241
referring to other variables being updated would get the "old" values
242242

243-
Previous Behaviour:
243+
Previous Behavior:
244244

245245
.. code-block:: ipython
246246

@@ -253,7 +253,7 @@ The :func:`DataFrame.assign` now accepts dependent keyword arguments for python
253253
1 3 -2
254254
2 4 -3
255255

256-
New Behaviour:
256+
New Behavior:
257257

258258
.. ipython:: python
259259

@@ -320,6 +320,57 @@ If installed, we now require:
320320
| openpyxl | 2.4.0 | |
321321
+-----------------+-----------------+----------+
322322

323+
.. _whatsnew_0230.api_breaking.dict_insertion_order:
324+
325+
Instantation from dicts preserves dict insertion order for python 3.6+
326+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
327+
328+
Until Python 3.6, dicts in Python had no formally defined ordering. Python
329+
version 3.6 and later have changed the ordering definition of dicts, so dicts
330+
in these newer versions are ordered by insertion order, see
331+
`PEP 468 <https://www.python.org/dev/peps/pep-0468/>`_.
332+
Pandas will use the dict's insertion order, when creating Series or
333+
DataFrames from dicts (:issue:`19018`) .
334+
335+
Previous Behavior (and current behavior if on Python < 3.6):
336+
337+
.. code-block:: ipython
338+
339+
In [1]: pd.Series({'Income': 2000,
340+
... 'Expenses': -1500,
341+
... 'Taxes': -200,
342+
... 'Net result': 300})
343+
Expenses -1500
344+
Income 2000
345+
Net result 300
346+
Taxes -200
347+
dtype: int64
348+
349+
Note the Series above is ordered alphabetically by the index values.
350+
351+
New Behavior (for Python >= 3.6):
352+
353+
.. ipython:: python
354+
355+
pd.Series({'Income': 2000,
356+
'Expenses': -1500,
357+
'Taxes': -200,
358+
'Net result': 300})
359+
360+
Notice that the Series is now ordered by insertion order. This new behavior is
361+
used for all relevant pandas types (``Series``, ``DataFrame``, ``SparseSeries``
362+
and ``SparseDataFrame``).
363+
364+
If you wish to retain the old behavior while using Python >= 3.6, you can use
365+
``sort_index``:
366+
367+
.. ipython:: python
368+
369+
pd.Series({'Income': 2000,
370+
'Expenses': -1500,
371+
'Taxes': -200,
372+
'Net result': 300}).sort_index()
373+
323374
.. _whatsnew_0230.api_breaking.deprecate_panel:
324375

325376
Deprecate Panel

pandas/core/frame.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,11 @@ class DataFrame(NDFrame):
251251
----------
252252
data : numpy ndarray (structured or homogeneous), dict, or DataFrame
253253
Dict can contain Series, arrays, constants, or list-like objects
254+
255+
.. versionchanged :: 0.23.0
256+
If data is a dict, argument order is maintained for Python 3.6
257+
and later.
258+
254259
index : Index or array-like
255260
Index to use for resulting frame. Will default to RangeIndex if
256261
no indexing information part of input data and no index provided
@@ -460,7 +465,7 @@ def _init_dict(self, data, index, columns, dtype=None):
460465

461466
else:
462467
keys = list(data.keys())
463-
if not isinstance(data, OrderedDict):
468+
if not PY36 and not isinstance(data, OrderedDict):
464469
keys = com._try_sort(keys)
465470
columns = data_names = Index(keys)
466471
arrays = [data[k] for k in keys]

pandas/core/series.py

+7-2
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
from pandas import compat
5555
from pandas.io.formats.terminal import get_terminal_size
5656
from pandas.compat import (
57-
zip, u, OrderedDict, StringIO, range, get_range_parameters)
57+
zip, u, OrderedDict, StringIO, range, get_range_parameters, PY36)
5858
from pandas.compat.numpy import function as nv
5959

6060
import pandas.core.ops as ops
@@ -130,6 +130,11 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
130130
----------
131131
data : array-like, dict, or scalar value
132132
Contains data stored in Series
133+
134+
.. versionchanged :: 0.23.0
135+
If data is a dict, argument order is maintained for Python 3.6
136+
and later.
137+
133138
index : array-like or Index (1d)
134139
Values must be hashable and have the same length as `data`.
135140
Non-unique index values are allowed. Will default to
@@ -286,7 +291,7 @@ def _init_dict(self, data, index=None, dtype=None):
286291
# Now we just make sure the order is respected, if any
287292
if index is not None:
288293
s = s.reindex(index, copy=False)
289-
elif not isinstance(data, OrderedDict):
294+
elif not PY36 and not isinstance(data, OrderedDict):
290295
try:
291296
s = s.sort_index()
292297
except TypeError:

pandas/core/sparse/frame.py

+9-2
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
# pylint: disable=E1101,E1103,W0231,E0202
77

88
import warnings
9-
from pandas.compat import lmap
9+
from pandas.compat import lmap, OrderedDict, PY36
1010
from pandas import compat
1111
import numpy as np
1212

@@ -39,6 +39,10 @@ class SparseDataFrame(DataFrame):
3939
Parameters
4040
----------
4141
data : same types as can be passed to DataFrame or scipy.sparse.spmatrix
42+
.. versionchanged :: 0.23.0
43+
If data is a dict, argument order is maintained for Python 3.6
44+
and later.
45+
4246
index : array-like, optional
4347
column : array-like, optional
4448
default_kind : {'block', 'integer'}, default 'block'
@@ -138,7 +142,10 @@ def _init_dict(self, data, index, columns, dtype=None):
138142
columns = _ensure_index(columns)
139143
data = {k: v for k, v in compat.iteritems(data) if k in columns}
140144
else:
141-
columns = Index(com._try_sort(list(data.keys())))
145+
keys = list(data.keys())
146+
if not PY36 and not isinstance(data, OrderedDict):
147+
keys = com._try_sort(keys)
148+
columns = Index(keys)
142149

143150
if index is None:
144151
index = extract_index(list(data.values()))

pandas/core/sparse/series.py

+4
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,10 @@ class SparseSeries(Series):
4242
Parameters
4343
----------
4444
data : {array-like, Series, SparseSeries, dict}
45+
.. versionchanged :: 0.23.0
46+
If data is a dict, argument order is maintained for Python 3.6
47+
and later.
48+
4549
kind : {'block', 'integer'}
4650
fill_value : float
4751
Code for missing value. Defaults depends on dtype.

pandas/tests/frame/test_constructors.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
from pandas.core.dtypes.common import is_integer_dtype
1717
from pandas.compat import (lmap, long, zip, range, lrange, lzip,
18-
OrderedDict, is_platform_little_endian)
18+
OrderedDict, is_platform_little_endian, PY36)
1919
from pandas import compat
2020
from pandas import (DataFrame, Index, Series, isna,
2121
MultiIndex, Timedelta, Timestamp,
@@ -290,6 +290,18 @@ def test_constructor_dict(self):
290290
with tm.assert_raises_regex(ValueError, msg):
291291
DataFrame({'a': 0.7}, columns=['b'])
292292

293+
def test_constructor_dict_order(self):
294+
# GH19018
295+
# initialization ordering: by insertion order if python>= 3.6, else
296+
# order by value
297+
d = {'b': self.ts2, 'a': self.ts1}
298+
frame = DataFrame(data=d)
299+
if PY36:
300+
expected = DataFrame(data=d, columns=list('ba'))
301+
else:
302+
expected = DataFrame(data=d, columns=list('ab'))
303+
tm.assert_frame_equal(frame, expected)
304+
293305
def test_constructor_multi_index(self):
294306
# GH 4078
295307
# construction error with mi and all-nan frame

pandas/tests/io/test_excel.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -811,17 +811,17 @@ def test_read_excel_multiindex_empty_level(self):
811811
_skip_if_no_xlsxwriter()
812812
with ensure_clean('.xlsx') as path:
813813
df = DataFrame({
814-
('Zero', ''): {0: 0},
815814
('One', 'x'): {0: 1},
816815
('Two', 'X'): {0: 3},
817-
('Two', 'Y'): {0: 7}
816+
('Two', 'Y'): {0: 7},
817+
('Zero', ''): {0: 0}
818818
})
819819

820820
expected = DataFrame({
821-
('Zero', 'Unnamed: 3_level_1'): {0: 0},
822821
('One', u'x'): {0: 1},
823822
('Two', u'X'): {0: 3},
824-
('Two', u'Y'): {0: 7}
823+
('Two', u'Y'): {0: 7},
824+
('Zero', 'Unnamed: 3_level_1'): {0: 0}
825825
})
826826

827827
df.to_excel(path)

pandas/tests/io/test_pytables.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2034,7 +2034,7 @@ def test_table_values_dtypes_roundtrip(self):
20342034
'bool': 1, 'int16': 1, 'int8': 1,
20352035
'int64': 1, 'object': 1, 'datetime64[ns]': 2})
20362036
result = result.sort_index()
2037-
result = expected.sort_index()
2037+
expected = expected.sort_index()
20382038
tm.assert_series_equal(result, expected)
20392039

20402040
def test_table_mixed_dtypes(self):

pandas/tests/series/test_constructors.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
from pandas._libs import lib
2323
from pandas._libs.tslib import iNaT
2424

25-
from pandas.compat import lrange, range, zip, long
25+
from pandas.compat import lrange, range, zip, long, PY36
2626
from pandas.util.testing import assert_series_equal
2727
import pandas.util.testing as tm
2828

@@ -783,6 +783,18 @@ def test_constructor_dict(self):
783783
expected.iloc[1] = 1
784784
assert_series_equal(result, expected)
785785

786+
def test_constructor_dict_order(self):
787+
# GH19018
788+
# initialization ordering: by insertion order if python>= 3.6, else
789+
# order by value
790+
d = {'b': 1, 'a': 0, 'c': 2}
791+
result = Series(d)
792+
if PY36:
793+
expected = Series([1, 0, 2], index=list('bac'))
794+
else:
795+
expected = Series([0, 1, 2], index=list('abc'))
796+
tm.assert_series_equal(result, expected)
797+
786798
@pytest.mark.parametrize("value", [2, np.nan, None, float('nan')])
787799
def test_constructor_dict_nan_key(self, value):
788800
# GH 18480

pandas/tests/sparse/frame/test_frame.py

+12
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,18 @@ def test_constructor(self):
139139

140140
repr(self.frame)
141141

142+
def test_constructor_dict_order(self):
143+
# GH19018
144+
# initialization ordering: by insertion order if python>= 3.6, else
145+
# order by value
146+
d = {'b': [2, 3], 'a': [0, 1]}
147+
frame = SparseDataFrame(data=d)
148+
if compat.PY36:
149+
expected = SparseDataFrame(data=d, columns=list('ba'))
150+
else:
151+
expected = SparseDataFrame(data=d, columns=list('ab'))
152+
tm.assert_sp_frame_equal(frame, expected)
153+
142154
def test_constructor_ndarray(self):
143155
# no index or columns
144156
sp = SparseDataFrame(self.frame.values)

pandas/tests/sparse/series/test_series.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
from pandas.tseries.offsets import BDay
1515
import pandas.util.testing as tm
1616
import pandas.util._test_decorators as td
17-
from pandas.compat import range
17+
from pandas.compat import range, PY36
1818
from pandas.core.reshape.util import cartesian_product
1919

2020
import pandas.core.sparse.frame as spf
@@ -114,6 +114,18 @@ def test_constructor_dict_input(self):
114114
result = SparseSeries(constructor_dict)
115115
tm.assert_sp_series_equal(result, expected)
116116

117+
def test_constructor_dict_order(self):
118+
# GH19018
119+
# initialization ordering: by insertion order if python>= 3.6, else
120+
# order by value
121+
d = {'b': 1, 'a': 0, 'c': 2}
122+
result = SparseSeries(d)
123+
if PY36:
124+
expected = SparseSeries([1, 0, 2], index=list('bac'))
125+
else:
126+
expected = SparseSeries([0, 1, 2], index=list('abc'))
127+
tm.assert_sp_series_equal(result, expected)
128+
117129
def test_constructor_dtype(self):
118130
arr = SparseSeries([np.nan, 1, 2, np.nan])
119131
assert arr.dtype == np.float64

0 commit comments

Comments
 (0)