Skip to content

Commit 222e16c

Browse files
author
tp
committed
initialization from dicts for py>=3.6 maintains insertion order
1 parent feedf66 commit 222e16c

File tree

12 files changed

+153
-19
lines changed

12 files changed

+153
-19
lines changed

doc/source/dsintro.rst

+17-3
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,21 @@ index is passed, one will be created having values ``[0, ..., len(data) - 1]``.
8181

8282
**From dict**
8383

84-
If ``data`` is a dict, if **index** is passed the values in data corresponding
85-
to the labels in the index will be pulled out. Otherwise, an index will be
86-
constructed from the sorted keys of the dict, if possible.
84+
When creating a pandas Series from a dict, the Series will be ordered by the
85+
dict's insertion order, if you are using Python 3.6+ and no index has been
86+
supplied.
87+
88+
.. ipython:: python
89+
90+
d = {'b' : 1, 'a' : 0 'c' : 2}
91+
pd.Series(d)
92+
93+
If you are a Python version lower than 3.6, and no index is passed, the
94+
series will be sorted by the lexical order of the keys of the dict
95+
(i.e. ['a', 'b', 'c'] in the example above).
96+
97+
If an index is passed, the values in data corresponding to the labels in the
98+
index will be pulled out.
8799

88100
.. ipython:: python
89101
@@ -277,6 +289,8 @@ The row and column labels can be accessed respectively by accessing the
277289
From dict of ndarrays / lists
278290
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
279291

292+
The columns will be ordered by the dict insertion order, unless you're using
293+
Python version < 3.6, then the columns will be ordered lexically/alphabetically.
280294
The ndarrays must all be the same length. If an index is passed, it must
281295
clearly also be the same length as the arrays. If no index is passed, the
282296
result will be ``range(n)``, where ``n`` is the array length.

doc/source/whatsnew/v0.23.0.txt

+54-3
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
v0.23.0
44
-------
55

6-
This is a major release from 0.21.1 and includes a number of API changes,
6+
This is a major release from 0.22.0 and includes a number of API changes,
77
deprecations, new features, enhancements, and performance improvements along
88
with a large number of bug fixes. We recommend that all users upgrade to this
99
version.
@@ -240,7 +240,7 @@ The :func:`DataFrame.assign` now accepts dependent keyword arguments for python
240240
using ``.assign()`` to update an existing column. Previously, callables
241241
referring to other variables being updated would get the "old" values
242242

243-
Previous Behaviour:
243+
Previous behavior:
244244

245245
.. code-block:: ipython
246246

@@ -253,7 +253,7 @@ The :func:`DataFrame.assign` now accepts dependent keyword arguments for python
253253
1 3 -2
254254
2 4 -3
255255

256-
New Behaviour:
256+
New behavior:
257257

258258
.. ipython:: python
259259

@@ -320,6 +320,57 @@ If installed, we now require:
320320
| openpyxl | 2.4.0 | |
321321
+-----------------+-----------------+----------+
322322

323+
.. _whatsnew_0230.api_breaking.dict_insertion_order:
324+
325+
Creating Dataframes and Series from dicts preserves dict insertion order for python 3.6+
326+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
327+
328+
Until Python 3.6, dicts in Python had no formally defined ordering. Python
329+
version 3.6 and later have changed the ordering definition of dicts, so dicts
330+
in these newer versions are ordered by insertion order
331+
(see also `PEP 468 <https://www.python.org/dev/peps/pep-0468/>`_).
332+
Pandas will from version 0.23 use insertion order, when creating Series or
333+
DataFrames from dicts (:issue:`19018`) .
334+
335+
Previous behavior (and current behavior if on Python < 3.6):
336+
337+
.. code-block:: ipython
338+
339+
In [1]: pd.Series({'Income': 2000,
340+
... 'Expenses': -1500,
341+
... 'Taxes': -200,
342+
... 'Net result': 300})
343+
Expenses -1500
344+
Income 2000
345+
Net result 300
346+
Taxes -200
347+
dtype: int64
348+
349+
Note the Series above is ordered alphabetically by the index values.
350+
351+
New behavior (for Python >= 3.6):
352+
353+
.. ipython:: python
354+
355+
pd.Series({'Income': 2000,
356+
'Expenses': -1500,
357+
'Taxes': -200,
358+
'Net result': 300})
359+
360+
Notice that the Series is now ordered by insertion order. This new behavior is
361+
used for all relevant pandas types (``Series``, ``DataFrame``, ``SparseSeries``
362+
and ``SparseDataFrame``).
363+
364+
If you wish to retain the old behavior while using Python >= 3.6, you can use
365+
``sort_index``:
366+
367+
.. ipython:: python
368+
369+
pd.Series({'Income': 2000,
370+
'Expenses': -1500,
371+
'Taxes': -200,
372+
'Net result': 300}).sort_index()
373+
323374
.. _whatsnew_0230.api_breaking.deprecate_panel:
324375

325376
Deprecate Panel

pandas/core/frame.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,11 @@ class DataFrame(NDFrame):
251251
----------
252252
data : numpy ndarray (structured or homogeneous), dict, or DataFrame
253253
Dict can contain Series, arrays, constants, or list-like objects
254+
255+
.. versionchanged :: 0.23.0
256+
If data is a dict, argument order is maintained for Python 3.6
257+
and later.
258+
254259
index : Index or array-like
255260
Index to use for resulting frame. Will default to RangeIndex if
256261
no indexing information part of input data and no index provided
@@ -460,7 +465,7 @@ def _init_dict(self, data, index, columns, dtype=None):
460465

461466
else:
462467
keys = list(data.keys())
463-
if not isinstance(data, OrderedDict):
468+
if not PY36 and not isinstance(data, OrderedDict):
464469
keys = com._try_sort(keys)
465470
columns = data_names = Index(keys)
466471
arrays = [data[k] for k in keys]

pandas/core/series.py

+7-2
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
from pandas import compat
5555
from pandas.io.formats.terminal import get_terminal_size
5656
from pandas.compat import (
57-
zip, u, OrderedDict, StringIO, range, get_range_parameters)
57+
zip, u, OrderedDict, StringIO, range, get_range_parameters, PY36)
5858
from pandas.compat.numpy import function as nv
5959

6060
import pandas.core.ops as ops
@@ -130,6 +130,11 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
130130
----------
131131
data : array-like, dict, or scalar value
132132
Contains data stored in Series
133+
134+
.. versionchanged :: 0.23.0
135+
If data is a dict, argument order is maintained for Python 3.6
136+
and later.
137+
133138
index : array-like or Index (1d)
134139
Values must be hashable and have the same length as `data`.
135140
Non-unique index values are allowed. Will default to
@@ -286,7 +291,7 @@ def _init_dict(self, data, index=None, dtype=None):
286291
# Now we just make sure the order is respected, if any
287292
if index is not None:
288293
s = s.reindex(index, copy=False)
289-
elif not isinstance(data, OrderedDict):
294+
elif not PY36 and not isinstance(data, OrderedDict):
290295
try:
291296
s = s.sort_index()
292297
except TypeError:

pandas/core/sparse/frame.py

+9-2
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
# pylint: disable=E1101,E1103,W0231,E0202
77

88
import warnings
9-
from pandas.compat import lmap
9+
from pandas.compat import lmap, OrderedDict, PY36
1010
from pandas import compat
1111
import numpy as np
1212

@@ -39,6 +39,10 @@ class SparseDataFrame(DataFrame):
3939
Parameters
4040
----------
4141
data : same types as can be passed to DataFrame or scipy.sparse.spmatrix
42+
.. versionchanged :: 0.23.0
43+
If data is a dict, argument order is maintained for Python 3.6
44+
and later.
45+
4246
index : array-like, optional
4347
column : array-like, optional
4448
default_kind : {'block', 'integer'}, default 'block'
@@ -138,7 +142,10 @@ def _init_dict(self, data, index, columns, dtype=None):
138142
columns = _ensure_index(columns)
139143
data = {k: v for k, v in compat.iteritems(data) if k in columns}
140144
else:
141-
columns = Index(com._try_sort(list(data.keys())))
145+
keys = list(data.keys())
146+
if not PY36 and not isinstance(data, OrderedDict):
147+
keys = com._try_sort(keys)
148+
columns = Index(keys)
142149

143150
if index is None:
144151
index = extract_index(list(data.values()))

pandas/core/sparse/series.py

+4
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,10 @@ class SparseSeries(Series):
4242
Parameters
4343
----------
4444
data : {array-like, Series, SparseSeries, dict}
45+
.. versionchanged :: 0.23.0
46+
If data is a dict, argument order is maintained for Python 3.6
47+
and later.
48+
4549
kind : {'block', 'integer'}
4650
fill_value : float
4751
Code for missing value. Defaults depends on dtype.

pandas/tests/frame/test_constructors.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
from pandas.core.dtypes.common import is_integer_dtype
1717
from pandas.compat import (lmap, long, zip, range, lrange, lzip,
18-
OrderedDict, is_platform_little_endian)
18+
OrderedDict, is_platform_little_endian, PY36)
1919
from pandas import compat
2020
from pandas import (DataFrame, Index, Series, isna,
2121
MultiIndex, Timedelta, Timestamp,
@@ -290,6 +290,18 @@ def test_constructor_dict(self):
290290
with tm.assert_raises_regex(ValueError, msg):
291291
DataFrame({'a': 0.7}, columns=['b'])
292292

293+
def test_constructor_dict_order(self):
294+
# GH19018
295+
# initialization ordering: by insertion order if python>= 3.6, else
296+
# order by value
297+
d = {'b': self.ts2, 'a': self.ts1}
298+
frame = DataFrame(data=d)
299+
if PY36:
300+
expected = DataFrame(data=d, columns=list('ba'))
301+
else:
302+
expected = DataFrame(data=d, columns=list('ab'))
303+
tm.assert_frame_equal(frame, expected)
304+
293305
def test_constructor_multi_index(self):
294306
# GH 4078
295307
# construction error with mi and all-nan frame

pandas/tests/io/test_excel.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -811,17 +811,17 @@ def test_read_excel_multiindex_empty_level(self):
811811
_skip_if_no_xlsxwriter()
812812
with ensure_clean('.xlsx') as path:
813813
df = DataFrame({
814-
('Zero', ''): {0: 0},
815814
('One', 'x'): {0: 1},
816815
('Two', 'X'): {0: 3},
817-
('Two', 'Y'): {0: 7}
816+
('Two', 'Y'): {0: 7},
817+
('Zero', ''): {0: 0}
818818
})
819819

820820
expected = DataFrame({
821-
('Zero', 'Unnamed: 3_level_1'): {0: 0},
822821
('One', u'x'): {0: 1},
823822
('Two', u'X'): {0: 3},
824-
('Two', u'Y'): {0: 7}
823+
('Two', u'Y'): {0: 7},
824+
('Zero', 'Unnamed: 3_level_1'): {0: 0}
825825
})
826826

827827
df.to_excel(path)

pandas/tests/io/test_pytables.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2034,7 +2034,7 @@ def test_table_values_dtypes_roundtrip(self):
20342034
'bool': 1, 'int16': 1, 'int8': 1,
20352035
'int64': 1, 'object': 1, 'datetime64[ns]': 2})
20362036
result = result.sort_index()
2037-
result = expected.sort_index()
2037+
expected = expected.sort_index()
20382038
tm.assert_series_equal(result, expected)
20392039

20402040
def test_table_mixed_dtypes(self):

pandas/tests/series/test_constructors.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
from pandas._libs import lib
2323
from pandas._libs.tslib import iNaT
2424

25-
from pandas.compat import lrange, range, zip, long
25+
from pandas.compat import lrange, range, zip, long, PY36
2626
from pandas.util.testing import assert_series_equal
2727
import pandas.util.testing as tm
2828

@@ -783,6 +783,18 @@ def test_constructor_dict(self):
783783
expected.iloc[1] = 1
784784
assert_series_equal(result, expected)
785785

786+
def test_constructor_dict_order(self):
787+
# GH19018
788+
# initialization ordering: by insertion order if python>= 3.6, else
789+
# order by value
790+
d = {'b': 1, 'a': 0, 'c': 2}
791+
result = Series(d)
792+
if PY36:
793+
expected = Series([1, 0, 2], index=list('bac'))
794+
else:
795+
expected = Series([0, 1, 2], index=list('abc'))
796+
tm.assert_series_equal(result, expected)
797+
786798
@pytest.mark.parametrize("value", [2, np.nan, None, float('nan')])
787799
def test_constructor_dict_nan_key(self, value):
788800
# GH 18480

pandas/tests/sparse/frame/test_frame.py

+12
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,18 @@ def test_constructor(self):
139139

140140
repr(self.frame)
141141

142+
def test_constructor_dict_order(self):
143+
# GH19018
144+
# initialization ordering: by insertion order if python>= 3.6, else
145+
# order by value
146+
d = {'b': [2, 3], 'a': [0, 1]}
147+
frame = SparseDataFrame(data=d)
148+
if compat.PY36:
149+
expected = SparseDataFrame(data=d, columns=list('ba'))
150+
else:
151+
expected = SparseDataFrame(data=d, columns=list('ab'))
152+
tm.assert_sp_frame_equal(frame, expected)
153+
142154
def test_constructor_ndarray(self):
143155
# no index or columns
144156
sp = SparseDataFrame(self.frame.values)

pandas/tests/sparse/series/test_series.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
from pandas.tseries.offsets import BDay
1515
import pandas.util.testing as tm
1616
import pandas.util._test_decorators as td
17-
from pandas.compat import range
17+
from pandas.compat import range, PY36
1818
from pandas.core.reshape.util import cartesian_product
1919

2020
import pandas.core.sparse.frame as spf
@@ -114,6 +114,18 @@ def test_constructor_dict_input(self):
114114
result = SparseSeries(constructor_dict)
115115
tm.assert_sp_series_equal(result, expected)
116116

117+
def test_constructor_dict_order(self):
118+
# GH19018
119+
# initialization ordering: by insertion order if python>= 3.6, else
120+
# order by value
121+
d = {'b': 1, 'a': 0, 'c': 2}
122+
result = SparseSeries(d)
123+
if PY36:
124+
expected = SparseSeries([1, 0, 2], index=list('bac'))
125+
else:
126+
expected = SparseSeries([0, 1, 2], index=list('abc'))
127+
tm.assert_sp_series_equal(result, expected)
128+
117129
def test_constructor_dtype(self):
118130
arr = SparseSeries([np.nan, 1, 2, np.nan])
119131
assert arr.dtype == np.float64

0 commit comments

Comments
 (0)