Skip to content

Commit 4715ef6

Browse files
committed
Merge remote-tracking branch 'upstream/master' into ea-where
2 parents edff47e + b78aa8d commit 4715ef6

File tree

106 files changed

+2623
-2267
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

106 files changed

+2623
-2267
lines changed

LICENSES/MUSL_LICENSE

+132
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
musl as a whole is licensed under the following standard MIT license:
2+
3+
----------------------------------------------------------------------
4+
Copyright © 2005-2014 Rich Felker, et al.
5+
6+
Permission is hereby granted, free of charge, to any person obtaining
7+
a copy of this software and associated documentation files (the
8+
"Software"), to deal in the Software without restriction, including
9+
without limitation the rights to use, copy, modify, merge, publish,
10+
distribute, sublicense, and/or sell copies of the Software, and to
11+
permit persons to whom the Software is furnished to do so, subject to
12+
the following conditions:
13+
14+
The above copyright notice and this permission notice shall be
15+
included in all copies or substantial portions of the Software.
16+
17+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
18+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
19+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
20+
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
21+
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
22+
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
23+
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
24+
----------------------------------------------------------------------
25+
26+
Authors/contributors include:
27+
28+
Anthony G. Basile
29+
Arvid Picciani
30+
Bobby Bingham
31+
Boris Brezillon
32+
Brent Cook
33+
Chris Spiegel
34+
Clément Vasseur
35+
Emil Renner Berthing
36+
Hiltjo Posthuma
37+
Isaac Dunham
38+
Jens Gustedt
39+
Jeremy Huntwork
40+
John Spencer
41+
Justin Cormack
42+
Luca Barbato
43+
Luka Perkov
44+
M Farkas-Dyck (Strake)
45+
Michael Forney
46+
Nicholas J. Kain
47+
orc
48+
Pascal Cuoq
49+
Pierre Carrier
50+
Rich Felker
51+
Richard Pennington
52+
sin
53+
Solar Designer
54+
Stefan Kristiansson
55+
Szabolcs Nagy
56+
Timo Teräs
57+
Valentin Ochs
58+
William Haddon
59+
60+
Portions of this software are derived from third-party works licensed
61+
under terms compatible with the above MIT license:
62+
63+
The TRE regular expression implementation (src/regex/reg* and
64+
src/regex/tre*) is Copyright © 2001-2008 Ville Laurikari and licensed
65+
under a 2-clause BSD license (license text in the source files). The
66+
included version has been heavily modified by Rich Felker in 2012, in
67+
the interests of size, simplicity, and namespace cleanliness.
68+
69+
Much of the math library code (src/math/* and src/complex/*) is
70+
Copyright © 1993,2004 Sun Microsystems or
71+
Copyright © 2003-2011 David Schultz or
72+
Copyright © 2003-2009 Steven G. Kargl or
73+
Copyright © 2003-2009 Bruce D. Evans or
74+
Copyright © 2008 Stephen L. Moshier
75+
and labelled as such in comments in the individual source files. All
76+
have been licensed under extremely permissive terms.
77+
78+
The ARM memcpy code (src/string/armel/memcpy.s) is Copyright © 2008
79+
The Android Open Source Project and is licensed under a two-clause BSD
80+
license. It was taken from Bionic libc, used on Android.
81+
82+
The implementation of DES for crypt (src/misc/crypt_des.c) is
83+
Copyright © 1994 David Burren. It is licensed under a BSD license.
84+
85+
The implementation of blowfish crypt (src/misc/crypt_blowfish.c) was
86+
originally written by Solar Designer and placed into the public
87+
domain. The code also comes with a fallback permissive license for use
88+
in jurisdictions that may not recognize the public domain.
89+
90+
The smoothsort implementation (src/stdlib/qsort.c) is Copyright © 2011
91+
Valentin Ochs and is licensed under an MIT-style license.
92+
93+
The BSD PRNG implementation (src/prng/random.c) and XSI search API
94+
(src/search/*.c) functions are Copyright © 2011 Szabolcs Nagy and
95+
licensed under following terms: "Permission to use, copy, modify,
96+
and/or distribute this code for any purpose with or without fee is
97+
hereby granted. There is no warranty."
98+
99+
The x86_64 port was written by Nicholas J. Kain. Several files (crt)
100+
were released into the public domain; others are licensed under the
101+
standard MIT license terms at the top of this file. See individual
102+
files for their copyright status.
103+
104+
The mips and microblaze ports were originally written by Richard
105+
Pennington for use in the ellcc project. The original code was adapted
106+
by Rich Felker for build system and code conventions during upstream
107+
integration. It is licensed under the standard MIT terms.
108+
109+
The powerpc port was also originally written by Richard Pennington,
110+
and later supplemented and integrated by John Spencer. It is licensed
111+
under the standard MIT terms.
112+
113+
All other files which have no copyright comments are original works
114+
produced specifically for use as part of this library, written either
115+
by Rich Felker, the main author of the library, or by one or more
116+
contibutors listed above. Details on authorship of individual files
117+
can be found in the git version control history of the project. The
118+
omission of copyright and license comments in each file is in the
119+
interest of source tree size.
120+
121+
All public header files (include/* and arch/*/bits/*) should be
122+
treated as Public Domain as they intentionally contain no content
123+
which can be covered by copyright. Some source modules may fall in
124+
this category as well. If you believe that a file is so trivial that
125+
it should be in the Public Domain, please contact the authors and
126+
request an explicit statement releasing it from copyright.
127+
128+
The following files are trivial, believed not to be copyrightable in
129+
the first place, and hereby explicitly released to the Public Domain:
130+
131+
All public headers: include/*, arch/*/bits/*
132+
Startup files: crt/*

asv_bench/benchmarks/groupby.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -473,8 +473,8 @@ def setup(self):
473473
n1 = 400
474474
n2 = 250
475475
index = MultiIndex(levels=[np.arange(n1), tm.makeStringIndex(n2)],
476-
labels=[np.repeat(range(n1), n2).tolist(),
477-
list(range(n2)) * n1],
476+
codes=[np.repeat(range(n1), n2).tolist(),
477+
list(range(n2)) * n1],
478478
names=['lev1', 'lev2'])
479479
arr = np.random.randn(n1 * n2, 3)
480480
arr[::10000, 0] = np.nan

asv_bench/benchmarks/join_merge.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -115,16 +115,16 @@ class Join(object):
115115
def setup(self, sort):
116116
level1 = tm.makeStringIndex(10).values
117117
level2 = tm.makeStringIndex(1000).values
118-
label1 = np.arange(10).repeat(1000)
119-
label2 = np.tile(np.arange(1000), 10)
118+
codes1 = np.arange(10).repeat(1000)
119+
codes2 = np.tile(np.arange(1000), 10)
120120
index2 = MultiIndex(levels=[level1, level2],
121-
labels=[label1, label2])
121+
codes=[codes1, codes2])
122122
self.df_multi = DataFrame(np.random.randn(len(index2), 4),
123123
index=index2,
124124
columns=['A', 'B', 'C', 'D'])
125125

126-
self.key1 = np.tile(level1.take(label1), 10)
127-
self.key2 = np.tile(level2.take(label2), 10)
126+
self.key1 = np.tile(level1.take(codes1), 10)
127+
self.key2 = np.tile(level2.take(codes2), 10)
128128
self.df = DataFrame({'data1': np.random.randn(100000),
129129
'data2': np.random.randn(100000),
130130
'key1': self.key1,

asv_bench/benchmarks/multiindex_object.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -79,8 +79,8 @@ def setup(self):
7979
levels = [np.arange(n),
8080
tm.makeStringIndex(n).values,
8181
1000 + np.arange(n)]
82-
labels = [np.random.choice(n, (k * n)) for lev in levels]
83-
self.mi = MultiIndex(levels=levels, labels=labels)
82+
codes = [np.random.choice(n, (k * n)) for lev in levels]
83+
self.mi = MultiIndex(levels=levels, codes=codes)
8484

8585
def time_duplicated(self):
8686
self.mi.duplicated()

asv_bench/benchmarks/reindex.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -71,9 +71,9 @@ class LevelAlign(object):
7171
def setup(self):
7272
self.index = MultiIndex(
7373
levels=[np.arange(10), np.arange(100), np.arange(100)],
74-
labels=[np.arange(10).repeat(10000),
75-
np.tile(np.arange(100).repeat(100), 10),
76-
np.tile(np.tile(np.arange(100), 100), 10)])
74+
codes=[np.arange(10).repeat(10000),
75+
np.tile(np.arange(100).repeat(100), 10),
76+
np.tile(np.tile(np.arange(100), 100), 10)])
7777
self.df = DataFrame(np.random.randn(len(self.index), 4),
7878
index=self.index)
7979
self.df_level = DataFrame(np.random.randn(100, 4),

asv_bench/benchmarks/stat_ops.py

+8-8
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,10 @@ class FrameMultiIndexOps(object):
3131

3232
def setup(self, level, op):
3333
levels = [np.arange(10), np.arange(100), np.arange(100)]
34-
labels = [np.arange(10).repeat(10000),
35-
np.tile(np.arange(100).repeat(100), 10),
36-
np.tile(np.tile(np.arange(100), 100), 10)]
37-
index = pd.MultiIndex(levels=levels, labels=labels)
34+
codes = [np.arange(10).repeat(10000),
35+
np.tile(np.arange(100).repeat(100), 10),
36+
np.tile(np.tile(np.arange(100), 100), 10)]
37+
index = pd.MultiIndex(levels=levels, codes=codes)
3838
df = pd.DataFrame(np.random.randn(len(index), 4), index=index)
3939
self.df_func = getattr(df, op)
4040

@@ -67,10 +67,10 @@ class SeriesMultiIndexOps(object):
6767

6868
def setup(self, level, op):
6969
levels = [np.arange(10), np.arange(100), np.arange(100)]
70-
labels = [np.arange(10).repeat(10000),
71-
np.tile(np.arange(100).repeat(100), 10),
72-
np.tile(np.tile(np.arange(100), 100), 10)]
73-
index = pd.MultiIndex(levels=levels, labels=labels)
70+
codes = [np.arange(10).repeat(10000),
71+
np.tile(np.arange(100).repeat(100), 10),
72+
np.tile(np.tile(np.arange(100), 100), 10)]
73+
index = pd.MultiIndex(levels=levels, codes=codes)
7474
s = pd.Series(np.random.randn(len(index)), index=index)
7575
self.s_func = getattr(s, op)
7676

doc/source/advanced.rst

+6-1
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,11 @@ analysis.
4949

5050
See the :ref:`cookbook<cookbook.multi_index>` for some advanced strategies.
5151

52+
.. versionchanged:: 0.24.0
53+
54+
:attr:`MultiIndex.labels` has been renamed to :attr:`MultiIndex.codes`
55+
and :attr:`MultiIndex.set_labels` to :attr:`MultiIndex.set_codes`.
56+
5257
Creating a MultiIndex (hierarchical index) object
5358
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5459

@@ -469,7 +474,7 @@ values across a level. For instance:
469474
.. ipython:: python
470475
471476
midx = pd.MultiIndex(levels=[['zero', 'one'], ['x', 'y']],
472-
labels=[[1, 1, 0, 0], [1, 0, 1, 0]])
477+
codes=[[1, 1, 0, 0], [1, 0, 1, 0]])
473478
df = pd.DataFrame(np.random.randn(4, 2), index=midx)
474479
df
475480
df2 = df.mean(level=0)

doc/source/api.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1712,7 +1712,7 @@ MultiIndex Attributes
17121712

17131713
MultiIndex.names
17141714
MultiIndex.levels
1715-
MultiIndex.labels
1715+
MultiIndex.codes
17161716
MultiIndex.nlevels
17171717
MultiIndex.levshape
17181718

@@ -1723,7 +1723,7 @@ MultiIndex Components
17231723
:toctree: generated/
17241724

17251725
MultiIndex.set_levels
1726-
MultiIndex.set_labels
1726+
MultiIndex.set_codes
17271727
MultiIndex.to_hierarchical
17281728
MultiIndex.to_flat_index
17291729
MultiIndex.to_frame

doc/source/dsintro.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -961,7 +961,7 @@ From DataFrame using ``to_panel`` method
961961
.. ipython:: python
962962
:okwarning:
963963
964-
midx = pd.MultiIndex(levels=[['one', 'two'], ['x','y']], labels=[[1,1,0,0],[1,0,1,0]])
964+
midx = pd.MultiIndex(levels=[['one', 'two'], ['x','y']], codes=[[1,1,0,0],[1,0,1,0]])
965965
df = pd.DataFrame({'A' : [1, 2, 3, 4], 'B': [5, 6, 7, 8]}, index=midx)
966966
df.to_panel()
967967

doc/source/indexing.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -1571,9 +1571,9 @@ Setting metadata
15711571

15721572
Indexes are "mostly immutable", but it is possible to set and change their
15731573
metadata, like the index ``name`` (or, for ``MultiIndex``, ``levels`` and
1574-
``labels``).
1574+
``codes``).
15751575

1576-
You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_labels``
1576+
You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_codes``
15771577
to set these attributes directly. They default to returning a copy; however,
15781578
you can specify ``inplace=True`` to have the data change in place.
15791579

@@ -1588,7 +1588,7 @@ See :ref:`Advanced Indexing <advanced>` for usage of MultiIndexes.
15881588
ind.name = "bob"
15891589
ind
15901590
1591-
``set_names``, ``set_levels``, and ``set_labels`` also take an optional
1591+
``set_names``, ``set_levels``, and ``set_codes`` also take an optional
15921592
`level`` argument
15931593

15941594
.. ipython:: python

doc/source/internals.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -74,23 +74,23 @@ MultiIndex
7474
~~~~~~~~~~
7575

7676
Internally, the ``MultiIndex`` consists of a few things: the **levels**, the
77-
integer **labels**, and the level **names**:
77+
integer **codes** (until version 0.24 named *labels*), and the level **names**:
7878

7979
.. ipython:: python
8080
8181
index = pd.MultiIndex.from_product([range(3), ['one', 'two']],
8282
names=['first', 'second'])
8383
index
8484
index.levels
85-
index.labels
85+
index.codes
8686
index.names
8787
88-
You can probably guess that the labels determine which unique element is
88+
You can probably guess that the codes determine which unique element is
8989
identified with that location at each layer of the index. It's important to
90-
note that sortedness is determined **solely** from the integer labels and does
90+
note that sortedness is determined **solely** from the integer codes and does
9191
not check (or care) whether the levels themselves are sorted. Fortunately, the
9292
constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but
93-
if you compute the levels and labels yourself, please be careful.
93+
if you compute the levels and codes yourself, please be careful.
9494

9595
Values
9696
~~~~~~

doc/source/io.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -3728,8 +3728,8 @@ storing/selecting from homogeneous index ``DataFrames``.
37283728
37293729
index = pd.MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
37303730
['one', 'two', 'three']],
3731-
labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
3732-
[0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
3731+
codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
3732+
[0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
37333733
names=['foo', 'bar'])
37343734
df_mi = pd.DataFrame(np.random.randn(10, 3), index=index,
37353735
columns=['A', 'B', 'C'])

doc/source/whatsnew/v0.24.0.rst

+9
Original file line numberDiff line numberDiff line change
@@ -1101,6 +1101,13 @@ Other API Changes
11011101
Deprecations
11021102
~~~~~~~~~~~~
11031103

1104+
- :attr:`MultiIndex.labels` has been deprecated and replaced by :attr:`MultiIndex.codes`.
1105+
The functionality is unchanged. The new name better reflects the natures of
1106+
these codes and makes the ``MultiIndex`` API more similar to the API for :class:`CategoricalIndex`(:issue:`13443`).
1107+
As a consequence, other uses of the name ``labels`` in ``MultiIndex`` have also been deprecated and replaced with ``codes``:
1108+
- You should initialize a ``MultiIndex`` instance using a parameter named ``codes`` rather than ``labels``.
1109+
- ``MultiIndex.set_labels`` has been deprecated in favor of :meth:`MultiIndex.set_codes`.
1110+
- For method :meth:`MultiIndex.copy`, the ``labels`` parameter has been deprecated and replaced by a ``codes`` parameter.
11041111
- :meth:`DataFrame.to_stata`, :meth:`read_stata`, :class:`StataReader` and :class:`StataWriter` have deprecated the ``encoding`` argument. The encoding of a Stata dta file is determined by the file type and cannot be changed (:issue:`21244`)
11051112
- :meth:`MultiIndex.to_hierarchical` is deprecated and will be removed in a future version (:issue:`21613`)
11061113
- :meth:`Series.ptp` is deprecated. Use ``numpy.ptp`` instead (:issue:`21614`)
@@ -1236,6 +1243,7 @@ Performance Improvements
12361243
- Improved performance of :func:`pd.concat` for `Series` objects (:issue:`23404`)
12371244
- Improved performance of :meth:`DatetimeIndex.normalize` and :meth:`Timestamp.normalize` for timezone naive or UTC datetimes (:issue:`23634`)
12381245
- Improved performance of :meth:`DatetimeIndex.tz_localize` and various ``DatetimeIndex`` attributes with dateutil UTC timezone (:issue:`23772`)
1246+
- Fixed a performance regression on Windows with Python 3.7 of :func:`pd.read_csv` (:issue:`23516`)
12391247
- Improved performance of :class:`Categorical` constructor for `Series` objects (:issue:`23814`)
12401248
- Improved performance of :meth:`~DataFrame.where` for Categorical data (:issue:`24077`)
12411249

@@ -1549,6 +1557,7 @@ Reshaping
15491557
- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)
15501558
- Bug in ``Series`` construction when passing no data and ``dtype=str`` (:issue:`22477`)
15511559
- Bug in :func:`cut` with ``bins`` as an overlapping ``IntervalIndex`` where multiple bins were returned per item instead of raising a ``ValueError`` (:issue:`23980`)
1560+
- Bug in :func:`pandas.concat` when joining ``Series`` datetimetz with ``Series`` category would lose timezone (:issue:`23816`)
15521561
- Bug in :meth:`DataFrame.join` when joining on partial MultiIndex would drop names (:issue:`20452`).
15531562

15541563
.. _whatsnew_0240.bug_fixes.sparse:

pandas/_libs/src/headers/portable.h

+6
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,10 @@
55
#define strcasecmp( s1, s2 ) _stricmp( s1, s2 )
66
#endif
77

8+
// GH-23516 - works around locale perf issues
9+
// from MUSL libc, MIT Licensed - see LICENSES
10+
#define isdigit_ascii(c) ((unsigned)c - '0' < 10)
11+
#define isspace_ascii(c) (c == ' ' || (unsigned)c-'\t' < 5)
12+
#define toupper_ascii(c) (((unsigned)c-'a' < 26) ? (c & 0x5f) : c)
13+
814
#endif

0 commit comments

Comments
 (0)