Skip to content

Commit a41b502

Browse files
Merge remote-tracking branch 'upstream/master' into bisect
2 parents 82f557e + 5abc06f commit a41b502

File tree

187 files changed

+2105
-1303
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

187 files changed

+2105
-1303
lines changed

asv_bench/benchmarks/groupby.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -486,7 +486,7 @@ def setup(self):
486486
tmp2 = (np.random.random(10000) * 10.0).astype(np.float32)
487487
tmp = np.concatenate((tmp1, tmp2))
488488
arr = np.repeat(tmp, 10)
489-
self.df = DataFrame(dict(a=arr, b=arr))
489+
self.df = DataFrame({"a": arr, "b": arr})
490490

491491
def time_sum(self):
492492
self.df.groupby(["a"])["b"].sum()

asv_bench/benchmarks/join_merge.py

+6
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,9 @@ def time_join_dataframe_index_single_key_small(self, sort):
132132
def time_join_dataframe_index_shuffle_key_bigger_sort(self, sort):
133133
self.df_shuf.join(self.df_key2, on="key2", sort=sort)
134134

135+
def time_join_dataframes_cross(self, sort):
136+
self.df.loc[:2000].join(self.df_key1, how="cross", sort=sort)
137+
135138

136139
class JoinIndex:
137140
def setup(self):
@@ -205,6 +208,9 @@ def time_merge_dataframe_integer_2key(self, sort):
205208
def time_merge_dataframe_integer_key(self, sort):
206209
merge(self.df, self.df2, on="key1", sort=sort)
207210

211+
def time_merge_dataframes_cross(self, sort):
212+
merge(self.left.loc[:2000], self.right.loc[:2000], how="cross", sort=sort)
213+
208214

209215
class I8Merge:
210216

doc/source/development/contributing.rst

+34-7
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ Creating a development environment
146146
----------------------------------
147147

148148
To test out code changes, you'll need to build pandas from source, which
149-
requires a C compiler and Python environment. If you're making documentation
149+
requires a C/C++ compiler and Python environment. If you're making documentation
150150
changes, you can skip to :ref:`contributing.documentation` but you won't be able
151151
to build the documentation locally before pushing your changes.
152152

@@ -195,6 +195,13 @@ operations. To install pandas from source, you need to compile these C
195195
extensions, which means you need a C compiler. This process depends on which
196196
platform you're using.
197197

198+
If you have setup your environment using ``conda``, the packages ``c-compiler``
199+
and ``cxx-compiler`` will install a fitting compiler for your platform that is
200+
compatible with the remaining conda packages. On Windows and macOS, you will
201+
also need to install the SDKs as they have to be distributed separately.
202+
These packages will be automatically installed by using ``pandas``'s
203+
``environment.yml``.
204+
198205
**Windows**
199206

200207
You will need `Build Tools for Visual Studio 2017
@@ -206,12 +213,33 @@ You will need `Build Tools for Visual Studio 2017
206213
scrolling down to "All downloads" -> "Tools for Visual Studio 2019".
207214
In the installer, select the "C++ build tools" workload.
208215

216+
You can install the necessary components on the commandline using
217+
`vs_buildtools.exe <https://aka.ms/vs/16/release/vs_buildtools.exe>`_:
218+
219+
.. code::
220+
221+
vs_buildtools.exe --quiet --wait --norestart --nocache ^
222+
--installPath C:\BuildTools ^
223+
--add "Microsoft.VisualStudio.Workload.VCTools;includeRecommended" ^
224+
--add Microsoft.VisualStudio.Component.VC.v141 ^
225+
--add Microsoft.VisualStudio.Component.VC.v141.x86.x64 ^
226+
--add Microsoft.VisualStudio.Component.Windows10SDK.17763
227+
228+
To setup the right paths on the commandline, call
229+
``"C:\BuildTools\VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.16 10.0.17763.0``.
230+
209231
**macOS**
210232

211-
Information about compiler installation can be found here:
233+
To use the ``conda``-based compilers, you will need to install the
234+
Developer Tools using ``xcode-select --install``. Otherwise
235+
information about compiler installation can be found here:
212236
https://devguide.python.org/setup/#macos
213237

214-
**Unix**
238+
**Linux**
239+
240+
For Linux-based ``conda`` installations, you won't have to install any
241+
additional components outside of the conda environment. The instructions
242+
below are only needed if your setup isn't based on conda environments.
215243

216244
Some Linux distributions will come with a pre-installed C compiler. To find out
217245
which compilers (and versions) are installed on your system::
@@ -243,11 +271,10 @@ Let us know if you have any difficulties by opening an issue or reaching out on
243271
Creating a Python environment
244272
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
245273

246-
Now that you have a C compiler, create an isolated pandas development
247-
environment:
274+
Now create an isolated pandas development environment:
248275

249-
* Install either `Anaconda <https://www.anaconda.com/download/>`_ or `miniconda
250-
<https://conda.io/miniconda.html>`_
276+
* Install either `Anaconda <https://www.anaconda.com/download/>`_, `miniconda
277+
<https://conda.io/miniconda.html>`_, or `miniforge <https://github.com/conda-forge/miniforge>`_
251278
* Make sure your conda is up to date (``conda update conda``)
252279
* Make sure that you have :ref:`cloned the repository <contributing.forking>`
253280
* ``cd`` to the pandas source directory

doc/source/development/policies.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ We will not introduce new deprecations in patch releases.
3535
Deprecations will only be enforced in **major** releases. For example, if a
3636
behavior is deprecated in pandas 1.2.0, it will continue to work, with a
3737
warning, for all releases in the 1.x series. The behavior will change and the
38-
deprecation removed in the next next major release (2.0.0).
38+
deprecation removed in the next major release (2.0.0).
3939

4040
.. note::
4141

doc/source/user_guide/dsintro.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -439,7 +439,7 @@ Data Classes as introduced in `PEP557 <https://www.python.org/dev/peps/pep-0557>
439439
can be passed into the DataFrame constructor.
440440
Passing a list of dataclasses is equivalent to passing a list of dictionaries.
441441

442-
Please be aware, that that all values in the list should be dataclasses, mixing
442+
Please be aware, that all values in the list should be dataclasses, mixing
443443
types in the list would result in a TypeError.
444444

445445
.. ipython:: python

doc/source/user_guide/indexing.rst

+4-32
Original file line numberDiff line numberDiff line change
@@ -584,48 +584,20 @@ without using a temporary variable.
584584
(bb.groupby(['year', 'team']).sum()
585585
.loc[lambda df: df['r'] > 100])
586586
587-
.. _indexing.deprecate_ix:
588587
589-
IX indexer is deprecated
590-
------------------------
591-
592-
.. warning::
593-
594-
.. versionchanged:: 1.0.0
595-
596-
The ``.ix`` indexer was removed, in favor of the more strict ``.iloc`` and ``.loc`` indexers.
588+
.. _combining_positional_and_label_based_indexing:
597589

598-
``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide
599-
to index *positionally* OR via *labels* depending on the data type of the index. This has caused quite a
600-
bit of user confusion over the years.
590+
Combining positional and label-based indexing
591+
---------------------------------------------
601592

602-
The recommended methods of indexing are:
603-
604-
* ``.loc`` if you want to *label* index.
605-
* ``.iloc`` if you want to *positionally* index.
593+
If you wish to get the 0th and the 2nd elements from the index in the 'A' column, you can do:
606594

607595
.. ipython:: python
608596
609597
dfd = pd.DataFrame({'A': [1, 2, 3],
610598
'B': [4, 5, 6]},
611599
index=list('abc'))
612-
613600
dfd
614-
615-
Previous behavior, where you wish to get the 0th and the 2nd elements from the index in the 'A' column.
616-
617-
.. code-block:: ipython
618-
619-
In [3]: dfd.ix[[0, 2], 'A']
620-
Out[3]:
621-
a 1
622-
c 3
623-
Name: A, dtype: int64
624-
625-
Using ``.loc``. Here we will select the appropriate indexes from the index, then use *label* indexing.
626-
627-
.. ipython:: python
628-
629601
dfd.loc[dfd.index[[0, 2]], 'A']
630602
631603
This can also be expressed using ``.iloc``, by explicitly getting locations on the indexers, and using

doc/source/user_guide/integer_na.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ dtype if needed.
117117
# coerce when needed
118118
s + 0.01
119119
120-
These dtypes can operate as part of of ``DataFrame``.
120+
These dtypes can operate as part of ``DataFrame``.
121121

122122
.. ipython:: python
123123

doc/source/user_guide/timeseries.rst

+8-13
Original file line numberDiff line numberDiff line change
@@ -588,45 +588,43 @@ would include matching times on an included date:
588588

589589
.. warning::
590590

591-
Indexing ``DataFrame`` rows with strings is deprecated in pandas 1.2.0 and will be removed in a future version. Use ``frame.loc[dtstring]`` instead.
591+
Indexing ``DataFrame`` rows with a *single* string with getitem (e.g. ``frame[dtstring]``)
592+
is deprecated starting with pandas 1.2.0 (given the ambiguity whether it is indexing
593+
the rows or selecting a column) and will be removed in a future version. The equivalent
594+
with ``.loc`` (e.g. ``frame.loc[dtstring]``) is still supported.
592595

593596
.. ipython:: python
594-
:okwarning:
595597
596598
dft = pd.DataFrame(
597599
np.random.randn(100000, 1),
598600
columns=["A"],
599601
index=pd.date_range("20130101", periods=100000, freq="T"),
600602
)
601603
dft
602-
dft["2013"]
604+
dft.loc["2013"]
603605
604606
This starts on the very first time in the month, and includes the last date and
605607
time for the month:
606608

607609
.. ipython:: python
608-
:okwarning:
609610
610611
dft["2013-1":"2013-2"]
611612
612613
This specifies a stop time **that includes all of the times on the last day**:
613614

614615
.. ipython:: python
615-
:okwarning:
616616
617617
dft["2013-1":"2013-2-28"]
618618
619619
This specifies an **exact** stop time (and is not the same as the above):
620620

621621
.. ipython:: python
622-
:okwarning:
623622
624623
dft["2013-1":"2013-2-28 00:00:00"]
625624
626625
We are stopping on the included end-point as it is part of the index:
627626

628627
.. ipython:: python
629-
:okwarning:
630628
631629
dft["2013-1-15":"2013-1-15 12:30:00"]
632630
@@ -652,7 +650,6 @@ We are stopping on the included end-point as it is part of the index:
652650
Slicing with string indexing also honors UTC offset.
653651

654652
.. ipython:: python
655-
:okwarning:
656653
657654
df = pd.DataFrame([0], index=pd.DatetimeIndex(["2019-01-01"], tz="US/Pacific"))
658655
df
@@ -704,15 +701,14 @@ If index resolution is second, then the minute-accurate timestamp gives a
704701
series_second.index.resolution
705702
series_second["2011-12-31 23:59"]
706703
707-
If the timestamp string is treated as a slice, it can be used to index ``DataFrame`` with ``[]`` as well.
704+
If the timestamp string is treated as a slice, it can be used to index ``DataFrame`` with ``.loc[]`` as well.
708705

709706
.. ipython:: python
710-
:okwarning:
711707
712708
dft_minute = pd.DataFrame(
713709
{"a": [1, 2, 3], "b": [4, 5, 6]}, index=series_minute.index
714710
)
715-
dft_minute["2011-12-31 23"]
711+
dft_minute.loc["2011-12-31 23"]
716712
717713
718714
.. warning::
@@ -2080,7 +2076,6 @@ You can pass in dates and strings to ``Series`` and ``DataFrame`` with ``PeriodI
20802076
Passing a string representing a lower frequency than ``PeriodIndex`` returns partial sliced data.
20812077

20822078
.. ipython:: python
2083-
:okwarning:
20842079
20852080
ps["2011"]
20862081
@@ -2090,7 +2085,7 @@ Passing a string representing a lower frequency than ``PeriodIndex`` returns par
20902085
index=pd.period_range("2013-01-01 9:00", periods=600, freq="T"),
20912086
)
20922087
dfp
2093-
dfp["2013-01-01 10H"]
2088+
dfp.loc["2013-01-01 10H"]
20942089
20952090
As with ``DatetimeIndex``, the endpoints will be included in the result. The example below slices data starting from 10:00 to 11:59.
20962091

doc/source/whatsnew/v0.12.0.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -419,7 +419,7 @@ Bug fixes
419419
~~~~~~~~~
420420

421421
- Plotting functions now raise a ``TypeError`` before trying to plot anything
422-
if the associated objects have have a dtype of ``object`` (:issue:`1818`,
422+
if the associated objects have a dtype of ``object`` (:issue:`1818`,
423423
:issue:`3572`, :issue:`3911`, :issue:`3912`), but they will try to convert object arrays to
424424
numeric arrays if possible so that you can still plot, for example, an
425425
object array with floats. This happens before any drawing takes place which
@@ -430,8 +430,8 @@ Bug fixes
430430

431431
- ``Series.str`` now supports iteration (:issue:`3638`). You can iterate over the
432432
individual elements of each string in the ``Series``. Each iteration yields
433-
yields a ``Series`` with either a single character at each index of the
434-
original ``Series`` or ``NaN``. For example,
433+
a ``Series`` with either a single character at each index of the original
434+
``Series`` or ``NaN``. For example,
435435

436436
.. ipython:: python
437437
:okwarning:

doc/source/whatsnew/v0.14.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -923,7 +923,7 @@ Bug fixes
923923
- ``HDFStore.select_as_multiple`` handles start and stop the same way as ``select`` (:issue:`6177`)
924924
- ``HDFStore.select_as_coordinates`` and ``select_column`` works with a ``where`` clause that results in filters (:issue:`6177`)
925925
- Regression in join of non_unique_indexes (:issue:`6329`)
926-
- Issue with groupby ``agg`` with a single function and a a mixed-type frame (:issue:`6337`)
926+
- Issue with groupby ``agg`` with a single function and a mixed-type frame (:issue:`6337`)
927927
- Bug in ``DataFrame.replace()`` when passing a non- ``bool``
928928
``to_replace`` argument (:issue:`6332`)
929929
- Raise when trying to align on different levels of a MultiIndex assignment (:issue:`3738`)

doc/source/whatsnew/v0.15.2.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ Enhancements
136136

137137
- Added ability to export Categorical data to Stata (:issue:`8633`). See :ref:`here <io.stata-categorical>` for limitations of categorical variables exported to Stata data files.
138138
- Added flag ``order_categoricals`` to ``StataReader`` and ``read_stata`` to select whether to order imported categorical data (:issue:`8836`). See :ref:`here <io.stata-categorical>` for more information on importing categorical variables from Stata data files.
139-
- Added ability to export Categorical data to to/from HDF5 (:issue:`7621`). Queries work the same as if it was an object array. However, the ``category`` dtyped data is stored in a more efficient manner. See :ref:`here <io.hdf5-categorical>` for an example and caveats w.r.t. prior versions of pandas.
139+
- Added ability to export Categorical data to/from HDF5 (:issue:`7621`). Queries work the same as if it was an object array. However, the ``category`` dtyped data is stored in a more efficient manner. See :ref:`here <io.hdf5-categorical>` for an example and caveats w.r.t. prior versions of pandas.
140140
- Added support for ``searchsorted()`` on ``Categorical`` class (:issue:`8420`).
141141

142142
Other enhancements:

doc/source/whatsnew/v0.16.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Version 0.16.1 (May 11, 2015)
66
{{ header }}
77

88

9-
This is a minor bug-fix release from 0.16.0 and includes a a large number of
9+
This is a minor bug-fix release from 0.16.0 and includes a large number of
1010
bug fixes along several new features, enhancements, and performance improvements.
1111
We recommend that all users upgrade to this version.
1212

@@ -72,7 +72,7 @@ setting the index of a ``DataFrame/Series`` with a ``category`` dtype would conv
7272
Out[4]: Index(['c', 'a', 'b'], dtype='object')
7373
7474
75-
setting the index, will create create a ``CategoricalIndex``
75+
setting the index, will create a ``CategoricalIndex``
7676

7777
.. code-block:: ipython
7878

doc/source/whatsnew/v0.16.2.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Version 0.16.2 (June 12, 2015)
66
{{ header }}
77

88

9-
This is a minor bug-fix release from 0.16.1 and includes a a large number of
9+
This is a minor bug-fix release from 0.16.1 and includes a large number of
1010
bug fixes along some new features (:meth:`~DataFrame.pipe` method), enhancements, and performance improvements.
1111

1212
We recommend that all users upgrade to this version.

doc/source/whatsnew/v0.18.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -610,7 +610,7 @@ Subtraction by ``Timedelta`` in a ``Series`` by a ``Timestamp`` works (:issue:`1
610610
pd.Timestamp('2012-01-01') - ser
611611
612612
613-
``NaT.isoformat()`` now returns ``'NaT'``. This change allows allows
613+
``NaT.isoformat()`` now returns ``'NaT'``. This change allows
614614
``pd.Timestamp`` to rehydrate any timestamp like object from its isoformat
615615
(:issue:`12300`).
616616

doc/source/whatsnew/v0.20.0.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -1167,7 +1167,7 @@ Other API changes
11671167
- ``.loc`` has compat with ``.ix`` for accepting iterators, and NamedTuples (:issue:`15120`)
11681168
- ``interpolate()`` and ``fillna()`` will raise a ``ValueError`` if the ``limit`` keyword argument is not greater than 0. (:issue:`9217`)
11691169
- ``pd.read_csv()`` will now issue a ``ParserWarning`` whenever there are conflicting values provided by the ``dialect`` parameter and the user (:issue:`14898`)
1170-
- ``pd.read_csv()`` will now raise a ``ValueError`` for the C engine if the quote character is larger than than one byte (:issue:`11592`)
1170+
- ``pd.read_csv()`` will now raise a ``ValueError`` for the C engine if the quote character is larger than one byte (:issue:`11592`)
11711171
- ``inplace`` arguments now require a boolean value, else a ``ValueError`` is thrown (:issue:`14189`)
11721172
- ``pandas.api.types.is_datetime64_ns_dtype`` will now report ``True`` on a tz-aware dtype, similar to ``pandas.api.types.is_datetime64_any_dtype``
11731173
- ``DataFrame.asof()`` will return a null filled ``Series`` instead the scalar ``NaN`` if a match is not found (:issue:`15118`)
@@ -1315,7 +1315,7 @@ The recommended methods of indexing are:
13151315
- ``.loc`` if you want to *label* index
13161316
- ``.iloc`` if you want to *positionally* index.
13171317

1318-
Using ``.ix`` will now show a ``DeprecationWarning`` with a link to some examples of how to convert code :ref:`here <indexing.deprecate_ix>`.
1318+
Using ``.ix`` will now show a ``DeprecationWarning`` with a link to some examples of how to convert code `here <https://pandas.pydata.org/pandas-docs/version/1.0/user_guide/indexing.html#ix-indexer-is-deprecated>`__.
13191319

13201320

13211321
.. ipython:: python
@@ -1663,11 +1663,11 @@ Indexing
16631663
- Bug in ``.reset_index()`` when an all ``NaN`` level of a ``MultiIndex`` would fail (:issue:`6322`)
16641664
- Bug in ``.reset_index()`` when raising error for index name already present in ``MultiIndex`` columns (:issue:`16120`)
16651665
- Bug in creating a ``MultiIndex`` with tuples and not passing a list of names; this will now raise ``ValueError`` (:issue:`15110`)
1666-
- Bug in the HTML display with with a ``MultiIndex`` and truncation (:issue:`14882`)
1666+
- Bug in the HTML display with a ``MultiIndex`` and truncation (:issue:`14882`)
16671667
- Bug in the display of ``.info()`` where a qualifier (+) would always be displayed with a ``MultiIndex`` that contains only non-strings (:issue:`15245`)
16681668
- Bug in ``pd.concat()`` where the names of ``MultiIndex`` of resulting ``DataFrame`` are not handled correctly when ``None`` is presented in the names of ``MultiIndex`` of input ``DataFrame`` (:issue:`15787`)
16691669
- Bug in ``DataFrame.sort_index()`` and ``Series.sort_index()`` where ``na_position`` doesn't work with a ``MultiIndex`` (:issue:`14784`, :issue:`16604`)
1670-
- Bug in in ``pd.concat()`` when combining objects with a ``CategoricalIndex`` (:issue:`16111`)
1670+
- Bug in ``pd.concat()`` when combining objects with a ``CategoricalIndex`` (:issue:`16111`)
16711671
- Bug in indexing with a scalar and a ``CategoricalIndex`` (:issue:`16123`)
16721672

16731673
IO

doc/source/whatsnew/v0.21.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ Parquet is designed to faithfully serialize and de-serialize ``DataFrame`` s, su
5050
dtypes, including extension dtypes such as datetime with timezones.
5151

5252
This functionality depends on either the `pyarrow <http://arrow.apache.org/docs/python/>`__ or `fastparquet <https://fastparquet.readthedocs.io/en/latest/>`__ library.
53-
For more details, see see :ref:`the IO docs on Parquet <io.parquet>`.
53+
For more details, see :ref:`the IO docs on Parquet <io.parquet>`.
5454

5555

5656
.. _whatsnew_0210.enhancements.infer_objects:

0 commit comments

Comments
 (0)