Skip to content

Commit 199cc43

Browse files
committed
Merge branch '33141-pandas-cut' of github.com:mabelvj/pandas into 33141-pandas-cut
2 parents 777e13e + 4599d46 commit 199cc43

File tree

145 files changed

+2692
-1128
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

145 files changed

+2692
-1128
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
# pandas: powerful Python data analysis toolkit
88
[![PyPI Latest Release](https://img.shields.io/pypi/v/pandas.svg)](https://pypi.org/project/pandas/)
99
[![Conda Latest Release](https://anaconda.org/conda-forge/pandas/badges/version.svg)](https://anaconda.org/anaconda/pandas/)
10+
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3509134.svg)](https://doi.org/10.5281/zenodo.3509134)
1011
[![Package Status](https://img.shields.io/pypi/status/pandas.svg)](https://pypi.org/project/pandas/)
1112
[![License](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/pandas-dev/pandas/blob/master/LICENSE)
1213
[![Travis Build Status](https://travis-ci.org/pandas-dev/pandas.svg?branch=master)](https://travis-ci.org/pandas-dev/pandas)

asv_bench/asv.conf.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
// followed by the pip installed packages).
4040
"matrix": {
4141
"numpy": [],
42-
"Cython": [],
42+
"Cython": ["0.29.16"],
4343
"matplotlib": [],
4444
"sqlalchemy": [],
4545
"scipy": [],

asv_bench/benchmarks/array.py

+18
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@ def setup(self):
99
self.values_float = np.array([1.0, 0.0, 1.0, 0.0])
1010
self.values_integer = np.array([1, 0, 1, 0])
1111
self.values_integer_like = [1, 0, 1, 0]
12+
self.data = np.array([True, False, True, False])
13+
self.mask = np.array([False, False, True, False])
14+
15+
def time_constructor(self):
16+
pd.arrays.BooleanArray(self.data, self.mask)
1217

1318
def time_from_bool_array(self):
1419
pd.array(self.values_bool, dtype="boolean")
@@ -21,3 +26,16 @@ def time_from_integer_like(self):
2126

2227
def time_from_float_array(self):
2328
pd.array(self.values_float, dtype="boolean")
29+
30+
31+
class IntegerArray:
32+
def setup(self):
33+
self.values_integer = np.array([1, 0, 1, 0])
34+
self.data = np.array([1, 2, 3, 4], dtype="int64")
35+
self.mask = np.array([False, False, True, False])
36+
37+
def time_constructor(self):
38+
pd.arrays.IntegerArray(self.data, self.mask)
39+
40+
def time_from_integer_array(self):
41+
pd.array(self.values_integer, dtype="Int64")

ci/code_checks.sh

+4-4
Original file line numberDiff line numberDiff line change
@@ -292,10 +292,6 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
292292
pytest -q --doctest-modules pandas/core/generic.py
293293
RET=$(($RET + $?)) ; echo $MSG "DONE"
294294

295-
MSG='Doctests groupby.py' ; echo $MSG
296-
pytest -q --doctest-modules pandas/core/groupby/groupby.py -k"-cumcount -describe -pipe"
297-
RET=$(($RET + $?)) ; echo $MSG "DONE"
298-
299295
MSG='Doctests series.py' ; echo $MSG
300296
pytest -q --doctest-modules pandas/core/series.py
301297
RET=$(($RET + $?)) ; echo $MSG "DONE"
@@ -318,6 +314,10 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
318314
pytest -q --doctest-modules pandas/core/dtypes/
319315
RET=$(($RET + $?)) ; echo $MSG "DONE"
320316

317+
MSG='Doctests groupby' ; echo $MSG
318+
pytest -q --doctest-modules pandas/core/groupby/
319+
RET=$(($RET + $?)) ; echo $MSG "DONE"
320+
321321
MSG='Doctests indexes' ; echo $MSG
322322
pytest -q --doctest-modules pandas/core/indexes/
323323
RET=$(($RET + $?)) ; echo $MSG "DONE"

ci/deps/azure-36-minimum_versions.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ dependencies:
2222
- numpy=1.13.3
2323
- openpyxl=2.5.7
2424
- pytables=3.4.2
25-
- python-dateutil=2.6.1
25+
- python-dateutil=2.7.3
2626
- pytz=2017.2
2727
- scipy=0.19.0
2828
- xlrd=1.1.0

ci/deps/azure-37-numpydev.yaml

+2-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ dependencies:
1414
- pytz
1515
- pip
1616
- pip:
17-
- cython>=0.29.16
17+
- cython==0.29.16
18+
# GH#33507 cython 3.0a1 is causing TypeErrors 2020-04-13
1819
- "git+git://github.com/dateutil/dateutil.git"
1920
- "-f https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com"
2021
- "--pre"

ci/deps/azure-macos-36.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ dependencies:
2323
- openpyxl
2424
- pyarrow>=0.13.0
2525
- pytables
26-
- python-dateutil==2.6.1
26+
- python-dateutil==2.7.3
2727
- pytz
2828
- xarray
2929
- xlrd

doc/source/getting_started/install.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ Package Minimum support
221221
================================================================ ==========================
222222
`setuptools <https://setuptools.readthedocs.io/en/latest/>`__ 24.2.0
223223
`NumPy <https://www.numpy.org>`__ 1.13.3
224-
`python-dateutil <https://dateutil.readthedocs.io/en/stable/>`__ 2.6.1
224+
`python-dateutil <https://dateutil.readthedocs.io/en/stable/>`__ 2.7.3
225225
`pytz <https://pypi.org/project/pytz/>`__ 2017.2
226226
================================================================ ==========================
227227

doc/source/getting_started/intro_tutorials/03_subset_data.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
<div class="card-body">
2424
<p class="card-text">
2525

26-
This tutorial uses the titanic data set, stored as CSV. The data
26+
This tutorial uses the Titanic data set, stored as CSV. The data
2727
consists of the following data columns:
2828

2929
- PassengerId: Id of every passenger.
@@ -72,7 +72,7 @@ How do I select specific columns from a ``DataFrame``?
7272
<ul class="task-bullet">
7373
<li>
7474

75-
I’m interested in the age of the titanic passengers.
75+
I’m interested in the age of the Titanic passengers.
7676

7777
.. ipython:: python
7878
@@ -111,7 +111,7 @@ the number of rows is returned.
111111
<ul class="task-bullet">
112112
<li>
113113

114-
I’m interested in the age and sex of the titanic passengers.
114+
I’m interested in the age and sex of the Titanic passengers.
115115

116116
.. ipython:: python
117117
@@ -198,7 +198,7 @@ can be used to filter the ``DataFrame`` by putting it in between the
198198
selection brackets ``[]``. Only rows for which the value is ``True``
199199
will be selected.
200200

201-
We now from before that the original titanic ``DataFrame`` consists of
201+
We know from before that the original Titanic ``DataFrame`` consists of
202202
891 rows. Let’s have a look at the amount of rows which satisfy the
203203
condition by checking the ``shape`` attribute of the resulting
204204
``DataFrame`` ``above_35``:
@@ -212,7 +212,7 @@ condition by checking the ``shape`` attribute of the resulting
212212
<ul class="task-bullet">
213213
<li>
214214

215-
I’m interested in the titanic passengers from cabin class 2 and 3.
215+
I’m interested in the Titanic passengers from cabin class 2 and 3.
216216

217217
.. ipython:: python
218218

doc/source/user_guide/io.rst

+21-2
Original file line numberDiff line numberDiff line change
@@ -285,14 +285,18 @@ chunksize : int, default ``None``
285285
Quoting, compression, and file format
286286
+++++++++++++++++++++++++++++++++++++
287287

288-
compression : {``'infer'``, ``'gzip'``, ``'bz2'``, ``'zip'``, ``'xz'``, ``None``}, default ``'infer'``
288+
compression : {``'infer'``, ``'gzip'``, ``'bz2'``, ``'zip'``, ``'xz'``, ``None``, ``dict``}, default ``'infer'``
289289
For on-the-fly decompression of on-disk data. If 'infer', then use gzip,
290290
bz2, zip, or xz if filepath_or_buffer is a string ending in '.gz', '.bz2',
291291
'.zip', or '.xz', respectively, and no decompression otherwise. If using 'zip',
292292
the ZIP file must contain only one data file to be read in.
293-
Set to ``None`` for no decompression.
293+
Set to ``None`` for no decompression. Can also be a dict with key ``'method'``
294+
set to one of {``'zip'``, ``'gzip'``, ``'bz2'``}, and other keys set to
295+
compression settings. As an example, the following could be passed for
296+
faster compression: ``compression={'method': 'gzip', 'compresslevel': 1}``.
294297

295298
.. versionchanged:: 0.24.0 'infer' option added and set to default.
299+
.. versionchanged:: 1.1.0 dict option extended to support ``gzip`` and ``bz2``.
296300
thousands : str, default ``None``
297301
Thousands separator.
298302
decimal : str, default ``'.'``
@@ -3347,6 +3351,12 @@ The compression type can be an explicit parameter or be inferred from the file e
33473351
If 'infer', then use ``gzip``, ``bz2``, ``zip``, or ``xz`` if filename ends in ``'.gz'``, ``'.bz2'``, ``'.zip'``, or
33483352
``'.xz'``, respectively.
33493353

3354+
The compression parameter can also be a ``dict`` in order to pass options to the
3355+
compression protocol. It must have a ``'method'`` key set to the name
3356+
of the compression protocol, which must be one of
3357+
{``'zip'``, ``'gzip'``, ``'bz2'``}. All other key-value pairs are passed to
3358+
the underlying compression library.
3359+
33503360
.. ipython:: python
33513361
33523362
df = pd.DataFrame({
@@ -3383,6 +3393,15 @@ The default is to 'infer':
33833393
rt = pd.read_pickle("s1.pkl.bz2")
33843394
rt
33853395
3396+
Passing options to the compression protocol in order to speed up compression:
3397+
3398+
.. ipython:: python
3399+
3400+
df.to_pickle(
3401+
"data.pkl.gz",
3402+
compression={"method": "gzip", 'compresslevel': 1}
3403+
)
3404+
33863405
.. ipython:: python
33873406
:suppress:
33883407

doc/source/user_guide/timeseries.rst

+9
Original file line numberDiff line numberDiff line change
@@ -786,6 +786,15 @@ Furthermore, if you have a ``Series`` with datetimelike values, then you can
786786
access these properties via the ``.dt`` accessor, as detailed in the section
787787
on :ref:`.dt accessors<basics.dt_accessors>`.
788788

789+
.. versionadded:: 1.1.0
790+
791+
You may obtain the year, week and day components of the ISO year from the ISO 8601 standard:
792+
793+
.. ipython:: python
794+
795+
idx = pd.date_range(start='2019-12-29', freq='D', periods=4)
796+
idx.to_series().dt.isocalendar()
797+
789798
.. _timeseries.offsets:
790799

791800
DateOffset objects

doc/source/whatsnew/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
{{ header }}
44

55
*************
6-
Release Notes
6+
Release notes
77
*************
88

99
This is the list of changes to pandas between each release. For full details,

doc/source/whatsnew/v0.10.0.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0100:
22

3-
v0.10.0 (December 17, 2012)
4-
---------------------------
3+
Version 0.10.0 (December 17, 2012)
4+
----------------------------------
55

66
{{ header }}
77

@@ -490,7 +490,7 @@ Updated PyTables support
490490
however, query terms using the prior (undocumented) methodology are unsupported. You must read in the entire
491491
file and write it out using the new format to take advantage of the updates.
492492

493-
N dimensional Panels (experimental)
493+
N dimensional panels (experimental)
494494
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
495495

496496
Adding experimental support for Panel4D and factory functions to create n-dimensional named panels.

doc/source/whatsnew/v0.10.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0101:
22

3-
v0.10.1 (January 22, 2013)
4-
---------------------------
3+
Version 0.10.1 (January 22, 2013)
4+
---------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.11.0.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0110:
22

3-
v0.11.0 (April 22, 2013)
4-
------------------------
3+
Version 0.11.0 (April 22, 2013)
4+
-------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.12.0.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0120:
22

3-
v0.12.0 (July 24, 2013)
4-
------------------------
3+
Version 0.12.0 (July 24, 2013)
4+
------------------------------
55

66
{{ header }}
77

@@ -177,8 +177,8 @@ API changes
177177
``__repr__``). Plus string safety throughout. Now employed in many places
178178
throughout the pandas library. (:issue:`4090`, :issue:`4092`)
179179

180-
I/O enhancements
181-
~~~~~~~~~~~~~~~~
180+
IO enhancements
181+
~~~~~~~~~~~~~~~
182182

183183
- ``pd.read_html()`` can now parse HTML strings, files or urls and return
184184
DataFrames, courtesy of @cpcloud. (:issue:`3477`, :issue:`3605`, :issue:`3606`, :issue:`3616`).

doc/source/whatsnew/v0.13.0.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0130:
22

3-
v0.13.0 (January 3, 2014)
4-
---------------------------
3+
Version 0.13.0 (January 3, 2014)
4+
--------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.13.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0131:
22

3-
v0.13.1 (February 3, 2014)
4-
--------------------------
3+
Version 0.13.1 (February 3, 2014)
4+
---------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.14.0.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -473,7 +473,7 @@ Some other enhancements to the sql functions include:
473473

474474
.. _whatsnew_0140.slicers:
475475

476-
MultiIndexing using slicers
476+
Multiindexing using slicers
477477
~~~~~~~~~~~~~~~~~~~~~~~~~~~
478478

479479
In 0.14.0 we added a new way to slice MultiIndexed objects.
@@ -904,7 +904,7 @@ There are no experimental changes in 0.14.0
904904

905905
.. _whatsnew_0140.bug_fixes:
906906

907-
Bug Fixes
907+
Bug fixes
908908
~~~~~~~~~
909909

910910
- Bug in Series ValueError when index doesn't match data (:issue:`6532`)

doc/source/whatsnew/v0.15.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -600,7 +600,7 @@ Rolling/expanding moments improvements
600600

601601
.. _whatsnew_0150.sql:
602602

603-
Improvements in the sql io module
603+
Improvements in the SQL io module
604604
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
605605

606606
- Added support for a ``chunksize`` parameter to ``to_sql`` function. This allows DataFrame to be written in chunks and avoid packet-size overflow errors (:issue:`8062`).

doc/source/whatsnew/v0.18.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1197,7 +1197,7 @@ Performance improvements
11971197

11981198
.. _whatsnew_0180.bug_fixes:
11991199

1200-
Bug Fixes
1200+
Bug fixes
12011201
~~~~~~~~~
12021202

12031203
- Bug in ``GroupBy.size`` when data-frame is empty. (:issue:`11699`)

doc/source/whatsnew/v0.18.1.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -380,7 +380,7 @@ New behavior:
380380
381381
.. _whatsnew_0181.numpy_compatibility:
382382

383-
numpy function compatibility
383+
NumPy function compatibility
384384
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
385385

386386
Compatibility between pandas array-like methods (e.g. ``sum`` and ``take``) and their ``numpy``

doc/source/whatsnew/v0.19.0.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -377,15 +377,15 @@ For ``MultiIndex``, values are dropped if any level is missing by default. Speci
377377

378378
.. _whatsnew_0190.gbq:
379379

380-
Google BigQuery Enhancements
380+
Google BigQuery enhancements
381381
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
382382

383383
- The :func:`read_gbq` method has gained the ``dialect`` argument to allow users to specify whether to use BigQuery's legacy SQL or BigQuery's standard SQL. See the `docs <https://pandas-gbq.readthedocs.io/en/latest/reading.html>`__ for more details (:issue:`13615`).
384384
- The :func:`~DataFrame.to_gbq` method now allows the DataFrame column order to differ from the destination table schema (:issue:`11359`).
385385

386386
.. _whatsnew_0190.errstate:
387387

388-
Fine-grained numpy errstate
388+
Fine-grained NumPy errstate
389389
^^^^^^^^^^^^^^^^^^^^^^^^^^^
390390

391391
Previous versions of pandas would permanently silence numpy's ufunc error handling when ``pandas`` was imported. Pandas did this in order to silence the warnings that would arise from using numpy ufuncs on missing data, which are usually represented as ``NaN`` s. Unfortunately, this silenced legitimate warnings arising in non-pandas code in the application. Starting with 0.19.0, pandas will use the ``numpy.errstate`` context manager to silence these warnings in a more fine-grained manner, only around where these operations are actually used in the pandas code base. (:issue:`13109`, :issue:`13145`)
@@ -1185,7 +1185,7 @@ the result of calling :func:`read_csv` without the ``chunksize=`` argument
11851185
11861186
.. _whatsnew_0190.sparse:
11871187

1188-
Sparse Changes
1188+
Sparse changes
11891189
^^^^^^^^^^^^^^
11901190

11911191
These changes allow pandas to handle sparse data with more dtypes, and for work to make a smoother experience with data handling.

0 commit comments

Comments
 (0)