Skip to content

Commit 756084a

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into git_version
2 parents 0406aa7 + 7a2fbce commit 756084a

File tree

259 files changed

+16536
-12723
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

259 files changed

+16536
-12723
lines changed

.coveragerc

+2
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
[run]
33
branch = False
44
omit = */tests/*
5+
plugins = Cython.Coverage
56

67
[report]
78
# Regexes for lines to exclude from consideration
@@ -22,6 +23,7 @@ exclude_lines =
2223
if __name__ == .__main__.:
2324

2425
ignore_errors = False
26+
show_missing = True
2527

2628
[html]
2729
directory = coverage_html_report

appveyor.yml

+2
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,14 @@ environment:
2020
matrix:
2121

2222
- CONDA_ROOT: "C:\\Miniconda3_64"
23+
APPVEYOR_BUILD_WORKER_IMAGE: Visual Studio 2017
2324
PYTHON_VERSION: "3.6"
2425
PYTHON_ARCH: "64"
2526
CONDA_PY: "36"
2627
CONDA_NPY: "113"
2728

2829
- CONDA_ROOT: "C:\\Miniconda3_64"
30+
APPVEYOR_BUILD_WORKER_IMAGE: Visual Studio 2015
2931
PYTHON_VERSION: "2.7"
3032
PYTHON_ARCH: "64"
3133
CONDA_PY: "27"

asv_bench/benchmarks/groupby.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ def time_frame_nth(self, dtype):
142142
def time_series_nth_any(self, dtype):
143143
self.df['values'].groupby(self.df['key']).nth(0, dropna='any')
144144

145-
def time_groupby_nth_all(self, dtype):
145+
def time_series_nth_all(self, dtype):
146146
self.df['values'].groupby(self.df['key']).nth(0, dropna='all')
147147

148148
def time_series_nth(self, dtype):

asv_bench/benchmarks/reshape.py

+18
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
1+
import string
12
from itertools import product
23

34
import numpy as np
45
from pandas import DataFrame, MultiIndex, date_range, melt, wide_to_long
6+
import pandas as pd
57

68
from .pandas_vb_common import setup # noqa
79

@@ -132,3 +134,19 @@ def setup(self):
132134

133135
def time_pivot_table(self):
134136
self.df.pivot_table(index='key1', columns=['key2', 'key3'])
137+
138+
139+
class GetDummies(object):
140+
goal_time = 0.2
141+
142+
def setup(self):
143+
categories = list(string.ascii_letters[:12])
144+
s = pd.Series(np.random.choice(categories, size=1_000_000),
145+
dtype=pd.api.types.CategoricalDtype(categories))
146+
self.s = s
147+
148+
def time_get_dummies_1d(self):
149+
pd.get_dummies(self.s, sparse=False)
150+
151+
def time_get_dummies_1d_sparse(self):
152+
pd.get_dummies(self.s, sparse=True)

ci/requirements-optional-conda.txt

+1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ s3fs
2222
scipy
2323
seaborn
2424
sqlalchemy
25+
statsmodels
2526
xarray
2627
xlrd
2728
xlsxwriter

ci/requirements-optional-pip.txt

+1
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ s3fs
2424
scipy
2525
seaborn
2626
sqlalchemy
27+
statsmodels
2728
xarray
2829
xlrd
2930
xlsxwriter

doc/make.py

+4
Original file line numberDiff line numberDiff line change
@@ -363,6 +363,10 @@ def main():
363363
sys.path.append(args.python_path)
364364
globals()['pandas'] = importlib.import_module('pandas')
365365

366+
# Set the matplotlib backend to the non-interactive Agg backend for all
367+
# child processes.
368+
os.environ['MPLBACKEND'] = 'module://matplotlib.backends.backend_agg'
369+
366370
builder = DocBuilder(args.num_jobs, not args.no_api, args.single,
367371
args.verbosity)
368372
getattr(builder, args.command)()

doc/source/api.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -444,6 +444,7 @@ Reindexing / Selection / Label manipulation
444444

445445
Series.align
446446
Series.drop
447+
Series.droplevel
447448
Series.drop_duplicates
448449
Series.duplicated
449450
Series.equals
@@ -1063,6 +1064,7 @@ Reshaping, sorting, transposing
10631064
.. autosummary::
10641065
:toctree: generated/
10651066

1067+
DataFrame.droplevel
10661068
DataFrame.pivot
10671069
DataFrame.pivot_table
10681070
DataFrame.reorder_levels
@@ -1870,8 +1872,6 @@ Methods
18701872
PeriodIndex.asfreq
18711873
PeriodIndex.strftime
18721874
PeriodIndex.to_timestamp
1873-
PeriodIndex.tz_convert
1874-
PeriodIndex.tz_localize
18751875

18761876
Scalars
18771877
-------

doc/source/basics.rst

+18-5
Original file line numberDiff line numberDiff line change
@@ -1924,11 +1924,24 @@ untouched. If the data is modified, it is because you did so explicitly.
19241924
dtypes
19251925
------
19261926

1927-
The main types stored in pandas objects are ``float``, ``int``, ``bool``,
1928-
``datetime64[ns]`` and ``datetime64[ns, tz]``, ``timedelta[ns]``,
1929-
``category`` and ``object``. In addition these dtypes have item sizes, e.g.
1930-
``int64`` and ``int32``. See :ref:`Series with TZ <timeseries.timezone_series>`
1931-
for more detail on ``datetime64[ns, tz]`` dtypes.
1927+
For the most part, pandas uses NumPy arrays and dtypes for Series or individual
1928+
columns of a DataFrame. The main types allowed in pandas objects are ``float``,
1929+
``int``, ``bool``, and ``datetime64[ns]`` (note that NumPy does not support
1930+
timezone-aware datetimes).
1931+
1932+
In addition to NumPy's types, pandas :ref:`extends <extending.extension-types>`
1933+
NumPy's type-system for a few cases.
1934+
1935+
* :ref:`Categorical <categorical>`
1936+
* :ref:`Datetime with Timezone <timeseries.timezone_series>`
1937+
* :ref:`Period <timeseries.periods>`
1938+
* :ref:`Interval <advanced.indexing.intervallindex>`
1939+
1940+
Pandas uses the ``object`` dtype for storing strings.
1941+
1942+
Finally, arbitrary objects may be stored using the ``object`` dtype, but should
1943+
be avoided to the extent possible (for performance and interoperability with
1944+
other libraries and methods. See :ref:`basics.object_conversion`).
19321945

19331946
A convenient :attr:`~DataFrame.dtypes` attribute for DataFrame returns a Series
19341947
with the data type of each column.

doc/source/merging.rst

+20-20
Original file line numberDiff line numberDiff line change
@@ -506,8 +506,8 @@ You can also pass a list of dicts or Series:
506506
507507
.. _merging.join:
508508

509-
Database-style DataFrame joining/merging
510-
----------------------------------------
509+
Database-style DataFrame or named Series joining/merging
510+
--------------------------------------------------------
511511

512512
pandas has full-featured, **high performance** in-memory join operations
513513
idiomatically very similar to relational databases like SQL. These methods
@@ -522,7 +522,7 @@ Users who are familiar with SQL but new to pandas might be interested in a
522522
:ref:`comparison with SQL<compare_with_sql.join>`.
523523

524524
pandas provides a single function, :func:`~pandas.merge`, as the entry point for
525-
all standard database join operations between ``DataFrame`` objects:
525+
all standard database join operations between ``DataFrame`` or named ``Series`` objects:
526526

527527
::
528528

@@ -531,40 +531,40 @@ all standard database join operations between ``DataFrame`` objects:
531531
suffixes=('_x', '_y'), copy=True, indicator=False,
532532
validate=None)
533533

534-
* ``left``: A DataFrame object.
535-
* ``right``: Another DataFrame object.
534+
* ``left``: A DataFrame or named Series object.
535+
* ``right``: Another DataFrame or named Series object.
536536
* ``on``: Column or index level names to join on. Must be found in both the left
537-
and right DataFrame objects. If not passed and ``left_index`` and
537+
and right DataFrame and/or Series objects. If not passed and ``left_index`` and
538538
``right_index`` are ``False``, the intersection of the columns in the
539-
DataFrames will be inferred to be the join keys.
540-
* ``left_on``: Columns or index levels from the left DataFrame to use as
539+
DataFrames and/or Series will be inferred to be the join keys.
540+
* ``left_on``: Columns or index levels from the left DataFrame or Series to use as
541541
keys. Can either be column names, index level names, or arrays with length
542-
equal to the length of the DataFrame.
543-
* ``right_on``: Columns or index levels from the right DataFrame to use as
542+
equal to the length of the DataFrame or Series.
543+
* ``right_on``: Columns or index levels from the right DataFrame or Series to use as
544544
keys. Can either be column names, index level names, or arrays with length
545-
equal to the length of the DataFrame.
545+
equal to the length of the DataFrame or Series.
546546
* ``left_index``: If ``True``, use the index (row labels) from the left
547-
DataFrame as its join key(s). In the case of a DataFrame with a MultiIndex
547+
DataFrame or Series as its join key(s). In the case of a DataFrame or Series with a MultiIndex
548548
(hierarchical), the number of levels must match the number of join keys
549-
from the right DataFrame.
550-
* ``right_index``: Same usage as ``left_index`` for the right DataFrame
549+
from the right DataFrame or Series.
550+
* ``right_index``: Same usage as ``left_index`` for the right DataFrame or Series
551551
* ``how``: One of ``'left'``, ``'right'``, ``'outer'``, ``'inner'``. Defaults
552552
to ``inner``. See below for more detailed description of each method.
553553
* ``sort``: Sort the result DataFrame by the join keys in lexicographical
554554
order. Defaults to ``True``, setting to ``False`` will improve performance
555555
substantially in many cases.
556556
* ``suffixes``: A tuple of string suffixes to apply to overlapping
557557
columns. Defaults to ``('_x', '_y')``.
558-
* ``copy``: Always copy data (default ``True``) from the passed DataFrame
558+
* ``copy``: Always copy data (default ``True``) from the passed DataFrame or named Series
559559
objects, even when reindexing is not necessary. Cannot be avoided in many
560560
cases but may improve performance / memory usage. The cases where copying
561561
can be avoided are somewhat pathological but this option is provided
562562
nonetheless.
563563
* ``indicator``: Add a column to the output DataFrame called ``_merge``
564564
with information on the source of each row. ``_merge`` is Categorical-type
565565
and takes on a value of ``left_only`` for observations whose merge key
566-
only appears in ``'left'`` DataFrame, ``right_only`` for observations whose
567-
merge key only appears in ``'right'`` DataFrame, and ``both`` if the
566+
only appears in ``'left'`` DataFrame or Series, ``right_only`` for observations whose
567+
merge key only appears in ``'right'`` DataFrame or Series, and ``both`` if the
568568
observation's merge key is found in both.
569569

570570
* ``validate`` : string, default None.
@@ -584,10 +584,10 @@ all standard database join operations between ``DataFrame`` objects:
584584

585585
Support for specifying index levels as the ``on``, ``left_on``, and
586586
``right_on`` parameters was added in version 0.23.0.
587+
Support for merging named ``Series`` objects was added in version 0.24.0.
587588

588-
The return type will be the same as ``left``. If ``left`` is a ``DataFrame``
589-
and ``right`` is a subclass of DataFrame, the return type will still be
590-
``DataFrame``.
589+
The return type will be the same as ``left``. If ``left`` is a ``DataFrame`` or named ``Series``
590+
and ``right`` is a subclass of ``DataFrame``, the return type will still be ``DataFrame``.
591591

592592
``merge`` is a function in the pandas namespace, and it is also available as a
593593
``DataFrame`` instance method :meth:`~DataFrame.merge`, with the calling

doc/source/style.ipynb

+4-1
Original file line numberDiff line numberDiff line change
@@ -985,7 +985,10 @@
985985
"- `vertical-align`\n",
986986
"- `white-space: nowrap`\n",
987987
"\n",
988-
"Only CSS2 named colors and hex colors of the form `#rgb` or `#rrggbb` are currently supported."
988+
"Only CSS2 named colors and hex colors of the form `#rgb` or `#rrggbb` are currently supported.\n",
989+
"\n",
990+
"The following pseudo CSS properties are also available to set excel specific style properties:\n",
991+
"- `number-format`\n"
989992
]
990993
},
991994
{

doc/source/whatsnew.rst

+2
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ These are new features and improvements of note in each release.
2020

2121
.. include:: whatsnew/v0.24.0.txt
2222

23+
.. include:: whatsnew/v0.23.3.txt
24+
2325
.. include:: whatsnew/v0.23.2.txt
2426

2527
.. include:: whatsnew/v0.23.1.txt

doc/source/whatsnew/v0.23.0.txt

-1
Original file line numberDiff line numberDiff line change
@@ -1245,7 +1245,6 @@ Offsets
12451245
- Bug in :class:`FY5253` where ``datetime`` addition and subtraction incremented incorrectly for dates on the year-end but not normalized to midnight (:issue:`18854`)
12461246
- Bug in :class:`FY5253` where date offsets could incorrectly raise an ``AssertionError`` in arithmetic operations (:issue:`14774`)
12471247

1248-
12491248
Numeric
12501249
^^^^^^^
12511250
- Bug in :class:`Series` constructor with an int or float list where specifying ``dtype=str``, ``dtype='str'`` or ``dtype='U'`` failed to convert the data elements to strings (:issue:`16605`)

doc/source/whatsnew/v0.23.1.txt

+5
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@ v0.23.1 (June 12, 2018)
66
This is a minor bug-fix release in the 0.23.x series and includes some small regression fixes
77
and bug fixes. We recommend that all users upgrade to this version.
88

9+
.. warning::
10+
11+
Starting January 1, 2019, pandas feature releases will support Python 3 only.
12+
See :ref:`install.dropping-27` for more.
13+
914
.. contents:: What's new in v0.23.1
1015
:local:
1116
:backlinks: none

doc/source/whatsnew/v0.23.2.txt

+4
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,10 @@ and bug fixes. We recommend that all users upgrade to this version.
1111
Pandas 0.23.2 is first pandas release that's compatible with
1212
Python 3.7 (:issue:`20552`)
1313

14+
.. warning::
15+
16+
Starting January 1, 2019, pandas feature releases will support Python 3 only.
17+
See :ref:`install.dropping-27` for more.
1418

1519
.. contents:: What's new in v0.23.2
1620
:local:

doc/source/whatsnew/v0.23.4.txt

+10-1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ v0.23.4
66
This is a minor bug-fix release in the 0.23.x series and includes some small regression fixes
77
and bug fixes. We recommend that all users upgrade to this version.
88

9+
.. warning::
10+
11+
Starting January 1, 2019, pandas feature releases will support Python 3 only.
12+
See :ref:`install.dropping-27` for more.
913

1014
.. contents:: What's new in v0.23.4
1115
:local:
@@ -16,7 +20,7 @@ and bug fixes. We recommend that all users upgrade to this version.
1620
Fixed Regressions
1721
~~~~~~~~~~~~~~~~~
1822

19-
-
23+
- Python 3.7 with Windows gave all missing values for rolling variance calculations (:issue:`21813`)
2024
-
2125

2226
.. _whatsnew_0234.bug_fixes:
@@ -27,6 +31,7 @@ Bug Fixes
2731
**Groupby/Resample/Rolling**
2832

2933
- Bug where calling :func:`DataFrameGroupBy.agg` with a list of functions including ``ohlc`` as the non-initial element would raise a ``ValueError`` (:issue:`21716`)
34+
- Bug in ``roll_quantile`` caused a memory leak when calling ``.rolling(...).quantile(q)`` with ``q`` in (0,1) (:issue:`21965`)
3035
-
3136

3237
**Conversion**
@@ -58,3 +63,7 @@ Bug Fixes
5863

5964
-
6065
-
66+
67+
**Missing**
68+
69+
- Bug in :func:`Series.clip` and :func:`DataFrame.clip` cannot accept list-like threshold containing ``NaN`` (:issue:`19992`)

0 commit comments

Comments
 (0)