Skip to content

Commit 0ec38e2

Browse files
author
awu42
committed
Merge remote-tracking branch 'upstream/master' into new-feature
2 parents 56bfc44 + e31c5ad commit 0ec38e2

File tree

119 files changed

+2152
-6840
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

119 files changed

+2152
-6840
lines changed

asv_bench/benchmarks/reshape.py

+3
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,9 @@ def time_pivot_table_categorical_observed(self):
161161
observed=True,
162162
)
163163

164+
def time_pivot_table_margins_only_column(self):
165+
self.df.pivot_table(columns=["key2", "key3"], margins=True)
166+
164167

165168
class Crosstab:
166169
def setup(self):

ci/deps/azure-37-locale.yaml

+3
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,6 @@ dependencies:
3434
- xlsxwriter
3535
- xlwt
3636
- pyarrow>=0.15
37+
- pip
38+
- pip:
39+
- pyxlsb

ci/deps/azure-macos-36.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,4 @@ dependencies:
3333
- pip
3434
- pip:
3535
- pyreadstat
36+
- pyxlsb

ci/deps/azure-windows-37.yaml

+3
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,6 @@ dependencies:
3535
- xlsxwriter
3636
- xlwt
3737
- pyreadstat
38+
- pip
39+
- pip:
40+
- pyxlsb

ci/deps/travis-36-cov.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,4 @@ dependencies:
5151
- coverage
5252
- pandas-datareader
5353
- python-dateutil
54+
- pyxlsb

ci/print_skipped.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/usr/bin/env python
1+
#!/usr/bin/env python3
22
import os
33
import xml.etree.ElementTree as et
44

doc/make.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/usr/bin/env python
1+
#!/usr/bin/env python3
22
"""
33
Python script for building documentation.
44

doc/source/getting_started/install.rst

+1
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,7 @@ pyarrow 0.12.0 Parquet, ORC (requires 0.13.0), and
264264
pymysql 0.7.11 MySQL engine for sqlalchemy
265265
pyreadstat SPSS files (.sav) reading
266266
pytables 3.4.2 HDF5 reading / writing
267+
pyxlsb 1.0.5 Reading for xlsb files
267268
qtpy Clipboard I/O
268269
s3fs 0.3.0 Amazon S3 access
269270
tabulate 0.8.3 Printing in Markdown-friendly format (see `tabulate`_)

doc/source/user_guide/io.rst

+70-42
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
2323
text;`JSON <https://www.json.org/>`__;:ref:`read_json<io.json_reader>`;:ref:`to_json<io.json_writer>`
2424
text;`HTML <https://en.wikipedia.org/wiki/HTML>`__;:ref:`read_html<io.read_html>`;:ref:`to_html<io.html>`
2525
text; Local clipboard;:ref:`read_clipboard<io.clipboard>`;:ref:`to_clipboard<io.clipboard>`
26-
binary;`MS Excel <https://en.wikipedia.org/wiki/Microsoft_Excel>`__;:ref:`read_excel<io.excel_reader>`;:ref:`to_excel<io.excel_writer>`
26+
;`MS Excel <https://en.wikipedia.org/wiki/Microsoft_Excel>`__;:ref:`read_excel<io.excel_reader>`;:ref:`to_excel<io.excel_writer>`
2727
binary;`OpenDocument <http://www.opendocumentformat.org>`__;:ref:`read_excel<io.ods>`;
2828
binary;`HDF5 Format <https://support.hdfgroup.org/HDF5/whatishdf5.html>`__;:ref:`read_hdf<io.hdf5>`;:ref:`to_hdf<io.hdf5>`
2929
binary;`Feather Format <https://github.com/wesm/feather>`__;:ref:`read_feather<io.feather>`;:ref:`to_feather<io.feather>`
@@ -2768,7 +2768,8 @@ Excel files
27682768

27692769
The :func:`~pandas.read_excel` method can read Excel 2003 (``.xls``)
27702770
files using the ``xlrd`` Python module. Excel 2007+ (``.xlsx``) files
2771-
can be read using either ``xlrd`` or ``openpyxl``.
2771+
can be read using either ``xlrd`` or ``openpyxl``. Binary Excel (``.xlsb``)
2772+
files can be read using ``pyxlsb``.
27722773
The :meth:`~DataFrame.to_excel` instance method is used for
27732774
saving a ``DataFrame`` to Excel. Generally the semantics are
27742775
similar to working with :ref:`csv<io.read_csv_table>` data.
@@ -3229,6 +3230,30 @@ OpenDocument spreadsheets match what can be done for `Excel files`_ using
32293230
Currently pandas only supports *reading* OpenDocument spreadsheets. Writing
32303231
is not implemented.
32313232

3233+
.. _io.xlsb:
3234+
3235+
Binary Excel (.xlsb) files
3236+
--------------------------
3237+
3238+
.. versionadded:: 1.0.0
3239+
3240+
The :func:`~pandas.read_excel` method can also read binary Excel files
3241+
using the ``pyxlsb`` module. The semantics and features for reading
3242+
binary Excel files mostly match what can be done for `Excel files`_ using
3243+
``engine='pyxlsb'``. ``pyxlsb`` does not recognize datetime types
3244+
in files and will return floats instead.
3245+
3246+
.. code-block:: python
3247+
3248+
# Returns a DataFrame
3249+
pd.read_excel('path_to_file.xlsb', engine='pyxlsb')
3250+
3251+
.. note::
3252+
3253+
Currently pandas only supports *reading* binary Excel files. Writing
3254+
is not implemented.
3255+
3256+
32323257
.. _io.clipboard:
32333258

32343259
Clipboard
@@ -4220,46 +4245,49 @@ Compression
42204245
all kinds of stores, not just tables. Two parameters are used to
42214246
control compression: ``complevel`` and ``complib``.
42224247

4223-
``complevel`` specifies if and how hard data is to be compressed.
4224-
``complevel=0`` and ``complevel=None`` disables
4225-
compression and ``0<complevel<10`` enables compression.
4226-
4227-
``complib`` specifies which compression library to use. If nothing is
4228-
specified the default library ``zlib`` is used. A
4229-
compression library usually optimizes for either good
4230-
compression rates or speed and the results will depend on
4231-
the type of data. Which type of
4232-
compression to choose depends on your specific needs and
4233-
data. The list of supported compression libraries:
4234-
4235-
- `zlib <https://zlib.net/>`_: The default compression library. A classic in terms of compression, achieves good compression rates but is somewhat slow.
4236-
- `lzo <https://www.oberhumer.com/opensource/lzo/>`_: Fast compression and decompression.
4237-
- `bzip2 <http://bzip.org/>`_: Good compression rates.
4238-
- `blosc <http://www.blosc.org/>`_: Fast compression and decompression.
4239-
4240-
Support for alternative blosc compressors:
4241-
4242-
- `blosc:blosclz <http://www.blosc.org/>`_ This is the
4243-
default compressor for ``blosc``
4244-
- `blosc:lz4
4245-
<https://fastcompression.blogspot.dk/p/lz4.html>`_:
4246-
A compact, very popular and fast compressor.
4247-
- `blosc:lz4hc
4248-
<https://fastcompression.blogspot.dk/p/lz4.html>`_:
4249-
A tweaked version of LZ4, produces better
4250-
compression ratios at the expense of speed.
4251-
- `blosc:snappy <https://google.github.io/snappy/>`_:
4252-
A popular compressor used in many places.
4253-
- `blosc:zlib <https://zlib.net/>`_: A classic;
4254-
somewhat slower than the previous ones, but
4255-
achieving better compression ratios.
4256-
- `blosc:zstd <https://facebook.github.io/zstd/>`_: An
4257-
extremely well balanced codec; it provides the best
4258-
compression ratios among the others above, and at
4259-
reasonably fast speed.
4260-
4261-
If ``complib`` is defined as something other than the
4262-
listed libraries a ``ValueError`` exception is issued.
4248+
* ``complevel`` specifies if and how hard data is to be compressed.
4249+
``complevel=0`` and ``complevel=None`` disables compression and
4250+
``0<complevel<10`` enables compression.
4251+
4252+
* ``complib`` specifies which compression library to use.
4253+
If nothing is specified the default library ``zlib`` is used. A
4254+
compression library usually optimizes for either good compression rates
4255+
or speed and the results will depend on the type of data. Which type of
4256+
compression to choose depends on your specific needs and data. The list
4257+
of supported compression libraries:
4258+
4259+
- `zlib <https://zlib.net/>`_: The default compression library.
4260+
A classic in terms of compression, achieves good compression
4261+
rates but is somewhat slow.
4262+
- `lzo <https://www.oberhumer.com/opensource/lzo/>`_: Fast
4263+
compression and decompression.
4264+
- `bzip2 <http://bzip.org/>`_: Good compression rates.
4265+
- `blosc <http://www.blosc.org/>`_: Fast compression and
4266+
decompression.
4267+
4268+
Support for alternative blosc compressors:
4269+
4270+
- `blosc:blosclz <http://www.blosc.org/>`_ This is the
4271+
default compressor for ``blosc``
4272+
- `blosc:lz4
4273+
<https://fastcompression.blogspot.dk/p/lz4.html>`_:
4274+
A compact, very popular and fast compressor.
4275+
- `blosc:lz4hc
4276+
<https://fastcompression.blogspot.dk/p/lz4.html>`_:
4277+
A tweaked version of LZ4, produces better
4278+
compression ratios at the expense of speed.
4279+
- `blosc:snappy <https://google.github.io/snappy/>`_:
4280+
A popular compressor used in many places.
4281+
- `blosc:zlib <https://zlib.net/>`_: A classic;
4282+
somewhat slower than the previous ones, but
4283+
achieving better compression ratios.
4284+
- `blosc:zstd <https://facebook.github.io/zstd/>`_: An
4285+
extremely well balanced codec; it provides the best
4286+
compression ratios among the others above, and at
4287+
reasonably fast speed.
4288+
4289+
If ``complib`` is defined as something other than the listed libraries a
4290+
``ValueError`` exception is issued.
42634291

42644292
.. note::
42654293

doc/source/user_guide/timeseries.rst

+5
Original file line numberDiff line numberDiff line change
@@ -1951,6 +1951,10 @@ The ``period`` dtype can be used in ``.astype(...)``. It allows one to change th
19511951
PeriodIndex partial string indexing
19521952
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
19531953

1954+
PeriodIndex now supports partial string slicing with non-monotonic indexes.
1955+
1956+
.. versionadded:: 1.1.0
1957+
19541958
You can pass in dates and strings to ``Series`` and ``DataFrame`` with ``PeriodIndex``, in the same manner as ``DatetimeIndex``. For details, refer to :ref:`DatetimeIndex Partial String Indexing <timeseries.partialindexing>`.
19551959

19561960
.. ipython:: python
@@ -1981,6 +1985,7 @@ As with ``DatetimeIndex``, the endpoints will be included in the result. The exa
19811985
19821986
dfp['2013-01-01 10H':'2013-01-01 11H']
19831987
1988+
19841989
Frequency conversion and resampling with PeriodIndex
19851990
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
19861991
The frequency of ``Period`` and ``PeriodIndex`` can be converted via the ``asfreq``

doc/source/whatsnew/v1.0.0.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,8 @@ Other enhancements
215215
- :meth:`Styler.format` added the ``na_rep`` parameter to help format the missing values (:issue:`21527`, :issue:`28358`)
216216
- Roundtripping DataFrames with nullable integer, string and period data types to parquet
217217
(:meth:`~DataFrame.to_parquet` / :func:`read_parquet`) using the `'pyarrow'` engine
218-
now preserve those data types with pyarrow >= 0.16.0 (:issue:`20612`, :issue:`28371`).
218+
now preserve those data types with pyarrow >= 1.0.0 (:issue:`20612`).
219+
- :func:`read_excel` now can read binary Excel (``.xlsb``) files by passing ``engine='pyxlsb'``. For more details and example usage, see the :ref:`Binary Excel files documentation <io.xlsb>`. Closes :issue:`8540`.
219220
- The ``partition_cols`` argument in :meth:`DataFrame.to_parquet` now accepts a string (:issue:`27117`)
220221
- :func:`pandas.read_json` now parses ``NaN``, ``Infinity`` and ``-Infinity`` (:issue:`12213`)
221222
- :func:`to_parquet` now appropriately handles the ``schema`` argument for user defined schemas in the pyarrow engine. (:issue:`30270`)

doc/source/whatsnew/v1.1.0.rst

+39-5
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,47 @@ including other versions of pandas.
1313
Enhancements
1414
~~~~~~~~~~~~
1515

16+
.. _whatsnew_110.period_index_partial_string_slicing:
17+
18+
Nonmonotonic PeriodIndex Partial String Slicing
19+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20+
:class:`PeriodIndex` now supports partial string slicing for non-monotonic indexes, mirroring :class:`DatetimeIndex` behavior (:issue:`31096`)
21+
22+
For example:
23+
24+
.. ipython:: python
25+
26+
dti = pd.date_range("2014-01-01", periods=30, freq="30D")
27+
pi = dti.to_period("D")
28+
ser_monotonic = pd.Series(np.arange(30), index=pi)
29+
shuffler = list(range(0, 30, 2)) + list(range(1, 31, 2))
30+
ser = ser_monotonic[shuffler]
31+
ser
32+
33+
.. ipython:: python
34+
ser["2014"]
35+
ser.loc["May 2015"]
36+
1637
.. _whatsnew_110.enhancements.other:
1738

1839
Other enhancements
1940
^^^^^^^^^^^^^^^^^^
2041

42+
- :class:`Styler` may now render CSS more efficiently where multiple cells have the same styling (:issue:`30876`)
2143
-
2244
-
2345

46+
.. ---------------------------------------------------------------------------
47+
48+
.. _whatsnew_110.api.other:
49+
50+
Other API changes
51+
^^^^^^^^^^^^^^^^^
52+
53+
- :meth:`Series.describe` will now show distribution percentiles for ``datetime`` dtypes, statistics ``first`` and ``last``
54+
will now be ``min`` and ``max`` to match with numeric dtypes in :meth:`DataFrame.describe` (:issue:`30164`)
55+
-
56+
-
2457

2558
.. ---------------------------------------------------------------------------
2659
@@ -133,18 +166,18 @@ Plotting
133166
Groupby/resample/rolling
134167
^^^^^^^^^^^^^^^^^^^^^^^^
135168

136-
-
137-
-
138-
169+
- Bug in :meth:`GroupBy.apply` raises ``ValueError`` when the ``by`` axis is not sorted and has duplicates and the applied ``func`` does not mutate passed in objects (:issue:`30667`)
139170

140171
Reshaping
141172
^^^^^^^^^
142173

143174
-
144175
- Bug in :meth:`DataFrame.pivot_table` when only MultiIndexed columns is set (:issue:`17038`)
176+
- Bug in :meth:`DataFrame.unstack` and :meth:`Series.unstack` can take tuple names in MultiIndexed data (:issue:`19966`)
177+
- Bug in :meth:`DataFrame.pivot_table` when ``margin`` is ``True`` and only ``column`` is defined (:issue:`31016`)
145178
- Fix incorrect error message in :meth:`DataFrame.pivot` when ``columns`` is set to ``None``. (:issue:`30924`)
146179
- Bug in :func:`crosstab` when inputs are two Series and have tuple names, the output will keep dummy MultiIndex as columns. (:issue:`18321`)
147-
180+
- Bug in :func:`concat` where the resulting indices are not copied when ``copy=True`` (:issue:`29879`)
148181

149182
Sparse
150183
^^^^^^
@@ -161,7 +194,8 @@ ExtensionArray
161194

162195
Other
163196
^^^^^
164-
-
197+
- Appending a dictionary to a :class:`DataFrame` without passing ``ignore_index=True`` will raise ``TypeError: Can only append a dict if ignore_index=True``
198+
instead of ``TypeError: Can only append a Series if ignore_index=True or if the Series has a name`` (:issue:`30871`)
165199
-
166200

167201
.. ---------------------------------------------------------------------------

doc/sphinxext/announce.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/usr/bin/env python
1+
#!/usr/bin/env python3
22
# -*- encoding:utf-8 -*-
33
"""
44
Script to generate contributor and pull request lists

pandas/_libs/index.pyx

+9-4
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,10 @@ cdef class IndexEngine:
7272
self.over_size_threshold = n >= _SIZE_CUTOFF
7373
self.clear_mapping()
7474

75-
def __contains__(self, object val):
75+
def __contains__(self, val: object) -> bool:
76+
# We assume before we get here:
77+
# - val is hashable
7678
self._ensure_mapping_populated()
77-
hash(val)
7879
return val in self.mapping
7980

8081
cpdef get_value(self, ndarray arr, object key, object tz=None):
@@ -415,7 +416,9 @@ cdef class DatetimeEngine(Int64Engine):
415416
raise TypeError(scalar)
416417
return scalar.value
417418

418-
def __contains__(self, object val):
419+
def __contains__(self, val: object) -> bool:
420+
# We assume before we get here:
421+
# - val is hashable
419422
cdef:
420423
int64_t loc, conv
421424

@@ -712,7 +715,9 @@ cdef class BaseMultiIndexCodesEngine:
712715

713716
return indexer
714717

715-
def __contains__(self, object val):
718+
def __contains__(self, val: object) -> bool:
719+
# We assume before we get here:
720+
# - val is hashable
716721
# Default __contains__ looks in the underlying mapping, which in this
717722
# case only contains integer representations.
718723
try:

0 commit comments

Comments
 (0)