Skip to content

Commit b5a4112

Browse files
committed
Merge remote-tracking branch 'upstream/master' into integer-array-pow-2
2 parents 9e5a69c + 8841969 commit b5a4112

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+597
-355
lines changed

ci/deps/azure-36-locale_slow.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ dependencies:
1818
- lxml
1919
- matplotlib=2.2.2
2020
- numpy=1.14.*
21-
- openpyxl=2.4.8
21+
- openpyxl=2.5.7
2222
- python-dateutil
2323
- python-blosc
2424
- pytz=2017.2

ci/deps/azure-36-minimum_versions.yaml

+2-1
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,15 @@ dependencies:
1111
- pytest-xdist>=1.21
1212
- hypothesis>=3.58.0
1313
- pytest-azurepipelines
14+
- psutil
1415

1516
# pandas dependencies
1617
- beautifulsoup4=4.6.0
1718
- bottleneck=1.2.1
1819
- jinja2=2.8
1920
- numexpr=2.6.2
2021
- numpy=1.13.3
21-
- openpyxl=2.4.8
22+
- openpyxl=2.5.7
2223
- pytables=3.4.2
2324
- python-dateutil=2.6.1
2425
- pytz=2017.2

doc/source/getting_started/install.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -255,10 +255,10 @@ gcsfs 0.2.2 Google Cloud Storage access
255255
html5lib HTML parser for read_html (see :ref:`note <optional_html>`)
256256
lxml 3.8.0 HTML parser for read_html (see :ref:`note <optional_html>`)
257257
matplotlib 2.2.2 Visualization
258-
openpyxl 2.4.8 Reading / writing for xlsx files
258+
openpyxl 2.5.7 Reading / writing for xlsx files
259259
pandas-gbq 0.8.0 Google Big Query access
260260
psycopg2 PostgreSQL engine for sqlalchemy
261-
pyarrow 0.12.0 Parquet and feather reading / writing
261+
pyarrow 0.12.0 Parquet, ORC (requires 0.13.0), and feather reading / writing
262262
pymysql 0.7.11 MySQL engine for sqlalchemy
263263
pyreadstat SPSS files (.sav) reading
264264
pytables 3.4.2 HDF5 reading / writing

doc/source/reference/io.rst

+7
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,13 @@ Parquet
9898

9999
read_parquet
100100

101+
ORC
102+
~~~
103+
.. autosummary::
104+
:toctree: api/
105+
106+
read_orc
107+
101108
SAS
102109
~~~
103110
.. autosummary::

doc/source/user_guide/io.rst

+12-3
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
2828
binary;`HDF5 Format <https://support.hdfgroup.org/HDF5/whatishdf5.html>`__;:ref:`read_hdf<io.hdf5>`;:ref:`to_hdf<io.hdf5>`
2929
binary;`Feather Format <https://github.com/wesm/feather>`__;:ref:`read_feather<io.feather>`;:ref:`to_feather<io.feather>`
3030
binary;`Parquet Format <https://parquet.apache.org/>`__;:ref:`read_parquet<io.parquet>`;:ref:`to_parquet<io.parquet>`
31+
binary;`ORC Format <//https://orc.apache.org/>`__;:ref:`read_orc<io.orc>`;
3132
binary;`Msgpack <https://msgpack.org/index.html>`__;:ref:`read_msgpack<io.msgpack>`;:ref:`to_msgpack<io.msgpack>`
3233
binary;`Stata <https://en.wikipedia.org/wiki/Stata>`__;:ref:`read_stata<io.stata_reader>`;:ref:`to_stata<io.stata_writer>`
3334
binary;`SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__;:ref:`read_sas<io.sas_reader>`;
@@ -4858,6 +4859,17 @@ The above example creates a partitioned dataset that may look like:
48584859
except OSError:
48594860
pass
48604861
4862+
.. _io.orc:
4863+
4864+
ORC
4865+
---
4866+
4867+
.. versionadded:: 1.0.0
4868+
4869+
Similar to the :ref:`parquet <io.parquet>` format, the `ORC Format <//https://orc.apache.org/>`__ is a binary columnar serialization
4870+
for data frames. It is designed to make reading data frames efficient. Pandas provides *only* a reader for the
4871+
ORC format, :func:`~pandas.read_orc`. This requires the `pyarrow <https://arrow.apache.org/docs/python/>`__ library.
4872+
48614873
.. _io.sql:
48624874

48634875
SQL queries
@@ -5761,6 +5773,3 @@ Space on disk (in bytes)
57615773
24009288 Oct 10 06:43 test_fixed_compress.hdf
57625774
24458940 Oct 10 06:44 test_table.hdf
57635775
24458940 Oct 10 06:44 test_table_compress.hdf
5764-
5765-
5766-

doc/source/whatsnew/v1.0.0.rst

+6-4
Original file line numberDiff line numberDiff line change
@@ -424,7 +424,7 @@ Optional libraries below the lowest tested version may still work, but are not c
424424
+-----------------+-----------------+---------+
425425
| matplotlib | 2.2.2 | |
426426
+-----------------+-----------------+---------+
427-
| openpyxl | 2.4.8 | |
427+
| openpyxl | 2.5.7 | X |
428428
+-----------------+-----------------+---------+
429429
| pyarrow | 0.12.0 | X |
430430
+-----------------+-----------------+---------+
@@ -541,7 +541,7 @@ or ``matplotlib.Axes.plot``. See :ref:`plotting.formatters` for more.
541541
- Removed the previously deprecated :meth:`Series.compound` and :meth:`DataFrame.compound` (:issue:`26405`)
542542
- Changed the the default value of `inplace` in :meth:`DataFrame.set_index` and :meth:`Series.set_axis`. It now defaults to ``False`` (:issue:`27600`)
543543
- Removed the previously deprecated :attr:`Series.cat.categorical`, :attr:`Series.cat.index`, :attr:`Series.cat.name` (:issue:`24751`)
544-
- :func:`to_datetime` no longer accepts "box" argument, always returns :class:`DatetimeIndex` or :class:`Index`, :class:`Series`, or :class:`DataFrame` (:issue:`24486`)
544+
- :func:`to_datetime` and :func:`to_timedelta` no longer accept "box" argument, always returns :class:`DatetimeIndex`, :class:`TimedeltaIndex`, :class:`Index`, :class:`Series`, or :class:`DataFrame` (:issue:`24486`)
545545
- :func:`to_timedelta`, :class:`Timedelta`, and :class:`TimedeltaIndex` no longer allow "M", "y", or "Y" for the "unit" argument (:issue:`23264`)
546546
- Removed the previously deprecated ``time_rule`` keyword from (non-public) :func:`offsets.generate_range`, which has been moved to :func:`core.arrays._ranges.generate_range` (:issue:`24157`)
547547
- :meth:`DataFrame.loc` or :meth:`Series.loc` with listlike indexers and missing labels will no longer reindex (:issue:`17295`)
@@ -713,6 +713,7 @@ Numeric
713713
- Bug in :class:`NumericIndex` construction that caused indexing to fail when integers in the ``np.uint64`` range were used (:issue:`28023`)
714714
- Bug in :class:`NumericIndex` construction that caused :class:`UInt64Index` to be casted to :class:`Float64Index` when integers in the ``np.uint64`` range were used to index a :class:`DataFrame` (:issue:`28279`)
715715
- Bug in :meth:`Series.interpolate` when using method=`index` with an unsorted index, would previously return incorrect results. (:issue:`21037`)
716+
- Bug in :meth:`DataFrame.round` where a :class:`DataFrame` with a :class:`CategoricalIndex` of :class:`IntervalIndex` columns would incorrectly raise a ``TypeError`` (:issue:`30063`)
716717

717718
Conversion
718719
^^^^^^^^^^
@@ -730,7 +731,7 @@ Strings
730731
Interval
731732
^^^^^^^^
732733

733-
-
734+
- Bug in :meth:`IntervalIndex.get_indexer` where a :class:`Categorical` or :class:`CategoricalIndex` ``target`` would incorrectly raise a ``TypeError`` (:issue:`30063`)
734735
-
735736

736737
Indexing
@@ -742,6 +743,7 @@ Indexing
742743
- Fix assignment of column via `.loc` with numpy non-ns datetime type (:issue:`27395`)
743744
- Bug in :meth:`Float64Index.astype` where ``np.inf`` was not handled properly when casting to an integer dtype (:issue:`28475`)
744745
- :meth:`Index.union` could fail when the left contained duplicates (:issue:`28257`)
746+
- Bug when indexing with ``.loc`` where the index was a :class:`CategoricalIndex` with integer and float categories, a ValueError was raised (:issue:`17569`)
745747
- :meth:`Index.get_indexer_non_unique` could fail with `TypeError` in some cases, such as when searching for ints in a string index (:issue:`28257`)
746748
- Bug in :meth:`Float64Index.get_loc` incorrectly raising ``TypeError`` instead of ``KeyError`` (:issue:`29189`)
747749

@@ -858,7 +860,7 @@ Other
858860
- Bug in :meth:`DataFrame.append` that raised ``IndexError`` when appending with empty list (:issue:`28769`)
859861
- Fix :class:`AbstractHolidayCalendar` to return correct results for
860862
years after 2030 (now goes up to 2200) (:issue:`27790`)
861-
- Fixed :class:`IntegerArray` returning ``NA`` rather than ``inf`` for operations dividing by 0 (:issue:`27398`)
863+
- Fixed :class:`IntegerArray` returning ``inf`` rather than ``NA`` for operations dividing by 0 (:issue:`27398`)
862864
- Fixed ``pow`` operations for :class:`IntegerArray` when the other value is ``0`` or ``1`` (:issue:`29997`)
863865
- Bug in :meth:`Series.count` raises if use_inf_as_na is enabled (:issue:`29478`)
864866

pandas/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,7 @@
168168
# misc
169169
read_clipboard,
170170
read_parquet,
171+
read_orc,
171172
read_feather,
172173
read_gbq,
173174
read_html,

pandas/_libs/testing.pyx

+8-8
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,8 @@ cpdef assert_almost_equal(a, b,
6666
check_less_precise=False,
6767
bint check_dtype=True,
6868
obj=None, lobj=None, robj=None):
69-
"""Check that left and right objects are almost equal.
69+
"""
70+
Check that left and right objects are almost equal.
7071
7172
Parameters
7273
----------
@@ -89,7 +90,6 @@ cpdef assert_almost_equal(a, b,
8990
Specify right object name being compared, internally used to show
9091
appropriate assertion message
9192
"""
92-
9393
cdef:
9494
int decimal
9595
double diff = 0.0
@@ -127,9 +127,9 @@ cpdef assert_almost_equal(a, b,
127127
# classes can't be the same, to raise error
128128
assert_class_equal(a, b, obj=obj)
129129

130-
assert has_length(a) and has_length(b), (
131-
f"Can't compare objects without length, one or both is invalid: "
132-
f"({a}, {b})")
130+
assert has_length(a) and has_length(b), ("Can't compare objects without "
131+
"length, one or both is invalid: "
132+
f"({a}, {b})")
133133

134134
if a_is_ndarray and b_is_ndarray:
135135
na, nb = a.size, b.size
@@ -157,7 +157,7 @@ cpdef assert_almost_equal(a, b,
157157
else:
158158
r = None
159159

160-
raise_assert_detail(obj, f'{obj} length are different', na, nb, r)
160+
raise_assert_detail(obj, f"{obj} length are different", na, nb, r)
161161

162162
for i in xrange(len(a)):
163163
try:
@@ -169,8 +169,8 @@ cpdef assert_almost_equal(a, b,
169169

170170
if is_unequal:
171171
from pandas.util.testing import raise_assert_detail
172-
msg = (f'{obj} values are different '
173-
f'({np.round(diff * 100.0 / na, 5)} %)')
172+
msg = (f"{obj} values are different "
173+
f"({np.round(diff * 100.0 / na, 5)} %)")
174174
raise_assert_detail(obj, msg, lobj, robj)
175175

176176
return True

pandas/_libs/tslibs/timedeltas.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -1460,7 +1460,7 @@ class Timedelta(_Timedelta):
14601460
# also timedelta-like
14611461
return _broadcast_floordiv_td64(self.value, other, _rfloordiv)
14621462

1463-
# Includes integer array // Timedelta, deprecated in GH#19761
1463+
# Includes integer array // Timedelta, disallowed in GH#19761
14641464
raise TypeError(f'Invalid dtype {other.dtype} for __floordiv__')
14651465

14661466
elif is_float_object(other) and util.is_nan(other):

pandas/compat/_optional.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
"matplotlib": "2.2.2",
1515
"numexpr": "2.6.2",
1616
"odfpy": "1.3.0",
17-
"openpyxl": "2.4.8",
17+
"openpyxl": "2.5.7",
1818
"pandas_gbq": "0.8.0",
1919
"pyarrow": "0.12.0",
2020
"pytables": "3.4.2",

pandas/compat/pickle_compat.py

+4-3
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
import copy
66
import pickle as pkl
7-
from typing import TYPE_CHECKING
7+
from typing import TYPE_CHECKING, Optional
88
import warnings
99

1010
from pandas import Index
@@ -219,8 +219,9 @@ def load_newobj_ex(self):
219219
pass
220220

221221

222-
def load(fh, encoding=None, is_verbose=False):
223-
"""load a pickle, with a provided encoding
222+
def load(fh, encoding: Optional[str] = None, is_verbose: bool = False):
223+
"""
224+
Load a pickle, with a provided encoding,
224225
225226
Parameters
226227
----------

0 commit comments

Comments
 (0)