Skip to content

Commit 7018a8a

Browse files
committed
Merge remote-tracking branch 'upstream/master' into fix-nunique-groupby
2 parents c1b52ba + a76df79 commit 7018a8a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+605
-239
lines changed

ci/deps/azure-macos-35.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ dependencies:
2222
- xlrd
2323
- xlsxwriter
2424
- xlwt
25+
- pip
2526
- pip:
2627
- pyreadstat
2728
# universal

ci/run_tests.sh

+6-5
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,10 @@ do
5050
# if no tests are found (the case of "single and slow"), pytest exits with code 5, and would make the script fail, if not for the below code
5151
sh -c "$PYTEST_CMD; ret=\$?; [ \$ret = 5 ] && exit 0 || exit \$ret"
5252

53-
if [[ "$COVERAGE" && $? == 0 ]]; then
54-
echo "uploading coverage for $TYPE tests"
55-
echo "bash <(curl -s https://codecov.io/bash) -Z -c -F $TYPE -f $COVERAGE_FNAME"
56-
bash <(curl -s https://codecov.io/bash) -Z -c -F $TYPE -f $COVERAGE_FNAME
57-
fi
53+
# 2019-08-21 disabling because this is hitting HTTP 400 errors GH#27602
54+
# if [[ "$COVERAGE" && $? == 0 && "$TRAVIS_BRANCH" == "master" ]]; then
55+
# echo "uploading coverage for $TYPE tests"
56+
# echo "bash <(curl -s https://codecov.io/bash) -Z -c -F $TYPE -f $COVERAGE_FNAME"
57+
# bash <(curl -s https://codecov.io/bash) -Z -c -F $TYPE -f $COVERAGE_FNAME
58+
# fi
5859
done

doc/source/user_guide/enhancingperf.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -243,9 +243,9 @@ We've gotten another big improvement. Let's check again where the time is spent:
243243

244244
.. ipython:: python
245245
246-
%prun -l 4 apply_integrate_f(df['a'].to_numpy(),
247-
df['b'].to_numpy(),
248-
df['N'].to_numpy())
246+
%%prun -l 4 apply_integrate_f(df['a'].to_numpy(),
247+
df['b'].to_numpy(),
248+
df['N'].to_numpy())
249249
250250
As one might expect, the majority of the time is now spent in ``apply_integrate_f``,
251251
so if we wanted to make anymore efficiencies we must continue to concentrate our

doc/source/user_guide/io.rst

+2
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
2828
:delim: ;
2929

3030
text;`CSV <https://en.wikipedia.org/wiki/Comma-separated_values>`__;:ref:`read_csv<io.read_csv_table>`;:ref:`to_csv<io.store_in_csv>`
31+
text;Fixed-Width Text File;:ref:`read_fwf<io.fwf_reader>`
3132
text;`JSON <https://www.json.org/>`__;:ref:`read_json<io.json_reader>`;:ref:`to_json<io.json_writer>`
3233
text;`HTML <https://en.wikipedia.org/wiki/HTML>`__;:ref:`read_html<io.read_html>`;:ref:`to_html<io.html>`
3334
text; Local clipboard;:ref:`read_clipboard<io.clipboard>`;:ref:`to_clipboard<io.clipboard>`
@@ -1372,6 +1373,7 @@ should pass the ``escapechar`` option:
13721373
print(data)
13731374
pd.read_csv(StringIO(data), escapechar='\\')
13741375
1376+
.. _io.fwf_reader:
13751377
.. _io.fwf:
13761378

13771379
Files with fixed width columns

doc/source/whatsnew/v0.25.1.rst

+28-78
Original file line numberDiff line numberDiff line change
@@ -1,127 +1,96 @@
11
.. _whatsnew_0251:
22

3-
What's new in 0.25.1 (July XX, 2019)
4-
------------------------------------
5-
6-
Enhancements
7-
~~~~~~~~~~~~
3+
What's new in 0.25.1 (August 21, 2019)
4+
--------------------------------------
85

6+
These are the changes in pandas 0.25.1. See :ref:`release` for a full changelog
7+
including other versions of pandas.
98

10-
.. _whatsnew_0251.enhancements.other:
11-
12-
Other enhancements
13-
^^^^^^^^^^^^^^^^^^
9+
I/O and LZMA
10+
~~~~~~~~~~~~
1411

15-
-
16-
-
17-
-
12+
Some users may unknowingly have an incomplete Python installation lacking the `lzma` module from the standard library. In this case, `import pandas` failed due to an `ImportError` (:issue: `27575`).
13+
Pandas will now warn, rather than raising an `ImportError` if the `lzma` module is not present. Any subsequent attempt to use `lzma` methods will raise a `RuntimeError`.
14+
A possible fix for the lack of the `lzma` module is to ensure you have the necessary libraries and then re-install Python.
15+
For example, on MacOS installing Python with `pyenv` may lead to an incomplete Python installation due to unmet system dependencies at compilation time (like `xz`). Compilation will succeed, but Python might fail at run time. The issue can be solved by installing the necessary dependencies and then re-installing Python.
1816

1917
.. _whatsnew_0251.bug_fixes:
2018

2119
Bug fixes
2220
~~~~~~~~~
2321

24-
2522
Categorical
2623
^^^^^^^^^^^
2724

28-
- Bug in :meth:`Categorical.fillna` would replace all values, not just those that are ``NaN`` (:issue:`26215`)
29-
-
25+
- Bug in :meth:`Categorical.fillna` that would replace all values, not just those that are ``NaN`` (:issue:`26215`)
3026

3127
Datetimelike
3228
^^^^^^^^^^^^
33-
- Bug in :func:`to_datetime` where passing a timezone-naive :class:`DatetimeArray` or :class:`DatetimeIndex` and ``utc=True`` would incorrectly return a timezone-naive result (:issue:`27733`)
34-
-
35-
-
36-
-
3729

38-
Timedelta
39-
^^^^^^^^^
40-
41-
-
42-
-
43-
-
30+
- Bug in :func:`to_datetime` where passing a timezone-naive :class:`DatetimeArray` or :class:`DatetimeIndex` and ``utc=True`` would incorrectly return a timezone-naive result (:issue:`27733`)
31+
- Bug in :meth:`Period.to_timestamp` where a :class:`Period` outside the :class:`Timestamp` implementation bounds (roughly 1677-09-21 to 2262-04-11) would return an incorrect :class:`Timestamp` instead of raising ``OutOfBoundsDatetime`` (:issue:`19643`)
32+
- Bug in iterating over :class:`DatetimeIndex` when the underlying data is read-only (:issue:`28055`)
4433

4534
Timezones
4635
^^^^^^^^^
4736

4837
- Bug in :class:`Index` where a numpy object array with a timezone aware :class:`Timestamp` and ``np.nan`` would not return a :class:`DatetimeIndex` (:issue:`27011`)
49-
-
50-
-
5138

5239
Numeric
5340
^^^^^^^
41+
5442
- Bug in :meth:`Series.interpolate` when using a timezone aware :class:`DatetimeIndex` (:issue:`27548`)
5543
- Bug when printing negative floating point complex numbers would raise an ``IndexError`` (:issue:`27484`)
56-
-
57-
-
44+
- Bug where :class:`DataFrame` arithmetic operators such as :meth:`DataFrame.mul` with a :class:`Series` with axis=1 would raise an ``AttributeError`` on :class:`DataFrame` larger than the minimum threshold to invoke numexpr (:issue:`27636`)
45+
- Bug in :class:`DataFrame` arithmetic where missing values in results were incorrectly masked with ``NaN`` instead of ``Inf`` (:issue:`27464`)
5846

5947
Conversion
6048
^^^^^^^^^^
6149

6250
- Improved the warnings for the deprecated methods :meth:`Series.real` and :meth:`Series.imag` (:issue:`27610`)
63-
-
64-
-
65-
66-
Strings
67-
^^^^^^^
68-
69-
-
70-
-
71-
-
72-
7351

7452
Interval
7553
^^^^^^^^
54+
7655
- Bug in :class:`IntervalIndex` where `dir(obj)` would raise ``ValueError`` (:issue:`27571`)
77-
-
78-
-
79-
-
8056

8157
Indexing
8258
^^^^^^^^
8359

8460
- Bug in partial-string indexing returning a NumPy array rather than a ``Series`` when indexing with a scalar like ``.loc['2015']`` (:issue:`27516`)
8561
- Break reference cycle involving :class:`Index` and other index classes to allow garbage collection of index objects without running the GC. (:issue:`27585`, :issue:`27840`)
8662
- Fix regression in assigning values to a single column of a DataFrame with a ``MultiIndex`` columns (:issue:`27841`).
87-
-
63+
- Fix regression in ``.ix`` fallback with an ``IntervalIndex`` (:issue:`27865`).
8864

8965
Missing
9066
^^^^^^^
9167

92-
-
93-
-
94-
-
95-
96-
MultiIndex
97-
^^^^^^^^^^
98-
99-
-
100-
-
101-
-
68+
- Bug in :func:`pandas.isnull` or :func:`pandas.isna` when the input is a type e.g. ``type(pandas.Series())`` (:issue:`27482`)
10269

10370
I/O
10471
^^^
10572

10673
- Avoid calling ``S3File.s3`` when reading parquet, as this was removed in s3fs version 0.3.0 (:issue:`27756`)
10774
- Better error message when a negative header is passed in :func:`pandas.read_csv` (:issue:`27779`)
108-
-
75+
- Follow the ``min_rows`` display option (introduced in v0.25.0) correctly in the HTML repr in the notebook (:issue:`27991`).
10976

11077
Plotting
11178
^^^^^^^^
11279

113-
- Added a pandas_plotting_backends entrypoint group for registering plot backends. See :ref:`extending.plotting-backends` for more (:issue:`26747`).
80+
- Added a ``pandas_plotting_backends`` entrypoint group for registering plot backends. See :ref:`extending.plotting-backends` for more (:issue:`26747`).
81+
- Fixed the re-instatement of Matplotlib datetime converters after calling
82+
:meth:`pandas.plotting.deregister_matplotlib_converters` (:issue:`27481`).
11483
- Fix compatibility issue with matplotlib when passing a pandas ``Index`` to a plot call (:issue:`27775`).
115-
-
11684

11785
Groupby/resample/rolling
11886
^^^^^^^^^^^^^^^^^^^^^^^^
11987

88+
- Fixed regression in :meth:`pands.core.groupby.DataFrameGroupBy.quantile` raising when multiple quantiles are given (:issue:`27526`)
12089
- Bug in :meth:`pandas.core.groupby.DataFrameGroupBy.transform` where applying a timezone conversion lambda function would drop timezone information (:issue:`27496`)
90+
- Bug in :meth:`pandas.core.groupby.GroupBy.nth` where ``observed=False`` was being ignored for Categorical groupers (:issue:`26385`)
12191
- Bug in windowing over read-only arrays (:issue:`27766`)
12292
- Fixed segfault in `pandas.core.groupby.DataFrameGroupBy.quantile` when an invalid quantile was passed (:issue:`27470`)
12393
- Bug in :meth:`pandas.core.groupby.SeriesGroupBy.nunique` where ``NaT`` values were interfering with the count of unique values (:issue:`27951`)
124-
-
12594

12695
Reshaping
12796
^^^^^^^^^
@@ -133,32 +102,13 @@ Reshaping
133102

134103
Sparse
135104
^^^^^^
136-
- Bug in reductions for :class:`Series` with Sparse dtypes (:issue:`27080`)
137-
-
138-
-
139-
-
140105

141-
142-
Build Changes
143-
^^^^^^^^^^^^^
144-
145-
-
146-
-
147-
-
148-
149-
ExtensionArray
150-
^^^^^^^^^^^^^^
151-
152-
-
153-
-
154-
-
106+
- Bug in reductions for :class:`Series` with Sparse dtypes (:issue:`27080`)
155107

156108
Other
157109
^^^^^
110+
158111
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` when replacing timezone-aware timestamps using a dict-like replacer (:issue:`27720`)
159-
-
160-
-
161-
-
162112

163113
.. _whatsnew_0.251.contributors:
164114

doc/source/whatsnew/v0.7.3.rst

-6
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,6 @@ New features
2525
from pandas.tools.plotting import scatter_matrix
2626
scatter_matrix(df, alpha=0.2) # noqa F821
2727
28-
.. image:: ../savefig/scatter_matrix_kde.png
29-
:width: 5in
3028
3129
- Add ``stacked`` argument to Series and DataFrame's ``plot`` method for
3230
:ref:`stacked bar plots <visualization.barplot>`.
@@ -35,15 +33,11 @@ New features
3533
3634
df.plot(kind='bar', stacked=True) # noqa F821
3735
38-
.. image:: ../savefig/bar_plot_stacked_ex.png
39-
:width: 4in
4036
4137
.. code-block:: python
4238
4339
df.plot(kind='barh', stacked=True) # noqa F821
4440
45-
.. image:: ../savefig/barh_plot_stacked_ex.png
46-
:width: 4in
4741
4842
- Add log x and y :ref:`scaling options <visualization.basic>` to
4943
``DataFrame.plot`` and ``Series.plot``

doc/source/whatsnew/v1.0.0.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ MultiIndex
158158
I/O
159159
^^^
160160

161-
-
161+
- :meth:`read_csv` now accepts binary mode file buffers when using the Python csv engine (:issue:`23779`)
162162
-
163163

164164
Plotting
@@ -168,6 +168,7 @@ Plotting
168168
-
169169
- Bug in :meth:`DataFrame.plot` producing incorrect legend markers when plotting multiple series on the same axis (:issue:`18222`)
170170
- Bug in :meth:`DataFrame.plot` when ``kind='box'`` and data contains datetime or timedelta data. These types are now automatically dropped (:issue:`22799`)
171+
- Bug in :meth:`DataFrame.plot.line` and :meth:`DataFrame.plot.area` produce wrong xlim in x-axis (:issue:`27686`, :issue:`25160`, :issue:`24784`)
171172

172173
Groupby/resample/rolling
173174
^^^^^^^^^^^^^^^^^^^^^^^^

pandas/_libs/parsers.pyx

+5-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
# See LICENSE for the license
33
import bz2
44
import gzip
5-
import lzma
65
import os
76
import sys
87
import time
@@ -59,9 +58,12 @@ from pandas.core.arrays import Categorical
5958
from pandas.core.dtypes.concat import union_categoricals
6059
import pandas.io.common as icom
6160

61+
from pandas.compat import _import_lzma, _get_lzma_file
6262
from pandas.errors import (ParserError, DtypeWarning,
6363
EmptyDataError, ParserWarning)
6464

65+
lzma = _import_lzma()
66+
6567
# Import CParserError as alias of ParserError for backwards compatibility.
6668
# Ultimately, we want to remove this import. See gh-12665 and gh-14479.
6769
CParserError = ParserError
@@ -645,9 +647,9 @@ cdef class TextReader:
645647
'zip file %s', str(zip_names))
646648
elif self.compression == 'xz':
647649
if isinstance(source, str):
648-
source = lzma.LZMAFile(source, 'rb')
650+
source = _get_lzma_file(lzma)(source, 'rb')
649651
else:
650-
source = lzma.LZMAFile(filename=source)
652+
source = _get_lzma_file(lzma)(filename=source)
651653
else:
652654
raise ValueError('Unrecognized compression type: %s' %
653655
self.compression)

pandas/_libs/tslib.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ cdef inline object create_time_from_ts(
7171

7272
@cython.wraparound(False)
7373
@cython.boundscheck(False)
74-
def ints_to_pydatetime(int64_t[:] arr, object tz=None, object freq=None,
74+
def ints_to_pydatetime(const int64_t[:] arr, object tz=None, object freq=None,
7575
str box="datetime"):
7676
"""
7777
Convert an i8 repr to an ndarray of datetimes, date, time or Timestamp

pandas/_libs/tslibs/period.pyx

+7-6
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@ PyDateTime_IMPORT
2121

2222
from pandas._libs.tslibs.np_datetime cimport (
2323
npy_datetimestruct, dtstruct_to_dt64, dt64_to_dtstruct,
24-
pandas_datetime_to_datetimestruct, NPY_DATETIMEUNIT, NPY_FR_D)
24+
pandas_datetime_to_datetimestruct, check_dts_bounds,
25+
NPY_DATETIMEUNIT, NPY_FR_D)
2526

2627
cdef extern from "src/datetime/np_datetime.h":
2728
int64_t npy_datetimestruct_to_datetime(NPY_DATETIMEUNIT fr,
@@ -1011,7 +1012,7 @@ def dt64arr_to_periodarr(int64_t[:] dtarr, int freq, tz=None):
10111012

10121013
@cython.wraparound(False)
10131014
@cython.boundscheck(False)
1014-
def periodarr_to_dt64arr(int64_t[:] periodarr, int freq):
1015+
def periodarr_to_dt64arr(const int64_t[:] periodarr, int freq):
10151016
"""
10161017
Convert array to datetime64 values from a set of ordinals corresponding to
10171018
periods per period convention.
@@ -1024,9 +1025,8 @@ def periodarr_to_dt64arr(int64_t[:] periodarr, int freq):
10241025

10251026
out = np.empty(l, dtype='i8')
10261027

1027-
with nogil:
1028-
for i in range(l):
1029-
out[i] = period_ordinal_to_dt64(periodarr[i], freq)
1028+
for i in range(l):
1029+
out[i] = period_ordinal_to_dt64(periodarr[i], freq)
10301030

10311031
return out.base # .base to access underlying np.ndarray
10321032

@@ -1179,14 +1179,15 @@ cpdef int64_t period_ordinal(int y, int m, int d, int h, int min,
11791179
return get_period_ordinal(&dts, freq)
11801180

11811181

1182-
cpdef int64_t period_ordinal_to_dt64(int64_t ordinal, int freq) nogil:
1182+
cdef int64_t period_ordinal_to_dt64(int64_t ordinal, int freq) except? -1:
11831183
cdef:
11841184
npy_datetimestruct dts
11851185

11861186
if ordinal == NPY_NAT:
11871187
return NPY_NAT
11881188

11891189
get_date_info(ordinal, freq, &dts)
1190+
check_dts_bounds(&dts)
11901191
return dtstruct_to_dt64(&dts)
11911192

11921193

0 commit comments

Comments
 (0)