Skip to content

Backports for 0.25.3 #29313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 31, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Version 0.25
.. toctree::
:maxdepth: 2

v0.25.3
v0.25.2
v0.25.1
v0.25.0
Expand Down
22 changes: 22 additions & 0 deletions doc/source/whatsnew/v0.25.3.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.. _whatsnew_0253:

What's new in 0.25.3 (October 31, 2019)
---------------------------------------

These are the changes in pandas 0.25.3. See :ref:`release` for a full changelog
including other versions of pandas.

.. _whatsnew_0253.bug_fixes:

Bug fixes
~~~~~~~~~

Groupby/resample/rolling
^^^^^^^^^^^^^^^^^^^^^^^^

- Bug in :meth:`DataFrameGroupBy.quantile` where NA values in the grouping could cause segfaults or incorrect results (:issue:`28882`)

Contributors
~~~~~~~~~~~~

.. contributors:: v0.25.2..HEAD
8 changes: 8 additions & 0 deletions pandas/_libs/groupby.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -719,6 +719,11 @@ def group_quantile(ndarray[float64_t] out,
ndarray[int64_t] counts, non_na_counts, sort_arr

assert values.shape[0] == N

if not (0 <= q <= 1):
raise ValueError("'q' must be between 0 and 1. Got"
" '{}' instead".format(q))

inter_methods = {
'linear': INTERPOLATION_LINEAR,
'lower': INTERPOLATION_LOWER,
Expand All @@ -736,6 +741,9 @@ def group_quantile(ndarray[float64_t] out,
with nogil:
for i in range(N):
lab = labels[i]
if lab == -1: # NA group label
continue

counts[lab] += 1
if not mask[i]:
non_na_counts[lab] += 1
Expand Down
2 changes: 1 addition & 1 deletion pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2935,7 +2935,7 @@ def _next_iter_line(self, row_num):
if self.warn_bad_lines or self.error_bad_lines:
msg = str(e)

if "NULL byte" in msg:
if "NULL byte" in msg or "line contains NUL" in msg:
msg = (
"NULL byte detected. This byte "
"cannot be processed in Python's "
Expand Down
34 changes: 34 additions & 0 deletions pandas/tests/groupby/test_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -1316,6 +1316,40 @@ def test_quantile_raises():
df.groupby("key").quantile()


def test_quantile_out_of_bounds_q_raises():
# https://github.com/pandas-dev/pandas/issues/27470
df = pd.DataFrame(dict(a=[0, 0, 0, 1, 1, 1], b=range(6)))
g = df.groupby([0, 0, 0, 1, 1, 1])
with pytest.raises(ValueError, match="Got '50.0' instead"):
g.quantile(50)

with pytest.raises(ValueError, match="Got '-1.0' instead"):
g.quantile(-1)


def test_quantile_missing_group_values_no_segfaults():
# GH 28662
data = np.array([1.0, np.nan, 1.0])
df = pd.DataFrame(dict(key=data, val=range(3)))

# Random segfaults; would have been guaranteed in loop
grp = df.groupby("key")
for _ in range(100):
grp.quantile()


def test_quantile_missing_group_values_correct_results():
# GH 28662
data = np.array([1.0, np.nan, 3.0, np.nan])
df = pd.DataFrame(dict(key=data, val=range(4)))

result = df.groupby("key").quantile()
expected = pd.DataFrame(
[1.0, 3.0], index=pd.Index([1.0, 3.0], name="key"), columns=["val"]
)
tm.assert_frame_equal(result, expected)


# pipe
# --------------------------------

Expand Down
5 changes: 1 addition & 4 deletions pandas/tests/io/parser/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1898,10 +1898,7 @@ def test_null_byte_char(all_parsers):
out = parser.read_csv(StringIO(data), names=names)
tm.assert_frame_equal(out, expected)
else:
if compat.PY38:
msg = "line contains NUL"
else:
msg = "NULL byte detected"
msg = "NULL byte detected"
with pytest.raises(ParserError, match=msg):
parser.read_csv(StringIO(data), names=names)

Expand Down