Skip to content

Commit bb3f541

Browse files
author
Maximiliano Greco
authored
Merge branch 'master' into feature/fix_wom
2 parents 6b4c557 + d7a4f5b commit bb3f541

File tree

9 files changed

+199
-37
lines changed

9 files changed

+199
-37
lines changed

doc/source/whatsnew/v0.23.0.txt

+20
Original file line numberDiff line numberDiff line change
@@ -404,6 +404,8 @@ Other Enhancements
404404
- :func:`read_html` now accepts a ``displayed_only`` keyword argument to controls whether or not hidden elements are parsed (``True`` by default) (:issue:`20027`)
405405
- zip compression is supported via ``compression=zip`` in :func:`DataFrame.to_pickle`, :func:`Series.to_pickle`, :func:`DataFrame.to_csv`, :func:`Series.to_csv`, :func:`DataFrame.to_json`, :func:`Series.to_json`. (:issue:`17778`)
406406
- Now is possible create a :class:`WeekOfMonth` offset with `n=0` (:issue:`20517`).
407+
- :class:`DataFrame` and :class:`Series` now support matrix multiplication (```@```) operator (:issue:`10259`) for Python>=3.5
408+
407409

408410
.. _whatsnew_0230.api_breaking:
409411

@@ -902,6 +904,22 @@ Performance Improvements
902904
Documentation Changes
903905
~~~~~~~~~~~~~~~~~~~~~
904906

907+
Thanks to all of the contributors who participated in the Pandas Documentation
908+
Sprint, which took place on March 10th. We had about 500 participants from over
909+
30 locations across the world. You should notice that many of the
910+
:ref:`API docstrings <api>` have greatly improved.
911+
912+
There were too many simultaneous contributions to include a release note for each
913+
improvement, but this `GitHub search`_ should give you an idea of how many docstrings
914+
were improved.
915+
916+
Special thanks to Marc Garcia for organizing the sprint. For more information,
917+
read the `NumFOCUS blogpost`_ recapping the sprint.
918+
919+
.. _GitHub search: https://github.com/pandas-dev/pandas/pulls?utf8=%E2%9C%93&q=is%3Apr+label%3ADocs+created%3A2018-03-10..2018-03-15+
920+
.. _NumFOCUS blogpost: https://www.numfocus.org/blog/worldwide-pandas-sprint/
921+
922+
905923
- Changed spelling of "numpy" to "NumPy", and "python" to "Python". (:issue:`19017`)
906924
- Consistency when introducing code samples, using either colon or period.
907925
Rewrote some sentences for greater clarity, added more dynamic references
@@ -1023,6 +1041,7 @@ Numeric
10231041
- Bug in :class:`Index` constructor with ``dtype='uint64'`` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
10241042
- Bug in :class:`DataFrame` flex arithmetic (e.g. ``df.add(other, fill_value=foo)``) with a ``fill_value`` other than ``None`` failed to raise ``NotImplementedError`` in corner cases where either the frame or ``other`` has length zero (:issue:`19522`)
10251043
- Multiplication and division of numeric-dtyped :class:`Index` objects with timedelta-like scalars returns ``TimedeltaIndex`` instead of raising ``TypeError`` (:issue:`19333`)
1044+
- Bug in :meth:`Series.rank` and :meth:`DataFrame.rank` when ``ascending='False'`` failed to return correct ranks for infinity if ``NaN`` were present (:issue:`19538`)
10261045
- Bug where ``NaN`` was returned instead of 0 by :func:`Series.pct_change` and :func:`DataFrame.pct_change` when ``fill_method`` is not ``None`` (:issue:`19873`)
10271046

10281047

@@ -1134,6 +1153,7 @@ Reshaping
11341153
- Bug in :class:`Series` constructor with ``Categorical`` where a ```ValueError`` is not raised when an index of different length is given (:issue:`19342`)
11351154
- Bug in :meth:`DataFrame.astype` where column metadata is lost when converting to categorical or a dictionary of dtypes (:issue:`19920`)
11361155
- Bug in :func:`cut` and :func:`qcut` where timezone information was dropped (:issue:`19872`)
1156+
- Bug in :class:`Series` constructor with a ``dtype=str``, previously raised in some cases (:issue:`19853`)
11371157

11381158
Other
11391159
^^^^^

pandas/_libs/algos_rank_helper.pxi.in

+3-3
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ def rank_1d_{{dtype}}(object in_arr, ties_method='average', ascending=True,
135135

136136
sorted_data = values.take(_as)
137137
sorted_mask = mask.take(_as)
138-
_indices = order[1].take(_as).nonzero()[0]
138+
_indices = np.diff(sorted_mask).nonzero()[0]
139139
non_na_idx = _indices[0] if len(_indices) > 0 else -1
140140
argsorted = _as.astype('i8')
141141

@@ -153,7 +153,7 @@ def rank_1d_{{dtype}}(object in_arr, ties_method='average', ascending=True,
153153

154154
if (i == n - 1 or
155155
are_diff(util.get_value_at(sorted_data, i + 1), val) or
156-
i == non_na_idx - 1):
156+
i == non_na_idx):
157157
if tiebreak == TIEBREAK_AVERAGE:
158158
for j in range(i - dups + 1, i + 1):
159159
ranks[argsorted[j]] = sum_ranks / dups
@@ -190,7 +190,7 @@ def rank_1d_{{dtype}}(object in_arr, ties_method='average', ascending=True,
190190
count += 1.0
191191

192192
if (i == n - 1 or sorted_data[i + 1] != val or
193-
i == non_na_idx - 1):
193+
i == non_na_idx):
194194
if tiebreak == TIEBREAK_AVERAGE:
195195
for j in range(i - dups + 1, i + 1):
196196
ranks[argsorted[j]] = sum_ranks / dups

pandas/core/frame.py

+10-1
Original file line numberDiff line numberDiff line change
@@ -863,7 +863,8 @@ def __len__(self):
863863

864864
def dot(self, other):
865865
"""
866-
Matrix multiplication with DataFrame or Series objects
866+
Matrix multiplication with DataFrame or Series objects. Can also be
867+
called using `self @ other` in Python >= 3.5.
867868
868869
Parameters
869870
----------
@@ -905,6 +906,14 @@ def dot(self, other):
905906
else: # pragma: no cover
906907
raise TypeError('unsupported type: %s' % type(other))
907908

909+
def __matmul__(self, other):
910+
""" Matrix multiplication using binary `@` operator in Python>=3.5 """
911+
return self.dot(other)
912+
913+
def __rmatmul__(self, other):
914+
""" Matrix multiplication using binary `@` operator in Python>=3.5 """
915+
return self.T.dot(np.transpose(other)).T
916+
908917
# ----------------------------------------------------------------------
909918
# IO methods (to / from other formats)
910919

pandas/core/series.py

+14-5
Original file line numberDiff line numberDiff line change
@@ -1994,7 +1994,7 @@ def autocorr(self, lag=1):
19941994
def dot(self, other):
19951995
"""
19961996
Matrix multiplication with DataFrame or inner-product with Series
1997-
objects
1997+
objects. Can also be called using `self @ other` in Python >= 3.5.
19981998
19991999
Parameters
20002000
----------
@@ -2033,6 +2033,14 @@ def dot(self, other):
20332033
else: # pragma: no cover
20342034
raise TypeError('unsupported type: %s' % type(other))
20352035

2036+
def __matmul__(self, other):
2037+
""" Matrix multiplication using binary `@` operator in Python>=3.5 """
2038+
return self.dot(other)
2039+
2040+
def __rmatmul__(self, other):
2041+
""" Matrix multiplication using binary `@` operator in Python>=3.5 """
2042+
return self.dot(other)
2043+
20362044
@Substitution(klass='Series')
20372045
@Appender(base._shared_docs['searchsorted'])
20382046
@deprecate_kwarg(old_arg_name='v', new_arg_name='value')
@@ -4156,9 +4164,10 @@ def _try_cast(arr, take_fast_path):
41564164
if issubclass(subarr.dtype.type, compat.string_types):
41574165
# GH 16605
41584166
# If not empty convert the data to dtype
4159-
if not isna(data).all():
4160-
data = np.array(data, dtype=dtype, copy=False)
4161-
4162-
subarr = np.array(data, dtype=object, copy=copy)
4167+
# GH 19853: If data is a scalar, subarr has already the result
4168+
if not is_scalar(data):
4169+
if not np.all(isna(data)):
4170+
data = np.array(data, dtype=dtype, copy=False)
4171+
subarr = np.array(data, dtype=object, copy=copy)
41634172

41644173
return subarr

pandas/tests/frame/test_analytics.py

+59-2
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import warnings
66
from datetime import timedelta
77
from distutils.version import LooseVersion
8+
import operator
89
import sys
910
import pytest
1011

@@ -13,7 +14,7 @@
1314
from numpy.random import randn
1415
import numpy as np
1516

16-
from pandas.compat import lrange, product
17+
from pandas.compat import lrange, product, PY35
1718
from pandas import (compat, isna, notna, DataFrame, Series,
1819
MultiIndex, date_range, Timestamp, Categorical,
1920
_np_version_under1p15)
@@ -2091,7 +2092,6 @@ def test_clip_with_na_args(self):
20912092
self.frame)
20922093

20932094
# Matrix-like
2094-
20952095
def test_dot(self):
20962096
a = DataFrame(np.random.randn(3, 4), index=['a', 'b', 'c'],
20972097
columns=['p', 'q', 'r', 's'])
@@ -2144,6 +2144,63 @@ def test_dot(self):
21442144
with tm.assert_raises_regex(ValueError, 'aligned'):
21452145
df.dot(df2)
21462146

2147+
@pytest.mark.skipif(not PY35,
2148+
reason='matmul supported for Python>=3.5')
2149+
def test_matmul(self):
2150+
# matmul test is for GH #10259
2151+
a = DataFrame(np.random.randn(3, 4), index=['a', 'b', 'c'],
2152+
columns=['p', 'q', 'r', 's'])
2153+
b = DataFrame(np.random.randn(4, 2), index=['p', 'q', 'r', 's'],
2154+
columns=['one', 'two'])
2155+
2156+
# DataFrame @ DataFrame
2157+
result = operator.matmul(a, b)
2158+
expected = DataFrame(np.dot(a.values, b.values),
2159+
index=['a', 'b', 'c'],
2160+
columns=['one', 'two'])
2161+
tm.assert_frame_equal(result, expected)
2162+
2163+
# DataFrame @ Series
2164+
result = operator.matmul(a, b.one)
2165+
expected = Series(np.dot(a.values, b.one.values),
2166+
index=['a', 'b', 'c'])
2167+
tm.assert_series_equal(result, expected)
2168+
2169+
# np.array @ DataFrame
2170+
result = operator.matmul(a.values, b)
2171+
expected = np.dot(a.values, b.values)
2172+
tm.assert_almost_equal(result, expected)
2173+
2174+
# nested list @ DataFrame (__rmatmul__)
2175+
result = operator.matmul(a.values.tolist(), b)
2176+
expected = DataFrame(np.dot(a.values, b.values),
2177+
index=['a', 'b', 'c'],
2178+
columns=['one', 'two'])
2179+
tm.assert_almost_equal(result.values, expected.values)
2180+
2181+
# mixed dtype DataFrame @ DataFrame
2182+
a['q'] = a.q.round().astype(int)
2183+
result = operator.matmul(a, b)
2184+
expected = DataFrame(np.dot(a.values, b.values),
2185+
index=['a', 'b', 'c'],
2186+
columns=['one', 'two'])
2187+
tm.assert_frame_equal(result, expected)
2188+
2189+
# different dtypes DataFrame @ DataFrame
2190+
a = a.astype(int)
2191+
result = operator.matmul(a, b)
2192+
expected = DataFrame(np.dot(a.values, b.values),
2193+
index=['a', 'b', 'c'],
2194+
columns=['one', 'two'])
2195+
tm.assert_frame_equal(result, expected)
2196+
2197+
# unaligned
2198+
df = DataFrame(randn(3, 4), index=[1, 2, 3], columns=lrange(4))
2199+
df2 = DataFrame(randn(5, 3), index=lrange(5), columns=[1, 2, 3])
2200+
2201+
with tm.assert_raises_regex(ValueError, 'aligned'):
2202+
operator.matmul(df, df2)
2203+
21472204

21482205
@pytest.fixture
21492206
def df_duplicates():

pandas/tests/series/test_analytics.py

+48-2
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
from itertools import product
55
from distutils.version import LooseVersion
6-
6+
import operator
77
import pytest
88

99
from numpy import nan
@@ -18,7 +18,7 @@
1818
from pandas.core.indexes.timedeltas import Timedelta
1919
import pandas.core.nanops as nanops
2020

21-
from pandas.compat import lrange, range
21+
from pandas.compat import lrange, range, PY35
2222
from pandas import compat
2323
from pandas.util.testing import (assert_series_equal, assert_almost_equal,
2424
assert_frame_equal, assert_index_equal)
@@ -921,6 +921,52 @@ def test_dot(self):
921921
pytest.raises(Exception, a.dot, a.values[:3])
922922
pytest.raises(ValueError, a.dot, b.T)
923923

924+
@pytest.mark.skipif(not PY35,
925+
reason='matmul supported for Python>=3.5')
926+
def test_matmul(self):
927+
# matmul test is for GH #10259
928+
a = Series(np.random.randn(4), index=['p', 'q', 'r', 's'])
929+
b = DataFrame(np.random.randn(3, 4), index=['1', '2', '3'],
930+
columns=['p', 'q', 'r', 's']).T
931+
932+
# Series @ DataFrame
933+
result = operator.matmul(a, b)
934+
expected = Series(np.dot(a.values, b.values), index=['1', '2', '3'])
935+
assert_series_equal(result, expected)
936+
937+
# DataFrame @ Series
938+
result = operator.matmul(b.T, a)
939+
expected = Series(np.dot(b.T.values, a.T.values),
940+
index=['1', '2', '3'])
941+
assert_series_equal(result, expected)
942+
943+
# Series @ Series
944+
result = operator.matmul(a, a)
945+
expected = np.dot(a.values, a.values)
946+
assert_almost_equal(result, expected)
947+
948+
# np.array @ Series (__rmatmul__)
949+
result = operator.matmul(a.values, a)
950+
expected = np.dot(a.values, a.values)
951+
assert_almost_equal(result, expected)
952+
953+
# mixed dtype DataFrame @ Series
954+
a['p'] = int(a.p)
955+
result = operator.matmul(b.T, a)
956+
expected = Series(np.dot(b.T.values, a.T.values),
957+
index=['1', '2', '3'])
958+
assert_series_equal(result, expected)
959+
960+
# different dtypes DataFrame @ Series
961+
a = a.astype(int)
962+
result = operator.matmul(b.T, a)
963+
expected = Series(np.dot(b.T.values, a.T.values),
964+
index=['1', '2', '3'])
965+
assert_series_equal(result, expected)
966+
967+
pytest.raises(Exception, a.dot, a.values[:3])
968+
pytest.raises(ValueError, a.dot, b.T)
969+
924970
def test_value_counts_nunique(self):
925971

926972
# basics.rst doc example

pandas/tests/series/test_constructors.py

+5
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,11 @@ def test_constructor_empty(self, input_class):
110110
empty2 = Series(input_class(), index=lrange(10), dtype='float64')
111111
assert_series_equal(empty, empty2)
112112

113+
# GH 19853 : with empty string, index and dtype str
114+
empty = Series('', dtype=str, index=range(3))
115+
empty2 = Series('', index=range(3))
116+
assert_series_equal(empty, empty2)
117+
113118
@pytest.mark.parametrize('input_arg', [np.nan, float('nan')])
114119
def test_constructor_nan(self, input_arg):
115120
empty = Series(dtype='float64', index=lrange(10))

pandas/tests/series/test_rank.py

+38-22
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@
1616
from pandas.tests.series.common import TestData
1717
from pandas._libs.tslib import iNaT
1818
from pandas._libs.algos import Infinity, NegInfinity
19+
from itertools import chain
20+
import pandas.util._test_decorators as td
1921

2022

2123
class TestSeriesRank(TestData):
@@ -257,38 +259,52 @@ def _check(s, expected, method='average'):
257259
series = s if dtype is None else s.astype(dtype)
258260
_check(series, results[method], method=method)
259261

260-
def test_rank_tie_methods_on_infs_nans(self):
262+
@td.skip_if_no_scipy
263+
@pytest.mark.parametrize('ascending', [True, False])
264+
@pytest.mark.parametrize('method', ['average', 'min', 'max', 'first',
265+
'dense'])
266+
@pytest.mark.parametrize('na_option', ['top', 'bottom', 'keep'])
267+
def test_rank_tie_methods_on_infs_nans(self, method, na_option, ascending):
261268
dtypes = [('object', None, Infinity(), NegInfinity()),
262269
('float64', np.nan, np.inf, -np.inf)]
263270
chunk = 3
264271
disabled = set([('object', 'first')])
265272

266-
def _check(s, expected, method='average', na_option='keep'):
267-
result = s.rank(method=method, na_option=na_option)
273+
def _check(s, method, na_option, ascending):
274+
exp_ranks = {
275+
'average': ([2, 2, 2], [5, 5, 5], [8, 8, 8]),
276+
'min': ([1, 1, 1], [4, 4, 4], [7, 7, 7]),
277+
'max': ([3, 3, 3], [6, 6, 6], [9, 9, 9]),
278+
'first': ([1, 2, 3], [4, 5, 6], [7, 8, 9]),
279+
'dense': ([1, 1, 1], [2, 2, 2], [3, 3, 3])
280+
}
281+
ranks = exp_ranks[method]
282+
if na_option == 'top':
283+
order = [ranks[1], ranks[0], ranks[2]]
284+
elif na_option == 'bottom':
285+
order = [ranks[0], ranks[2], ranks[1]]
286+
else:
287+
order = [ranks[0], [np.nan] * chunk, ranks[1]]
288+
expected = order if ascending else order[::-1]
289+
expected = list(chain.from_iterable(expected))
290+
result = s.rank(method=method, na_option=na_option,
291+
ascending=ascending)
268292
tm.assert_series_equal(result, Series(expected, dtype='float64'))
269293

270-
exp_ranks = {
271-
'average': ([2, 2, 2], [5, 5, 5], [8, 8, 8]),
272-
'min': ([1, 1, 1], [4, 4, 4], [7, 7, 7]),
273-
'max': ([3, 3, 3], [6, 6, 6], [9, 9, 9]),
274-
'first': ([1, 2, 3], [4, 5, 6], [7, 8, 9]),
275-
'dense': ([1, 1, 1], [2, 2, 2], [3, 3, 3])
276-
}
277-
na_options = ('top', 'bottom', 'keep')
278294
for dtype, na_value, pos_inf, neg_inf in dtypes:
279295
in_arr = [neg_inf] * chunk + [na_value] * chunk + [pos_inf] * chunk
280296
iseries = Series(in_arr, dtype=dtype)
281-
for method, na_opt in product(exp_ranks.keys(), na_options):
282-
ranks = exp_ranks[method]
283-
if (dtype, method) in disabled:
284-
continue
285-
if na_opt == 'top':
286-
order = ranks[1] + ranks[0] + ranks[2]
287-
elif na_opt == 'bottom':
288-
order = ranks[0] + ranks[2] + ranks[1]
289-
else:
290-
order = ranks[0] + [np.nan] * chunk + ranks[1]
291-
_check(iseries, order, method, na_opt)
297+
if (dtype, method) in disabled:
298+
continue
299+
_check(iseries, method, na_option, ascending)
300+
301+
def test_rank_desc_mix_nans_infs(self):
302+
# GH 19538
303+
# check descending ranking when mix nans and infs
304+
iseries = Series([1, np.nan, np.inf, -np.inf, 25])
305+
result = iseries.rank(ascending=False)
306+
exp = Series([3, np.nan, 1, 4, 2], dtype='float64')
307+
tm.assert_series_equal(result, exp)
292308

293309
def test_rank_methods_series(self):
294310
pytest.importorskip('scipy.stats.special')

0 commit comments

Comments
 (0)