Skip to content

Complex Dtype Support for Hashmap Algos #36482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Sep 4, 2021
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
1c6b786
Merge master
alimcmaster1 Jan 3, 2020
42a46d7
Fix test failures ignore FutureWarning
alimcmaster1 Jan 4, 2020
8331d06
Filter warning correctly
alimcmaster1 Jan 4, 2020
3ba4169
Fix imports
alimcmaster1 Jan 4, 2020
8302589
Merge remote-tracking branch 'remotes/upstream/master' into lucaiones…
alimcmaster1 Jan 4, 2020
5068771
Add warning annotation
alimcmaster1 Jan 4, 2020
8d65aa7
Remove unrequired annotation
alimcmaster1 Jan 4, 2020
8b7ac7d
Merge remote-tracking branch 'remotes/upstream/master' into lucaiones…
alimcmaster1 Jan 4, 2020
45c8237
Merge remote-tracking branch 'upstream/master' into lucaionescu-mcmali
alimcmaster1 Jan 4, 2020
cb74fe3
Update docs
alimcmaster1 Jan 5, 2020
b29404e
Create deepsource.toml
alimcmaster1 Jan 16, 2020
f983f4f
Commit Complex handling
alimcmaster1 Sep 16, 2020
c2e4e82
run black
alimcmaster1 Sep 19, 2020
7c42495
Use pandas.testing
alimcmaster1 Sep 19, 2020
41b1faf
Use pandas.testing
alimcmaster1 Sep 19, 2020
da53f38
Clean ups
alimcmaster1 Sep 19, 2020
32262e7
Merge remote-tracking branch 'upstream/master' into mcmali-complex
alimcmaster1 Nov 26, 2020
f4932d9
Move test to sep files
alimcmaster1 Nov 26, 2020
328e242
Refactor Tests
alimcmaster1 Nov 28, 2020
8d5d517
Merge remote-tracking branch 'upstream/master' into mcmali-complex
alimcmaster1 Nov 28, 2020
38e0dc7
Merge remote-tracking branch 'origin/master' into mcmali-complex
alimcmaster1 Jan 12, 2021
2008239
Merge master
alimcmaster1 Jan 12, 2021
574be58
Complex 128 support
alimcmaster1 Jan 12, 2021
e0c3e44
Remove deepsource.toml
alimcmaster1 Jan 12, 2021
1b487b8
run black
alimcmaster1 Jan 12, 2021
b393a08
Fix tests
alimcmaster1 Jan 13, 2021
554090f
Merge remote-tracking branch 'upstream/master' into mcmali-complex
alimcmaster1 Jan 18, 2021
ab38ad9
Add ReadMe
alimcmaster1 Jan 18, 2021
a28c495
Add ReadMe
alimcmaster1 Aug 25, 2021
7a9f960
Merge Master
alimcmaster1 Aug 26, 2021
9e558ce
Merge remote-tracking branch 'upstream/master' into mcmali-complex
alimcmaster1 Aug 26, 2021
1d11a90
Add np complex64 and np.nan tests
alimcmaster1 Aug 26, 2021
737ea96
complex 64 and 128 testing
alimcmaster1 Aug 26, 2021
ae7674b
complex 64 and 128 testing
alimcmaster1 Aug 26, 2021
d1e00b7
Pep8
alimcmaster1 Aug 26, 2021
df28514
isort
alimcmaster1 Aug 26, 2021
6b4c10e
More tests
alimcmaster1 Aug 26, 2021
9afed5f
Add type info
alimcmaster1 Aug 27, 2021
96d5a58
Merge Master
alimcmaster1 Aug 31, 2021
e9a4ca2
Fix whatsnew
alimcmaster1 Aug 31, 2021
e882b4e
Merge remote-tracking branch 'upstream/master' into mcmali-complex
alimcmaster1 Sep 1, 2021
6bf72a0
Updates as per comments
alimcmaster1 Sep 1, 2021
e53417d
Fix tests
alimcmaster1 Sep 1, 2021
fdf45b1
Merge Master
alimcmaster1 Sep 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ Other enhancements
- :meth:`Series.ewm`, :meth:`DataFrame.ewm`, now support a ``method`` argument with a ``'table'`` option that performs the windowing operation over an entire :class:`DataFrame`. See :ref:`Window Overview <window.overview>` for performance and functional benefits (:issue:`42273`)
- :meth:`.GroupBy.cummin` and :meth:`.GroupBy.cummax` now support the argument ``skipna`` (:issue:`34047`)
- :meth:`read_table` now supports the argument ``storage_options`` (:issue:`39167`)
- Methods that relied on hashmap based algos such as :meth:`DataFrameGroupBy.value_counts`, :meth:`DataFrameGroupBy.count` and :func:`factorize` ignored imaginary component for complex numbers (:issue:`17927`)

.. ---------------------------------------------------------------------------

Expand Down
2 changes: 2 additions & 0 deletions pandas/_libs/algos.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,8 @@ def diff_2d(
) -> None: ...
def ensure_platform_int(arr: object) -> np.ndarray: ...
def ensure_object(arr: object) -> np.ndarray: ...
def ensure_complex64(arr: object, copy=True) -> np.ndarray: ...
def ensure_complex128(arr: object, copy=True) -> np.ndarray: ...
def ensure_float64(arr: object, copy=True) -> np.ndarray: ...
def ensure_float32(arr: object, copy=True) -> np.ndarray: ...
def ensure_int8(arr: object, copy=True) -> np.ndarray: ...
Expand Down
2 changes: 2 additions & 0 deletions pandas/_libs/algos.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ import numpy as np

cimport numpy as cnp
from numpy cimport (
NPY_COMPLEX64,
NPY_COMPLEX128,
NPY_FLOAT32,
NPY_FLOAT64,
NPY_INT8,
Expand Down
2 changes: 2 additions & 0 deletions pandas/_libs/algos_common_helper.pxi.in
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ dtypes = [('float64', 'FLOAT64', 'float64'),
('uint16', 'UINT16', 'uint16'),
('uint32', 'UINT32', 'uint32'),
('uint64', 'UINT64', 'uint64'),
('complex64', 'COMPLEX64', 'complex64'),
('complex128', 'COMPLEX128', 'complex128')
# ('platform_int', 'INT', 'int_'),
# ('object', 'OBJECT', 'object_'),
]
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@ def ensure_float(arr):
ensure_int32 = algos.ensure_int32
ensure_int16 = algos.ensure_int16
ensure_int8 = algos.ensure_int8
ensure_complex64 = algos.ensure_complex64
ensure_complex128 = algos.ensure_complex128
ensure_platform_int = algos.ensure_platform_int
ensure_object = algos.ensure_object

Expand Down
30 changes: 30 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1009,6 +1009,36 @@ def test_groupby_complex():
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize(
"frame,expected",
[
(
DataFrame(
[
{"a": 1, "b": 1 + 1j},
{"a": 1, "b": 1 + 2j},
{"a": 4, "b": 1},
]
),
DataFrame(
np.array([1, 1, 1], dtype=np.int64),
index=Index([(1 + 1j), (1 + 2j), (1 + 0j)], dtype="object", name="b"),
columns=Index(["a"], dtype="object"),
),
)
],
)
def test_groupby_complex_numbers(frame, expected):
result = frame.groupby("b", sort=False).count()
tm.assert_frame_equal(result, expected)

# Sorted by the magnitude of the complex numbers
# Complex Index dtype is cast to object
expected.index = Index([(1 + 0j), (1 + 1j), (1 + 2j)], dtype="object", name="b")
result = frame.groupby("b", sort=True).count()
tm.assert_frame_equal(result, expected)


def test_groupby_series_indexed_differently():
s1 = Series(
[5.0, -9.0, 4.0, 100.0, -5.0, 55.0, 6.7],
Expand Down
49 changes: 49 additions & 0 deletions pandas/tests/indexes/multi/test_duplicates.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from pandas import (
DatetimeIndex,
MultiIndex,
Series,
)
import pandas._testing as tm

Expand Down Expand Up @@ -299,6 +300,54 @@ def test_duplicated_drop_duplicates():
tm.assert_index_equal(idx.drop_duplicates(keep=False), expected)


@pytest.mark.parametrize(
"array,expected,dtype",
[
(
[
np.nan + np.nan * 1j,
0,
1j,
1j,
1,
1 + 1j,
1 + 2j,
1 + 1j,
np.nan,
np.nan + np.nan * 1j,
],
Series(
[False, False, False, True, False, False, False, True, False, True],
dtype=bool,
),
np.complex64,
),
(
[
np.nan + np.nan * 1j,
0,
1j,
1j,
1,
1 + 1j,
1 + 2j,
1 + 1j,
np.nan,
np.nan + np.nan * 1j,
],
Series(
[False, False, False, True, False, False, False, True, False, True],
dtype=bool,
),
np.complex128,
),
],
)
def test_duplicated_series_complex_numbers(array, expected, dtype):
result = Series(array, dtype=dtype).duplicated()
tm.assert_series_equal(result, expected)


def test_multi_drop_duplicates_pos_args_deprecation():
# GH#41485
idx = MultiIndex.from_arrays([[1, 2, 3, 1], [1, 2, 3, 1]])
Expand Down
15 changes: 14 additions & 1 deletion pandas/tests/indexes/period/methods/test_factorize.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
import numpy as np

from pandas import PeriodIndex
from pandas import (
PeriodIndex,
factorize,
)
import pandas._testing as tm


Expand Down Expand Up @@ -35,3 +38,13 @@ def test_factorize(self):
arr, idx = idx2.factorize()
tm.assert_numpy_array_equal(arr, exp_arr)
tm.assert_index_equal(idx, exp_idx)

def test_factorize_complex(self):
array = [1, 2, 2 + 1j]
labels, uniques = factorize(array)

expected_labels = np.array([0, 1, 2], dtype=np.intp)
tm.assert_numpy_array_equal(labels, expected_labels)

expected_uniques = np.array([(1 + 0j), (2 + 0j), (2 + 1j)], dtype=object)
tm.assert_numpy_array_equal(uniques, expected_uniques)
46 changes: 46 additions & 0 deletions pandas/tests/reductions/test_reductions.py
Original file line number Diff line number Diff line change
Expand Up @@ -1487,3 +1487,49 @@ def test_mode_boolean_with_na(self):
result = ser.mode()
expected = Series({0: True}, dtype="boolean")
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize(
"array,expected,dtype",
[
(
[0, 1j, 1, 1, 1 + 1j, 1 + 2j],
Series([1], dtype=np.complex128),
np.complex128,
),
(
[0, 1j, 1, 1, 1 + 1j, 1 + 2j],
Series([1], dtype=np.complex64),
np.complex64,
),
(
[1 + 1j, 2j, 1 + 1j],
Series([1 + 1j], dtype=np.complex128),
np.complex128,
),
],
)
def test_unimode_complex(self, array, expected, dtype):
result = Series(array, dtype=dtype).mode()
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize(
"array,expected,dtype",
[
(
# no modes
[0, 1j, 1, 1 + 1j, 1 + 2j],
Series([0j, 1j, 1 + 0j, 1 + 1j, 1 + 2j], dtype=np.complex128),
np.complex128,
),
(
[1 + 1j, 2j, 1 + 1j, 2j, 3],
Series([2j, 1 + 1j], dtype=np.complex64),
np.complex64,
),
],
)
def test_multimode_complex(self, array, expected, dtype):
# mode tries to sort multimodal series.
# Complex numbers are sorted by their magnitude
result = Series(array, dtype=dtype).mode()
tm.assert_series_equal(result, expected)
14 changes: 14 additions & 0 deletions pandas/tests/series/methods/test_isin.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,3 +186,17 @@ def test_isin_large_series_mixed_dtypes_and_nan():
result = ser.isin({"foo", "bar"})
expected = Series([False] * 3 * 1_000_000)
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize(
"array,expected",
[
(
[0, 1j, 1j, 1, 1 + 1j, 1 + 2j, 1 + 1j],
Series([False, True, True, False, True, True, True], dtype=bool),
)
],
)
def test_isin_complex_numbers(array, expected):
result = Series(array).isin([1j, 1 + 1j, 1 + 2j])
tm.assert_series_equal(result, expected)
18 changes: 18 additions & 0 deletions pandas/tests/series/methods/test_value_counts.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,3 +207,21 @@ def test_value_counts_bool_with_nan(self, ser, dropna, exp):
# GH32146
out = ser.value_counts(dropna=dropna)
tm.assert_series_equal(out, exp)

@pytest.mark.parametrize(
"input_array,expected",
[
(
[1 + 1j, 1 + 1j, 1, 3j, 3j, 3j],
Series([3, 2, 1], index=pd.Index([3j, 1 + 1j, 1], dtype=np.complex128)),
),
(
[1 + 1j, 1 + 1j, 1, 3j, 3j, 3j],
Series([3, 2, 1], index=pd.Index([3j, 1 + 1j, 1], dtype=np.complex64)),
),
],
)
def test_value_counts_complex_numbers(self, input_array, expected):
# Complex Index dtype is cast to object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still valid? IIUC value_counts uses complex128/256 and not objects, see

cpdef value_count(ndarray[htfunc_t] values, bint dropna):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dtype of the index will be objects, see below. Agree this probably needs fixing - I can create a follow up. As this is the same issue as your comment below refers too. #36482 (comment)

In [14]: pd.Series([1 + 1j, 1 + 1j, 1, 3j, 3j, 3j]).value_counts().index
Out[14]: Index([3j, (1+1j), (1+0j)], dtype='object')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls create a followon issue

result = Series(input_array).value_counts()
tm.assert_series_equal(result, expected)
13 changes: 13 additions & 0 deletions pandas/tests/test_algos.py
Original file line number Diff line number Diff line change
Expand Up @@ -1513,6 +1513,19 @@ def test_unique_tuples(self, arr, uniques):
result = pd.unique(arr)
tm.assert_numpy_array_equal(result, expected)

@pytest.mark.parametrize(
"array,expected",
[
(
[1 + 1j, 0, 1, 1j, 1 + 2j, 1 + 2j],
np.array([(1 + 1j), 0j, (1 + 0j), 1j, (1 + 2j)], dtype=object),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, dtype=object here... I ask myself, whether we should have a unique-function with fused types, like we already do for other functions e.g. value_counts (

cpdef value_count(ndarray[htfunc_t] values, bint dropna):
)
What do you think @jbrockmendel? Probably should not be part of this PR though. It looks like dtype=object for factorize and Index are just consequences of unique returning objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel - whenever you have time, think your eyes on this would be much appreciated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah i think ideally this should return a complex dtype (same for factorize above). OK for that to be a separate PR, can leave a comment on the test to that effect

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great have added comments to that effect - will create a follow up issue

)
],
)
def test_unique_complex_numbers(self, array, expected):
result = pd.unique(array)
tm.assert_numpy_array_equal(result, expected)


class TestHashTable:
def test_string_hashtable_set_item_signature(self):
Expand Down