-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: .transform(ngroup) for axis=1 grouper #50761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -3369,7 +3369,7 @@ def ngroup(self, ascending: bool = True): | |||
dtype: int64 | |||
""" | |||
with self._group_selection_context(): | |||
index = self._selected_obj.index | |||
index = self._selected_obj._get_axis(self.axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get index
based on self.axis
instead of getting the default axis=0
index.
About the tests, they do pass for me when I run: pytest pandas\tests\groupby -n 4 -m "not slow and not network and not db and not single_cpu" -r sxX But get a failure for: pytest pandas\tests\groupby -n 4 It fails for FAILED pandas/tests/groupby/test_value_counts.py::test_series_groupby_value_counts[False-False-False-False-False-['1st', '2nd']-1000-203] - KeyError: <class 'numpy.intc'>
768 failed, 17098 passed, 402 skipped, 226 xfailed I am on Windows 10. |
Changes look good - can you post a stack trace for the failing test? You should just be able to run
You may need to switch the orientation of the slashes. Also - can you see if you get the same failure on main and 1.5.x? |
Sorry for the late reply. The above command needs to be enclosed in ================================================================================= test session starts =================================================================================
platform win32 -- Python 3.8.15, pytest-7.2.0, pluggy-1.0.0 -- C:\Users\Me\mambaforge\envs\pandas-dev\python.exe
cachedir: .pytest_cache
hypothesis profile 'ci' -> deadline=None, suppress_health_check=[HealthCheck.too_slow], database=DirectoryBasedExampleDatabase('C:\\Users\\Me\\Desktop\\pandas\\.hypothesis\\examples')
rootdir: C:\Users\Me\Desktop\pandas, configfile: pyproject.toml
plugins: anyio-3.6.2, hypothesis-6.59.0, asyncio-0.20.2, cov-4.0.0, cython-0.2.0, xdist-3.0.2
asyncio: mode=strict
collected 1 item
pandas/tests/groupby/test_value_counts.py::test_series_groupby_value_counts[False-False-False-False-False-['1st', '2nd']-1000-203] FAILED
====================================================================================== FAILURES =======================================================================================
_______________________________________________ test_series_groupby_value_counts[False-False-False-False-False-['1st', '2nd']-1000-203] _______________________________________________
df = 1st 2nd 3rd
0 d 2015-08-31 17
1 d 2015-08-26 18
2 c 2015-08-27 1
3 b 2015-08-29 1...96 b 2015-08-28 9
997 c 2015-08-26 15
998 a 2015-08-27 16
999 c 2015-08-29 2
[1000 rows x 3 columns]
keys = ['1st', '2nd'], bins = array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]), n = 1000, m = 20, isort = False, normalize = False, sort = False, ascending = False, dropna = False
@pytest.mark.slow
@pytest.mark.parametrize("df, keys, bins, n, m", binned, ids=ids)
@pytest.mark.parametrize("isort", [True, False])
@pytest.mark.parametrize("normalize", [True, False])
@pytest.mark.parametrize("sort", [True, False])
@pytest.mark.parametrize("ascending", [True, False])
@pytest.mark.parametrize("dropna", [True, False])
def test_series_groupby_value_counts(
df, keys, bins, n, m, isort, normalize, sort, ascending, dropna
):
def rebuild_index(df):
arr = list(map(df.index.get_level_values, range(df.index.nlevels)))
df.index = MultiIndex.from_arrays(arr, names=df.index.names)
return df
kwargs = {
"normalize": normalize,
"sort": sort,
"ascending": ascending,
"dropna": dropna,
"bins": bins,
}
gr = df.groupby(keys, sort=isort)
> left = gr["3rd"].value_counts(**kwargs)
pandas\tests\groupby\test_value_counts.py:109:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas\core\groupby\generic.py:771: in value_counts
_, idx = get_join_indexers(left, right, sort=False, how="left")
pandas\core\reshape\merge.py:1653: in get_join_indexers
zipped = zip(*mapped)
pandas\core\reshape\merge.py:1650: in <genexpr>
_factorize_keys(left_keys[n], right_keys[n], sort=sort, how=how)
pandas\core\reshape\merge.py:2373: in _factorize_keys
klass, lk, rk = _convert_arrays_and_get_rizer_klass(lk, rk)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
lk = array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1,
2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4,..., 0, 1, 2, 3,
4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5,
6, 7, 8, 9], dtype=int32)
rk = array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2,
3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 4, 5, 6,...0, 2, 4, 5, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4,
5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8], dtype=int32)
def _convert_arrays_and_get_rizer_klass(
lk: ArrayLike, rk: ArrayLike
) -> tuple[type[libhashtable.Factorizer], ArrayLike, ArrayLike]:
klass: type[libhashtable.Factorizer]
if is_numeric_dtype(lk.dtype):
if not is_dtype_equal(lk, rk):
dtype = find_common_type([lk.dtype, rk.dtype])
if isinstance(dtype, ExtensionDtype):
cls = dtype.construct_array_type()
if not isinstance(lk, ExtensionArray):
lk = cls._from_sequence(lk, dtype=dtype, copy=False)
else:
lk = lk.astype(dtype)
if not isinstance(rk, ExtensionArray):
rk = cls._from_sequence(rk, dtype=dtype, copy=False)
else:
rk = rk.astype(dtype)
else:
lk = lk.astype(dtype)
rk = rk.astype(dtype)
if isinstance(lk, BaseMaskedArray):
# Invalid index type "type" for "Dict[Type[object], Type[Factorizer]]";
# expected type "Type[object]"
klass = _factorizers[lk.dtype.type] # type: ignore[index]
else:
> klass = _factorizers[lk.dtype.type]
E KeyError: <class 'numpy.intc'>
pandas\core\reshape\merge.py:2440: KeyError
------------------------------------------------------------ generated xml file: C:\Users\Me\Desktop\pandas\test-data.xml -------------------------------------------------------------
================================================================================ slowest 30 durations =================================================================================
0.01s call pandas/tests/groupby/test_value_counts.py::test_series_groupby_value_counts[False-False-False-False-False-['1st', '2nd']-1000-203]
0.00s setup pandas/tests/groupby/test_value_counts.py::test_series_groupby_value_counts[False-False-False-False-False-['1st', '2nd']-1000-203]
0.00s teardown pandas/tests/groupby/test_value_counts.py::test_series_groupby_value_counts[False-False-False-False-False-['1st', '2nd']-1000-203]
=============================================================================== short test summary info ===============================================================================
FAILED pandas/tests/groupby/test_value_counts.py::test_series_groupby_value_counts[False-False-False-False-False-['1st', '2nd']-1000-203] - KeyError: <class 'numpy.intc'>
================================================================================== 1 failed in 0.72s ================================================================================== It does not fail on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great - just needs a line in the whatsnew for 2.0 under the Bugfix - Groupby section.
@iofall - I merged main here to make sure recent changes in groupby didn't impact this. An unrelated failure due to the latest release of fsspec popped up. Merging main should fix everything(ish).
Thanks @iofall - nice work! |
Part of #45986
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.