Skip to content

base/test_unique.py: regression test for bad unicode string #34851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 18, 2020
Merged

base/test_unique.py: regression test for bad unicode string #34851

merged 1 commit into from
Jun 18, 2020

Conversation

suvayu
Copy link
Contributor

@suvayu suvayu commented Jun 17, 2020

I ran the tests with pandas 1.0.4, where it segfaults as expected (shown below), and master passes the test.

tests/test_mytest.py::test_unique2[idx_or_series_w_bad_unicode0] Fatal Python error: Segmentation fault

Current thread 0x00007f09d2dc4740 (most recent call first):
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pandas/core/algorithms.py", line 382 in unique
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pandas/core/base.py", line 1246 in unique
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 1974 in unique
  File "/home/suvali/tmp/pandas-tests/tests/test_mytest.py", line 16 in test_unique2
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/python.py", line 182 in pytest_pyfunc_call
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/python.py", line 1477 in runtest
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/runner.py", line 135 in pytest_runtest_call
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/runner.py", line 217 in <lambda>
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/runner.py", line 244 in from_call
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/runner.py", line 216 in call_runtest_hook
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/runner.py", line 186 in call_and_report
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/runner.py", line 100 in runtestprotocol
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/runner.py", line 85 in pytest_runtest_protocol
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/main.py", line 272 in pytest_runtestloop
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/main.py", line 247 in _main
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/main.py", line 191 in wrap_session
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/main.py", line 240 in pytest_cmdline_main
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/_pytest/config/__init__.py", line 124 in main
  File "/opt/conda/envs/py38/bin/pytest", line 11 in <module>
Segmentation fault (core dumped)

Comment on lines 120 to 123
unique_values = list(dict.fromkeys(obj.values))
if isinstance(obj, pd.Index):
expected = pd.Index(unique_values, dtype=obj.dtype)
if is_datetime64tz_dtype(obj.dtype):
expected = expected.normalize()
tm.assert_index_equal(result, expected)
else:
expected = np.array(unique_values, dtype=obj.dtype)
tm.assert_numpy_array_equal(result, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be simpler. Since the sole unique value is hard-coded above we can just explicitly construct the expected output using that value and object dtype (shouldn't need to check for datetime).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment @dsaxton. I actually wanted to replicate the basic test earlier in the same file, test_unique(..). It takes a fixture, and I wasn't sure how to pass a fixture and my examples simultaneously to pytest.mark.parametrize (my search led me to a feature request for pytest with lots of messy workarounds). So I guess the options are:

  1. I simplify this as you suggest, or
  2. we figure out a way to pass these examples along with the fixture.

This is my first PR here, so a bit of guidance would be great :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, testing PRs are always appreciated. It's probably fine to create this as a simple one-off rather than including invalid unicode in every test that uses the fixture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll simplify this one and push an update. Thanks a lot :)

def test_unique_bad_unicode(idx_or_series_w_bad_unicode):
# regression test for #34550
obj = idx_or_series_w_bad_unicode
obj = np.repeat(obj, range(1, len(obj) + 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we trying to add duplicates here? If so probably makes sense to add them up top.

@dsaxton dsaxton added the Testing pandas testing functions or related to the test suite label Jun 18, 2020
result = obj.unique()

# dict.fromkeys preserves the order
unique_values = list(dict.fromkeys(obj.values))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this and then simply pd.Index(["\ud38d"], dtype=object) below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed that part of your comment, fixed now :)

@suvayu
Copy link
Contributor Author

suvayu commented Jun 18, 2020

The test failures are in pandas/tests/io/json/test_pandas.py, a warning from contextlib has changed from FutureWarning to ResourceWarning

I didn't touch this test. Should I fix it here, or is that for another PR?

@dsaxton
Copy link
Member

dsaxton commented Jun 18, 2020

I didn't touch this test. Should I fix it here, or is that for another PR?

Seems unrelated (could be a flakey test), try merging master to restart CI and see if it goes away

@jreback jreback added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Jun 18, 2020
@jreback jreback added this to the 1.1 milestone Jun 18, 2020
@jreback jreback merged commit e6e0889 into pandas-dev:master Jun 18, 2020
@jreback
Copy link
Contributor

jreback commented Jun 18, 2020

thanks @suvayu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Series.unique segfaults on invalid unicode
3 participants