-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
base/test_unique.py: regression test for bad unicode string #34851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/tests/base/test_unique.py
Outdated
unique_values = list(dict.fromkeys(obj.values)) | ||
if isinstance(obj, pd.Index): | ||
expected = pd.Index(unique_values, dtype=obj.dtype) | ||
if is_datetime64tz_dtype(obj.dtype): | ||
expected = expected.normalize() | ||
tm.assert_index_equal(result, expected) | ||
else: | ||
expected = np.array(unique_values, dtype=obj.dtype) | ||
tm.assert_numpy_array_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be simpler. Since the sole unique value is hard-coded above we can just explicitly construct the expected output using that value and object dtype (shouldn't need to check for datetime).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment @dsaxton. I actually wanted to replicate the basic test earlier in the same file, test_unique(..)
. It takes a fixture, and I wasn't sure how to pass a fixture and my examples simultaneously to pytest.mark.parametrize
(my search led me to a feature request for pytest
with lots of messy workarounds). So I guess the options are:
- I simplify this as you suggest, or
- we figure out a way to pass these examples along with the fixture.
This is my first PR here, so a bit of guidance would be great :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, testing PRs are always appreciated. It's probably fine to create this as a simple one-off rather than including invalid unicode in every test that uses the fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll simplify this one and push an update. Thanks a lot :)
pandas/tests/base/test_unique.py
Outdated
def test_unique_bad_unicode(idx_or_series_w_bad_unicode): | ||
# regression test for #34550 | ||
obj = idx_or_series_w_bad_unicode | ||
obj = np.repeat(obj, range(1, len(obj) + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we trying to add duplicates here? If so probably makes sense to add them up top.
pandas/tests/base/test_unique.py
Outdated
result = obj.unique() | ||
|
||
# dict.fromkeys preserves the order | ||
unique_values = list(dict.fromkeys(obj.values)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this and then simply pd.Index(["\ud38d"], dtype=object) below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed that part of your comment, fixed now :)
The test failures are in I didn't touch this test. Should I fix it here, or is that for another PR? |
Seems unrelated (could be a flakey test), try merging master to restart CI and see if it goes away |
thanks @suvayu |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
I ran the tests with pandas 1.0.4, where it segfaults as expected (shown below), and
master
passes the test.