-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Adding test_map_missing_mixed to test_apply.py in pandas test suite series #20574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based off of the comments in the linked issue I think there are (at least) three scenarios we wanted to test:
- Mapping the NA value to a new value
- Mapping a str value to a new value AND
- Mapping an int value to a new value
You covered the first two but left out the third so we should add that.
With that said, whenever you have tests that are repetitive in nature and varying just slightly you should be thinking about parametrizing them. In this case, I think you should parametrize the test with values of "mapping,exp" where the former is the dict you want to use in the map
function and the latter is the result you expect (should be a series in each case). Below is a relatively close example of how this works - take a look at that and see if you can apply something similar here
pandas/pandas/tests/groupby/test_groupby.py
Line 2063 in 4efb39f
def test_rank_resets_each_group(self, pct, exp): |
Alright, I think I have something here. Is there any convenient way for me to run just an individual test file rather than all tests in their entirety? |
|
Hello @readyready15728! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on April 03, 2018 at 00:03 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #20574 +/- ##
==========================================
+ Coverage 91.82% 91.82% +<.01%
==========================================
Files 152 153 +1
Lines 49255 49256 +1
==========================================
+ Hits 45226 45227 +1
Misses 4029 4029
Continue to review full report at Codecov.
|
…py in pandas test suite series (again)
The parametrized test checks out. |
pandas/tests/series/test_apply.py
Outdated
({'string': 'another string' }, pd.Series(['another string'])), | ||
({42: 'the answer'}, pd.Series(['the answer']))]) | ||
def test_map_missing_mixed(self, mapping, exp): | ||
s = pd.Series(list(mapping.keys())[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job on the parametrization but in the process of updating this line I think we are losing sight of what we are trying to test. Originally you were constructing this series from list('abcd')
and appending an NA record. Now all you are doing is using the key of your dict as the series value, but that's not the same test.
Now that I'm thinking about it it's probably good to also add a "vals" parameter in front of the mapping. For the first two scenarios you can use list('abcd')
as you had before and for the third use range(4)
. Then just make your first line s = pd.Series(vals + [np.nan])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll give that a shot.
…n pandas test suite series
Now I have something that passes but I'm not sure it's idiomatic. |
Actually there is at least one last problem: I am calling pd.Series a bunch of times when I can just put it around exp in the assertion step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you get the tests to pass locally before pushing to GitHub. These have some logical errors that you need to work through
pandas/tests/series/test_apply.py
Outdated
|
||
@pytest.mark.parametrize("vals,mapping,exp", [ | ||
(list('abc'), {np.nan: 'not NaN'}, ['not NaN']), | ||
(list('abc'), {'string': 'another string'}, ['another string']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your mapping should contain one of the values in the series, so use 'a' instead of 'string'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Riiight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests did pass
Just not like they should have
pandas/tests/series/test_apply.py
Outdated
@pytest.mark.parametrize("vals,mapping,exp", [ | ||
(list('abc'), {np.nan: 'not NaN'}, ['not NaN']), | ||
(list('abc'), {'string': 'another string'}, ['another string']), | ||
(list(range(3)), {42: 'the answer'}, ['the answer'])]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, use 1 instead of 42. Just for consistency make the value numeric as well instead of 'the answer'
pandas/tests/series/test_apply.py
Outdated
(list('abc'), {'string': 'another string'}, ['another string']), | ||
(list(range(3)), {42: 'the answer'}, ['the answer'])]) | ||
def test_map_missing_mixed(self, vals, mapping, exp): | ||
s = pd.Series(vals + [list(mapping.keys())[0]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use the mapping keys here. s = pd.Series(vals + [np.nan])
is all you need
pandas/tests/series/test_apply.py
Outdated
s = pd.Series(vals + [list(mapping.keys())[0]]) | ||
result = s.map(mapping) | ||
|
||
tm.assert_series_equal(result[-1:].reset_index(drop=True), pd.Series(exp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think through the exp values you are passing in. They should obviously match the shape of your input but replace with NA values where appropriate.
For your first example, if you did list('abc')
as your val {'a': 'foo'}
as your mapping then your exp would be ['foo', np.nan, np.nan, np.nan]
.
I'm not sure what you are trying to do with result[-1:].reset_index(drop=True)
but that's getting way too complicated. If you follow all of the above steps you can just do tm.assert_series_equal(result, pd.Series(exp))
…andas test suite series
Alright, I just learned that supplying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor edit but otherwise lgtm. CI should run and a core maintainer will be able to review thereafter for merge ability.
Thanks for trying your hand at your first PR!
pandas/tests/series/test_apply.py
Outdated
(list('abc'), {'a': 'a letter'}, ['a letter'] + [np.nan] * 3), | ||
(list(range(3)), {0: 42}, [42] + [np.nan] * 3)]) | ||
def test_map_missing_mixed(self, vals, mapping, exp): | ||
s = pd.Series(vals + [np.nan]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add # GH20495
as a comment on its own line right below the method definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between def ...
and s ...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - you can look at test_with_nested_series
in the same module for reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need the comment referencing the issue, then we're good.
… test suite series
All set |
Thanks @readyready15728 ! |
Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):
git diff upstream/master -u -- "*.py" | flake8 --diff