Skip to content

Adding test_map_missing_mixed to test_apply.py in pandas test suite series #20574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 3, 2018

Conversation

readyready15728
Copy link
Contributor

@readyready15728 readyready15728 commented Apr 2, 2018

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based off of the comments in the linked issue I think there are (at least) three scenarios we wanted to test:

  1. Mapping the NA value to a new value
  2. Mapping a str value to a new value AND
  3. Mapping an int value to a new value

You covered the first two but left out the third so we should add that.

With that said, whenever you have tests that are repetitive in nature and varying just slightly you should be thinking about parametrizing them. In this case, I think you should parametrize the test with values of "mapping,exp" where the former is the dict you want to use in the map function and the latter is the result you expect (should be a series in each case). Below is a relatively close example of how this works - take a look at that and see if you can apply something similar here

def test_rank_resets_each_group(self, pct, exp):

@readyready15728
Copy link
Contributor Author

readyready15728 commented Apr 2, 2018

Alright, I think I have something here. Is there any convenient way for me to run just an individual test file rather than all tests in their entirety?

@WillAyd
Copy link
Member

WillAyd commented Apr 2, 2018

pytest path.to.the.module::ClassName::TestName would be the most granular, though you can strip off any of those elements to move up a level and run more tests

@pep8speaks
Copy link

pep8speaks commented Apr 2, 2018

Hello @readyready15728! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on April 03, 2018 at 00:03 Hours UTC

@codecov
Copy link

codecov bot commented Apr 2, 2018

Codecov Report

Merging #20574 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20574      +/-   ##
==========================================
+ Coverage   91.82%   91.82%   +<.01%     
==========================================
  Files         152      153       +1     
  Lines       49255    49256       +1     
==========================================
+ Hits        45226    45227       +1     
  Misses       4029     4029
Flag Coverage Δ
#multiple 90.2% <ø> (ø) ⬆️
#single 41.9% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/resample.py 96.43% <0%> (ø) ⬆️
pandas/core/groupby.py
pandas/core/groupby/__init__.py 100% <0%> (ø)
pandas/core/groupby/groupby.py 92.55% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4efb39f...e927be4. Read the comment docs.

@readyready15728
Copy link
Contributor Author

The parametrized test checks out.

({'string': 'another string' }, pd.Series(['another string'])),
({42: 'the answer'}, pd.Series(['the answer']))])
def test_map_missing_mixed(self, mapping, exp):
s = pd.Series(list(mapping.keys())[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job on the parametrization but in the process of updating this line I think we are losing sight of what we are trying to test. Originally you were constructing this series from list('abcd') and appending an NA record. Now all you are doing is using the key of your dict as the series value, but that's not the same test.

Now that I'm thinking about it it's probably good to also add a "vals" parameter in front of the mapping. For the first two scenarios you can use list('abcd') as you had before and for the third use range(4). Then just make your first line s = pd.Series(vals + [np.nan])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll give that a shot.

@readyready15728
Copy link
Contributor Author

Now I have something that passes but I'm not sure it's idiomatic.

@readyready15728
Copy link
Contributor Author

Actually there is at least one last problem: I am calling pd.Series a bunch of times when I can just put it around exp in the assertion step.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you get the tests to pass locally before pushing to GitHub. These have some logical errors that you need to work through


@pytest.mark.parametrize("vals,mapping,exp", [
(list('abc'), {np.nan: 'not NaN'}, ['not NaN']),
(list('abc'), {'string': 'another string'}, ['another string']),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your mapping should contain one of the values in the series, so use 'a' instead of 'string'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Riiight

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests did pass

Just not like they should have

@pytest.mark.parametrize("vals,mapping,exp", [
(list('abc'), {np.nan: 'not NaN'}, ['not NaN']),
(list('abc'), {'string': 'another string'}, ['another string']),
(list(range(3)), {42: 'the answer'}, ['the answer'])])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, use 1 instead of 42. Just for consistency make the value numeric as well instead of 'the answer'

(list('abc'), {'string': 'another string'}, ['another string']),
(list(range(3)), {42: 'the answer'}, ['the answer'])])
def test_map_missing_mixed(self, vals, mapping, exp):
s = pd.Series(vals + [list(mapping.keys())[0]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use the mapping keys here. s = pd.Series(vals + [np.nan]) is all you need

s = pd.Series(vals + [list(mapping.keys())[0]])
result = s.map(mapping)

tm.assert_series_equal(result[-1:].reset_index(drop=True), pd.Series(exp))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think through the exp values you are passing in. They should obviously match the shape of your input but replace with NA values where appropriate.

For your first example, if you did list('abc') as your val {'a': 'foo'} as your mapping then your exp would be ['foo', np.nan, np.nan, np.nan].

I'm not sure what you are trying to do with result[-1:].reset_index(drop=True) but that's getting way too complicated. If you follow all of the above steps you can just do tm.assert_series_equal(result, pd.Series(exp))

@readyready15728
Copy link
Contributor Author

Alright, I just learned that supplying -s to pytest allows output from print() statements to be shown. With that knowledge I was able to check whether the values being generated made sense before committing this time. Sorry for jumping the gun.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor edit but otherwise lgtm. CI should run and a core maintainer will be able to review thereafter for merge ability.

Thanks for trying your hand at your first PR!

(list('abc'), {'a': 'a letter'}, ['a letter'] + [np.nan] * 3),
(list(range(3)), {0: 42}, [42] + [np.nan] * 3)])
def test_map_missing_mixed(self, vals, mapping, exp):
s = pd.Series(vals + [np.nan])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add # GH20495 as a comment on its own line right below the method definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between def ... and s ...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - you can look at test_with_nested_series in the same module for reference

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need the comment referencing the issue, then we're good.

@readyready15728
Copy link
Contributor Author

All set

@TomAugspurger TomAugspurger merged commit 85c7900 into pandas-dev:master Apr 3, 2018
@TomAugspurger
Copy link
Contributor

Thanks @readyready15728 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent behavior of .map()
4 participants