-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Pd.series.map performance #34948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pd.series.map performance #34948
Conversation
Yeah, keys = tuple(data.keys())
values = tuple(data.values()) should be even faster than your approach. Can you compare this to your PR. Can you also give perf. comparisons relative to master. |
values = tuple(data.values()) --> this needs to be returned in the order of keys. If i use in the given way the tests are failing |
what about using |
It is kind of expecting keys as tuple and values as list. I can see tests passing. And values order should be in the same form as keys. In the sense data = {1:2,3:2,:4:5}, keys = tuple(3,4,1) and values = [2,5,2]. But anyhow i see if we try keys = tuple(data.keys()) and values = list(data.values()). This should be fine i think. Let me see and validate the tests |
emm, why does keys have to be keys, values = [], [] so maybe no need for keys to be tuple? sorry for commenting this without looking at the codebase in details and apologize in advance if it has to be tuple. |
If you look at the code history.
If you look at the above piece of code. Zip returns tuples of keys and values. But in the second line values are being converted to list. But i agree even in the else he is assigning empty list for both keys and values. |
well, I think it's fine to leave both to be but just follow your own idea to make the changes, and you might also need to add a whatsnew and an example in asv for performance comp. |
Yeah. I will look into it and try to generate the perf comp accordingly. |
@charlesdong1991 Any idea how to deal with the breaking |
emm, you meant CI? seem you have list changed to tuple. can you try using list for both and see if this error is gone? |
Hello @Rohith295! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-09-13 15:32:24 UTC |
Alright. I will check it accordinly. |
I am trying to generate the performance metrics. Can you suggest me some practices for generating it properly? |
you can also add a test for this case in |
Alright. I will have a look into this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah pls provide an asv for this case and also run the construction asvs to make sure we don't have any regressions elsewhere
Alright. I will fix the review comments and do provide the asv |
@Rohith295, is the failing test The python docs says:
so |
It's not only that one particular test, there were other tests aswell. I did not make a particular note of it of tests which are failing. Also as you mentioned data.values() gives output in particular order. I still don't want rely on it because it only implemented from python 3.6, but what if someone using older versions. I believe our code must be both backward and forward compatible. |
pandas supports python 3.6.1 and higher, so that particular concern is not a problem. But ok if tuple(data.values()) doesn't work currently, I'm fine with that issue being outside th scope of this PR. |
we need an asv & a whatsnew note in order to merge this. |
I will do it by today mostly. Sorry for the delay @jreback |
Guys, i am trying to build asv. If i start doing asv run. It is taking a long time by executing all the files under benchmarks directory. Do any one of you know just make the specific file execute under benchmarks directory to capture the asv file. @jreback @topper-123 @charlesdong1991 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if u can add a whats new note and an asv that captures this case
you can run just other relevant asvs (if we have any for map)
and show the results of this one
merge master and ping on green
Alright. I am on it.. |
· Creating environments [ 75.00%] · For pandas commit 04e9e0a <pd.Series.map_performance^2> (round 2/2):
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. |
@jreback I added the asv benchmarking output here. Is this what expected? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have sufficient asv's for this?
can you add a note in 1.2 perf section.
…d a note for the release v1.2
@jreback yes we do have sufficient asv's. Also added the comments as requested and also the release note under perf section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. minor comment in whatsnew, ping on green.
thanks @Rohith295 |
Thanks @jreback . I have learned so much from your review comments and also got to chance to understand more about open source community. |
There are other places also to refactor to improve the performance, but this current change has greater impact aswell.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
improved the performance of pd.series.map