-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: pandas cheat sheet #13202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1 I have wanted to do this already for a long time! (but not sure I will be able to :-)) |
+1. shall we close #1618? |
+1 |
I don't know R, but know that having a cheatsheet like this would be helpful. I'm going to give it a shot in Powerpoint, because I know it well and can make it pretty. I'm not sure what other tool to use that would let other people edit it and provide the good formatting. |
@Dr-Irv we will just check in the pdf and whatever format it's in |
Here is what I did today. Plagiarizing is blatant! Will be flying a lot next week, so this will be good to do on the plane. |
Nice start! That original from RStudio is CC 4.0, so AFAIK it's fine to copy / transform it as long as you acknowledge RStudio for the original. |
Here is the first page of the cheat sheet. I'll be working on page 2 next. In most places, I found the pandas way of doing things. I used @jreback initial comment as a guideline, i.e., "I think we could pretty much rip this off exactly as is and just substitute pandas functions directly." There were places where the R example didn't make sense, so I made my own arbitrary choices. Comments and criticism are welcome. I think it is better to get feedback here rather than anywhere else. |
for the subset section where you have:
more idiomatic to do:
though maybe elminate some of these column selections and show usage of .loc instead |
@jreback Thanks for the suggestions. I've made changes in my current working copy. Separate question - R has a cume_dist() method that computes the cumulative distribution of a vector. Similar to rank(pct=True), but different. Is there a pandas equivalent? |
Here's my latest version. Only thing left to do is to do groupby() examples in the open space on the second page. Some notes: (1) Don't have correspondence to R's cume_dist(), so left it out, and added clip() instead. (2) Setdiff between DataFrames is not there (discussion in #4480) so the example at bottom right of page 2 isn't so pretty. Maybe someone knows a better way. |
@Dr-Irv looking good!. I would remove use of I would add use of maybe use .plot somewhere? of course its ONLY 2 pages! hahah |
Looks really good. Few points:
|
@jreback regarding your last comments. I've removed My plan was to use remaining space on second page for |
@sinhrks I deleted the I'm space constrained to add Let me get the full first draft done, and we discuss further enhancements. Thank you all for the comments and feedback! |
Here is a proposed first draft with the 2 pages that correspond pretty well to what was in the R cheat sheet. Based on suggestions above, I added some content that doesn't appear in the R version. Comments and criticism are welcome. If you have ideas of things to add, due to space constraints, please suggest something to be deleted. @jreback I have limited availability between 12/16 and 12/20, so I can take comments into account on 12/21, and then hopefully be done with it. I would like suggestions of where to put this in the source (i.e., which directory). Should the Powerpoint source and PDF both be in there (so others could modify the Powerpoint and create the PDF)? Also, would I then just submit a pull request with those 2 new documents (which would cause the tests to be run, which seems a bit silly since I'm just adding 2 new files that are not touched by the test scripts)? |
@Dr-Irv looks really good! is yes do a PR with the powerpoint & pdf. as well as a small readme (or script) on how to build the pdf. I would put in pandas\docs\cheatsheet\ |
@jreback Here's are my challenges with including With respect to conversion, I have Powerpoint on Windows, and it has an export PDF feature. Scripting would be operating system dependent (and probably not worth the time to figure out). So I will just document how to do it in a text file. One other question - once published, where would links to the cheat sheet exist? On the pandas.pydata.org web site? Or from within the documentation itself? If the latter, how do we refer to something outside the documentation tree? |
a README is fine. you can refer to links like: https://github.com/pandas-dev/pandas/blob/master/doc/README.rst (for example), which is a static link to whatever is there. we can add a link on the website as well. on is adding a 3rd page too much? |
@jreback Adding a 3rd page is a possibility. In addition to discussing datetime-like issues, as well as If I were to do that, should I work on that before submitting the PR? Or do the PR now for the current version, so it's out there, and then do a new PR once I finish a third page? |
@Dr-Irv Really cool work! Thanks a lot for this! I have some comments, but will save them for later. Just a quick one: I think the Regarding how to include this into pandas: another possibility is that you keep it in a pandas-cheatsheet repo from your own instead of putting it into pandas code base itself. But then of course include the same clear links to it in the pandas documentation as in the other case. |
@Dr-Irv BTW, would it be possible to already post a pptx version as well? I am giving a 3-day pandas course beginning next week, and was thinking to use this to give to the students. But with the pptx version I can make small adaptions (like the incorrect frame) myself, as your new version that you can make on 12/21 will be to late for the course. |
@Dr-Irv btw, no problem with including a mention of the author as well! |
@jorisvandenbossche I made an error in the definition of the frame 'zdf' in the Cheat Sheet. That is now fixed, so the examples are correct. I will include the PPTX in the Pull Request. I'd prefer it be part of the pandas main project, and then others can contribute. I have revealed my secret identity at the bottom. :-> Sorry that I couldn't get things to you prior to today as I was on vacation. |
looks good @Dr-Irv if you want to put this in a PR, add to the 0.19.2 whatsnew (with a link to the location would be great). |
closes pandas-dev#13202 closes pandas-dev#14943
https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
I think we could pretty much rip this off exactly as is and just substitute pandas functions directly.
Further could update
comparison with R
a bit.anyone up for this?
The text was updated successfully, but these errors were encountered: