Skip to content

DOC: Flesh out the R comparison section of docs #3980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hayd opened this issue Jun 21, 2013 · 20 comments
Closed

DOC: Flesh out the R comparison section of docs #3980

hayd opened this issue Jun 21, 2013 · 20 comments
Labels
Milestone

Comments

@hayd
Copy link
Contributor

hayd commented Jun 21, 2013

I guess quite a lot of people come from an R background, and perhaps a good material would be a conversion table for pandas vs R functions/idioms etc. in http://pandas.pydata.org/pandas-docs/dev/comparison_with_r.html

Perhaps this site could offer some functions to consider including:
http://www.statmethods.net/management/variables.html

@jreback
Copy link
Contributor

jreback commented Jun 21, 2013

I guess R is famous for obfuscation (of syntax)?

@hayd
Copy link
Contributor Author

hayd commented Jun 21, 2013

I suspect it'll be a many-to-one table :)

@cpcloud
Copy link
Member

cpcloud commented Jun 21, 2013

ha! u guys are funny. what the heck is attach? is like attach(x) == globals()['x'] = x?

@jreback
Copy link
Contributor

jreback commented Jun 21, 2013

isn't R intuitive?

@cpcloud
Copy link
Member

cpcloud commented Jun 21, 2013

where the heck are cyl and vs coming from? This

attach(mtcars)
aggdata <- aggregate(mtcars, by=list(cyl,vs), FUN=mean, na.rm=TRUE)
detach(mtcars)

works only if you do the attach(mtcars)? wtf are the scoping rules in R? no such thing exists in Python without a lot of magic...

@jtratner
Copy link
Contributor

Attach basically is like saying 'make all of the columns of the data frame global variables'

@jtratner
Copy link
Contributor

It has a companion method detach. I think there's also a with - like statement that scopes just to the function call. Have you seen the model syntax yet? a ~ b I totally get that it's useful, but it's a little unsettling when you are used to being able to explicitly trace all names in the document.

@cpcloud
Copy link
Member

cpcloud commented Jun 21, 2013

patsy + statsmodels + pandas >>>>> R

@cpcloud
Copy link
Member

cpcloud commented Jun 21, 2013

magic regarding scope and namespaces 👎

@cpcloud
Copy link
Member

cpcloud commented Jun 21, 2013

anyway comparisons are useful to show people how awesome pandas is :)

@hayd
Copy link
Contributor Author

hayd commented Jul 12, 2013

related http://stackoverflow.com/questions/17621325/equivalent-pandas-function-to-this-r-aggregation

Anyone fancy spamming the pandas/R/.. mailing lists to see if anyone is interested in doing this?

@hayd
Copy link
Contributor Author

hayd commented Aug 1, 2013

@TomAugspurger
Copy link
Contributor

@hayd
Copy link
Contributor Author

hayd commented Nov 1, 2013

https://groups.google.com/forum/#!topic/pydata/1eNURQsflNw

A while back I started making some notes on how to do the various recipes in O'Reilly's R Cookbook (http://shop.oreilly.com/product/9780596809164.do) with Numpy, Pandas, Scipy.

I haven't had time to complete it so I'm sharing it in it's current state, and trying to get some community help to fill in the gaps.

I think this could be an extremely useful resource to encourage and help transition lots of people from R to Pandas.

So here's the notes:

http://notes.lexual.com/tech/r_numpy_pandas_cookbook.html

And here's the github repo, patches more than welcome!

https://github.com/lexual/sphinx-notes/blob/master/source/tech/r_numpy_pandas_cookbook.rst

Cheers,

Lex.

These look useful, shame there are some sections which are XXX-titled, as would be nice to have a todo list on this for areas to flesh out.

@HeardACat
Copy link
Contributor

I know that this section is more for pandas vs R, but I'm wondering would it be worthwhile to place some of the R functions if it isn't really related to pandas, for example: aaply, alply, or maybe dlply?

@jreback
Copy link
Contributor

jreback commented Jan 3, 2014

@HeardACat
Copy link
Contributor

hmm, would you want this to go under the reshape/cast section, or in the with section, since it could be done in R using dcast as well:

mydf <- data.frame(
  Animal = c('Animal1', 'Animal2', 'Animal3', 'Animal2', 'Animal1', 'Animal2', 'Animal3'),
  FeedType = c('A', 'B', 'A', 'A', 'B', 'B', 'A'),
  Amount = c(10, 7, 4, 2, 5, 6, 2)
)

# Stackoverflow example
with(mydf, tapply(Amount, list(Animal, FeedType), sum))

# Using reshape
require(reshape2)
dcast(mydf, Animal ~ FeedType, sum, fill=NaN)

In either case the solution would be whats in Stackoverflow (and very similar to the solution in the reshape/cast section of the current docs).

@jreback
Copy link
Contributor

jreback commented Jan 4, 2014

you can out under the more common / useful and put a link / statement in the other (as they r in the same page)

read it as if you are an R user doing the most common operation (eg what is normally recommended to R Users) and you want to convert to pandas

@jreback
Copy link
Contributor

jreback commented Jan 4, 2014

there are of course similar cases in pandas where multiple solutions present (eg imagine a vectorized function vs using apply)

one solution maybe faster or simpler or they may both be appropriate

@jreback
Copy link
Contributor

jreback commented Jan 7, 2014

think this is closable after the multiple PR's by @chappers

@jreback jreback closed this as completed Jan 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants