Skip to content

DOC: SQL to pandas comparison (#4524) #5615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 7, 2013
Merged

Conversation

gjreda
Copy link
Contributor

@gjreda gjreda commented Nov 29, 2013

@jtratner I was a little slower moving on this than I'd hoped, but it's a pretty good start.


.. ipython:: python

tips[['total_bill', 'tip', 'smoker', 'time']].head()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this head(5) to be more explicit

@gjreda
Copy link
Contributor Author

gjreda commented Nov 29, 2013

Should be good now. Let me know if there's anything I missed.

@jorisvandenbossche
Copy link
Member

Looks like a nice addition to the docs!

But a question, can you limit the line lenght to say 100 characters? That's easier to review on github.

@gjreda
Copy link
Contributor Author

gjreda commented Nov 30, 2013

@jorisvandenbossche I believe that's an issue with GitHub's review system. It looks fine when you view it in my forked repo.

@jorisvandenbossche
Copy link
Member

@gjreda Yes, in the rendered view of github it looks fine (https://github.com/gjreda/pandas/blob/master/doc/source/comparison_with_sql.rst), but when looking at the raw text (eg in the changes tab, where you can add comments) it is hard to read. And it is there that I mostly review PR's.

This is maybe not real code, but in my opinion we should also in the documentation source files try to stick to the extent possible to a certain line width, just as for code (maybe not the strict 80, but ca 100). I don't know what others think about that?

@gjreda
Copy link
Contributor Author

gjreda commented Nov 30, 2013

@jorisvandenbossche I'm not entirely sure I follow. Wouldn't that change the way in which the author needs to write the documentation quite a bit? How would you write a sentence that's longer than 100 chars?

@jorisvandenbossche
Copy link
Member

Press 'enter' and resume typing on the following line? In restructured text, a new line is not interpreted like that in a block of text.

@gjreda
Copy link
Contributor Author

gjreda commented Nov 30, 2013

Thanks for the clarification, @jorisvandenbossche. First time working in RST (and committing to an OS project) for me.


.. ipython:: python

import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already did import numpy. I don't think it is necessary to import it again. (maybe you can just do in the beginning of the document both import of pandas and numpy)

@jreback
Copy link
Contributor

jreback commented Nov 30, 2013

there are a bunch of imports at the top of each top level rst file , just copy paste at the top

@jorisvandenbossche
Copy link
Member

You're welcome!

A note: in most of the documentation, pandas is just mentioned as 'pandas' (pandas) instead of as 'pandas' (with double backticks: pandas). Maybe you can follow that here to be consistent?

@gjreda
Copy link
Contributor Author

gjreda commented Nov 30, 2013

@jreback I think commit b68975d takes care of that?

@jreback
Copy link
Contributor

jreback commented Nov 30, 2013

you can put a ':suppress' in that block

@gjreda
Copy link
Contributor Author

gjreda commented Nov 30, 2013

@jreback do you not want the imports to display at all? Keeping them there seems to make sense for the sake of clarity.

@jreback
Copy link
Contributor

jreback commented Nov 30, 2013

it's not consistent with the other docs, though I can see showing it as a use might read this first (if coming from sql). maybe just a sentence that these r customary imports (like. 10min.rst) says it - also you might to have a link to 10min at the top

@jtratner
Copy link
Contributor

But this is different than the rest of the docs since it's showing
translation rather than API reference (ie since it's something that someone
who doesn't know pandas might refer to). I'm okay either way, but could be
clearer and work as a one-pager.

@gjreda
Copy link
Contributor Author

gjreda commented Nov 30, 2013

I think 1052100 is a happy medium - keeps the imports while mentioning it's customary and refers new users to 10min.


Assume we have two database tables of the same name and structure as our DataFrames.

Types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave out this heading (maybe instead saying in a previous sentence that now the different join types are discussed or something like that), because our Sphinx theme doesn't make a difference anymore between a heading 3 and heading 4, so it is looking a little bit bizarre.

(You can just remove it without changing the other headings, Sphinx will automatically make h3's of the others (the heading symbols are relative))

@jorisvandenbossche
Copy link
Member

I did a build of the docs with this, and it is looking nice!

Another question, what do we recommend something to our users regarding column access? So using df.col or df['col']? I think both are used in the docs but mainly the df['col'] way. Somebody a strong opinion about this?

@jreback
Copy link
Contributor

jreback commented Dec 2, 2013

both are ok, but attribute access can have some restrictions if its a name of a method, so I would always use the accessor, e.g. df['diff'] rather than df.diff (which is an example of a method)

@jorisvandenbossche
Copy link
Member

I asked it because here @gjreda always uses the attribute method (as also in the R comparison docs). Would you change it to accessor type df['col'] or is it OK for here?

@jreback
Copy link
Contributor

jreback commented Dec 2, 2013

I think should change it as new useres will be less suprised if they tried df.diff

@jreback
Copy link
Contributor

jreback commented Dec 3, 2013

can you put a link in v0,13.0.txt to this page, announcing the new docs? (also add links in 10min.rst, maybe the merge section of main docs as well).

finally, need you to squash down these commits, see https://github.com/pydata/pandas/wiki/Using-Git


tips[['total_bill', 'tip', 'smoker', 'time']].head(5)

Calling the DataFrame without the list of column names would display all columns (akin to SQL's *).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A last small comment. Can you escape the * (like \*) or backquote it (like ````*```)? Now it gives a warning when building the docs (because a * normally indicates a start of emphasis)

@jreback
Copy link
Contributor

jreback commented Dec 5, 2013

@gjreda can you rebase and put in those links?

@jreback
Copy link
Contributor

jreback commented Dec 5, 2013

see this: https://github.com/pydata/pandas/wiki/Using-Git

you need to do an interactive rebase and delete everytning except your commits

then squash those together, goal is to have 1 (orsmall number of commits)

try this: git rebase -i origin/master

@gjreda
Copy link
Contributor Author

gjreda commented Dec 5, 2013

Hopefully that solved it. Let me know if I messed anything up.

@jreback
Copy link
Contributor

jreback commented Dec 5, 2013

your rebase looks good! just put a minor comment.

Leave the refernce in v0.13 to comparing_with_sql (basically announcing the new doc section). Then put a link from the merging section in the main docs to comparing_with_sql (to your Join section), and another in 10min rst to the top of comparing_with_sql (I think you might need to create a small section for SQL under IO in 10min.rst)

@gjreda
Copy link
Contributor Author

gjreda commented Dec 6, 2013

A link under IO in 10min doesn't really seem to fit given that compare_with_sql doesn't cover anything IO related. The section in merge where I put the link to compare_with_sql is actually linked to by 10min as well.

@jtratner
Copy link
Contributor

jtratner commented Dec 6, 2013

Agreed. This is about querying on DataFrames and friends, not loading from
SQL.

jreback added a commit that referenced this pull request Dec 7, 2013
DOC: SQL to pandas comparison (#4524)
@jreback jreback merged commit 55e624d into pandas-dev:master Dec 7, 2013
@jreback
Copy link
Contributor

jreback commented Dec 7, 2013

thanks!

pls check out built docs (after 5 pm est tomorrow)

follow up PRs / corrections always welcome

@gjreda
Copy link
Contributor Author

gjreda commented Dec 7, 2013

Nice! Thanks for the help along the way!

@hayd
Copy link
Contributor

hayd commented Dec 7, 2013

Awesome!

I guess re skeleton sections:

  • update, there's an update method.

  • delete... Not sure what we should say about, imo it's usually better to do something like:

    df = df[msk]
    

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants