Skip to content

ENH: Scatterplot Method added #3473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 17, 2013
Merged

ENH: Scatterplot Method added #3473

merged 1 commit into from
Oct 17, 2013

Conversation

zachcp
Copy link

@zachcp zachcp commented Apr 27, 2013

closes #2215

This is a simple pull request that creates a shortcut for scatter plots from a df. This is in reference to #2215 (and #1527) and follows the same pattern as df.hist().

#where you would previously have had to do:
from pandas.tools.plotting import scatter_plot 
scatter_plot(df, x = "x", y = 'y')

#you can now do
df.scatter(x="x",y="y")

#example
import pandas as pd
import numpy as np
df = pd.DataFrame( np.random.randn(100,4))
df.scatter(x=1,y=2)

I followed the same pattern as df.hist() even though it looks like you are intended make a class object for scatterplots that can be passed to the plot_frame() function as is the case for BarPlot and LinePlot. In any case, this works and may be useful.

best,
zach cp

@ghost
Copy link

ghost commented Apr 28, 2013

Adding one method per type of plot would pollute the DataFrame namespace too much.
Can't this be rolled in to df.plot(x=,y=,kind='scatter') in some way?

@ghost ghost mentioned this pull request Apr 28, 2013
@jreback
Copy link
Contributor

jreback commented Aug 23, 2013

@cpcloud @jtratner is this how we are handling the api for this? (kind='scatter') seems good to me?

@jreback
Copy link
Contributor

jreback commented Sep 22, 2013

@cpcloud seems like could include this in 0.13?

@zachcp can you rebase this?

@cpcloud
Copy link
Member

cpcloud commented Sep 22, 2013

i'll take a look

@ghost ghost assigned cpcloud Sep 22, 2013
@cpcloud
Copy link
Member

cpcloud commented Sep 22, 2013

echoing @y-p, i'd rather have kind='scatter' than another plotting method

@jtratner
Copy link
Contributor

+1 to kind='scatter'

On Sat, Sep 21, 2013 at 8:44 PM, Phillip Cloud [email protected]:

echoing @y-p https://github.com/y-p, i'd rather have kind='scatter'than another plotting method


Reply to this email directly or view it on GitHubhttps://github.com//pull/3473#issuecomment-24873591
.

@zachcp
Copy link
Author

zachcp commented Sep 22, 2013

I'm relatively new to git and working through the rebase process. I may have just tangled up my code during the rebase process (grrrr.....) so perhaps I will just fork master and start over.

@cpcloud
Copy link
Member

cpcloud commented Sep 22, 2013

do this:

git remote add upstream git://github.com/pydata/pandas.git # if you haven't already
git checkout master
git fetch upstream
git rebase upstream/master
git checkout --branch zachcp-awesome-branch-of-greatness
git cherry-pick 485ac8b 3d2819d 7aff758

@zachcp
Copy link
Author

zachcp commented Sep 22, 2013

Thanks @cpcloud. I almost actually typed zachcp-awesome-branch-of-greatness. Its probably more like twisted-branch-of-attempting.

With the code in pull request #4930 I can do this:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(50,4), columns= ['a','b','c','d'])
df.plot(x='a',y='b', kind='scatter')

not sure how elegant the code is but it works!

@jreback
Copy link
Contributor

jreback commented Sep 26, 2013

@zachcp need to rebase to master

git rebase origin/master

then you can

git rebase -i <before your fist commit>

to squash(s) down to 1 commit

then

git push yourfork branchname -f

@zachcp
Copy link
Author

zachcp commented Sep 26, 2013

@jreback I've been working on PR #4930 with @cpcloud's help. The PR is running on Travis CI although I may need better tests.

@jreback
Copy link
Contributor

jreback commented Sep 27, 2013

fyi...is #4493 related (can you add a test for this)?

@cpcloud
Copy link
Member

cpcloud commented Sep 27, 2013

@jreback no that's a matplotlib version issue

@cpcloud
Copy link
Member

cpcloud commented Sep 27, 2013

related to matplotlib's use of the mask attribute of numpy's masked arrays. since we define a method called mask you get "correct" behavior, but not on purpose (so really it's not correct)

@jreback
Copy link
Contributor

jreback commented Sep 27, 2013

@cpcloud ok....thxs

@cpcloud
Copy link
Member

cpcloud commented Sep 27, 2013

@zachcp looks like u forgot to remove some merge conflict markers, i.e., things like <<<<< HEAD.

gotta keep rebasing ... sorry about that ... things are moving quickly around here 😄

@cpcloud
Copy link
Member

cpcloud commented Sep 28, 2013

@zachcp can you squash this down to 1 or 2 commits? thanks

@zachcp
Copy link
Author

zachcp commented Sep 28, 2013

hmm. squashing didn't work quite as planned..... master is a moving target for rebasing.

@cpcloud
Copy link
Member

cpcloud commented Sep 28, 2013

You're probably just misunderstanding how rebasing works.

Do this:

# go to master
git checkout master

# get the latest upstream
git fetch upstream
git rebase upstream/master

# new branch to put your commits
git checkout -b new-plotting-branch

# now for the fun stuff
git rev-list --reverse upstream/master..scatterplot --author="$(git config user.name)" | while read commit; do
    git cherry-pick $commit
done

@cpcloud
Copy link
Member

cpcloud commented Sep 28, 2013

Okay, you've still got some history in there. Do the following:

git checkout scatterplot
git reset HEAD~1
git stash
git fetch upstream
git rebase upstream/master
git reset --hard upstream/master
git stash pop
git add .
git commit -m'YOUR AWESOME MESSAGE HERE'

@zachcp
Copy link
Author

zachcp commented Sep 28, 2013

You're probably just misunderstanding how rebasing works. - true,true.
@cpcloud - Thank you for taking the time to help me remedy that.

@cpcloud
Copy link
Member

cpcloud commented Sep 28, 2013

It's important to learn how to use git when contributing on GitHub. I can only go so far in helping you with snippets and such. I'm not sure what the best way to learn git is. Personally, I didn't start understanding how it worked from reading a bunch of tutorials/rants about it on the web, I actually had to produce things with it and thereby f**k up many, many times and figure out why I messed up.

Good thing about git is, it seems to know that you'll misuse (abuse?) it. It remembers practically everything you do which makes learning it slightly less stressful than if you weren't able to recover almost everything. I say almost, because of course git doesn't prevent you from shooting yourself in the foot, but then, why are you aiming at your foot in the first place? 😄

@jorisvandenbossche
Copy link
Member

@zachcp I tried out your branch, and I think there are still some issues with it. When trying out some df.plot(..., kind='scatter') plots, I encountered following issues:

  • The subplots=True keyword does not seem to do something useful (although you tested for it in the test). With the example in the test I get three identical subplots. E.g. with kind='bar', this keyword has no effect when both x and y are specified.
  • When trying df.plot(kind='scatter') (so without specifying x and y, I get the error message "UnboundLocalError: local variable 'ser' referenced before assignment", because here https://github.com/zachcp/pandas/blob/30da599b27fa5d26236ccc03fce439ead87567a3/pandas/tools/plotting.py#L1676 ser is not known.
  • The same goes for when only x is specified, and only specifying y also yields an (other) error.
  • The x and y label can be added as with the original scatter_plot.

I think you should try out some examples and some corner cases to test the code you are writing.

@jtratner
Copy link
Contributor

jtratner commented Oct 1, 2013

@Zachp can you run flake8 on this file? It'll reveal many of these errors.

@cpcloud
Copy link
Member

cpcloud commented Oct 3, 2013

@zachcp would like to get this in for 0.13... think u can get @jorisvandenbossche's issues addressed in the next 2 weeks-ish?

@zachcp
Copy link
Author

zachcp commented Oct 4, 2013

@cpcloud and @jorisvandenbossche

I updated the scatter plot class and tests to address several of Joris's issues. You will raise a ValueErrorif you try to use kind=scatter without specifying both x and y. So at a minimum the scatterplot works. However there are a few things that we might want to include to get feature parity with the scatter_plot() function (as Joris has suggested):

  • axis labels
  • the by keyword for grouping sets of plots
  • others?

Looking through the code it looks like plotting was developed in parallel with functions like grouped_hist, scatter_plot, and scatter_matrix sharing code while each plotting class/klass was written independently. I am more familiar with the style used in the functions than with that used by the classes, but could take a stab at implementing some of these features.

In realistic terms I can probably get the axis labels incorporated in time. The logic for subplotting using the 'by' keyword (subplots) may take longer.

@jorisvandenbossche
Copy link
Member

@zachcp At the moment, the code in your PR is not working (even not for the most simple case). You should really try it yourself if it's working or not.

But apart from that, it are only minor bugs why it is not working, and for the rest it is almost ready I think!
See my inline comments.

@zachcp
Copy link
Author

zachcp commented Oct 14, 2013

Thanks @jorisvandenbossche . I updated the code as per your suggestions. I'm deeply sorry about the code not working. I certainly value the time you are putting in and am not intentionally making more work for you. I broke my matplotlib installation on my home computer when removing MacPorts and the fix has eluded me. Right now plotting is broken on my home computer, however as I am using Github to syn code between computers I posted untested code; I should have noted that.

However, I believe this recent PR corrects the issues you have mentioned and should work, but I cannot check until tomorrow at work. Hopefully that will be sufficient and I can do a final rebase/squash. Thanks again.

MPLPlot.__init__(self, data, **kwargs)
self.kwds.setdefault('c', self.plt.rcParams['patch.facecolor'])
self.x = x
self.y = y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put these lines at the end of the __init__ function? Because now the following lines have no effect on self.x/y, which are used in _make_plot and _post_plot_logic (or you can also use self.x/y in the following lines)

@jorisvandenbossche
Copy link
Member

@zachcp Only a last comment now from my side!

@jorisvandenbossche
Copy link
Member

@Zachp You should also see if the tests pass. Can you set up Travis? You can find some explanation here to do that: https://github.com/pydata/pandas/blob/master/CONTRIBUTING.md#getting-travis-ci-going (really easy, you can just use your github account). The only thing you will have to do is push again to trigger Travis (eg a rebase and force push)

@zachcp
Copy link
Author

zachcp commented Oct 16, 2013

@jorisvandenbossche the last build passed Travis-CI. It looked good on the Travis-CI website although I don't know why it reflect here. I just rebased and pushed with --force so hopefully we'll be all good

@jreback
Copy link
Contributor

jreback commented Oct 16, 2013

@zachcp the travis status wil lreport here if you have followed all of the instructions; pls make sure that you enabled it on the github side as well

@zachcp
Copy link
Author

zachcp commented Oct 16, 2013

@jreback I checked my Travis status and everything seems to check out. Travis is testing commits and on the Github side the webhook Test Hook for Travis passes (https://github.com/zachcp/pandas/settings/hooks). Perhaps Travis doesn't have the required permissions? That would make sense but I did not change the default Travis permissions when authorizing it through Github.

I just did a fresh rebase/push.

@jreback
Copy link
Contributor

jreback commented Oct 16, 2013

@zachcp looks like travis is stalled.....

@jorisvandenbossche
Copy link
Member

@jreback maybe also something with pandas/travis or github itself? Because eg this PR #4689 pushed an hour ago, but no travis indication. My PR #5241 of a couple of hours ago, no travis indication, ..

@jorisvandenbossche
Copy link
Member

@jreback Ah, I hadn't yet seen your comment:-)

@jreback
Copy link
Contributor

jreback commented Oct 16, 2013

this passed (travis still hasn't updated github..but I think they are catching up)...

any more comments, @jorisvandenbossche ? @jtratner ?

going to merge

@jreback
Copy link
Contributor

jreback commented Oct 16, 2013

@zachcp

actually I realized, need to update the docstring for df.plot and pls add a release notes entry in doc/source/release.rst referencing the original issue (1-liner), in the API section

@zachcp
Copy link
Author

zachcp commented Oct 16, 2013

@jreback

which docstring are you referring to? the plot_frame() docstrings have been modified.

@jreback
Copy link
Contributor

jreback commented Oct 16, 2013

@zachcp yes...you are right...didn't see that...ok...pls add a release notes entry and you can put the same entry in v0.13.0 as well (in API changes)

@zachcp
Copy link
Author

zachcp commented Oct 16, 2013

@jreback OK. I got the v.0.13.0 (in API changes) but where else in release notes do you want it. It seems to me all of the release notes are in sub-categories and I've added a line under API changes

@jreback
Copy link
Contributor

jreback commented Oct 16, 2013

doc/source/release.rst under API changes for 0.13.0

@zachcp
Copy link
Author

zachcp commented Oct 16, 2013

done.

@@ -210,6 +210,7 @@ API Changes
- Default export for ``to_clipboard`` is now csv with a sep of `\t` for
compat (:issue:`3368`)
- ``at`` now will enlarge the object inplace (and return the same) (:issue:`2578`)
- `df.plot()` can now use kind=`scatter` (:issue:`3473`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be issue 2215

@zachcp
Copy link
Author

zachcp commented Oct 16, 2013

@cpcloud @jorisvandenbossche @jreback

Thank you for your time, suggestions, and patience during this PR; I learned quite a bit.

@@ -210,6 +210,7 @@ API Changes
- Default export for ``to_clipboard`` is now csv with a sep of `\t` for
compat (:issue:`3368`)
- ``at`` now will enlarge the object inplace (and return the same) (:issue:`2578`)
- `df.plot()` can now use kind=`scatter` (:issue:`2215`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really minor comment: this should be double backquotes for df.plot and normal quotes around scatter (to nitpick on rst things).

Double backquotes are for inline literal (so eg code), while single backquotes are for 'interpreted' text (text with a certain role associated, eg the :issue:2578``

@jorisvandenbossche
Copy link
Member

@jreback For me it is OK, but maybe someone else could also give a last review.

@zachcp You're welcome, glad you learned a lot!

@jorisvandenbossche
Copy link
Member

I just saw this: http://stackoverflow.com/questions/19415027/python-pandas-scatterplot-error-is-this-a-bug-with-pandas, saying something like this will work:

df.plot(kind='scatter', x='a', y='b', c='c', s=df.d * 100) 

But at the moment I suppose sepcifying columns as strings will only work for x and y, not for c or s. Would be an interesting idea to explore, but maybe not so simple (how do you distuinguish between a color string or a column string for c?). And maybe also for another PR, let's finish this one first.

@jtratner
Copy link
Contributor

Are all the back quote issues fixed? I thought single back quotes were a
syntax error (since they're supposed to be for arguments to things like
:issue: or links)

@jreback
Copy link
Contributor

jreback commented Oct 17, 2013

i'll fix this on merge

@jreback jreback merged commit dfd92c7 into pandas-dev:master Oct 17, 2013
@jreback
Copy link
Contributor

jreback commented Oct 17, 2013

merged, thanks @zachcp

@jorisvandenbossche
Copy link
Member

@jtratner I looked it up, and the single backquotes is defined as 'interpreted text' (http://docutils.sourceforge.net/docs/user/rst/quickref.html#inline-markup), and you can give different 'interpreted text roles' as ..`_` for internal link or `:func:`... to link a function, etc (http://sphinx-doc.org/markup/inline.html).
When you don't specify the role, it uses the 'default role', which has in Sphinx no special meaning, but this can be configured (http://sphinx-doc.org/config.html#confval-default_role). However, I tried it once in a docstring, and the default role of interpreted text appeared then just as italic.

@zachcp zachcp deleted the scatterplot branch October 17, 2013 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

integrate scatter plot into DataFrame/Series plot (or scatter) method
5 participants