-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Clustered heatmap #5646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clustered heatmap #5646
Conversation
Another issue that people may run into is that the clustering a large matrix will run into recursion limit issues, so there should be a way to smartly change |
I'll take a closer look this weekend. These are just some quick comments.
|
Re figs vs axes: you can access all axes of a fig with |
Also, here are the R doc for And the Matlab docs for |
What's the correct way to test if an object is None, if the other option is that it is a According to PEP8, what I'm doing here is wrong:
but when I do this as suggested by PEP8:
I get:
|
just do
|
…ngs and not necessarily indexable by a list of indices
That worked, thanks @jreback! I think I was getting the |
implicity calls |
…e other colorbar locations not implemented)
this should close #3497, or if not, pls create a separate issue for this? (just so we can put tags on it and reference it). |
I'm thinking a bit more about the API. It can be a bit intimidating to see a function signature with 15-20 keyword arguments. Things like And I guess this adds a new optional dependency with the |
if plot_df is None: | ||
plot_df = df | ||
|
||
if any(plot_df.index != df.index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is negligible, but you may want to use if (plot_df.index != df.index).any():
here; That uses the numpy any
instead of the python builtin. It's faster for big arrays.
It is quite overwhelming, but there a lot of aspects to control at a time. R's Another feature that both R and Matlab have is the ability to z-score-ify the matrix, i.e. normalize across either rows or columns such that 0 is the middle value. Should this be an option as well? One note is that the |
I can only say I'm extremely interested in this. Given that As a suggestion, I wonder if it would be possible to pass scipy dendrograms to the function, in case they are calculated from different algorithms than the base ones (e.g., when doing bootstrapped clustering). |
@lbeltrame that is a great idea! The way |
Type checking is evil, but IMO less evil than having more confusing options. I I know it sounds ugly, that's why I'm open to suggestions. ;) |
Likewise I'd change AssertionError (which is more like "this should never happen") throughout the code to ValueError ("incorrect input", in this case). I also second @TomAugspurger's comments, the matplotlib stuff should go into |
|
||
# heatmap with row names | ||
|
||
def get_width_ratios(half_width, side_colors, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a docstring to explain what this does, even if it's internal.
How should tests be written for figures? For |
@lbeltrame so both |
@olgabot why don't you have a |
@olgabot I'd be quite happy if we put in tests with |
It would need to be separate for columns and rows, so
So to address all of the above issues, I propose these acceptable arguments to
|
@olgabot - would you mind listing out the different outcome decisions |
The main outcomes are
Does that help? |
I agree with what your suggestion then, just tweaked such that anything Also, if you end up with On Mon, Dec 16, 2013 at 12:40 PM, Olga Botvinnik
|
@olgabot +1 on your idea with API. With regards to linkage and clustering methods, IMO it makes no sense to use two different methods for rows and columns (at least in my common usage), so I would rather avoid that. |
Can you update either the PR text or add here a TODO of what's left? Might be useful if someone else wants to contribute. BTW, how do you handle row text when rows are hundreds or even thousands? Even R can't get this quite right (I end up omitting row names in such cases). |
@olgabot looks like this PR found a more suitable home in seaborn, as discussed. Close? |
Yes sure Sent from my mobile device.
|
Hello there,
I'm working on a clustered heatmap figure for pandas, like R's
heatmap.2
which I used I ton when I was still in R-land.I'm having some trouble wrapping my mind around
pandas.tools.plotting
but I have a draft version of thisheatmap
function with some initial parameters in my fork: olgabot@4759d31 But I'm unsure how to proceed with plumbing this function into the currentpandas
plotting tools.Here's a notebook of a simple and complex example: http://nbviewer.ipython.org/gist/olgabot/7801024
Also, unlike other many plotting functions,
heatmap
creates a whole figure. I currently return thefig
instance along with the dendrogram objects for both the rows and columns in case the user wants to do some stats on them. It seems like for this function, accepting anax
argument (or even afig
) doesn't make any sense because there's at least 4 (up to 6 if you label columns or rows with colors)ax
instances to create.FYI besides adding documentation and tests and such I'm still working on:
pcolormesh
I'm also still working through all the developer FAQs and such so I'm probably making tons of n00b mistakes so please point them out to me, especially if there are some pandas/python conventions I'm totally not following.