Skip to content

DOC: Improve melt example (#23844) #28006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

MLopez-Ibanez
Copy link

@MLopez-Ibanez MLopez-Ibanez commented Aug 18, 2019

@TomAugspurger
Copy link
Contributor

What's the motivation for each of these changes? What's unclear about the original?

@MLopez-Ibanez
Copy link
Author

MLopez-Ibanez commented Aug 19, 2019 via email

@datapythonista
Copy link
Member

-1 on this, the example seems worse. Also, just removing the image will make it broken in the website, the call should be removed too.

@jalammar may be you want to have a look at this and the associated issue?

@MLopez-Ibanez
Copy link
Author

-1 on this, the example seems worse. Also, just removing the image will make it broken in the website, the call should be removed too.

Removing the image? The patch does not remove the image. It replaces it with an updated image to fix the df3->cheese issue and to show an example where melt is actually useful/sensible.

Quoting from bug #23844:

Normally, melt would be used to convert from wide to long data. However, the data on the left hand side of the example is already in long format (each variable corresponds to a different column) and the melt command is just creating a strange "thing" where the column value contains two different variables. I'm using this figure in my teaching as an example of what NOT to do when reshaping data.

A better example would be taken from: https://www.jstatsoft.org/article/view/v059i10

@datapythonista
Copy link
Member

My bad about the image, I misunderstood.

I see your point in the ticket, and having homogenous data in the melted columns makes sense, but I think this example is worse, don't use A, B or foo, bar variables, use something meaningful as the previous example.

Something like a dataframe for GDP, having the id column as a country, and then having columns 2016, 2017 and 2018 with the values could be a good example, easy to understand. Made up values in A, B columns is not.

Also, not sure why the name of the DataFrame is cheese, but that doesn't seem to make sense, if I'm not missing anything.

Thanks!

@MLopez-Ibanez
Copy link
Author

Something like a dataframe for GDP, having the id column as a country, and then having columns 2016, 2017 and 2018 with the values could be a good example, easy to understand. Made up values in A, B columns is not.

Sure, but that would require much more work to update the figures as there are no sources for the figures. I can create new figures from scratch, but they won't match the style of the existing figures.

Also, not sure why the name of the DataFrame is cheese, but that doesn't seem to make sense, if I'm not missing anything.

No idea, it was there already in the example. I have only updated the figure to match the name in the existing example.

@jalammar
Copy link
Contributor

jalammar commented Aug 24, 2019

I have the original figure and I'll be glad to upload it somewhere. I still don't see the point of this PR, though. Changing "height' and 'weight' to A and B seems objectively worse.

There could be a benefit from an extra figure that has an additional column (perhaps Age?) to show 'wide' to 'long' format in a more pronounced way -- even though the figure already shows the purpose of melt and the data frame does get longer after the melt operation.

@WillAyd
Copy link
Member

WillAyd commented Aug 26, 2019

So if we just change cheese to df3 does that cover everything here? @MLopez-Ibanez want to make that change?

@jalammar
Copy link
Contributor

For future reference, I have uploaded the original assets here: It's a Keynote file:
https://jalammar.github.io/assets/pandas_melt.key

In the link below, I have also created:

  • a wide-to-long version of the original graphic
  • an annotated version showing all the fonts in the graphic in case anybody wants to revise the images using something other than Keynote

https://imgur.com/a/JtuVsRX

@datapythonista
Copy link
Member

Thanks for sharing those @jalammar.

I was thinking some days ago for the visualizations in the carousel in https://datapythonista.github.io/pandas-web/, that would be nice to have them as svg. Not sure if it'll be tricky to create them, but I think that should make the files smaller, and also we don't need to keep track of the source files, since svg files are self contained.

If that's finally a good idea, we can probably do the same with the files you shared.

@jalammar
Copy link
Contributor

@datapythonista We can give it a shot. What software did you use to create those versions?

Keynote does not export into SVG directly, but I'll see what options are out there to make that conversion. This is one option I've seen. Another I've heard of is to export to PDF then open in Illustrator and attempt to export to SVG from there.

@datapythonista
Copy link
Member

I used Google drive. It can export directly to svg, even if the svg looks more complex and bigger than what it could be.

You can see those slides here: https://docs.google.com/presentation/d/1Yub5E6_Pto3WJaT_vqpwp9thlbuKqxJQclVmuZXBmu8/edit?usp=sharing

I added shadows, but that was editing the file later with gimp, but not even sure if that makes things look better.

@datapythonista
Copy link
Member

Moved the conversation about the carousel to #28168.

Since there is no agreement on the changes in this PR, I'm closing. @MLopez-Ibanez feel free to open a new PR based on the provided feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

poor melt example in documentation
5 participants