DOC: Improve melt example (#23844) #28006

MLopez-Ibanez · 2019-08-18T23:30:19Z

closes poor melt example in documentation #23844

TomAugspurger · 2019-08-19T02:51:16Z

What's the motivation for each of these changes? What's unclear about the original?

MLopez-Ibanez · 2019-08-19T09:06:11Z

The original is a poor example of melt (properly speaking, wide to long), as it doesn't serve any purpose to have such two different features under the same column. The proposed change in the PNG not only reflects the example more closely (df3->cheese) but it is also a better example of wide (same feature spread in two columns) to long. See also de description in the original bug report. At the end, I decided to minimise the changes rather than copy the exact example from the JSS paper cited in the bug report.

datapythonista · 2019-08-19T13:12:54Z

-1 on this, the example seems worse. Also, just removing the image will make it broken in the website, the call should be removed too.

@jalammar may be you want to have a look at this and the associated issue?

MLopez-Ibanez · 2019-08-19T13:46:33Z

-1 on this, the example seems worse. Also, just removing the image will make it broken in the website, the call should be removed too.

Removing the image? The patch does not remove the image. It replaces it with an updated image to fix the df3->cheese issue and to show an example where melt is actually useful/sensible.

Quoting from bug #23844:

Normally, melt would be used to convert from wide to long data. However, the data on the left hand side of the example is already in long format (each variable corresponds to a different column) and the melt command is just creating a strange "thing" where the column value contains two different variables. I'm using this figure in my teaching as an example of what NOT to do when reshaping data.

A better example would be taken from: https://www.jstatsoft.org/article/view/v059i10

datapythonista · 2019-08-19T13:56:39Z

My bad about the image, I misunderstood.

I see your point in the ticket, and having homogenous data in the melted columns makes sense, but I think this example is worse, don't use A, B or foo, bar variables, use something meaningful as the previous example.

Something like a dataframe for GDP, having the id column as a country, and then having columns 2016, 2017 and 2018 with the values could be a good example, easy to understand. Made up values in A, B columns is not.

Also, not sure why the name of the DataFrame is cheese, but that doesn't seem to make sense, if I'm not missing anything.

Thanks!

MLopez-Ibanez · 2019-08-24T18:42:54Z

Something like a dataframe for GDP, having the id column as a country, and then having columns 2016, 2017 and 2018 with the values could be a good example, easy to understand. Made up values in A, B columns is not.

Sure, but that would require much more work to update the figures as there are no sources for the figures. I can create new figures from scratch, but they won't match the style of the existing figures.

Also, not sure why the name of the DataFrame is cheese, but that doesn't seem to make sense, if I'm not missing anything.

No idea, it was there already in the example. I have only updated the figure to match the name in the existing example.

jalammar · 2019-08-24T19:12:51Z

I have the original figure and I'll be glad to upload it somewhere. I still don't see the point of this PR, though. Changing "height' and 'weight' to A and B seems objectively worse.

There could be a benefit from an extra figure that has an additional column (perhaps Age?) to show 'wide' to 'long' format in a more pronounced way -- even though the figure already shows the purpose of melt and the data frame does get longer after the melt operation.

WillAyd · 2019-08-26T02:18:50Z

So if we just change cheese to df3 does that cover everything here? @MLopez-Ibanez want to make that change?

jalammar · 2019-08-26T05:59:17Z

For future reference, I have uploaded the original assets here: It's a Keynote file:
https://jalammar.github.io/assets/pandas_melt.key

In the link below, I have also created:

a wide-to-long version of the original graphic
an annotated version showing all the fonts in the graphic in case anybody wants to revise the images using something other than Keynote

https://imgur.com/a/JtuVsRX

datapythonista · 2019-08-26T09:37:19Z

Thanks for sharing those @jalammar.

I was thinking some days ago for the visualizations in the carousel in https://datapythonista.github.io/pandas-web/, that would be nice to have them as svg. Not sure if it'll be tricky to create them, but I think that should make the files smaller, and also we don't need to keep track of the source files, since svg files are self contained.

If that's finally a good idea, we can probably do the same with the files you shared.

jalammar · 2019-08-27T04:58:36Z

@datapythonista We can give it a shot. What software did you use to create those versions?

Keynote does not export into SVG directly, but I'll see what options are out there to make that conversion. This is one option I've seen. Another I've heard of is to export to PDF then open in Illustrator and attempt to export to SVG from there.

datapythonista · 2019-08-27T07:50:14Z

I used Google drive. It can export directly to svg, even if the svg looks more complex and bigger than what it could be.

You can see those slides here: https://docs.google.com/presentation/d/1Yub5E6_Pto3WJaT_vqpwp9thlbuKqxJQclVmuZXBmu8/edit?usp=sharing

I added shadows, but that was editing the file later with gimp, but not even sure if that makes things look better.

datapythonista · 2019-08-27T08:13:19Z

Moved the conversation about the carousel to #28168.

Since there is no agreement on the changes in this PR, I'm closing. @MLopez-Ibanez feel free to open a new PR based on the provided feedback.

MLopez-Ibanez added 2 commits August 19, 2019 00:23

DOC: Improve melt example (pandas-dev#23844)

1595e4a

Merge remote-tracking branch 'upstream/master'

fcc80e6

datapythonista added the Docs label Aug 19, 2019

datapythonista closed this Aug 27, 2019

MLopez-Ibanez mentioned this pull request Sep 11, 2019

poor melt example in documentation #23844

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Improve melt example (#23844) #28006

DOC: Improve melt example (#23844) #28006

MLopez-Ibanez commented Aug 18, 2019 •

edited

Loading

TomAugspurger commented Aug 19, 2019

MLopez-Ibanez commented Aug 19, 2019 via email

datapythonista commented Aug 19, 2019

MLopez-Ibanez commented Aug 19, 2019

datapythonista commented Aug 19, 2019

MLopez-Ibanez commented Aug 24, 2019

jalammar commented Aug 24, 2019 •

edited

Loading

WillAyd commented Aug 26, 2019

jalammar commented Aug 26, 2019

datapythonista commented Aug 26, 2019

jalammar commented Aug 27, 2019

datapythonista commented Aug 27, 2019

datapythonista commented Aug 27, 2019

DOC: Improve melt example (#23844) #28006

DOC: Improve melt example (#23844) #28006

Conversation

MLopez-Ibanez commented Aug 18, 2019 • edited Loading

TomAugspurger commented Aug 19, 2019

MLopez-Ibanez commented Aug 19, 2019 via email

datapythonista commented Aug 19, 2019

MLopez-Ibanez commented Aug 19, 2019

datapythonista commented Aug 19, 2019

MLopez-Ibanez commented Aug 24, 2019

jalammar commented Aug 24, 2019 • edited Loading

WillAyd commented Aug 26, 2019

jalammar commented Aug 26, 2019

datapythonista commented Aug 26, 2019

jalammar commented Aug 27, 2019

datapythonista commented Aug 27, 2019

datapythonista commented Aug 27, 2019

MLopez-Ibanez commented Aug 18, 2019 •

edited

Loading

jalammar commented Aug 24, 2019 •

edited

Loading