Doc: Adds example of exploding lists into columns instead of storing in dataframe cells #23041

mgautam98 · 2018-10-08T10:30:54Z

closes DOC: section on caveats of storing lists inside DataFrame/Series #17027

This is the continuation of PR #19215

Restores: pandas-dev#17027

…R19215

…de more fixes

…in dataframe cells

codecov · 2018-10-08T11:38:03Z

Codecov Report

Merging #23041 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #23041   +/-   ##
=======================================
  Coverage   92.19%   92.19%           
=======================================
  Files         169      169           
  Lines       50873    50873           
=======================================
  Hits        46904    46904           
  Misses       3969     3969

Flag	Coverage Δ
#multiple	`90.61% <ø> (ø)`	⬆️
#single	`42.32% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce1f81f...4952597. Read the comment docs.

WillAyd · 2018-10-08T16:10:01Z

doc/source/gotchas.rst

@@ -336,3 +336,94 @@ constructors using something similar to the following:
 See `the NumPy documentation on byte order
 <https://docs.scipy.org/doc/numpy/user/basics.byteswapping.html>`__ for more
 details.
+
+
+Alternative to storing lists in DataFrame Cells


This isn't an "alternative" to lists as much as a way of just reshaping values; this is better phrased as just "Exploding List Items" or something to the effect

WillAyd · 2018-10-08T16:10:27Z

doc/source/gotchas.rst

+
+Alternative to storing lists in DataFrame Cells
+-----------------------------------------------
+Storing nested lists/arrays inside a pandas object should be avoided for performance and memory use reasons. Instead they should be "exploded" into a flat ``DataFrame`` structure.


Related to comment above this isn't an "alternative" to using lists within a DataFrame

WillAyd · 2018-10-08T16:12:08Z

doc/source/gotchas.rst

+
+.. ipython:: python
+
+   df = pd.DataFrame({'name': ['A.J. Price'] * 3, 


I believe this is copied from the following SO article:

https://stackoverflow.com/questions/32468402/how-to-explode-a-list-inside-a-dataframe-cell-into-separate-rows

Need to be careful copy / pasting items from SO into the code base. Would have to get express permission from author to use

I think SO code snippets are CC BY-SA so as long as we link back to the source (which we should be doing anything) then we're good.

https://stackoverflow.com/help/licensing

jreback · 2018-10-09T12:27:11Z

doc/source/gotchas.rst

+   df
+
+Stack the new columns as rows; this creates a new index level we'll want to drop in the next step. 
+Note that at this point we have a Series, not a Dataframe


this needs to be in the cookbook instead. Please cut this down to a much simpler set of examples.

datapythonista · 2018-11-04T09:52:50Z

What I use to expand a Series with lists in the values to the corresponding DataFrame is:

import pandas

genres = pandas.Series([['drama', 'romance'], ['romance'], ['comedy', 'action']])
genres.str.join(',').str.get_dummies(',')

So, I think the content of this PR shouldn't be in the documentation (I know you just used the existing PR @mgautam98, sorry I didn't see that earlier).

May be a short entry with that to the cookbook could be useful.

Closing, if anybody disagrees, please reopen.

pdpark and others added 6 commits January 12, 2018 15:01

DOC: Adds example of alternative to storing lists in a Dataframe

e91444e

Restores: pandas-dev#17027

Doc: Fixes issues with code examples.

11ff8a7

Merge remote-tracking branch 'upstream/master' into doc-gotchas-17027

7ca0ce8

DOC: Add example of alternative to storing lists in a Dataframe fix P…

6d379b4

…R19215

DOC: Adds example of alternative to storing lists in a Dataframe - ma…

a5a9ec2

…de more fixes

Doc: Adds example of exploding lists into columns instead of storing …

4952597

…in dataframe cells

WillAyd requested changes Oct 8, 2018

View reviewed changes

WillAyd added the Docs label Oct 8, 2018

datapythonista mentioned this pull request Oct 9, 2018

Doc: Adds example of exploding lists into columns instead of storing in dataframe cells #19215

Closed

1 task

jreback requested changes Oct 9, 2018

View reviewed changes

datapythonista closed this Nov 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc: Adds example of exploding lists into columns instead of storing in dataframe cells #23041

Doc: Adds example of exploding lists into columns instead of storing in dataframe cells #23041

mgautam98 commented Oct 8, 2018 •

edited

Loading

codecov bot commented Oct 8, 2018 •

edited

Loading

WillAyd Oct 8, 2018

WillAyd Oct 8, 2018

WillAyd Oct 8, 2018

TomAugspurger Oct 8, 2018 •

edited

Loading

jreback Oct 9, 2018

datapythonista commented Nov 4, 2018


		.. ipython:: python

		df = pd.DataFrame({'name': ['A.J. Price'] * 3,

Doc: Adds example of exploding lists into columns instead of storing in dataframe cells #23041

Doc: Adds example of exploding lists into columns instead of storing in dataframe cells #23041

Conversation

mgautam98 commented Oct 8, 2018 • edited Loading

codecov bot commented Oct 8, 2018 • edited Loading

Codecov Report

WillAyd Oct 8, 2018

Choose a reason for hiding this comment

WillAyd Oct 8, 2018

Choose a reason for hiding this comment

WillAyd Oct 8, 2018

Choose a reason for hiding this comment

TomAugspurger Oct 8, 2018 • edited Loading

Choose a reason for hiding this comment

jreback Oct 9, 2018

Choose a reason for hiding this comment

datapythonista commented Nov 4, 2018

mgautam98 commented Oct 8, 2018 •

edited

Loading

codecov bot commented Oct 8, 2018 •

edited

Loading

TomAugspurger Oct 8, 2018 •

edited

Loading