-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Doc: Adds example of exploding lists into columns instead of storing in dataframe cells #23041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…in dataframe cells
Codecov Report
@@ Coverage Diff @@
## master #23041 +/- ##
=======================================
Coverage 92.19% 92.19%
=======================================
Files 169 169
Lines 50873 50873
=======================================
Hits 46904 46904
Misses 3969 3969
Continue to review full report at Codecov.
|
@@ -336,3 +336,94 @@ constructors using something similar to the following: | |||
See `the NumPy documentation on byte order | |||
<https://docs.scipy.org/doc/numpy/user/basics.byteswapping.html>`__ for more | |||
details. | |||
|
|||
|
|||
Alternative to storing lists in DataFrame Cells |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't an "alternative" to lists as much as a way of just reshaping values; this is better phrased as just "Exploding List Items" or something to the effect
|
||
Alternative to storing lists in DataFrame Cells | ||
----------------------------------------------- | ||
Storing nested lists/arrays inside a pandas object should be avoided for performance and memory use reasons. Instead they should be "exploded" into a flat ``DataFrame`` structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to comment above this isn't an "alternative" to using lists within a DataFrame
|
||
.. ipython:: python | ||
|
||
df = pd.DataFrame({'name': ['A.J. Price'] * 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is copied from the following SO article:
Need to be careful copy / pasting items from SO into the code base. Would have to get express permission from author to use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SO code snippets are CC BY-SA so as long as we link back to the source (which we should be doing anything) then we're good.
df | ||
|
||
Stack the new columns as rows; this creates a new index level we'll want to drop in the next step. | ||
Note that at this point we have a Series, not a Dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be in the cookbook instead. Please cut this down to a much simpler set of examples.
What I use to expand a import pandas
genres = pandas.Series([['drama', 'romance'], ['romance'], ['comedy', 'action']])
genres.str.join(',').str.get_dummies(',') So, I think the content of this PR shouldn't be in the documentation (I know you just used the existing PR @mgautam98, sorry I didn't see that earlier). May be a short entry with that to the cookbook could be useful. Closing, if anybody disagrees, please reopen. |
This is the continuation of PR #19215