-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: added a reference to DataFrame assign in concatenate section of merging #18665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #18665 +/- ##
==========================================
- Coverage 91.61% 91.6% -0.01%
==========================================
Files 150 150
Lines 48887 48864 -23
==========================================
- Hits 44786 44763 -23
Misses 4101 4101
Continue to review full report at Codecov.
|
doc/source/merging.rst
Outdated
@@ -310,7 +310,8 @@ Concatenating with mixed ndims | |||
|
|||
You can concatenate a mix of Series and DataFrames. The | |||
Series will be transformed to DataFrames with the column name as | |||
the name of the Series. | |||
the name of the Series. To concatenate only one Series it is preferable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show a mini-example inline here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show a mini-example inline here.
You mean a full example like this one (not fit for our case of course)?
.. ipython:: python
s1 = pd.Series(['X0', 'X1', 'X2', 'X3'], name='X')
result = pd.concat([df1, s1], axis=1)
.. ipython:: python
:suppress:
@savefig merging_concat_mixed_ndim.png
p.plot([df1, s1], result,
labels=['df1', 's1'], vertical=False);
plt.close('all');
If so, looking at the broader context of that area of the document: the use of pd.concat()
I feel like this would confuse the user more. We are in a pd.concat()
context for several examples in a row and this would introduce a single df.assign()
call example in between them. An in-paragraph example would be ok for me. Otherwise, I think we should have a title and cover the use case completely separately (right now it is under the Concatenating with mixed ndims title).
In any case, I have no strong opinion, just let me know how you want it done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. you can make a small sub-section for this.
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -262,7 +262,7 @@ Performance Improvements | |||
Documentation Changes | |||
~~~~~~~~~~~~~~~~~~~~~ | |||
|
|||
- | |||
- Added a reference to `DataFrame.assign` in the concatenate section of the merging documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use :func:`DataFrame.assign`
; add this issue number at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move this to 0.21.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -262,7 +262,7 @@ Performance Improvements | |||
Documentation Changes | |||
~~~~~~~~~~~~~~~~~~~~~ | |||
|
|||
- | |||
- Added a reference to `DataFrame.assign` in the concatenate section of the merging documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move this to 0.21.1
92e6ac3
to
b4d1943
Compare
doc/source/merging.rst
Outdated
@@ -310,7 +310,8 @@ Concatenating with mixed ndims | |||
|
|||
You can concatenate a mix of Series and DataFrames. The | |||
Series will be transformed to DataFrames with the column name as | |||
the name of the Series. | |||
the name of the Series. To concatenate only one Series it is preferable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. you can make a small sub-section for this.
b4d1943
to
0ed539f
Compare
I made the requested changes. |
doc/source/merging.rst
Outdated
Concatenating using ``assign`` | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
You can concatenate one Series to a DataFrame using the DataFrame method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add when you would do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what you want, so let me tell you why I'm sending this PR.
I learned about assign
while trying to make a SettingWithCopyWarning
warning go away. The proposed solution:
df.loc[:, ('derived_column')] = value * df.loc[:,('X0')]
was still generating a warning in my case. So I've read the doc about merging
and stumbled upon a pd.concat
solution:
df = pd.concat([df, value * df['X0']], axis=1)
df.columns.values[1] = "derived_column"
Comparing notes with a colleague she told me about assign
and I found that solution more elegant:
df = df.assign(derived_column = value * df['X0'])
So that's why I want to add it to merging
. Now that I think about it, should I add a reference to assign
in the indexing-view-versus-copy
doc also? I didn't do it because I don't understand why my original solution using .loc
still triggered the warning.
Back to your comment, I guess the when is when I want to add a column to a DataFrame but I already said that, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After submitting my comment, I understood why the .loc()
solution is causing a warning. I was already working on a view due to prior processing. To make sure I tested with:
df = df.copy() # making sure it is a full df and not a view
df.loc[:, ('derived_column')] = value * df.loc[:,('X0')]
and it didn't produce a warning.
can you rebase |
0ed539f
to
742f496
Compare
Rebase done. Updated changelog in version 0.23.0 as per milestone attributed. |
doc/source/merging.rst
Outdated
Concatenating using ``assign`` | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
You can concatenate one Series to a DataFrame using the DataFrame method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about this phrasing, and the section title.
Concatenation already has a well-defined meaning in pandas. This is similar, since df[col] = series
, which is what assign
is doing, will do alignment like concat
, but it's different since you're only adding a single column.
I'd almost prefer a simple like this at the start or end of the concat
documentation that says something like "concat combins many pandas objects into a single new one. If you want to to include a Series in a DataFrame as a new column, assign directly like ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please read my initial explanation as to why I believed this can send newcomers into the right direction quicker. My contribution was for that goal.
Now, it's been a while and I'm no longer new to this problem so I can't put myself back into that mindset anymore. All I remember is that the warning wouldn't go away and that the first solution I stumbled upon seemed too complex and assign
should have been proposed, while reading that merging documentation.
If you want to to include a Series in a DataFrame as a new column, assign directly like ...
Is exactly what the section right after my contribution proposes yet it is still in the merging
doc. I've put a reference to assign
in front of it to redirect people to a cleaner solution when they have that specific problem.
Moving towards patch acceptance: if I change Concatenating using
to Add a Series to a DataFrame using
in the title and in the text body, would that satisfy your concern?
Thanks for your feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than making a separate section, I would be ok with making this a ::note.. I think. Its just a terminology issue, might be helpful to understand what concatenate is different from assign.
doc/source/merging.rst
Outdated
Concatenating using ``assign`` | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
You can concatenate one Series to a DataFrame using the DataFrame method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than making a separate section, I would be ok with making this a ::note.. I think. Its just a terminology issue, might be helpful to understand what concatenate is different from assign.
742f496
to
d5d9c7a
Compare
Rewritten the contribution to be a simple note pointer to the |
doc/source/merging.rst
Outdated
@@ -323,6 +323,11 @@ the name of the ``Series``. | |||
labels=['df1', 's1'], vertical=False); | |||
plt.close('all'); | |||
|
|||
.. note:: | |||
|
|||
You can perform the same task using the DataFrame method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not true. I would say if you want to add a single series to an existing dataframe, then you want to use assign, otherwise you want to use concat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true only in this exact case of concatenating a single series to a single dataframe. @obilodeau it'd be good to point out difference. concat
is for concatenating an arbitrary number of pandas objects, while .assign
is only defined on DataFrame
, and only accepts 1-d argumnents. Something like
.. note::
Since we're concatenating a Series to a DataFrame, we could have achieved the same
output with :meth:`DataFrame.assign`. To concatenate an arbitrary number of pandas
objects (DataFrame or Series), use ``concat``.
d5d9c7a
to
42da11a
Compare
…merging Because of various stack overflow answers we are directed to the merging page where a link to DataFrame.assign would be better. Adding the reference here will make people's code better. At least it would have for me.
42da11a
to
22c1fcf
Compare
Updated text to reflect @TomAugspurger's suggestion based on @jreback's comments. |
thanks @obilodeau |
Because of various stack overflow answers we are directed to the merging
page where a link to DataFrame.assign would be better. Adding the
reference here will make people's code better. At least it would have
for me.
git diff upstream/master -u -- "*.py" | flake8 --diff
For tests, we'll see with CI.