Skip to content

DOC: added a reference to DataFrame assign in concatenate section of merging #18665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 20, 2018

Conversation

obilodeau
Copy link
Contributor

Because of various stack overflow answers we are directed to the merging
page where a link to DataFrame.assign would be better. Adding the
reference here will make people's code better. At least it would have
for me.

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

For tests, we'll see with CI.

@codecov
Copy link

codecov bot commented Dec 7, 2017

Codecov Report

Merging #18665 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18665      +/-   ##
==========================================
- Coverage   91.61%    91.6%   -0.01%     
==========================================
  Files         150      150              
  Lines       48887    48864      -23     
==========================================
- Hits        44786    44763      -23     
  Misses       4101     4101
Flag Coverage Δ
#multiple 89.98% <ø> (-0.01%) ⬇️
#single 41.77% <ø> (-0.03%) ⬇️
Impacted Files Coverage Δ
pandas/core/ops.py 96.62% <0%> (-0.13%) ⬇️
pandas/core/panel.py 97.3% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 740ad9a...22c1fcf. Read the comment docs.

@@ -310,7 +310,8 @@ Concatenating with mixed ndims

You can concatenate a mix of Series and DataFrames. The
Series will be transformed to DataFrames with the column name as
the name of the Series.
the name of the Series. To concatenate only one Series it is preferable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show a mini-example inline here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show a mini-example inline here.

You mean a full example like this one (not fit for our case of course)?

.. ipython:: python

   s1 = pd.Series(['X0', 'X1', 'X2', 'X3'], name='X')
   result = pd.concat([df1, s1], axis=1)

.. ipython:: python
   :suppress:

   @savefig merging_concat_mixed_ndim.png
   p.plot([df1, s1], result,
          labels=['df1', 's1'], vertical=False);
   plt.close('all');

If so, looking at the broader context of that area of the document: the use of pd.concat() I feel like this would confuse the user more. We are in a pd.concat() context for several examples in a row and this would introduce a single df.assign() call example in between them. An in-paragraph example would be ok for me. Otherwise, I think we should have a title and cover the use case completely separately (right now it is under the Concatenating with mixed ndims title).

In any case, I have no strong opinion, just let me know how you want it done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. you can make a small sub-section for this.

@@ -262,7 +262,7 @@ Performance Improvements
Documentation Changes
~~~~~~~~~~~~~~~~~~~~~

-
- Added a reference to `DataFrame.assign` in the concatenate section of the merging documentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use :func:`DataFrame.assign` ; add this issue number at the end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can move this to 0.21.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jreback jreback added Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 7, 2017
@@ -262,7 +262,7 @@ Performance Improvements
Documentation Changes
~~~~~~~~~~~~~~~~~~~~~

-
- Added a reference to `DataFrame.assign` in the concatenate section of the merging documentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can move this to 0.21.1

@@ -310,7 +310,8 @@ Concatenating with mixed ndims

You can concatenate a mix of Series and DataFrames. The
Series will be transformed to DataFrames with the column name as
the name of the Series.
the name of the Series. To concatenate only one Series it is preferable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. you can make a small sub-section for this.

@obilodeau
Copy link
Contributor Author

I made the requested changes.

Concatenating using ``assign``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can concatenate one Series to a DataFrame using the DataFrame method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add when you would do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you want, so let me tell you why I'm sending this PR.

I learned about assign while trying to make a SettingWithCopyWarning warning go away. The proposed solution:

df.loc[:, ('derived_column')] = value * df.loc[:,('X0')]

was still generating a warning in my case. So I've read the doc about merging and stumbled upon a pd.concat solution:

df = pd.concat([df, value * df['X0']], axis=1)
df.columns.values[1] = "derived_column"

Comparing notes with a colleague she told me about assign and I found that solution more elegant:

df = df.assign(derived_column = value * df['X0'])

So that's why I want to add it to merging. Now that I think about it, should I add a reference to assign in the indexing-view-versus-copy doc also? I didn't do it because I don't understand why my original solution using .loc still triggered the warning.

Back to your comment, I guess the when is when I want to add a column to a DataFrame but I already said that, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After submitting my comment, I understood why the .loc() solution is causing a warning. I was already working on a view due to prior processing. To make sure I tested with:

df = df.copy() # making sure it is a full df and not a view
df.loc[:, ('derived_column')] = value * df.loc[:,('X0')]

and it didn't produce a warning.

@jreback jreback added this to the 0.21.1 milestone Dec 9, 2017
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.21.1, 0.22.0 Dec 10, 2017
@jreback
Copy link
Contributor

jreback commented Feb 11, 2018

can you rebase

@obilodeau
Copy link
Contributor Author

Rebase done. Updated changelog in version 0.23.0 as per milestone attributed.

Concatenating using ``assign``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can concatenate one Series to a DataFrame using the DataFrame method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about this phrasing, and the section title.

Concatenation already has a well-defined meaning in pandas. This is similar, since df[col] = series, which is what assign is doing, will do alignment like concat, but it's different since you're only adding a single column.

I'd almost prefer a simple like this at the start or end of the concat documentation that says something like "concat combins many pandas objects into a single new one. If you want to to include a Series in a DataFrame as a new column, assign directly like ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please read my initial explanation as to why I believed this can send newcomers into the right direction quicker. My contribution was for that goal.

Now, it's been a while and I'm no longer new to this problem so I can't put myself back into that mindset anymore. All I remember is that the warning wouldn't go away and that the first solution I stumbled upon seemed too complex and assign should have been proposed, while reading that merging documentation.

If you want to to include a Series in a DataFrame as a new column, assign directly like ...

Is exactly what the section right after my contribution proposes yet it is still in the merging doc. I've put a reference to assign in front of it to redirect people to a cleaner solution when they have that specific problem.

Moving towards patch acceptance: if I change Concatenating using to Add a Series to a DataFrame using in the title and in the text body, would that satisfy your concern?

Thanks for your feedback!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than making a separate section, I would be ok with making this a ::note.. I think. Its just a terminology issue, might be helpful to understand what concatenate is different from assign.

Concatenating using ``assign``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can concatenate one Series to a DataFrame using the DataFrame method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than making a separate section, I would be ok with making this a ::note.. I think. Its just a terminology issue, might be helpful to understand what concatenate is different from assign.

@jreback jreback removed this from the 0.23.0 milestone Feb 18, 2018
@obilodeau
Copy link
Contributor Author

Rewritten the contribution to be a simple note pointer to the DataFrame.assign() function. I noticed that in the meantime the assign() documentation was bonified and includes a column assignment using values derived from the dataframe. This is exactly what I hoped I'd found when I was looking for it in the merging documentation.

@@ -323,6 +323,11 @@ the name of the ``Series``.
labels=['df1', 's1'], vertical=False);
plt.close('all');

.. note::

You can perform the same task using the DataFrame method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not true. I would say if you want to add a single series to an existing dataframe, then you want to use assign, otherwise you want to use concat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true only in this exact case of concatenating a single series to a single dataframe. @obilodeau it'd be good to point out difference. concat is for concatenating an arbitrary number of pandas objects, while .assign is only defined on DataFrame, and only accepts 1-d argumnents. Something like

..  note::

   Since we're concatenating a Series to a DataFrame, we could have achieved the same
   output with :meth:`DataFrame.assign`. To concatenate an arbitrary number of pandas
   objects (DataFrame or Series), use ``concat``.

…merging

Because of various stack overflow answers we are directed to the merging
page where a link to DataFrame.assign would be better. Adding the
reference here will make people's code better. At least it would have
for me.
@obilodeau
Copy link
Contributor Author

Updated text to reflect @TomAugspurger's suggestion based on @jreback's comments.

@jreback jreback added this to the 0.23.0 milestone Feb 20, 2018
@jreback jreback merged commit 3da7b1f into pandas-dev:master Feb 20, 2018
@jreback
Copy link
Contributor

jreback commented Feb 20, 2018

thanks @obilodeau

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants