Skip to content

Regression in casting Series to DataFrame with .name='foo' and columns=['bar'] #7893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
qwhelan opened this issue Jul 31, 2014 · 7 comments · Fixed by #41389
Closed

Regression in casting Series to DataFrame with .name='foo' and columns=['bar'] #7893

qwhelan opened this issue Jul 31, 2014 · 7 comments · Fixed by #41389
Labels
Constructors Series/DataFrame/Index/pd.array Constructors good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@qwhelan
Copy link
Contributor

qwhelan commented Jul 31, 2014

xref #13421 with a MultiIndex as the columns

Hi,

I encountered an edge case in DataFrame initialization with something like the following:

In [1]: from pandas import *
In [2]: s = Series(1, name='foo')
In [3]: df = DataFrame(s, columns=['bar'])
In [4]: df

Empty DataFrame
Columns: [bar]
Index: []

This happens in both 0.14.1 and 0.13.1, but this isn't really a bug as the docs exclude Series as a valid type for data=. That being said, this casting appears to work whenever .name is None or when .name equals what's passed to columns=, so failure in this particular case is rather surprising.

The mechanism appears to be:

  • DataFrame.__init__ upgrades .name to the column name, if it is not None
  • Then, the data columns are sliced with the list passed to columns=, resulting in an empty data set when the two differ.
  • This seems to only occur when a Series is directly passed as data=. I can't get this to occur with [Series, ...] or a dict of Series.

The options I see are (in order of my personal preference):

  • do the implicit rename (only occurs with single Series, so no ambiguity)
  • just don't allow a Series being passed as data=.
  • throw an exception due to the ambiguity

I don't see just documenting this behavior as being viable, as this edge case effectively leads to data loss.

@jreback
Copy link
Contributor

jreback commented Jul 31, 2014

it's not documented as this is how it used to work quite a while ago

use to_frame()

@qwhelan
Copy link
Contributor Author

qwhelan commented Jul 31, 2014

Ok, thanks. It seems like this is a regression, unless this is considered deprecated. The above does not occur on 0.8.0 (when the code that hit this was written) or 0.12.0 but does for 0.13.0, so probably showed up as part of the internal rewrite.

@jreback
Copy link
Contributor

jreback commented Jul 31, 2014

it might have slipped thru because wasn't tested (we actually have an amazing number of constructor tests)

but it is pretty consistent in that the columns/index that are passed in act as reindexers

that said the boat has sailed in this
in that changing it a no go (and to_frame) does it all

@qwhelan
Copy link
Contributor Author

qwhelan commented Jul 31, 2014

Ok. Just for reference, git bisect identified the following as suspect commits (many of these were skipped due to syntax issues):
03158a1
217bec2
4493bf3
7b09a3c
370f8c8
8ee0a89
7f31567

@qwhelan qwhelan changed the title Surprising behavior when casting Series to DataFrame with .name='foo' and columns=['bar'] Regression in casting Series to DataFrame with .name='foo' and columns=['bar'] Jul 31, 2014
@jreback
Copy link
Contributor

jreback commented Jul 31, 2014

Here's my answer to the same question: http://stackoverflow.com/questions/21984862/creating-pandas-dataframe-and-renaming-change-0-10-0-to-0-13-1

If you'd like to put up a couple of tests for the current behavior would be great.

@qwhelan
Copy link
Contributor Author

qwhelan commented Jul 31, 2014

@jreback Thanks for the pointer, I'll put together some tests over the weekend.

@jreback jreback added the Testing label Aug 1, 2014
@jreback jreback added this to the 0.15.0 milestone Aug 1, 2014
@jreback
Copy link
Contributor

jreback commented Sep 9, 2014

@qwhelan ?

@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 9, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@mroeschke mroeschke added Constructors Series/DataFrame/Index/pd.array Constructors good first issue Needs Tests Unit test(s) needed to prevent regressions and removed API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode Testing pandas testing functions or related to the test suite labels May 22, 2020
@simonjayhawkins simonjayhawkins modified the milestones: Contributions Welcome, 1.3 May 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
4 participants