Skip to content

Assignment via .loc #27286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cruzzoe opened this issue Jul 8, 2019 · 5 comments
Closed

Assignment via .loc #27286

cruzzoe opened this issue Jul 8, 2019 · 5 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@cruzzoe
Copy link
Contributor

cruzzoe commented Jul 8, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({'Data': ['afoo abc: agsegsegs', 'b def: eafsegsg', 'c ghi:']})
df2 = df

df2.loc[:, 'foo2']  = df2.Data.fillna('').astype(str).str.extract(r'.*(\w{3}).*')
print df2['foo2']

Problem description

When printing df2['foo2'] I get:
0 NaN
1 NaN
2 NaN

If instead I change the line assigning to 'foo2' to:

df2['foo2']  = df2.Data.fillna('').astype(str).str.extract(r'.*(\w{3}).*')

When printing df2['foo2'] I get:

0 egs
1 gsg
2 ghi

I would have expected df.loc[:, 'foo2'] and df['foo2'] assignments to behave the same way.

Output of pd.show_versions()

pandas: 0.23.4
numpy: 1.15.4
python: 2.7.14.final.0

@TomAugspurger
Copy link
Contributor

Simpler example

In [23]: df = pd.DataFrame({"A": [1, 2, 3]})

In [24]: df.loc[:, 'B'] = pd.DataFrame({"C": [1, 2, 3]})

In [25]: df
Out[25]:
   A   B
0  1 NaN
1  2 NaN
2  3 NaN

You're getting tripped up by automatic alignment. The names in the value dataframe ('C') don't match the target 'B'. I'm not sure whether or not this is a bug (cc @jreback @toobaz). It may have slightly different semantics from DataFrame.__setitem__[key], so the difference from df['foo2'] = ... may be correct.

However, I would think it has the same semantics as DataFrame.__setitem__[Sequence[key]]

In [30]: df = pd.DataFrame({"A": [1, 2, 3]})

In [31]: df[['B']] = pd.DataFrame({"C": [4, 5, 6]})

In [32]: df
Out[32]:
   A  B
0  1  4
1  2  5
2  3  6

@TomAugspurger TomAugspurger added the Indexing Related to indexing on series/frames, not to indexes themselves label Jul 8, 2019
@cruzzoe
Copy link
Contributor Author

cruzzoe commented Jul 8, 2019

Thanks for your response. If it's not a bug I wasn't aware the behaviour of loc was as above.

@cruzzoe
Copy link
Contributor Author

cruzzoe commented Jul 8, 2019

The names in the value dataframe ('C') don't match the target 'B'.

This doesnt appear to actually matter. For example, you can have different names in the value df compared to the target df and get the expected results. What appears to matter is that the name in the column index in loc already exists in the target dataframe.

EDITED case below to values 9,9,9 in order to show df updating

See case:

df = pd.DataFrame({"A": [1, 2, 3]})
df.loc[:, 'A'] = pd.DataFrame({"ZZZ": [9, 9, 9]})
print df
   A
0  9
1  9
2  9

compared to:

df = pd.DataFrame({"A": [1, 2, 3]})
df.loc[:, 'D'] = pd.DataFrame({"ZZZ": [1, 2, 3]})
print df
   A   D
0  1 NaN
1  2 NaN
2  3 NaN

It could be my ignorance of using .loc but I was not aware there was constraint that the column name had to exist in advance. If that's the case, then why does the below work:

df = pd.DataFrame({"A": [1, 2, 3]})
df.loc[:, 'E'] = 4
print df
   A  EE
0  1   4
1  2   4
2  3   4

@toobaz
Copy link
Member

toobaz commented Jul 8, 2019

While I certainly won't exclude there might be inconsistency with aligning indexes/inserting missing columns, my simple explanation of why

df2.loc[:, 'foo2']  = df2.Data.fillna('').astype(str).str.extract(r'.*(\w{3}).*')

won't work is simply that you're trying to fit a 2D object (DataFrame) into a 1D slot (df2.loc[:, 'foo2']). And indeed

df2.loc[:, 'foo2']  = df2.Data.fillna('').astype(str).str.extract(r'.*(\w{3}).*')[0]

works.

Sure, the slot and the value happen to have the same number of cells in your example, but I would never expect this broadcasting to lower dimensionality to be supported. And it is the df2['foo2'] behavior that I find (literally) deprecable. But maybe I can understand where it comes from: df2['foo2'] might in some cases be a 2D slot (when you have a MultiIndex on the columns).

@toobaz
Copy link
Member

toobaz commented Nov 14, 2020

I think this can closed. While broadcasting a 1D structure to a 2D slot is something we can support, squeezing a nx1 structure into a 1D slot is not worth the effort, I think.

@toobaz toobaz closed this as completed Nov 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

3 participants