-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
setting values in a dataframe with duplicated keys #34034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@c-foschi Can you provide an example that can be copy / pasted which shows the problem? It seems that this functionality does work: [ins] In [1]: df = pd.DataFrame([1, 2, 3], index=[1, 1, 2])
[ins] In [2]: df
Out[2]:
0
1 1
1 2
2 3
[ins] In [3]: df.loc[1, 0] = [9, 9]
[ins] In [4]: df
Out[4]:
0
1 9
1 9
2 3 |
@dsaxton @MarcoGorelli I tried to reproduce the error with toy examples but I failed many times. Still, every time I run my code, the same error appears. Here it is, I hope it makes some sense to you: In:
Out:
In:
Out:
In:
Out:
|
What version of pandas are you using? |
from |
Yes, please upgrade to the latest version (1.0.3) |
Done. Same error occurs:
|
OK, once you read in |
ok I finally managed to create a toy examples with random values: I have a file like this:
And my code is:
This raises the error to me. I should probably add that without the string column everything worked. |
Great, thanks @c-foschi ! This reproduces import pandas as pd
import numpy as np
from io import StringIO
X = pd.read_csv(
StringIO(
"""code,longitude,date
6342156,0.966747,a
6342156,0.756199,b
6342156,0.054222,c
6342156,0.743996,d
6342156,0.486753,a
6342156,0.464093,s
6342156,0.430592,d
2261019,0.827252,f
2261019,0.864456,f
2261019,0.866847,d"""
)
)
X.set_index("code", inplace=True)
X.loc[2261019, "longitude"] = np.arange(3) |
Seems for me to work on master: import pandas as pd
import numpy as np
from io import StringIO
X = pd.read_csv(
StringIO(
"""code,longitude,date
6342156,0.966747,a
6342156,0.756199,b
6342156,0.054222,c
6342156,0.743996,d
6342156,0.486753,a
6342156,0.464093,s
6342156,0.430592,d
2261019,0.827252,f
2261019,0.864456,f
2261019,0.866847,d"""
)
)
X.set_index("code", inplace=True)
X.loc[2261019, "longitude"] = np.arange(3)
print(X)
# result
longitude date
code
6342156 0.966747 a
6342156 0.756199 b
6342156 0.054222 c
6342156 0.743996 d
6342156 0.486753 a
6342156 0.464093 s
6342156 0.430592 d
2261019 0.000000 f
2261019 1.000000 f
2261019 2.000000 d
# check version
pd.__version__
'1.1.0.dev0+1502.g3ed7dff48 Do we need a test? |
Thanks @CloseChoice for checking, can confirm it works on master.
I would think so, yes
Thanks @c-foschi - yes, if we run this on 1.0.3 but add in X = X.drop('date', axis=1) then it works |
take |
fixed in #31897 3da053c is the first new commit
|
I have a dataframe with duplicated keys, and I need to assign values to some column of that dataframe, for some different keys. Dataframe indexes support duplicated keys, so I assume that this kind of work should be easy, if not I don't see why dataframes should be allowed to have duplicated keys. Anyway, setting the values like this:
gives me the following error:
even if the number of times
key
appears in the index ofdf
is equal to the length ofvector
. I think this should be fixed.Thank you,
c. foschi
The text was updated successfully, but these errors were encountered: