Skip to content

makeMissingDataFrame is broken #6602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jseabold opened this issue Mar 11, 2014 · 3 comments
Closed

makeMissingDataFrame is broken #6602

jseabold opened this issue Mar 11, 2014 · 3 comments

Comments

@jseabold
Copy link
Contributor

Did something change in iloc...? I could have sworn this worked and we talked about it during the PR. I was told not to use values but to use iloc instead IIRC.

[~/]
[18]: pd.version.version
[18]: '0.13.1-254-g150f323'


from pandas.util.testing import _create_missing_idx
df = pd.util.testing.makeDataFrame()
density = .9
random_state = None

i, j = _create_missing_idx(*df.shape, density=density,
                           random_state=random_state)
df.iloc[i, j] = np.nan

This must be a copy because there's no assignment.

This still works

df.values[i, j] = np.nan
@jreback
Copy link
Contributor

jreback commented Mar 11, 2014

i and j need to be integers not float indexers (which _create_missing_idx) is returning

In [12]: df.iloc[np.array(i).astype(int),np.array(j).astype(int)] = np.nan

In [13]: df
Out[13]: 
                   A         B         C         D
GKeS6xwGkH  0.076111 -0.192877  1.098092 -0.074225
b5WFjcgcvp -0.502561 -0.498398 -1.230549 -1.543185
vK5bWohkVI  0.780080  0.075142  0.980444  0.242915
PAtwXIifAf       NaN       NaN       NaN       NaN
Tls586DpAI  0.012728  1.426050  0.102844  0.871050
ELEPklQ7yw       NaN       NaN       NaN       NaN
EEGXwqnRlO       NaN       NaN       NaN       NaN
sSTuc11xNN -0.162915 -0.909402 -0.619644 -1.207251
evjm7MZ0qk -0.680087  0.887411  1.195531  1.237603
a5Z4It2V5H -2.073016 -0.788900 -0.083732  1.165306
FGkBqukGOb  0.165393  1.794161  1.753696 -1.385686
CRbt0EJpN2 -0.524696 -1.469448  0.134130  0.586626
iikzGFllAx  0.778601  1.434450 -1.176926 -1.753035
TZ9t34OOdY       NaN       NaN       NaN       NaN
qEOLe8ZdW2  1.117555  0.289419 -1.507414  0.632285
dlxec9h9KB       NaN       NaN       NaN       NaN
w1u7GXTZVs       NaN       NaN       NaN       NaN
MXhbZLpw2b  0.199026  0.555814  0.691348 -1.141643
NU9NOqHljV       NaN       NaN       NaN       NaN
783raWid6e -0.445552  0.109983 -0.042305  0.340432
1bcwNH4txV       NaN       NaN       NaN       NaN
04J1LdP5xW       NaN       NaN       NaN       NaN
dFwyxzhRoC -2.083956  0.105314 -1.350285  0.528947
pZPufraB0b  0.654603 -1.824764 -0.104437 -0.589233
9IutFKQyT8  0.307688  1.495336 -0.250889 -1.067093
SCWHxJuagm -0.426220  1.160561  1.126022  0.707325
1X5q9CmOvN       NaN       NaN       NaN       NaN
3a9oEAXL4i  0.623358  0.677829 -0.927471 -0.119501
RW9vXQg0Dk  0.817401  1.911899 -0.475811  1.122407
pFyaA4zfqE       NaN       NaN       NaN       NaN

[30 rows x 4 columns]

@jreback
Copy link
Contributor

jreback commented Mar 11, 2014

other problem is that these are treated not as a grid (which is what I think you want).

so just do df.values[i,j] = np.nan

we don't handle this type of mesh indexing

In [24]: zip(*[[ int(i_) for i_ in i ],[ int(j_) for j_ in j ]])
Out[24]: 
[(21, 0),
 (3, 1),
 (15, 1),
 (16, 1),
 (20, 1),
 (13, 2),
 (26, 2),
 (5, 3),
 (6, 3),
 (15, 3),
 (18, 3),
 (29, 3)]

e.g. basically a list of cordinates to set

@jreback
Copy link
Contributor

jreback commented Mar 11, 2014

This happens to work because its a single dtype


In [35]: df = pd.util.testing.makeDataFrame()

In [36]: df.unstack().iloc[[ int(i_*len(df.columns) + j_) for i_, j_ in zip(i, j) ]] = np.nan

In [37]: df
Out[37]: 
                   A         B         C         D
fhXEt3cb41 -2.542137  0.682847 -0.307920  0.267502
mT2YVWqTbS  0.920000 -0.572048       NaN  0.643715
4L9B4yUozs  0.470971 -1.750536 -0.264348  1.350301
zVXa3OswAo -0.789714 -0.509468       NaN  0.159287
IKKaXLIEwn -1.000156  1.091673  0.717248  0.433991
4K8wcFV2vx -0.714839  0.731494       NaN -1.448743
Y9OImXzIOr -1.114815  0.992466 -0.566328 -0.810867
gq5Rq6u0B0 -1.214338 -1.675467  1.714498 -1.336355
pn2G8ud7Su -1.369294 -0.115031 -0.021097  0.996644
xxARn8LbH6 -0.105542 -1.192954  0.026511 -0.108402
7dqRmDrRc7 -0.817408 -0.224906 -0.297257 -1.321547
FnOjGpwi8u  0.202575  0.060629 -1.782457  1.203774
kr9yi451RI -1.221792  0.805534 -1.607661  0.543769
zaaPJGPIPD       NaN  1.481853  0.552138  1.193685
41WfdGgzIu  0.632413  0.728245  2.119639 -0.777506
KyIx2TK1VI -1.122135  0.329781       NaN -0.205368
e3a4PS5v7I -0.871674  1.059267  2.825929       NaN
5spuEC8EyY -0.013621  1.069023  1.057529 -0.063874
9VD9a53o74 -0.619559  1.440201  0.254996  0.355991
MgYGRNRhSW -0.719056  0.161600 -2.143606 -1.898759
mfdOE8KlLt -0.309020  0.049107  0.289401  2.033655
RHO9UFLqp8  1.307798  0.193412       NaN  1.407008
qrYcSwhggM -1.538358  1.504896 -0.071199 -0.348272
vuwXj43xJZ       NaN  0.740426  0.156312  1.405110
HmCLyu0gay -1.371543       NaN       NaN -1.173484
bwWhkISrz0  1.757823 -0.751076  1.480796 -0.142391
epsoAg0kQ8 -0.020010 -1.651050 -0.951079  1.455640
UJ0QAhUHRh       NaN  1.401063  0.181608 -0.375468
iYP7NFcbGQ  0.119397 -1.528577  2.022177 -0.370158
Ux3wTsUCl5  1.051203 -0.595926  1.248444       NaN

[30 rows x 4 columns]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants