-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
HDF5.remove with where removing all rows #17567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So you would have to show a reproducible example; the following works.
|
I did a slight change to your test code then it reproduces now. In [1]:
df = pd.DataFrame({'A': np.arange(1000), 'B': pd.date_range('20130101',periods=1000)}).set_index('B')
select = list(df.reset_index().B.sample(100))
store = pd.HDFStore('test.h5')
store.put('df', df, format='table')
assert len(store.select('df')) == 1000
print('select len %d' % len(store.select('df', where='index in %s' % select)))
store.remove('df', where='index in %s' % select)
print('after remove %d' % len(store.select('df')))
assert len(store.select('df')) == 900
select len 100
after remove 0
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-136-cf9062a4d575> in <module>()
7 store.remove('df', where='index in %s' % select)
8 print('after remove %d' % len(store.select('df')))
----> 9 assert len(store.select('df')) == 900
AssertionError:
In [2]:
store.select('df')
Out[2]:
A
B |
ok, would appreciate if you can debug. there are not very many test cases for this, so certainly can add this one and fix. |
Tangentially related to #15937. The Swapping the commented lines here will return an empty dataframe. In [2]:
df = pd.DataFrame({'A': np.arange(1000),
'B': pd.date_range('20130101', periods=1000)}
).set_index('B')
select = list(df.reset_index().B.sample(32))
store = pd.HDFStore('temp.h5')
store.put('df', df, format='table')
where = df.index.isin(select)
store.select('df', where=where)
#store.remove('df', where='index in %s' % select)
store.remove('df', where=where)
Out[2]:
32
In [3]:
len(store.select('df'))
Out[3]:
968 |
when I use 'in' condition in HDFStore.remove, it removes all data in the table instead of selected rows
Problem description
The “where" condition in HDFStore.select returns selected 3353 rows. I supposed using same condition with "where" in HDFStore.remove should do the same which is remove this 3353 rows. But it remove 6710 rows which is all lines in this table.
I tested with only one single row in the "index in [list]" where condition. It removes one row which is I expected.
I tested with equivalent condition ops such as "=" returns the same
Output of
pd.show_versions()
'0.20.3'
The text was updated successfully, but these errors were encountered: