Skip to content

Read hdf returns unexpected values for categorical #39420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
bb2d803
BUG: fix case of a category value which isn't exists (#39189)
nofarmish Jan 22, 2021
f9be625
BUG: add UT to conver_value for this use case (#39189)
nofarmish Jan 23, 2021
aa90441
BUG: change style with pre-commit (#39189)
nofarmish Jan 23, 2021
e8ca3fc
BUG: add a whatsnew record (#39189)
nofarmish Jan 23, 2021
b5ded49
Trigger Build
nofarmish Jan 23, 2021
0cb8ad7
BUG: check for tests (#39189)
nofarmish Jan 23, 2021
8284e0b
BUG: remove spaces (#39189)
nofarmish Jan 23, 2021
9773aaa
BUG: remove whatsnew (#39189)
nofarmish Jan 23, 2021
4281ef0
BUG: remove tests(#39189)
nofarmish Jan 23, 2021
7178757
BUG: add whats new (#39189)
nofarmish Jan 23, 2021
ca9420e
BUG: check tests (#39189)
nofarmish Jan 23, 2021
f61b7c5
BUG: update tests (#39189)
nofarmish Jan 26, 2021
8c3b3b6
BUG: update after precommit (#39189)
nofarmish Jan 26, 2021
4e3bce2
BUG: update after precommit (#39189)
nofarmish Jan 26, 2021
558a585
BUG: fix case of a category value which isn't exists (#39189)
nofarmish Jan 22, 2021
adfe600
BUG: add UT to conver_value for this use case (#39189)
nofarmish Jan 23, 2021
63815c7
BUG: change style with pre-commit (#39189)
nofarmish Jan 23, 2021
74c687a
BUG: add a whatsnew record (#39189)
nofarmish Jan 23, 2021
3023fc0
BUG: check for tests (#39189)
nofarmish Jan 23, 2021
f917ba9
BUG: remove spaces (#39189)
nofarmish Jan 23, 2021
0abe192
BUG: remove whatsnew (#39189)
nofarmish Jan 23, 2021
1b959ee
BUG: remove tests(#39189)
nofarmish Jan 23, 2021
4de349f
BUG: add whats new (#39189)
nofarmish Jan 23, 2021
d7a3ef6
BUG: check tests (#39189)
nofarmish Jan 23, 2021
eb8cd5a
BUG: update tests (#39189)
nofarmish Jan 26, 2021
235d05e
BUG: update after precommit (#39189)
nofarmish Jan 26, 2021
73541ff
BUG: update after precommit (#39189)
nofarmish Jan 26, 2021
877ae9e
Merge remote-tracking branch 'origin/read-hdf-returns-unexpected-valu…
nofarmish Jan 26, 2021
017e47a
BUG: change test location (#39189)
nofarmish Jan 30, 2021
d67ff95
BUG: remove import (#39189)
nofarmish Jan 30, 2021
37eef60
BUG: remove import (#39189)
nofarmish Jan 30, 2021
b3565af
BUG: remove list() before sorted() (#39189)
nofarmish Jan 30, 2021
5af7c04
BUG: remove list() in sorted() (#39189)
nofarmish Jan 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,7 @@ I/O
- Bug in :func:`read_csv` not switching ``true_values`` and ``false_values`` for nullable ``boolean`` dtype (:issue:`34655`)
- Bug in :func:`read_json` when ``orient="split"`` does not maintain numeric string index (:issue:`28556`)
- :meth:`read_sql` returned an empty generator if ``chunksize`` was no-zero and the query returned no results. Now returns a generator with a single empty dataframe (:issue:`34411`)
- Bug in :func:`read_hdf` returning unexpected records when filtering on categorical string columns using ``where`` parameter (:issue:`39189`)

Period
^^^^^^
Expand Down
8 changes: 3 additions & 5 deletions pandas/core/computation/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,12 +210,10 @@ def stringify(value):
return TermValue(int(v), v, kind)
elif meta == "category":
metadata = extract_array(self.metadata, extract_numpy=True)
result = metadata.searchsorted(v, side="left")

# result returns 0 if v is first element or if v is not in metadata
# check that metadata contains v
if not result and v not in metadata:
if v not in metadata:
result = -1
else:
result = metadata.searchsorted(v, side="left")
return TermValue(result, result, "integer")
elif kind == "integer":
v = int(float(v))
Expand Down
22 changes: 22 additions & 0 deletions pandas/tests/io/pytables/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,3 +184,25 @@ def test_categorical_nan_only_columns(setup_path):
df.to_hdf(path, "df", format="table", data_columns=True)
result = read_hdf(path, "df")
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
"where, df, expected",
[
('col=="q"', DataFrame({"col": ["a", "b", "s"]}), DataFrame({"col": []})),
('col=="a"', DataFrame({"col": ["a", "b", "s"]}), DataFrame({"col": ["a"]})),
],
)
def test_convert_value(setup_path, where: str, df: DataFrame, expected: DataFrame):
# GH39420
# Check that read_hdf with categorical columns can filter by where condition.
df.col = df.col.astype("category")
max_widths = {"col": 1}
categorical_values = sorted(df.col.unique())
expected.col = expected.col.astype("category")
expected.col.cat.set_categories(categorical_values, inplace=True)

with ensure_clean_path(setup_path) as path:
df.to_hdf(path, "df", format="table", min_itemsize=max_widths)
result = read_hdf(path, where=where)
tm.assert_frame_equal(result, expected)