Skip to content

DOC: Reference level name as Term of HDFStore.select query (#28791) #28793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 16, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3811,6 +3811,8 @@ storing/selecting from homogeneous index ``DataFrames``.
# the levels are automatically included as data columns
store.select('df_mi', 'foo=bar')

.. note::
The ``index`` keyword is reserved and cannot be use as a level name.

.. _io.hdf5-query:

Expand All @@ -3829,6 +3831,7 @@ A query is specified using the ``Term`` class under the hood, as a boolean expre

* ``index`` and ``columns`` are supported indexers of ``DataFrames``.
* if ``data_columns`` are specified, these can be used as additional indexers.
* level name in a MultiIndex, with default name ``level_0``, ``level_1``, … if not provided.

Valid comparison operators are:

Expand Down Expand Up @@ -3947,7 +3950,7 @@ space. These are in terms of the total number of rows in a table.

.. _io.hdf5-timedelta:

Using timedelta64[ns]
Query timedelta64[ns]
+++++++++++++++++++++

You can store and query using the ``timedelta64[ns]`` type. Terms can be
Expand All @@ -3966,6 +3969,35 @@ specified in the format: ``<float>(<unit>)``, where float may be signed (and fra
store.append('dftd', dftd, data_columns=True)
store.select('dftd', "C<'-3.5D'")

Query MultiIndex
++++++++++++++++

Selecting from a ``MultiIndex`` can be achieved by using the name of the level.

.. ipython:: python

df_mi.index.names
store.select('df_mi', "foo=baz and bar=two")

If the ``MultiIndex`` levels names are ``None``, the levels are automatically made available via
the ``level_n`` keyword with ``n`` the level of the ``MultiIndex`` you want to select from.

.. ipython:: python

index = pd.MultiIndex(
levels=[["foo", "bar", "baz", "qux"], ["one", "two", "three"]],
codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
)
df_mi_2 = pd.DataFrame(np.random.randn(10, 3),
index=index, columns=["A", "B", "C"])
df_mi_2

store.append("df_mi_2", df_mi_2)

# the levels are automatically included as data columns with keyword level_n
store.select("df_mi_2", "level_0=foo and level_1=two")


Indexing
++++++++

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ Documentation Improvements
^^^^^^^^^^^^^^^^^^^^^^^^^^

- Added new section on :ref:`scale` (:issue:`28315`).
- Added sub-section Query MultiIndex in IO tools user guide (:issue:`28791`)

.. _whatsnew_1000.deprecations:

Expand Down