Skip to content

API: Indexing #89

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Oct 18, 2022 · 1 comment
Closed

API: Indexing #89

jbrockmendel opened this issue Oct 18, 2022 · 1 comment

Comments

@jbrockmendel
Copy link
Contributor

Summarizing the discussion from the call 2022-10-13, the agreed-upon API looks like:

def get_column_by_name(self, key: str) -> Column: ...
def get_columns_by_name(self, keys: Sequence[str]) -> DataFrame: ...
def get_rows(self, indices: Punting[int]) -> DataFrame: ...
def slice_rows(self, start: int | None, stop: int | None, step: int | None) -> DataFrame: ...
def get_rows_by_mask(self, mask: Punting[bool]) -> DataFrame: ...

Punting[x] indicates that we discussed several options and decided to put off making a decision. For get_rows the options were Sequence[int] and Column[int]. For get_rows_by_mask the options were Arraylike[bool] and Column[bool] (here Arraylike probably denotes an object adhering to the array API standard, but some participants wanted to be stricter than that).

The following indexing methods from pandas.DataFrame are not included:

__getitem__
take
loc
iloc
at
iat
xs
@rgommers
Copy link
Member

The above-listed APIs for indexing were implemented:

def get_column_by_name(self, name: str, /) -> Column:
"""
Select a column by name.
Parameters
----------
name : str
Returns
-------
Column
Raises
------
KeyError
If the key is not present.
"""
...
def get_columns_by_name(self, names: Sequence[str], /) -> DataFrame:
"""
Select multiple columns by name.
Parameters
----------
names : Sequence[str]
Returns
-------
DataFrame
Raises
------
KeyError
If the any requested key is not present.
"""
...
def get_rows(self, indices: "Column[int]") -> DataFrame:
"""
Select a subset of rows, similar to `ndarray.take`.
Parameters
----------
indices : Column[int]
Positions of rows to select.
Returns
-------
DataFrame
"""
...
def slice_rows(
self, start: int | None, stop: int | None, step: int | None
) -> DataFrame:
"""
Select a subset of rows corresponding to a slice.
Parameters
----------
start : int or None
stop : int or None
step : int or None
Returns
-------
DataFrame
"""
...
def get_rows_by_mask(self, mask: "Column[bool]") -> DataFrame:
"""
Select a subset of rows corresponding to a mask.
Parameters
----------
mask : Column[bool]
Returns
-------
DataFrame
Notes
-----
Some participants preferred a weaker type Arraylike[bool] for mask,
where 'Arraylike' denotes an object adhering to the Array API standard.
"""

So I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants