Update goals of project? #244

MarcoGorelli · 2023-08-29T10:33:52Z

In light of recent discussions, it looks like we might be on different pages regarding what we'd like to achieve

I'd like to share what I'd like to consider as goals, so I can more easily refer to them in later discussions

Zero-cost abstraction

If I write something using the Standard, it should not have been possible to write it more efficiently using the underlying library directly

Minimal

This was already stated by Areg here, but in practice I don't think it's being followed.
In particular: if some feature has been explicitly rejected by a participating library, then the onus is on the Consortium to articulate the need for such a feature, rather than on the library to defend its decision to not have it.

Some subset should be independent of execution details

The current goal seems to be that everything should be independent of execution details. To be honest, I think we need to choose between:

the Standard being useful
everything being execution-detail-independent

Not really sure we can have both. My general suggestion is that some core part of the Standard be marked as "execution-independent", but that we consider also having some more flexible methods on top of that (like GroupBy.__iter__).

Self-documenting

The API itself should make it clear what's allowed and what's not. If

mask = df1.get_column_by_name('a') + df2.get_column_by_name('a')
df.get_rows_by_mask(mask)

isn't supported by some implementation (e.g. dataframe as sql frontend), then this suggests that the API should be changed

Familiar

If we don't want to copy pandas (currently the most used dataframe library), then we should look at what the rest of the ecosystem is doing before inventing something completely different

The text was updated successfully, but these errors were encountered:

MarcoGorelli added the API design label Aug 29, 2023

MarcoGorelli mentioned this issue Aug 29, 2023

Add DataFrame.insert_columns #231

Closed

MarcoGorelli mentioned this issue Sep 6, 2023

Separate eager and lazy APIs #249

Closed

rmorshea mentioned this issue Sep 21, 2023

Support for polars unionai-oss/pandera#1064

Closed

MarcoGorelli closed this as completed Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update goals of project? #244

Update goals of project? #244

MarcoGorelli commented Aug 29, 2023 •

edited

Loading

Update goals of project? #244

Update goals of project? #244

Comments

MarcoGorelli commented Aug 29, 2023 • edited Loading

Zero-cost abstraction

Minimal

Some subset should be independent of execution details

Self-documenting

Familiar

MarcoGorelli commented Aug 29, 2023 •

edited

Loading