Skip to content

Update goals of project? #244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MarcoGorelli opened this issue Aug 29, 2023 · 0 comments
Closed

Update goals of project? #244

MarcoGorelli opened this issue Aug 29, 2023 · 0 comments

Comments

@MarcoGorelli
Copy link
Contributor

MarcoGorelli commented Aug 29, 2023

In light of recent discussions, it looks like we might be on different pages regarding what we'd like to achieve

I'd like to share what I'd like to consider as goals, so I can more easily refer to them in later discussions

Zero-cost abstraction

If I write something using the Standard, it should not have been possible to write it more efficiently using the underlying library directly

Minimal

This was already stated by Areg here, but in practice I don't think it's being followed.
In particular: if some feature has been explicitly rejected by a participating library, then the onus is on the Consortium to articulate the need for such a feature, rather than on the library to defend its decision to not have it.

Some subset should be independent of execution details

The current goal seems to be that everything should be independent of execution details. To be honest, I think we need to choose between:

  • the Standard being useful
  • everything being execution-detail-independent

Not really sure we can have both. My general suggestion is that some core part of the Standard be marked as "execution-independent", but that we consider also having some more flexible methods on top of that (like GroupBy.__iter__).

Self-documenting

The API itself should make it clear what's allowed and what's not. If

mask = df1.get_column_by_name('a') + df2.get_column_by_name('a')
df.get_rows_by_mask(mask)

isn't supported by some implementation (e.g. dataframe as sql frontend), then this suggests that the API should be changed

Familiar

If we don't want to copy pandas (currently the most used dataframe library), then we should look at what the rest of the ecosystem is doing before inventing something completely different

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant