You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In light of recent discussions, it looks like we might be on different pages regarding what we'd like to achieve
I'd like to share what I'd like to consider as goals, so I can more easily refer to them in later discussions
Zero-cost abstraction
If I write something using the Standard, it should not have been possible to write it more efficiently using the underlying library directly
Minimal
This was already stated by Areg here, but in practice I don't think it's being followed.
In particular: if some feature has been explicitly rejected by a participating library, then the onus is on the Consortium to articulate the need for such a feature, rather than on the library to defend its decision to not have it.
Some subset should be independent of execution details
The current goal seems to be that everything should be independent of execution details. To be honest, I think we need to choose between:
the Standard being useful
everything being execution-detail-independent
Not really sure we can have both. My general suggestion is that some core part of the Standard be marked as "execution-independent", but that we consider also having some more flexible methods on top of that (like GroupBy.__iter__).
Self-documenting
The API itself should make it clear what's allowed and what's not. If
isn't supported by some implementation (e.g. dataframe as sql frontend), then this suggests that the API should be changed
Familiar
If we don't want to copy pandas (currently the most used dataframe library), then we should look at what the rest of the ecosystem is doing before inventing something completely different
The text was updated successfully, but these errors were encountered:
In light of recent discussions, it looks like we might be on different pages regarding what we'd like to achieve
I'd like to share what I'd like to consider as goals, so I can more easily refer to them in later discussions
Zero-cost abstraction
If I write something using the Standard, it should not have been possible to write it more efficiently using the underlying library directly
Minimal
This was already stated by Areg here, but in practice I don't think it's being followed.
In particular: if some feature has been explicitly rejected by a participating library, then the onus is on the Consortium to articulate the need for such a feature, rather than on the library to defend its decision to not have it.
Some subset should be independent of execution details
The current goal seems to be that everything should be independent of execution details. To be honest, I think we need to choose between:
Not really sure we can have both. My general suggestion is that some core part of the Standard be marked as "execution-independent", but that we consider also having some more flexible methods on top of that (like
GroupBy.__iter__
).Self-documenting
The API itself should make it clear what's allowed and what's not. If
isn't supported by some implementation (e.g. dataframe as sql frontend), then this suggests that the API should be changed
Familiar
If we don't want to copy pandas (currently the most used dataframe library), then we should look at what the rest of the ecosystem is doing before inventing something completely different
The text was updated successfully, but these errors were encountered: