-
Notifications
You must be signed in to change notification settings - Fork 21
How the API is expected to be used #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, how I see it, is that in vaex, I create a new module that exposes this standard API, but calls into Vaex, same for pandas, same for Modin. I don't expect the current Vaex API to change to this API (although maybe get inspired).
Yes, that's what I think the purpose is, new libraries, e.g. this, or a GraphQL API will use the standard API, not pandas/vaex/modin etc.
I think it's fine that pandas keeps its own API, and adds a new class to expose this new standard API, not in the same class (although that might be possible). I hope what I say makes sense. |
Trying to summarize what was discussed in the call. I'll open a PR in the RFC when there is agreement.
Does this represent well what was discussed in the call? Any feedback welcome. |
we now have a standardised way of opting into the standard ( I suspect that nobody will change their main API, but will just expose a thing wrapped around their main API to comply with the standard closing then, as I think this is now addressed |
In today's meeting it was discussed what's the goal of the API, and which are its target users.
@maartenbreddels and @devin-petersohn, if I understood correctly, see the API we're defining here, as something they'd like to implement internally in Vaex and Modin, but not making it their public API. Not sure what's pandas point of view on that.
I think that's perfectly fine, and it makes sense. But I have the question of whether would make sense if those public API's would be independent wrappers, in the same way Seaborn wraps Matplotlib, or HoloViews wraps Bokeh. Let me expand on what I mean here.
For the discussions we had, I think people mentioned that they were interested in defining a more "pure" and less "magic" API, than the existing one. Not sure if the previous sentence makes a lot of sense, but I guess some of the principles for the API could be:
Personally, I think this API should be great for software developers. Like developers of libraries like us, who want to build on top of it. Or developers of downstream software. And I'd say, also to data engineers, and people who want to write production code with dataframes.
Then, I understand that some users (e.g. data analysts) prefer more "magic" API's, that automatically fix problems they don't want to care about. As an example, let's think of the dataframe constructor.
As a data analyst, or non-software people, I think the next code working is very reasonable/convenient:
But as software engineer, I may want to have a more explicit and less magic syntax, for example:
Correct me if I'm wrong, but I think there is mostly agreement that what we want to focus in the consortium API in the latter style. If Vaex, Modin, pandas... provide this API, then there is easy compatibility in the ecosystem. For example, Scikit-learn or Matplotlib can get a "dataframe" as a parameter, and operate with it, since they know it will follow the standard API.
But then, implementations like Modin, Vaex, or pandas, may want to keep their existing API's. Or provide a different user API, more targeted to specific users (e.g. data analysts, who want the library making guesses, that make their lives easier).
Then my question is, does it make sense that this alternative API live in the implementations? For example, let's consider I see pandas as this API on top of numpy, Vaex on top of memory maps, and Modin on top of Ray (excuse the simplification). Then, if Modin wants to implement an SQLite-like API. Could make sense that this is an independent project, of an SQLite-like API that wraps the standard API? Instead of a Modin API? I guess that could make sense.
Then, I guess there is the case, of an implementation, let's say pandas, which is planning to expose the API to users, but it's going to add some extra magic (let's say that the standard for filter is
df.filter(condition)
but pandas wants to keeps supportingdf[condition]
for backward compatibility. Or Vaex having some specific syntax for expressions in top of the standard API.I see there is a whole range between these options:
Would be great to know other people thoughts. I think most people have an idea on how this API is expected to be used, but not sure if we're all in the same page.
The text was updated successfully, but these errors were encountered: