You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Another possibility that I would like to propose, is to use the DuckDB engine to directly run queries on a data frame. Since DuckDB is able to consume and produce those, a user could get a data frame from running SQL directly on that data frame.
This should be possible since it would run Duckdb's default connection under the hood. With the advantages of consuming/producing df, directly running SQL, and parallel query execution.
If this is something that the core-dev team/ pandas community would be interested in. I'm happy to give a go to a PR :-)
I think before integrating DuckDB into pandas, it would be wise to address #41728 and #36893 regarding pluggable SQL engines.
It would help scale the maintainability of different db engines in pandas without leaking engine specific details in the SQL code. Would you be interested in investigating a SQL plugin interface?
Is your feature request related to a problem?
DuckDB supports the consumption and production of data frames. In this issue, I discuss how we could leverage these on two different ends.
Loading data frames as DuckDB Tables:
The
to_sql()
function can be extended to leverage the consumption to efficiently load data frames as tables in DuckDB (See Adding DuckDB as a SQL Database for the to_sql() function #45675).Run SQL on data frames:
Another possibility that I would like to propose, is to use the DuckDB engine to directly run queries on a data frame. Since DuckDB is able to consume and produce those, a user could get a data frame from running SQL directly on that data frame.
e.g.,
This should be possible since it would run Duckdb's default connection under the hood. With the advantages of consuming/producing df, directly running SQL, and parallel query execution.
If this is something that the core-dev team/ pandas community would be interested in. I'm happy to give a go to a PR :-)
Additional context
DuckDB's blogpost of SQL on Pandas, including benchmarks. https://duckdb.org/2021/05/14/sql-on-pandas.html
The text was updated successfully, but these errors were encountered: