Skip to content

ENH: Have DataFrame.to_sql() return the created Table object #42645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
multimeric opened this issue Jul 21, 2021 · 4 comments
Closed

ENH: Have DataFrame.to_sql() return the created Table object #42645

multimeric opened this issue Jul 21, 2021 · 4 comments
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query Needs Discussion Requires discussion from core team before further action

Comments

@multimeric
Copy link
Contributor

Is your feature request related to a problem?

I would like to be able to query a table immediately after creating it using df.to_sql(), in the SQLAlchemy interface.

Describe the solution you'd like

Currently to_sql() has no return value. I suggest simply returning the new table object, which is of type sqlalchemy.Table.

API breaking implications

None, as to_sql() currently has no return value.

Describe alternatives you've considered

  • You can query the table immediately using raw SQL as shown in the documentation for this function, but this is relatively low-level
  • You could obtain a Table object from the automap extension. However this is redundant as Pandas at some point generated the table metadata so this would involve throwing it away and then trying to guess what it used to be from the state of the database.

Additional context

This conflicts with #23998, which suggests that we return the ResultProxy instead. Perhaps we return both, as a tuple? Both of these objects would be useful.

@multimeric multimeric added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 21, 2021
@mzeitlin11
Copy link
Member

Thanks for suggesting this @multimeric! IMO the ResultProxy would be more useful (and intuitive since it matches execute) since it gives information you couldn't get easily otherwise (like how many rows were added in an append case).

What would be the use case for returning the new table object? While I agree it is more convenient than your alternatives, I'm not sure it would be worth complicating the return value with a tuple. Returning the table object would also feel redundant in the case of an append.

@mzeitlin11 mzeitlin11 added IO SQL to_sql, read_sql, read_sql_query Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 31, 2021
@multimeric
Copy link
Contributor Author

Yeah honestly I see the benefit in returning the result proxy too. The reason I want the table is because SQLAlchemy needs the table metadata as a Python object in order to be able to query the table. It's possible you can query a table without metadata (?) but then it won't do any column or data type validation. You could also load the metadata from the database (using Table(autoload_with=engine)) but like I said this is wasted processing because we had the metadata and then we threw it away.

So for example my use case is loading a CSV into a table, and then joining it to several existing database tables. If to_sql returns a table I could just do:

temp = df.to_sql(...)
qry = insert(Bar).from_select("a", "b", "c", select(
    temp.a, temp.b, Foo.c
).join_from(
    temp,
    Foo
))

@VinayN03
Copy link

VinayN03 commented Dec 9, 2021

If we get last inserted id and row count from df.to_sql it would be helpful.

@mroeschke
Copy link
Member

Thanks for the request, but I think since 1.5 this returns the row count now so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants