-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API to_sql method doesn't provide the option to specify unique indexes #7984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1 |
So, I believe pandas already provides Or perhaps there should be a more powerful mechanism (with a sensical set of defaults) to specify in a unified manner which indices to create, whether they should be unique, and if so whether they should be PRIMARY KEYs. Perhaps an argument that can specify the set of constraints/indices/primary keys/foreign keys to pass onto the SQLAlchemy @jorisvandenbossche Thoughts? |
+1 this feature will be great |
+1 |
Please add support for primary key and indexes. |
This issue just came up for a project. I thought a logical default functionality would be translate the DataFrame's index to primary key of the DB table. So to_sql would always create tables with a primary_key defined by the Currently the |
I don't think we can do this automatically since |
So my original thoughts on this issue was that SQL Alchemy has something like this built in - where you can basically say I want this arbitrary column going from 0 - N in sequence as my primary key for the table.
Pandas achieves something pretty similar by just resetting the index - the result is a new index 0-N:
So my thinking is because we'd be creating a new "index"-like object for the primary key, we could have a kwarg to turn this functionality on (resetting the index as a default seems un-intuitive to me) so that you could tell pandas, I want just a vanilla 0 - N sequence as a primary key. So basically what @jefferythewind suggested but with the added requirement that before using the index for the primary key you reset the index or construct a new sequential index internally. So maybe something like: primary_key='string' as a kwarg on the to_sql function where string can be:
As I write this I realize |
I like the Add another boolean key_word argument ( ie These 2 arguments could not be used together. |
This is very fundamental issues, since a lot of data needs primary key.... |
What is the status of this "Primary key" creation? |
This could very well be something that just isn't clear to me from reading the docs, but I would like to have my index be unique when I dump the dataframe to SQLite table so that when I append more data I don't get duplicate rows. Here's what my resulting scheme looks like after using the to_sql function
sqlite> .schema
CREATE TABLE "Sniffs" (
"ID" TEXT,
"Day" TEXT,
"ModelName" TEXT,
"PassRate" FLOAT,
"RtxName" TEXT,
"Shadow" TEXT,
"Time" TEXT,
"Unit" TEXT
);
CREATE INDEX "ix_Sniffs_ID" ON "Sniffs" ("ID");
Looking at these docs:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql
My resulting call looks something like this:
sniff_df.to_sql('Sniffs', sqleng, if_exists='append', index_label='ID')
Am I reading the docs wrong or anything/is there currently a method to get unique indexes in the output sql to avoid overlapping data? Could we get a kwarg like: unique_index=True ?
The text was updated successfully, but these errors were encountered: