API to_sql method doesn't provide the option to specify unique indexes #7984

manistal · 2014-08-11T16:58:51Z

This could very well be something that just isn't clear to me from reading the docs, but I would like to have my index be unique when I dump the dataframe to SQLite table so that when I append more data I don't get duplicate rows. Here's what my resulting scheme looks like after using the to_sql function

sqlite> .schema
CREATE TABLE "Sniffs" (
"ID" TEXT,
"Day" TEXT,
"ModelName" TEXT,
"PassRate" FLOAT,
"RtxName" TEXT,
"Shadow" TEXT,
"Time" TEXT,
"Unit" TEXT
);
CREATE INDEX "ix_Sniffs_ID" ON "Sniffs" ("ID");

Looking at these docs:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql

My resulting call looks something like this:
sniff_df.to_sql('Sniffs', sqleng, if_exists='append', index_label='ID')

Am I reading the docs wrong or anything/is there currently a method to get unique indexes in the output sql to avoid overlapping data? Could we get a kwarg like: unique_index=True ?

ja-b · 2014-12-19T19:40:16Z

+1

artemyk · 2014-12-19T19:55:24Z

So, I believe pandas already provides df.index.is_unique, which could be used to automatically set UNIQUE for indices. Perhaps a kwarg to suppress such behavior would be more helpful.

Or perhaps there should be a more powerful mechanism (with a sensical set of defaults) to specify in a unified manner which indices to create, whether they should be unique, and if so whether they should be PRIMARY KEYs. Perhaps an argument that can specify the set of constraints/indices/primary keys/foreign keys to pass onto the SQLAlchemy Table() constructor? (we could also have an argument with a dict that, for diff columns, specifies the kwargs to pass onto the SQLAlchemy Column() constructor; this could generalize what we do with dtypes now)

@jorisvandenbossche Thoughts?

femtotrader · 2015-12-20T12:35:31Z

+1 this feature will be great

xingren23 · 2016-02-18T01:18:34Z

+1

icyblade · 2016-08-23T06:46:11Z

Please add support for primary key and indexes.
I'm using Pandas for data processing and Tableau for visualization, but Tableau sucks when processing data without an index.

jefferythewind · 2016-11-08T11:46:10Z

This issue just came up for a project. I thought a logical default functionality would be translate the DataFrame's index to primary key of the DB table. So to_sql would always create tables with a primary_key defined by the index of the DataFrame. This way, the to_sql could just pass the DB error back to the client on attempting to insert duplicate primary_keys.

Currently the to_sql method always creates tables without primary keys which could be improved upon for sure.

TomAugspurger · 2016-11-08T12:45:04Z

I don't think we can do this automatically since pd.Index doesn't have to be unique. It would need to be a keyword.

manistal · 2016-11-08T14:36:13Z

So my original thoughts on this issue was that SQL Alchemy has something like this built in - where you can basically say I want this arbitrary column going from 0 - N in sequence as my primary key for the table.

project_id = db.Column(
    db.Integer, db.Sequence('project_id'), primary_key=True
)

Pandas achieves something pretty similar by just resetting the index - the result is a new index 0-N:

table = old_table.reset_index()

So my thinking is because we'd be creating a new "index"-like object for the primary key, we could have a kwarg to turn this functionality on (resetting the index as a default seems un-intuitive to me) so that you could tell pandas, I want just a vanilla 0 - N sequence as a primary key.

So basically what @jefferythewind suggested but with the added requirement that before using the index for the primary key you reset the index or construct a new sequential index internally.

So maybe something like: primary_key='string' as a kwarg on the to_sql function where string can be:

column_name - select a column to become the primary key, throws an error if is_unique is not true
sequential - generate a new sequential primary key
None - current functionality of no primary key

As I write this I realize sequential could match a column name in some crazy scenario so someone more clever than me might have to come up with the arg usage, but what do you guys think of the general functionality there?

jefferythewind · 2016-11-08T16:28:09Z

I like the primary_key="column_name" argument. But require that column_name is a column name, or error.

Add another boolean key_word argument ( ie auto_key=True ), which when set to True, assumes an auto-incrementing column named "index".

These 2 arguments could not be used together.

arita37 · 2016-11-19T21:31:29Z

This is very fundamental issues, since a lot of data needs primary key....
We need to retrieve the key from db and update the pandas table accordingly before update
or SL Alchemy

gbrault · 2020-08-18T15:18:44Z

What is the status of this "Primary key" creation?

jorisvandenbossche added SQL labels Aug 11, 2014

jorisvandenbossche mentioned this issue Aug 20, 2014

BUG: When creating table, db indexes should be created from DataFrame indexes #8083

Merged

jorisvandenbossche added this to the 0.15.1 milestone Oct 2, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

artemyk mentioned this issue May 15, 2015

Status of io.sql.get_schema as public function? #9960

Open

trbs mentioned this issue Apr 18, 2016

add support for specifying secondary indexes with to_sql #12904

Closed

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API to_sql method doesn't provide the option to specify unique indexes #7984

API to_sql method doesn't provide the option to specify unique indexes #7984

manistal commented Aug 11, 2014

ja-b commented Dec 19, 2014

artemyk commented Dec 19, 2014

femtotrader commented Dec 20, 2015

xingren23 commented Feb 18, 2016

icyblade commented Aug 23, 2016

jefferythewind commented Nov 8, 2016 •

edited

Loading

TomAugspurger commented Nov 8, 2016

manistal commented Nov 8, 2016

jefferythewind commented Nov 8, 2016 •

edited

Loading

arita37 commented Nov 19, 2016

gbrault commented Aug 18, 2020

API to_sql method doesn't provide the option to specify unique indexes #7984

API to_sql method doesn't provide the option to specify unique indexes #7984

Comments

manistal commented Aug 11, 2014

ja-b commented Dec 19, 2014

artemyk commented Dec 19, 2014

femtotrader commented Dec 20, 2015

xingren23 commented Feb 18, 2016

icyblade commented Aug 23, 2016

jefferythewind commented Nov 8, 2016 • edited Loading

TomAugspurger commented Nov 8, 2016

manistal commented Nov 8, 2016

jefferythewind commented Nov 8, 2016 • edited Loading

arita37 commented Nov 19, 2016

gbrault commented Aug 18, 2020

jefferythewind commented Nov 8, 2016 •

edited

Loading

jefferythewind commented Nov 8, 2016 •

edited

Loading