Skip to content

Bug in generate CREATE TABLE SQL for DataFrame with MultiIndex with different types #8021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
artemyk opened this issue Aug 13, 2014 · 2 comments · Fixed by #8022
Closed

Bug in generate CREATE TABLE SQL for DataFrame with MultiIndex with different types #8021

artemyk opened this issue Aug 13, 2014 · 2 comments · Fixed by #8022
Labels
IO SQL to_sql, read_sql, read_sql_query
Milestone

Comments

@artemyk
Copy link
Contributor

artemyk commented Aug 13, 2014

When using SQLAlchemy to create a table from a dataframe, Pandas messes up the types of the index columns. For example:

import sqlalchemy
engine = sqlalchemy.create_engine('sqlite:///:memory:')

import pandas as pd
from pandas.io.sql import PandasSQLTable, PandasSQLAlchemy
df = pd.DataFrame.from_records([{'a':1,'b':2.1,'c':'txt1','a':2,'b':1.5,'c':'txt2'}], index=['a','b'])
o = PandasSQLTable('test_df',PandasSQLAlchemy(engine),frame=df)
print o.sql_schema()

outputs

CREATE TABLE test_df (
    a FLOAT, 
    b BIGINT, 
    c TEXT
)

when a should be BIGINT and b should be FLOAT.

This occurs because the index columns are enumerated backwards in PandasSQLTable._create_table_statement in pandas.io.sql but the types don't take into account this backward iteration:

        if self.index is not None:
            for i, idx_label in enumerate(self.index[::-1]):
                idx_type = self._sqlalchemy_type(
                    self.frame.index.get_level_values(i))
                columns.insert(0, Column(idx_label, idx_type, index=True))

I will make a pull request with a fix for this issue. Versions appended below:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-358.6.2.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.1-200-g534784b
nose: 1.3.0
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.4
bottleneck: None
tables: None
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: None
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: 2.4.2
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None
@jorisvandenbossche jorisvandenbossche added this to the 0.15.0 milestone Aug 13, 2014
@jorisvandenbossche
Copy link
Member

Thanks for the report, and great you can do a PR!

@artemyk
Copy link
Contributor Author

artemyk commented Aug 14, 2014

Pull request submitted at #8022 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants