Skip to content

Timestamp sub second resolution not supported with MySQL #7938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
femtotrader opened this issue Aug 5, 2014 · 11 comments
Closed

Timestamp sub second resolution not supported with MySQL #7938

femtotrader opened this issue Aug 5, 2014 · 11 comments
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query

Comments

@femtotrader
Copy link

femtotrader commented Aug 5, 2014

Hello,

Timestamp sub second resolution doesn't seem to be supported with mysqldb.

See:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np
import sqlalchemy

def main():
    N = 5
    df = pd.DataFrame({'Date':pd.date_range('2000-12-29', freq='D', periods=N), 'Open': range(N)})
    scheme = 'mysql+mysqldb'

    config = {
        'host'    : 'localhost',
        'database': 'test',
        'user'    : 'root',
        'password': '123456',
        'port': 3306
    }

    db_uri = "{scheme}://{user}:{password}@{host}:{port}/{database}"
    db_uri = db_uri.format(
        scheme = scheme,
        user = config['user'],
        password = config['password'],
        port = config['port'],
        host = config['host'],
        database = config['database']
    )
    engine = sqlalchemy.create_engine(db_uri)

    df["Date"] = df["Date"] + np.timedelta64(1, 'us') # to test resolution issue

    df = df.set_index("Date")

    print(df)
    print(df.dtypes)
    print(type(df.index), df.index.dtype)
    print(type(df.index[0]))

    df.to_sql("orcl", engine, flavor="mysql", if_exists="replace")

if __name__ == '__main__':
    main()

doesn't raise any error but when I queried database I noticed that I have only second resolution timestamp (no microsecond, in fact no fractional seconds).

$ mysql -u root mysql -p
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 40
Server version: 5.5.38-0ubuntu0.14.04.1 (Ubuntu)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from orcl;
+---------------------+------+
| Date                | Open |
+---------------------+------+
| 2000-12-29 00:00:00 |    0 |
| 2000-12-30 00:00:00 |    1 |
| 2000-12-31 00:00:00 |    2 |
| 2001-01-01 00:00:00 |    3 |
| 2001-01-02 00:00:00 |    4 |
+---------------------+------+
5 rows in set (0.00 sec)

mysql> show columns from orcl;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| Date  | datetime   | YES  | MUL | NULL    |       |
| Open  | bigint(20) | YES  |     | NULL    |       |
+-------+------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

Maybe storing datetime64 into integer value could be the most portable solution over database, driver...

@jorisvandenbossche jorisvandenbossche changed the title Timestamp sub second resolution not supported with mysqldb Timestamp sub second resolution not supported with MySQL Aug 5, 2014
@jorisvandenbossche
Copy link
Member

This is a limitation of MySQL (see http://dev.mysql.com/doc/refman/5.0/en/datetime.html).

I personally think we should not do anything about this (users of MySQL should be aware of that / live with the limitation or circumvent it themselves if needed). But we can certainly add a warning to the docs.

And maybe it would be possible to add a UserWarning when there are sub-second data that will be discarded? (or maybe that would be too costly to check?)

@danielballan @mangecoeur @hayd @jreback What do you think?

@jreback
Copy link
Contributor

jreback commented Aug 5, 2014

silly question, is this a printing issue with the db? (e.g. can you have the db dump the actual values)?

@jorisvandenbossche
Copy link
Member

No, quote from the link of mysql docs:

A DATETIME or TIMESTAMP value can include a trailing fractional seconds part in up to microseconds (6 digits) precision. Although this fractional part is recognized, it is discarded from values stored into DATETIME or TIMESTAMP columns

So the information is lost when the values are stored in the database.

@femtotrader
Copy link
Author

see also http://dev.mysql.com/doc/refman/5.6/en/fractional-seconds.html microseconds seems to be supported by MySQL (but not nanoseconds). Do you really think that datetime64 shouldn't be stored as integer ? if you consider this is a bad idea why not using DATETIME(6) instead of DATETIME ?

@jorisvandenbossche
Copy link
Member

Yes, I don't think that is a good idea (but try to convince me otherwise!) My reasoning:

  • In a lot of (most?) applications, sub-second data will not be used, and in those cases, the datetime type of mysql is perfectly fine. So I think that we should, certainly by default, use the datetime type of mysql, as this exists.
  • I agree that in some cases (eg this loss of precision) another represention can be better, eg ints. And can also be more general (works for all types of precision). But in the case above, this would also be a lot less convenient than to store it as a datetime, so I wouldn't do it by default.
  • The question then is, do we provide it as an option? Or leave it to the user to do the conversion? The problem here is that multiple conversions would be possible. You could eg also argue for a string representation (better interpretable as the integer value)? You could do the conversion to ints with different timebases? So, I personally think (for now) it is better to leave this to the user to do this explicit.
  • If data would be lost due to this precision limitation, a warning would indeed be ideal. We should investigate if this is easily possible.

@danielballan
Copy link
Contributor

For compatibility with other systems, I don't think datetime64 should be stored as integers. MySQL users would expect dates to be dates. Pandas users can convert their datetime64 columns to ints or strings before writing to the db. As @jorisvandenbossche says, there are too many reasonable ways to do this: we don't want to guess.

I think there's a case for making pandas warn. The performance cost is a concern. Not sure how I feel about this.

FYI, I remember seeing 2-3 years ago that the feature request page for robust support of fractional seconds in MySQL is more than 10 years old.

@femtotrader
Copy link
Author

I understand that UserWarnings come with a computational time cost... so I can live without that.

I would like to keep dataframe with datetime64 but I would like to be able to "tell" to SQLAlchemy how to manage them with database storage. Is there an API for that (a kind of converter to overload and to pass to engine) ? something like conn.set_converter_class(Datetime64Converter) (see #7936 ) but which handle not only value but also type to use in DB.

@jorisvandenbossche
Copy link
Member

@femtotrader I think that should be possible, take a look here: http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#custom-types, you can overwrite existing or define new types in the way they are handled by sqlalchemy.

That is maybe also a way to solve the datetime problem with mysql-connector and pymysql. If you get it to work, can you post it here? We maybe can add it as an example to the docs.

@jorisvandenbossche
Copy link
Member

A possible way to check:

In [34]: def check_subsecond(s):
   ....:     return np.any(s.astype('i8') % 1000000000)
   ....:

In [36]: s = pd.date_range('2012-01-01', freq='s', periods=100000)

In [38]: check_subsecond(s)
Out[38]: False

In [39]: %timeit check_subsecond(s[[0]])
10000 loops, best of 3: 45.1 µs per loop

In [40]: %timeit check_subsecond(s)
1000 loops, best of 3: 1.36 ms per loop

In [43]: s = pd.date_range('2012-01-01', freq='us', periods=100000)

In [45]: check_subsecond(s)
Out[45]: True

In [46]: %timeit check_subsecond(s)
1000 loops, best of 3: 1.35 ms per loop

Would this be acceptable as a performance loss?
We should actually also something similar for postgresql then, as this has a precision up to microsecond, not nanosecond ..

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

@jorisvandenbossche still relevant? (if so pls assign to milestone), otherwise close

@mroeschke
Copy link
Member

Seems like this is a MySQL limitation and there's not a clear action on the pandas side. Going to close for now, but happy to reopen if I misunderstood

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

5 participants