Skip to content

No way with to_json to write only date out of datetime #16492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
toobaz opened this issue May 25, 2017 · 14 comments
Open

No way with to_json to write only date out of datetime #16492

toobaz opened this issue May 25, 2017 · 14 comments
Labels
Enhancement IO JSON read_json, to_json, json_normalize

Comments

@toobaz
Copy link
Member

toobaz commented May 25, 2017

Code Sample, a copy-pastable example if possible

In [2]: pd.Series(pd.to_datetime(['2017-03-15'])).to_csv()
Out[2]: '0,2017-03-15\n'

In [3]: pd.Series(pd.to_datetime(['2017-03-15'])).to_json(date_format='iso')
Out[3]: '{"0":"2017-03-15T00:00:00.000Z"}'

Problem description

From this SO comment

By default, to_csv() drops times of day if they are all midnight. By default, to_json instead writes dates in epoch, but if passed date_format='iso' will always write times.

Not sure whether it would be optimal to have date_format='iso' behave as to_csv(), or rather add a new (for instance) date_format='iso_date', or maybe even free-form like accepted by pd.Series.df.strftime(), e.g. date_format="%Y-%m-%d".

Expected Output

Out[3]: '{"0":"2017-03-15"}'

Output of pd.show_versions()

# Paste the output here pd.shoINSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.7.0-1-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: it_IT.utf8 LOCALE: it_IT.UTF-8

pandas: 0.20.1
pytest: 3.0.6
pip: 9.0.1
setuptools: None
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: 0.9.2
IPython: 5.1.0.dev
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.2
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: 3.7.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.2.1
w_versions() here

@toobaz
Copy link
Member Author

toobaz commented May 25, 2017

Probably worth fixing together with #12997

@jreback
Copy link
Contributor

jreback commented May 25, 2017

-1 on this. This does not conform to JSON and this creates non-standard formatting for dates.

#12997 is completely orthogonal to this and is an actual bug.

@jreback jreback added the IO JSON read_json, to_json, json_normalize label May 25, 2017
@toobaz
Copy link
Member Author

toobaz commented May 25, 2017

This does not conform to JSON

I'm not expert at all on JSON and/or date representation standards... but if JSON dictates the use of ISO 8601, then ISO 8601 does define a way to represent dates (without time). If JSON instead dictates using the full representation for ISO 8601, then indeed this is worth closing.

(I know that strictly speaking JSON does not dictate anything, I'm just referring to any widespread convention.)

@chris-b1
Copy link
Contributor

I think accepting a generic date-format string would be reasonable and consistent with other APIs:

pd.Series(pd.to_datetime(['2017-03-15'])).to_csv(date_format='%m/%d/%Y')
Out[30]: '0,03/15/2017\n'

@MattOates
Copy link

There is no "JSON standard" on what a date should look like, major languages vastly differ on how they serialize their builtin datetimes to JSON with standard libraries. Most of the time the point is to hand to a JS library what it expects for a date format. Sometimes this looks like %Y%m%d as a string for just a date... Being pragmatic is far more useful than having to completely reimplement JSON serialisation just to have a different date format, when there is a parameter called date_format. My expectation coming to this for the first time was something like datetime.strftime(), which is what I imagine a lot of people would expect. Perhaps add in strftime as a new parameter for back compat? The default format should just be ISO too really, rather than a raw timestamp since you've lost TZ info that might be in there anyway. So the existing default is kind of bad, and the added format functionality doesn't do a huge deal for you other than provide the sensible default.

@WillAyd
Copy link
Member

WillAyd commented Jun 6, 2018

FWIW in the table schema the date field descriptor accepts a pattern

https://frictionlessdata.io/specs/table-schema/https://frictionlessdata.io/specs/table-schema/

So if we wanted to specify a date format along with the provided JSON that may be the best (i.e. only) option

@dargueta
Copy link
Contributor

Any movement on this? It's a real problem when generating JSONL files for Redshift. Redshift rejects the file because it has a time portion for a field that's a DATE. (I don't own the table so switching to TIMESTAMP isn't a viable solution.)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 13, 2019 via email

@ttamg
Copy link

ttamg commented Aug 13, 2019

I just want to second the request to enable a date serializer that allows dates as 'YYYY-MM-DD' in the to_json() method not just the ISO method with the time also. I have other services that are rejecting the date on this basis.

In case someone else is struggling with this and needs a quick workaround there is a messy hack converting to a dictionary and then using json.dumps() with a serialiser for the date format you want. In case this helps anyone here is a small function as a workaround:

import json

def get_json(df):
    """ Small function to serialise DataFrame dates as 'YYYY-MM-DD' in JSON """

    def convert_timestamp(item_date_object):
        if isinstance(item_date_object, (datetime.date, datetime.datetime)):
            return item_date_object.strftime("%Y-%m-%d")
    
    dict_ = df.to_dict(orient='records')

    return json.dumps(dict_, default=convert_timestamp)

my_json = get_json(df)

@guyko81
Copy link

guyko81 commented Sep 7, 2020

I found a great solution here

df.assign( **df.select_dtypes(['datetime']).astype(str).to_dict('list') ).to_json(orient="records")

@CaselIT
Copy link

CaselIT commented Jan 17, 2023

to_json already has date_unit that supports only seconds and sub-seconds.
It would be great if it could behave like unit in numpy datetime_as_string were you can use D to export the date as YYYY-MM-DD

@DavideCanton
Copy link

Since YYYY-MM-DD is a standard ISO format, I believe adding an else if clause here could do the trick, since it seems that ujson already supports serializing dates.

I can provide a PR if you are interested in adding support, since I think it could be a useful feature for serializing dataframes containing dates without time info.

@WillAyd
Copy link
Member

WillAyd commented Jan 24, 2023

@DavideCanton nice find - sure if you want to publish a PR someone can review and provide feedback

@tab1tha
Copy link
Contributor

tab1tha commented Jan 27, 2023

Thank you @DavideCanton !
This is useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging a pull request may close this issue.