Skip to content

BUG: Spurious FutureWarning when using pd.read_json() #59511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
krassowski opened this issue Aug 14, 2024 · 8 comments
Open
2 of 3 tasks

BUG: Spurious FutureWarning when using pd.read_json() #59511

krassowski opened this issue Aug 14, 2024 · 8 comments
Assignees
Labels
Bug IO JSON read_json, to_json, json_normalize Warnings Warnings that appear or should be added to pandas
Milestone

Comments

@krassowski
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import io
s = '{"A":{"0":"X","Y":"Y"}}'
pd.read_json(io.StringIO(s), typ='frame', orient='records')

Issue Description

read_json() works correctly but generates three spurious and annoying warnings:

FutureWarning: The behavior of 'to_datetime' with 'unit' when parsing strings is deprecated. In a future version, strings will be parsed as datetime strings, matching the behavior without a 'unit'. To retain the old behavior, explicitly cast ints or floats to numeric type before calling to_datetime.
  pd.read_json(io.StringIO(s), typ='frame', orient='records')
FutureWarning: The behavior of 'to_datetime' with 'unit' when parsing strings is deprecated. In a future version, strings will be parsed as datetime strings, matching the behavior without a 'unit'. To retain the old behavior, explicitly cast ints or floats to numeric type before calling to_datetime.
  pd.read_json(io.StringIO(s), typ='frame', orient='records')
FutureWarning: The behavior of 'to_datetime' with 'unit' when parsing strings is deprecated. In a future version, strings will be parsed as datetime strings, matching the behavior without a 'unit'. To retain the old behavior, explicitly cast ints or floats to numeric type before calling to_datetime.
  pd.read_json(io.StringIO(s), typ='frame', orient='records')

These warnings are spurious because the usage of to_datetime is not controlled by user. The warnings are annoying because there are so many of them.

This was previously raised on Stack Overflow here 3 months ago and it accumulated almost 400 visits: https://stackoverflow.com/questions/78454457/pandas-read-json-future-warning-the-behavior-of-to-datetime-with-unit-when

Expected Behavior

The warnings are suppressed.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.10.10.final.0
python-bits : 64
OS : Linux
OS-release : 6.5.0-44-generic
Version : #44-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 7 15:10:09 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 2.2.2
numpy : 2.0.1
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 65.5.0
pip : 22.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.26.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

@krassowski krassowski added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 14, 2024
@sgysherry
Copy link

sgysherry commented Aug 14, 2024

I test with latest dev version, seems the problem is solved. Will take a closer look

@rhshadrach
Copy link
Member

Thanks for the report, confirmed on 2.2.x. At a quick glance, it looks like this line:

convert_dates=True,

should be passing self.convert_dates instead of being hard coded to True.

Additionally, we could catch the warning here:

new_data = to_datetime(new_data, errors="raise", unit=date_unit)

Further investigations and PRs to fix are welcome!

@rhshadrach rhshadrach added IO JSON read_json, to_json, json_normalize Warnings Warnings that appear or should be added to pandas and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 14, 2024
@rhshadrach rhshadrach added this to the 2.2.3 milestone Aug 14, 2024
@KevsterAmp
Copy link
Contributor

Take

@KevsterAmp
Copy link
Contributor

seems like self.convert_dates also returns True by default.

The convert_axes param of pd.read_json() is True for all orient values except "Table"
https://pandas.pydata.org/docs/dev/reference/api/pandas.read_json.html

I was trying to see the difference between the pd.read_json() 2.2.x branch with main, can't really find much except that the specific FutureWarning is now removed from the main branch.

@rhshadrach Any ideas for a fix? Thanks

@rhshadrach
Copy link
Member

seems like self.convert_dates also returns True by default.
...
The convert_axes param of pd.read_json() is True for all orient values except "Table"

I may be misunderstanding, but I think you're talking about the default value of convert_dates? The user can supply convert_dates=False to read_json. If we didn't have it hardcoded to True but rather used the argument the user passes, then I think they could suppress this warning. Of course, users shouldn't have to pass convert_dates=False to suppress the warning, but I think it's a better option than being forced to capture it.

Somewhat separately, it seems to me we should be adhering to a user's request when they pass convert_dates=False. That is not currently the case because it is hardcoded to True.

@KevsterAmp
Copy link
Contributor

Thanks for the clarification @rhshadrach. I misunderstood the code on my previous message

@asishm
Copy link
Contributor

asishm commented Sep 7, 2024

I don't think this OP issue reproduces on main (was presumably fixed by #59124 - haven't run a bisect). I think it's better to create a new issue for the comment (speaking from the perspective of a release-notes reading user)

Somewhat separately, it seems to me we should be adhering to a user's request when they pass convert_dates=False. That is not currently the case because it is hardcoded to True.

which I believe is what @KevsterAmp's PR aims to do.

@rhshadrach
Copy link
Member

I don't think this OP issue reproduces on main

I believe the warning has been removed in preparation of pandas 3.0. The warning still exists on 2.2, and I think this is eligible for a backport. In fact, we would skip a PR into main because this is no longer an issue on main, so a PR straight into either 2.2.x or 2.3.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize Warnings Warnings that appear or should be added to pandas
Projects
None yet
Development

No branches or pull requests

6 participants