-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: json_normalize #15621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
yeah this is all in python code :< IIRC @wesm has a plan for this in pandas2, so maybe it would be possible to make use of some of that. |
Converting lists of dictionaries faster in |
Not sure if this is still on anyone's radar, but I've been dealing with a performance issue at least partly caused by json_normalize. From some profiling, it seems like the biggest problem for my case is the use of deepcopy. For common relatively simple cases of just dictionaries/lists of string and numeric literals, deepcopy seems like a lot of unnecessary overhead. Even if it's needed for some use cases, calling it recursively (when it is doing its own recursive copy) is surely not optimal. |
any updates on fixing this or suggestions for workarounds (maybe some other library that flattens the dictionary?) I found this library https://pypi.org/project/flatten-dict/ that seems to make things a bit faster than pd.io.json.json_normalize
|
I'm happy to take a look and potentially make a pull request. I wrote pure python implementation here.
brief benchmark:
|
take |
I haven't looked much at the implementation, but guessing simpler cases like this could be optimized.
Output of
pd.show_versions()
pandas: 0.19.2
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: None
html5lib: 0.999999999
httplib2: 0.9.2
apiclient: 1.5.3
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: