-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
RecursionError in DataFrame.replace with Out of Bounds Datetime #20380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you make your example reproducible? http://stackoverflow.com/help/mcve |
example_df.zip Thanks! Jack |
Also here is the output of pd.show_versions() INSTALLED VERSIONScommit: None pandas: 0.22.0 |
Are you able to do some digging to see what's going on? I suspect none of
the maintainers will have time.
If not, perhaps post a traceback and someone else can take a look if they
hit the same issue.
…On Fri, Mar 16, 2018 at 2:21 PM, JackKZ ***@***.***> wrote:
Also here is the output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20380 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHItJ5pNRLg91TJJmW0ndm1q_vg3A5ks5tfBCpgaJpZM4St8QO>
.
|
Just to chime on this with more info, I'm getting a import pandas as pd
from io import BytesIO
import zipfile
import requests
# Read the zipped file from this issue thread.
res = requests.get("https://github.com/pandas-dev/pandas/files/1820497/example_df.zip")
# Read the pkl file via pandas (pkl contains a data frame).
with zipfile.ZipFile(BytesIO(res.content)) as z:
pkl_file = z.namelist()[0]
with z.open(pkl_file) as pk:
df = pd.read_pickle(pk)
# Simple call to replace.
df = df.replace("blah", "NA") Here's the traceback:
I tried isolating the issue to a specific column try:
df = df.replace("blah", "NA")
except RecursionError:
for curr_col in df.columns:
try:
df[curr_col] = df.loc[:, curr_col].replace("blah", "NA")
print("success for col %s" % curr_col)
except RecursionError:
print("fail for col %s" % curr_col) This outputs:
So the issue seems to be with col types = set()
for n in df["production_date"]:
if type(n) not in types:
types.add(type(n)) printing
This is as far as I've gotten. Not sure what to do with all this info, but figured I'd share. |
Oh and here's my output for
|
@TomAugspurger I believe I've narrowed this down to This works without errors (year import pandas as pd
import datetime
df = pd.DataFrame({
"dt" : [datetime.datetime(2017, 12, 20)],
"str" : ["blah"]
})
df.replace("blah", "cats") However this throws a import pandas as pd
import datetime
df = pd.DataFrame({
"dt" : [datetime.datetime(3017, 12, 20)],
"str" : ["blah"]
})
df.replace("blah", "cats") Saving the bad code to file and then calling it from command prompt, I'm able to see the entire traceback. Near the top, this block appears a bunch of times:
Followed by a bunch of calls referencing lines 771 and 2216, and ending with this:
Looking at function check_dts_bounds in |
The recursion is occurring because
There is little sense in trying again if the datetime is out of bounds. This issue can be fixed by adding an except block for I'm inclined to think 'yes'. It's better to fail sooner rather than later. |
Thanks for digging. In @ChrisMuir's small example, the column is object dtype, so I think that's OK. passing |
Updated the original post with the nice example. |
raised a PR trying to fix this. 😃 convert=False breaks other cases by the look of it. at the moment using try except on cast module, new ideas welcome. |
@JackKZ @ChrisMuir would you like an error to be raised (for the out of bound datetime) when encountering above example or ignore silently and carry out the replace anyhow? |
Well, personally I'd prefer that |
In the example given, and perhaps in most use cases, the OOB datetime is irrelevant. Of course that datetime could lead to other problems, but as far as replace is concerned it shouldn't matter. In my opinion, it would only make sense to raise the exception when manipulating the datetime. |
Looks like this was closed by #22108 |
RecursionError in DataFrame.replace
Leads to lots of
(thanks to @ChrisMuir for the example)
The text was updated successfully, but these errors were encountered: