Skip to content

REF: make BaseOffset a cdef class #34167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 18, 2020

Conversation

jbrockmendel
Copy link
Member

@jreback i could use help here with one last stubborn pickle/pytables test test_read_py2_hdf_file_in_py3

Within this legacy file we have the substring (sub-bytes?)

blob = b"ccopy_reg\n_reconstructor\np1\n(cpandas.tseries.offsets\nBusinessDay\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS'normalize'\np6\nI00\nsS'n'\nI1\nsS'kwds'\np7\n(dp8\nsS'offset'\np9\ncdatetime\ntimedelta\np10\n(I0\nI0\nI0\ntRp11\nsb."

AFAICT it is doing pickle.loads(blob) and getting:

>>> pickle.loads(blob)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/copyreg.py", line 43, in _reconstructor
    obj = object.__new__(cls)
TypeError: object.__new__(BusinessDay) is not safe, use BusinessDay.__new__()

Then it is falling back and for reasons not clear to me is calling to_offset(blob) which raises ValueError which is what pytest shows us.

IIUC there is a way to register functions for unpickling, but none of the functions in pd.compat.pickle_compat are getting called in this process, so I'm at a loss for how to do this. Any ideas?

@jreback
Copy link
Contributor

jreback commented May 13, 2020

you can figure out what you need to patch by monkey patching the unpickler itself

and stepping thru

it’s going to be one of the dist instructions i think (i did this a number of years ago so don’t remember much more)

@jbrockmendel
Copy link
Member Author

you can figure out what you need to patch by monkey patching the unpickler itself

Very tentative, but it looks like we can make a context to temporarily patch pickle to use the Unpickler in compat.pickle_compat during calls to HDFStore.get

@jbrockmendel
Copy link
Member Author

Booyah green

@@ -620,6 +628,8 @@ class _BaseOffset:
TimeStamp
Rolled timestamp if not on offset, otherwise unchanged timestamp.
"""
# TODO: try to avoid runtime/circular import
from pandas import Timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you not cimport this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need Timestamp (not cimport-able), not _Timestamp (cimport-able) here. We could re-instate a as_timestamp func to only do the runtime import in one place, and after a (fast) isinstance check.

I asked on the cython mailing list yesterday about handling of circular cimports. It seems pretty hit-and-miss.

if group is None:
raise KeyError(f"No object named {key} in the file")
return self._read_group(group)
with patch_pickle():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems pretty invasive to do this here. can we not do this in the pickle_compat.py itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the unpickling is done with a call to pickle.loads from within pytables (not our pytables code, but pytables' own code). the effect of this patching here is to make it use our pickle_compat code instead of the stdlib version. if there's a less invasive way to do this thatd be great, but i haven't found it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i understand, can you add some comments here to that effect

@jbrockmendel jbrockmendel changed the title WIP/REF: make BaseOffset a cdef class REF: make BaseOffset a cdef class May 14, 2020
@jreback jreback added the Frequency DateOffsets label May 15, 2020
@jreback
Copy link
Contributor

jreback commented May 15, 2020

ok will have another look, pls rebase

@jbrockmendel
Copy link
Member Author

updated; npdev failure is unrelated AFAICT

@jbrockmendel
Copy link
Member Author

rebased+green

@@ -111,6 +111,14 @@ def as_datetime(obj: datetime) -> datetime:
return obj


cdef ABCTimestamp as_timestamp(obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we removed this previously? what is the purpose of this now? to centrailize the import?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to centrailize the import?

Yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would only have 2 times where you actually need to import Timestamp (and you don't even need to check if its ABC); i would just do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated+green

if group is None:
raise KeyError(f"No object named {key} in the file")
return self._read_group(group)
with patch_pickle():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i understand, can you add some comments here to that effect

@jbrockmendel
Copy link
Member Author

ok i understand, can you add some comments here to that effect

updated+green

@jreback jreback added this to the 1.1 milestone May 18, 2020
@jreback jreback merged commit 1ce9f0c into pandas-dev:master May 18, 2020
@jreback
Copy link
Contributor

jreback commented May 18, 2020

very nice

@jbrockmendel jbrockmendel deleted the ref-liboffsets-cdefit branch May 18, 2020 18:04
TomAugspurger added a commit to TomAugspurger/dask that referenced this pull request Jun 16, 2020
* idxmaxmin: pandas-dev/pandas#32579
* upstream fix
* first_last: pandas-dev/pandas#34167
* test_dropna: unclear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants