-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug Ensure cache_readonly is only applied on instance #40329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/tests/libs/test_lib.py
Outdated
assert instanceA.expensive_property == "A" | ||
assert instanceA.counter == 1 | ||
|
||
assert instanceB.expensive_property == "B" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on master this returns A
b3255dd
to
1e41bac
Compare
1e0e6da
to
7ccc9ca
Compare
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -601,6 +601,7 @@ ExtensionArray | |||
|
|||
- Bug in :meth:`DataFrame.where` when ``other`` is a :class:`Series` with :class:`ExtensionArray` dtype (:issue:`38729`) | |||
- Fixed bug where :meth:`Series.idxmax`, :meth:`Series.idxmin` and ``argmax/min`` fail when the underlying data is :class:`ExtensionArray` (:issue:`32749`, :issue:`33719`, :issue:`36566`) | |||
- Bug where some properties of subclasses of :class:``PandasExtensionDtype`` where improperly cached :issue:`40329` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where -> were
parens around the issue tag
single-ticks around PandasExtensionDtype
Wouldn't it be simpler to just not instantiate _cache on the class? |
We don't need a cache per instance not per class. |
fb5661f
to
d0fbc0b
Compare
I believe renaming the property cache is not the best idea since this is used all over the place. The actual problem is that these extension type caches pandas/pandas/core/dtypes/dtypes.py Line 89 in 7b5957f
are class attributes but share the same name as the commonly used instance attribute and I do not know how best to distinguish them. I would propose to rename the dtypes cache but I'm not sure if this is perceived as a breaking change or not |
It would not. Go for it. |
Why not go the other direction and change PandasExtensionDtype._cache to e.g. PandasExtensionDtype._instance_cache |
d0fbc0b
to
628a509
Compare
Sorry for not responding. I was out of office for a while. I added a runtime check to the property cache implementation which is supposed to ensure that we're not accessing any class wide state. I'm also ensuring that we're actually using the cache for ExtensionTypes and raise an appropriate exception if we do not. This led me down the serialization implementation of the |
628a509
to
abe7603
Compare
pandas/_libs/tslibs/offsets.pyx
Outdated
@@ -895,13 +895,6 @@ cdef class Tick(SingleConstructorOffset): | |||
|
|||
raise ApplyTypeError(f"Unhandled type: {type(other).__name__}") | |||
|
|||
# -------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what were these hitting? or were they at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly do not know if these were used or not. I ran into the issue that a lot of the BaseOffset
children were not properly initialized. I ended up initializing _cache
in every class but noticed they should all do fine using the __reduce__
in
pandas/pandas/_libs/tslibs/offsets.pyx
Lines 728 to 737 in ce34c1c
def __reduce__(self): | |
# This __reduce__ implementation is for all BaseOffset subclasses | |
# except for RelativeDeltaOffset | |
# np.busdaycalendar objects do not pickle nicely, but we can reconstruct | |
# from attributes that do get pickled. | |
tup = tuple( | |
getattr(self, attr) if attr != "calendar" else None | |
for attr in self._attributes | |
) | |
return type(self), tup |
They are all tested in
pandas/pandas/tests/io/test_pickle.py
Lines 170 to 177 in ce34c1c
def test_pickles(current_pickle_data, legacy_pickle): | |
if not is_platform_little_endian(): | |
pytest.skip("known failure on non-little endian") | |
version = os.path.basename(os.path.dirname(legacy_pickle)) | |
with catch_warnings(record=True): | |
simplefilter("ignore") | |
compare(current_pickle_data, legacy_pickle, version) |
with the data of
def create_data(): |
pandas/_libs/properties.pyx
Outdated
@@ -23,12 +23,29 @@ cdef class CachedProperty: | |||
return self | |||
|
|||
# Get the cache or set a default one if needed | |||
cache = getattr(obj, '_cache', None) | |||
cache = instance_cache = getattr(obj, '_cache', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is any of this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really know how to test this and implemented this runtime check. I think it's not absolutely necessary if one follows the convention properly but I believe it is very easy to make a mistake here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_no_cache_attribute_on_class():
for cls in list_of_all_relevant_classes_somehow():
assert not hasattr(cls, "_cache")
pandas/core/dtypes/dtypes.py
Outdated
@@ -86,7 +86,7 @@ class PandasExtensionDtype(ExtensionDtype): | |||
base: Optional[DtypeObj] = None | |||
isbuiltin = 0 | |||
isnative = 0 | |||
_cache: Dict[str_type, PandasExtensionDtype] = {} | |||
_cache_dtypes: Dict[str_type, PandasExtensionDtype] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes in this file and in pandas/tests/dtypes/test_dtypes.py make sense, the rest i think are unnecessary and in the case of cache_readonly harmful
I get that the complexity in |
b46e5ee
to
3cd66a2
Compare
I reverted the changes in the cython code base. What's left is the renaming of the |
@fjetter looks fine, can you merge master and ping on green |
3cd66a2
to
cf30dc7
Compare
This was closed by #40193 |
The property cache was using the attribute
_cache
which occasionally is used by other classes as a class attribute. Most prominently this is true for all subclasses ofPandasExtensionDtype
, seepandas/pandas/core/dtypes/dtypes.py
Line 89 in ce082fb
This would cause the same cash to be used for the property caching as for the extension dtype. The dtype cache is a class attribute whereas the property cache is supposed to be instance only. I fixed this by using a longer name for the property caching attribute, hoping this would lead to less collisions.
Introducing a new attribute leads to failures for Cython extension types which previously were handled with an
AttributeError
. Firstly, I believe these cases should not silently pass and secondly the previous behaviour was wrong by not executing the function w/out cache but it returned the instance of theCachedProperty
class instead. I addressed this but enabling the caching for extension classes requires defining the attribute explicitly. I went ahead and created a mixin class for this but I'm open to other suggestions as well.