Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BUG: hashing datetime64 objects #50960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: hashing datetime64 objects #50960
Changes from 28 commits
7761ecd
610b0c6
0e140ae
919383c
ae8c0bb
92a39eb
2f67805
0635f86
229ab72
6e96805
24fda82
058b666
7398991
3fdf564
6e4836e
f55337a
a97dfc9
1338ca2
9fb1987
818682c
74ab540
037ba05
7a4b1ab
47e5247
dcd09dd
32d479b
d47cfd8
c091317
1ce791e
704fb69
6d962b0
f838953
0d8500a
58a29b6
95069e0
7362f3e
dd08670
998a4cc
b75730b
5c57a5e
6b4460f
c94609b
afe9493
4fecc97
b4cc41e
c620339
3633653
c55f182
6e2bbf0
23c2826
143b3a3
ffb8365
1fdfd64
5513721
875d6af
a29a56a
40e6e17
15a701c
af25f40
9d5cb46
d5a031d
bd7d432
394d86e
1766bc3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, missed that you removed this here. If we duplicate the code, maybe just don't change it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can move this to the capsule. It would fit much more naturally there than the khash header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The capsule also already does the PyDateTime_IMPORT, which I think will help your remaining issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to move this out of the function body? Maybe it can go into the global hashtable code instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that causes a complaint at build time. there is another use of PyDateTime_IMPORT in the json code that is also runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does not seem clean to be honest. You are effectively doing a lookup
datetime.datetime_CAPI
and unpacking the capsule.This being in a header makes things a bit tricky maybe. The import can be on the top level of the cython file, but not of a C-file and thus even less of a header. If this is only used in Cython, then you could just put it there (if it gets used without the import you get a segfault, which should be relatively clean to read in gdb/lldb, but of course not super easy, maybe a comment on the
PyDateTime_FromDateAndTime
so someone who finds a crash there knows why?).If this was in a C file, you would need to add a
datetime_init()
function and make sure it gets called on import.Long story short: This is used once, so probably move
PyDateTime_IMPORT
to cython top-level.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this not work to move to the global hashtable code in the cython file? Also agree with @seberg would like to avoid using the macro here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ill defer to you guys on this, but im uncomfortable with having this c file have an implicit dependency on the higher-level cython file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note there's also a coment in np_datetime.c saying it would be helpful to get PyDateTime_IMPORT to work in that file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the implicit dependency is not nice, but I think that is just something to live with. You need some initialization, which needs to be a function call and that cannot happen in a header file. I would just say: If you forget, you get a clean segfault pointing at the first use of Datetime code in gdb/lldb. We put a comment there explaining things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some (hopefully helpful) context - the header file never gets "executed" in any sense, so just putting the macro in a header file doesn't by itself just run the code you've inserted.
The entry point for C extensions is a module initialization function like you see in the Python documentation - this gets executed when the Python interpreter loads the extension:
https://docs.python.org/3/extending/extending.html#the-module-s-method-table-and-initialization-function
I am assuming that the global namespace of the Cython file is akin to the module initialization; if that is true then it would be natural for the import to go there rather than the header.
With that said the workaround suggested here is fine, but I don't think the dependency problem you were worried about is too big of a deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is decent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ops had, added an error check, although
PyDateTime_IMPORT
probably can't fail here anyway (I am sure by this time datetime had been imported somewhere).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed commit with this (but not the NULL check) and it failed on the CI, so reverted it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had misspelled
PyDateTimeAPI
, with a lower T instead of capital one.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have these same defines in
khash_python.h
already. Not sure about duplicating here versus refactoringThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah I moved some of it rather than duplicating, likely could go further down that path. I'd like to get this in soonish so prefer to do bigger refactors separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with C its generally really hard to track down the effects of copy/pasting defines across headers and implementations though. Maybe we just create a
cpython_hash.h
file that khash and np can include?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving it into its own file seems nice probably. I wouldn't care too much, since this is only a problem if you would include the other header here (even then it might not be since it matches).
On a general note, it might be good to add include guards to your headers:
(some pattern there, google style suggests full paths and at least a partial path would make sense I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The static inline should be the same as pandas inline.
inline
used to not exist, but you can now rely on it. So basicallyPANDAS_INLINE
can be blanket replaced withstatic inline
.However, you can of course also just use
static
orPANDAS_INLINE
here. You should add one of these (i.e.static
should be there) to ensure that this is private to this file.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other PRs I've asked us to move away from specifying inline explicitly instead allowing the compiler to choose for us. Is that the wrong approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC its a compiler hint mainly but I am not sure where it matters in many places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea on second read this might be a place where it actually could inline. Thought it was using the Python runtime at first but I see it doesn't now, so it might have a chance
gcc has
-Winline
to tell you when the hint is ignored would be interesting to see on thisThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could bunch everything below second (or even minute or more?) into a single 64bit number (then unfortunately the same trick to split it up if necessary on the platform).
Not sure that is worthwhile, probably a tiny bit faster, but I am mainly wondering if it might generalize a bit nicer if we use a simpler scheme that doesn't require the full datetime struct in principle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to make this public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the point of the capsule is you don't specify
sources
any more. Since you are adding a new function here, you'll want to add it to the datetime capsule itselfThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m still struggling here with builds failures I cant reproduce. Any other thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't add any new sources arguments to setup.py - this is what causes undefined symbols during parallel compilation with setuptools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted that and now im getting a different failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think you also need to move the implementation out of the header file into the capsule to resolve that