Skip to content

COMPAT: Adapt to Numpy 2.0 dtype changes #57774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

lithomas1
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@lithomas1 lithomas1 added this to the 2.2.2 milestone Mar 7, 2024
@lithomas1 lithomas1 added the Compat pandas objects compatability with Numpy or Python functions label Mar 7, 2024
@lithomas1 lithomas1 marked this pull request as ready for review March 8, 2024 01:47
@lithomas1 lithomas1 requested a review from WillAyd as a code owner March 8, 2024 01:47
@lithomas1
Copy link
Member Author

cc @seberg for any objections to the approach here.

(I hope we aren't abusing numpy internals here).

@mattip
Copy link
Contributor

mattip commented Mar 8, 2024

I think the migration guide suggests using #include <numpy/npy_2_compat.h> and then the code change would use this macro, which includes any legacy/new dtype mangling.

- return (((PyArray_DatetimeDTypeMetaData *)dtype->c_metadata)->meta);
+ return PyDataType_C_METADATA(dtype);

But what you have should also work.

@seberg
Copy link
Contributor

seberg commented Mar 8, 2024

This is OK. I think slightly nicer would be to define #define PyDataType_C_METADATA(descr) ((descr)->c_metadata) instead and use that instead.
(That way you also don't need to vendor npy2_compat.h which is a bit much for just this purpose.

I thought there would be an elsize use in the json reader, but nice if not!

@mattip
Copy link
Contributor

mattip commented Mar 8, 2024

I think slightly nicer would be to define ...

so the complete diff would be like this?

- return (((PyArray_DatetimeDTypeMetaData *)dtype->c_metadata)->meta);
+ #if NPY_ABI_VERSION < 0x02000000
+     #define PyDataType_C_METADATA(descr) ((descr)->c_metadata)
+ #endif
+ return (PyDataType_C_METADATA(descr)->meta);

@lithomas1
Copy link
Member Author

Huh, PyDataType_C_METADATA doesn't seem to exist in the ndarraytypes.h include file.
(I think that's why I copied the legacydtype thing from numpy's datetime.c file yesterday).

Does it need to be exposed on the numpy side, or am I just missing something obvious?

@lithomas1
Copy link
Member Author

Maybe its something to do with the static inline declaration?

@seberg
Copy link
Contributor

seberg commented Mar 8, 2024

Hummmm, weird. Any chance there is some caching involved in CI and it is using an older NumPy wheel?

@jorisvandenbossche
Copy link
Member

I had the same issue locally earlier today, and that was not because of caching I think (I tried with removing my pandas build directory multiple times)

@lithomas1
Copy link
Member Author

lithomas1 commented Mar 8, 2024

No, I have it locally too

Here's what the header looks like (ndarraytypes.h)

#define PyDataType_ISLEGACY(dtype) ((dtype)->type_num < NPY_VSTRING && ((dtype)->type_num >= 0))
#define PyDataType_ISBOOL(obj) PyTypeNum_ISBOOL(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISUNSIGNED(obj) PyTypeNum_ISUNSIGNED(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISSIGNED(obj) PyTypeNum_ISSIGNED(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISINTEGER(obj) PyTypeNum_ISINTEGER(((PyArray_Descr*)(obj))->type_num )
#define PyDataType_ISFLOAT(obj) PyTypeNum_ISFLOAT(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISNUMBER(obj) PyTypeNum_ISNUMBER(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISSTRING(obj) PyTypeNum_ISSTRING(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISCOMPLEX(obj) PyTypeNum_ISCOMPLEX(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISFLEXIBLE(obj) PyTypeNum_ISFLEXIBLE(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISDATETIME(obj) PyTypeNum_ISDATETIME(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISUSERDEF(obj) PyTypeNum_ISUSERDEF(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISEXTENDED(obj) PyTypeNum_ISEXTENDED(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_ISOBJECT(obj) PyTypeNum_ISOBJECT(((PyArray_Descr*)(obj))->type_num)
#define PyDataType_MAKEUNSIZED(dtype) ((dtype)->elsize = 0)
/*
 * PyDataType_* FLAGS, FLACHK, REFCHK, HASFIELDS, HASSUBARRAY, UNSIZED,
 * SUBARRAY, NAMES, FIELDS, C_METADATA, and METADATA require version specific
 * lookup and are defined in npy_2_compat.h.
 */


#define PyArray_ISBOOL(obj) PyTypeNum_ISBOOL(PyArray_TYPE(obj))
#define PyArray_ISUNSIGNED(obj) PyTypeNum_ISUNSIGNED(PyArray_TYPE(obj))
#define PyArray_ISSIGNED(obj) PyTypeNum_ISSIGNED(PyArray_TYPE(obj))
#define PyArray_ISINTEGER(obj) PyTypeNum_ISINTEGER(PyArray_TYPE(obj))

@lithomas1
Copy link
Member Author

lithomas1 commented Mar 8, 2024

The whitespace is where the definition of the static things should be, but they're just gone.

We still have macros like PyDataType_ISLEGACY that you added in numpy/numpy#25802 though.

@jorisvandenbossche
Copy link
Member

And I was certainly using an up to date numpy, because wasn't using a nightly wheel but locally built from latest main. For the changes in pyarrow, I did not run into that error. I was trying to see the difference between both, but couldn't directly see anything.

@seberg
Copy link
Contributor

seberg commented Mar 8, 2024

Sorry I hadn't noticed: the definitions are moved, and you need #include "numpy/ndarrayobject.h" and import the NumPy API. I had added a new import helper to just do that locally as a quick fix if needed, but that requires copying in npy2_compat.h.

I'll have a look at it.

@seberg
Copy link
Contributor

seberg commented Mar 8, 2024

Sorry, I'll need a bit more time to set up the env for NumPy 2, but this diff should be right, I think. The annoying part is that we need to use import_array(), but it seems there is a place for this already here.

diff --git a/pandas/_libs/src/datetime/pd_datetime.c b/pandas/_libs/src/datetime/pd_datetime.c
index 19de51be6e..f778d06607 100644
--- a/pandas/_libs/src/datetime/pd_datetime.c
+++ b/pandas/_libs/src/datetime/pd_datetime.c
@@ -20,6 +20,8 @@ This file is derived from NumPy 1.7. See NUMPY_LICENSE.txt
 #include <Python.h>
 
 #include "datetime.h"
+/* Need to import_array for np_datetime.c (for NumPy 1.x support) */
+#include "numpy/ndarrayobject.h"
 #include "pandas/datetime/pd_datetime.h"
 #include "pandas/portable.h"
 
@@ -255,5 +257,6 @@ static struct PyModuleDef pandas_datetimemodule = {
 
 PyMODINIT_FUNC PyInit_pandas_datetime(void) {
   PyDateTime_IMPORT;
+  import_array();
   return PyModuleDef_Init(&pandas_datetimemodule);
 }
diff --git a/pandas/_libs/src/vendored/numpy/datetime/np_datetime.c b/pandas/_libs/src/vendored/numpy/datetime/np_datetime.c
index 277d01807f..d2853b4936 100644
--- a/pandas/_libs/src/vendored/numpy/datetime/np_datetime.c
+++ b/pandas/_libs/src/vendored/numpy/datetime/np_datetime.c
@@ -16,7 +16,7 @@ This file is derived from NumPy 1.7. See NUMPY_LICENSE.txt
 
 // Licence at LICENSES/NUMPY_LICENSE
 
-#define NO_IMPORT
+#define NO_IMPORT_ARRAY
 
 #ifndef NPY_NO_DEPRECATED_API
 #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
@@ -25,7 +25,7 @@ This file is derived from NumPy 1.7. See NUMPY_LICENSE.txt
 #include <Python.h>
 
 #include "pandas/vendored/numpy/datetime/np_datetime.h"
-#include <numpy/ndarraytypes.h>
+#include <numpy/ndarrayobject.h>
 #include <numpy/npy_common.h>
 
 #if defined(_WIN32)

@lithomas1
Copy link
Member Author

Sorry, I'll need a bit more time to set up the env for NumPy 2, but this diff should be right, I think. The annoying part is that we need to use import_array(), but it seems there is a place for this already here.

Thanks, I'll test this out.

Thinking out loud, maybe a simpler way could just be to use the newly exposed datetime.c functions from here https://github.com/numpy/numpy/pull/21199/files.

Would there be any ABI problems with using numpy C API functions from a new version?

@seberg
Copy link
Contributor

seberg commented Mar 8, 2024

I opened gh-57780 to iterate myself quickly, probably done, but lets see what crops up next...

Since they are proper API Functions, if you use them you cannot support NumPy 1.x unfortunately.

@lithomas1 lithomas1 removed this from the 2.2.2 milestone Mar 8, 2024
@lithomas1
Copy link
Member Author

closing this then.

Thanks for helping debug this!

@lithomas1 lithomas1 closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants