Skip to content

REF: Replace np.find_common_dtype with np.result_type #49569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

mroeschke
Copy link
Member

In preparation of numpy/numpy#22539

@mroeschke mroeschke added the Refactor Internal refactoring of code label Nov 7, 2022
@seberg
Copy link
Contributor

seberg commented Nov 8, 2022

Hmmmm, the fact that find_common_type bails on datetimes (if there is more than one) and returns object dtype when it doesn't know what to do seems to confuse things? Sounds like I should document the fact that find_common_type falls back to object and that things may be a bit strange if datetime/timedelta are involved.

@mroeschke
Copy link
Member Author

A philosophical question: does np.result_type provide the type after applying an operation (like arithmetic) to multiple different types? I think in pandas, np.find_common_dtype is used to answer "what dtype can store multiple different types"

@mroeschke
Copy link
Member Author

Sounds like I should document the fact that find_common_type falls back to object and that things may be a bit strange if datetime/timedelta are involved.

Yeah I think pandas appreciates that datetime/timedelta with non-datetime/timedelta goes to object in find_common_dtype. I notice result_type will raise a TypeError, but I guess pandas can catch and know it should be object

But I think np.result_type(datetime, timedelta) returning dtype('<M8[ns]') is odd for pandas given the "storage" comment I made above.

In [13]: np.__version__
Out[13]: '1.23.1'

----

In [14]: np.find_common_type((np.dtype("datetime64[ns]"), np.dtype("timedelta64[ns]")), [])
Out[14]: dtype('O')  # from a "storage" perspective, this makes sense

In [15]: np.result_type(*(np.dtype("datetime64[ns]"), np.dtype("timedelta64[ns]")))
Out[15]: dtype('<M8[ns]')  # from datetime +/- timedelta this makes sense

----

In [17]: np.find_common_type((np.dtype("datetime64[ns]"), np.int64), [])
Out[17]: dtype('O')


# I think this TypeError below can be okay if it can always signal to pandas that "object" is needed to store these types
In [18]: np.result_type(*(np.dtype("datetime64[ns]"), np.int64))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 np.result_type(*(np.dtype("datetime64[ns]"), np.int64))

TypeError: The DTypes <class 'numpy.dtype[datetime64]'> and <class 'numpy.dtype[int64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.

----

@seberg
Copy link
Contributor

seberg commented Nov 9, 2022

is odd for pandas given the "storage" comment I made above

It is odd/wrong in NumPy as well. Another problem is timedelta+integers. I wanted to look into that a bit and think about deprecating it (this doesn't help you immediately maybe).

I suspect the reason is to make certain math promotions work easier (i.e. find timedelta + int as a loop that uses timedelta+timedelta), and there should be better solutions for that.

EDIT: I can't really think of an obvious workaround right now. Unless there is only one input, I think a non-numeric dtype return was effectively always object, so:

try:
    common_dtype = np.result_type(*dtypes)
    if common_dtype.kind in "mMSU":
        # NumPy promotion currently misbehaves for for times and strings,
        # so fall back to object (find_common_dtype did unless there was only one dtype)
        common_dtype = np.dtype('O')
except TypeError:
    # Fall back to object on error (could be more specific error eventually)
    common_dtype = np.dtype('O')

might work, but feels also a bit clunky. (The above would allow certain structured dtypes to promote correctly for example). So there might be a point in not deprecating and first cleaning up the bad time promotion (and strings).

@mroeschke mroeschke marked this pull request as draft November 9, 2022 20:15
@mroeschke
Copy link
Member Author

Another problem is timedelta+integers.

From a pandas point of view, object would be the correct "result type" in terms of storing timedelta and ints. I think generally pandas would support np.result_type(datetime/timedelta, anything else) to be a TypeError or object

might work, but feels also a bit clunky.

Yeah we would probably migrate np.find_common_dtype to that code snippet if you went forward with the deprecation. It's good to see that result_type mostly matches `find_common_dtype though.

@seberg
Copy link
Contributor

seberg commented Nov 9, 2022

Yeah, looking into fixing the timedelta/datetime promotion rules (i.e. its nonsense that they promote with bool/integers or with each other).
But it is probably a bit bigger and comes with some other fallout (as typical). That might clear up enough, but I am not sure yet.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Dec 10, 2022
@mroeschke
Copy link
Member Author

Closing this for now since we can't fully use np.result_type everywhere yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants