Skip to content

API: Index([NaT, None]) match Series([NaT, None]) #49566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 8, 2022

Conversation

jbrockmendel
Copy link
Member

@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Index Related to the Index class or subclasses Constructors Series/DataFrame/Index/pd.array Constructors labels Nov 8, 2022
@mroeschke mroeschke added this to the 2.0 milestone Nov 8, 2022
@mroeschke mroeschke merged commit 2713873 into pandas-dev:main Nov 8, 2022
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the api-maybe_convert_objects branch November 8, 2022 19:03
phofl pushed a commit to phofl/pandas that referenced this pull request Nov 9, 2022
* API: Index([NaT, None]) match Series([NaT, None])

* mypy fixup
@phofl
Copy link
Member

phofl commented Dec 16, 2022

It looks like that this caused a slowdown in explode:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.006    0.006 {built-in method builtins.exec}
        1    0.000    0.000    0.006    0.006 <string>:1(<module>)
        1    0.000    0.000    0.006    0.006 series.py:4122(explode)
        1    0.000    0.000    0.005    0.005 series.py:342(__init__)
        1    0.000    0.000    0.005    0.005 construction.py:497(sanitize_array)
        1    0.000    0.000    0.005    0.005 cast.py:1104(maybe_infer_to_datetimelike)
        1    0.005    0.005    0.005    0.005 {pandas._libs.lib.maybe_convert_objects}
        1    0.001    0.001    0.001    0.001 {pandas._libs.reshape.explode}
        1    0.000    0.000    0.000    0.000 base.py:1137(repeat)
        1    0.000    0.000    0.000    0.000 {method 'repeat' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 _methods.py:46(_sum)
        1    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}

https://asv-runner.github.io/asv-collection/pandas/#reshape.Explode.time_explode?p-n_rows=10000&p-max_list_length=10

@jbrockmendel
Copy link
Member Author

any idea what that profile output looks like before this?

@phofl
Copy link
Member

phofl commented Dec 16, 2022

No, but can have a look tomorrow

@phofl
Copy link
Member

phofl commented Dec 17, 2022

This is how it looks before:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.001    0.001 series.py:4169(explode)
        1    0.001    0.001    0.001    0.001 {pandas._libs.reshape.explode}
        1    0.000    0.000    0.000    0.000 series.py:344(__init__)
        1    0.000    0.000    0.000    0.000 base.py:1176(repeat)
        1    0.000    0.000    0.000    0.000 construction.py:497(sanitize_array)
        1    0.000    0.000    0.000    0.000 managers.py:1909(from_array)
        1    0.000    0.000    0.000    0.000 {method 'repeat' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 _methods.py:46(_sum)
        1    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.000    0.000    0.000    0.000 function.py:59(__call__)
        2    0.000    0.000    0.000    0.000 config.py:262(__call__)
        1    0.000    0.000    0.000    0.000 common.py:157(is_object_dtype)
        2    0.000    0.000    0.000    0.000 config.py:134(_get_option)
    24/17    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 _validators.py:168(validate_args_and_kwargs)
        1    0.000    0.000    0.000    0.000 common.py:1486(_is_dtype_type)
       34    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}

and this is after the commit from this pr:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.006    0.006 {built-in method builtins.exec}
        1    0.000    0.000    0.006    0.006 <string>:1(<module>)
        1    0.000    0.000    0.006    0.006 series.py:4169(explode)
        1    0.000    0.000    0.005    0.005 series.py:344(__init__)
        1    0.000    0.000    0.005    0.005 construction.py:497(sanitize_array)
        1    0.000    0.000    0.005    0.005 construction.py:755(_try_cast)
        1    0.000    0.000    0.005    0.005 cast.py:1174(maybe_infer_to_datetimelike)
        1    0.005    0.005    0.005    0.005 {pandas._libs.lib.maybe_convert_objects}
        1    0.001    0.001    0.001    0.001 {pandas._libs.reshape.explode}
        1    0.000    0.000    0.000    0.000 base.py:1176(repeat)
        1    0.000    0.000    0.000    0.000 _methods.py:46(_sum)
        1    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.000    0.000    0.000    0.000 managers.py:1909(from_array)
        1    0.000    0.000    0.000    0.000 function.py:59(__call__)
        1    0.000    0.000    0.000    0.000 _validators.py:168(validate_args_and_kwargs)
        1    0.000    0.000    0.000    0.000 {method 'repeat' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 common.py:157(is_object_dtype)
        1    0.000    0.000    0.000    0.000 numeric.py:289(full)
        1    0.000    0.000    0.000    0.000 base.py:4885(_values)
        2    0.000    0.000    0.000    0.000 config.py:262(__call__)
        1    0.000    0.000    0.000    0.000 common.py:1486(_is_dtype_type)
       34    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        2    0.000    0.000    0.000    0.000 config.py:134(_get_option)
        1    0.000    0.000    0.000    0.000 range.py:192(_data)
    24/17    0.000    0.000    0.000    0.000 {built-in method builtins.len}

@jbrockmendel
Copy link
Member Author

so looks like it is all in maybe_convert_objects. could add a convert_numeric=False flag to allow some short-circuiting in there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Series([pd.NaT, None]) vs Index([pd.NaT, None])
3 participants