Skip to content

Refactor DatetimeArray._generate_range #24562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Jan 2, 2019 · 3 comments · Fixed by #45293
Closed

Refactor DatetimeArray._generate_range #24562

TomAugspurger opened this issue Jan 2, 2019 · 3 comments · Fixed by #45293
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance Refactor Internal refactoring of code
Milestone

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 2, 2019

Currently, DatetimeArray._generate_range calls DatetimeArray._simple_new 4 times, 3 times with i8 values and ones with M8[ns] values.

diff --git a/pandas/core/arrays/datetimes.py b/pandas/core/arrays/datetimes.py
index 8b0565a36..a7f8e303a 100644
--- a/pandas/core/arrays/datetimes.py
+++ b/pandas/core/arrays/datetimes.py
@@ -213,12 +213,16 @@ class DatetimeArrayMixin(dtl.DatetimeLikeArrayMixin,
     _dtype = None  # type: Union[np.dtype, DatetimeTZDtype]
     _freq = None
 
+    i = 0
+
     @classmethod
     def _simple_new(cls, values, freq=None, tz=None):
         """
         we require the we have a dtype compat for the values
         if we are passed a non-dtype compat, then coerce using the constructor
         """
+        cls.i += 1
+        print(f"DTA._simple_new: {cls.i}")
         assert isinstance(values, np.ndarray), type(values)
         if values.dtype == 'i8':
             # for compat with datetime/timedelta/period shared methods,
diff --git a/pandas/core/indexes/datetimes.py b/pandas/core/indexes/datetimes.py
index 690a3db28..ab08bbf6f 100644
--- a/pandas/core/indexes/datetimes.py
+++ b/pandas/core/indexes/datetimes.py
@@ -273,6 +273,7 @@ class DatetimeIndex(DatetimeIndexOpsMixin, Int64Index, DatetimeDelegateMixin):
                 dayfirst=False, yearfirst=False, dtype=None,
                 copy=False, name=None, verify_integrity=None):
 
+        print("hi")
         if verify_integrity is not None:
             warnings.warn("The 'verify_integrity' argument is deprecated, "
                           "will be removed in a future version.",
In [2]: idx = pd.date_range('2014-01-02', '2014-04-30', freq='M', tz='UTC')
DTA._simple_new: 1
DTA._simple_new: 2
DTA._simple_new: 3
DTA._simple_new: 4

I'm not familiar with this code, but I would naively hope for a function that

  1. Extracts there correct freq / tz from all the arguments (start, end, etc.)
  2. Generates the correct i8 values for start, end, tz
  3. Wraps those i8 values in a DatetimeArray._simple_new at the end.

I'm investigating if this can be done.

I'm not sure if this applies to timedelta as well.

@TomAugspurger TomAugspurger added Datetime Datetime data dtype Performance Memory or execution speed performance Timedelta Timedelta data type Refactor Internal refactoring of code labels Jan 2, 2019
@jbrockmendel
Copy link
Member

Is this actually a perf issue?

@TomAugspurger
Copy link
Contributor Author

I'd say so, especially if we implement frequency validation.

@jbrockmendel
Copy link
Member

I'd say so, especially if we implement frequency validation.

The OP refers to _simple_new. I know in the new status quo simple_new just calls __init__, but if it behaved like the old simple_new it wouldn't need to do validation.

None of which affects the main point that range generation can be improved by a refactor.

@mroeschke mroeschke removed the Timedelta Timedelta data type label Jun 25, 2021
@jreback jreback added this to the 1.5 milestone Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants