QST:Time index problem in version 1.1.0 #35819

maozi07 · 2020-08-20T09:46:00Z

I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.

Question about pandas

Note: If you'd still like to submit a question, please read this guide detailing how to provide the necessary information for us to reproduce your question.
I'm using pandas 1.1.0 on macos, python:3.7
As below it's weird, when I select time larger than 2020/03/01 it's rise error, but if it's 2019, all things go right
I'v tried make some random data, but can't reproduce
I don't konw it's something wrong with my data or a pandas bug with version 1.1.0(it's ok in version 1.0.1)
Very appreciate for someone can give help

pd.__version__
'1.1.0'
df.index
DatetimeIndex(['2020-07-02 08:55:59', '2020-07-03 07:59:32',
               '2020-07-06 06:57:38', '2020-07-11 09:25:35',
               '2020-07-09 10:02:25', '2020-07-13 07:10:12',
               '2020-07-16 07:58:52', '2020-07-17 10:46:43',
               '2020-07-18 08:00:36', '2020-07-22 11:47:43',
               ...
               '2020-08-01 07:58:25', '2020-08-01 07:59:05',
               '2020-08-01 07:58:50', '2020-08-01 07:57:50',
               '2020-07-29 02:50:06', '2020-08-01 07:58:20',
               '2020-08-01 07:58:30', '2020-08-01 07:58:21',
               '2020-08-01 07:59:08', '2020-08-01 07:58:53'],
              dtype='datetime64[ns]', name='sub_time', length=4715513, freq=None)

df['20200701':]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-54-0be901d6256c> in <module>
----> 1 df['20200701':]

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2880             # either we have a slice or we have a string that can be converted
   2881             #  to a slice for partial-string date indexing
-> 2882             return self._slice(indexer, axis=0)
   2883 
   2884         # Do we have a (boolean) DataFrame?

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in _slice(self, slobj, axis)
   3546         Slicing with this method is *always* positional.
   3547         """
-> 3548         assert isinstance(slobj, slice), type(slobj)
   3549         axis = self._get_block_manager_axis(axis)
   3550         result = self._constructor(self._mgr.get_slice(slobj, axis=axis))

AssertionError: <class 'numpy.ndarray'>

df['20190701':]

                                     A         B  ...    C    D
a_time                                                   ...                    
2020-07-02 08:55:59              aa  cc  ...  22  1.2

Liam3851 · 2020-08-27T00:56:54Z

Your index is non-monotonic, see these two rows from your output above:

               '2020-08-01 07:58:50', '2020-08-01 07:57:50',
               '2020-07-29 02:50:06', '2020-08-01 07:58:20',

Since the index is non-monotonic, slicing is going to be ill-defined. I'd suggest you work that out first. If I had to guess, I imagine all your data is from 2020-03-01 to present, and so when you slice from prior to that, everything is greater than your slice bound and so the whole frame gets returned regardless.

maozi07 · 2020-08-27T01:12:58Z

Yes, you are right,the index is non-monotonic, all data is from 2020-03-01 to present.But silce working all right with pandas 1.0.5. Pandas 1.1.0 can't work I'v looked for release note not found something about this.Is't a change in pandas 1.1.0? Thanks for your reply

Liam3851 · 2020-08-27T13:54:30Z

While it may not have crashed I don't think the old version would have worked for a reasonable definition of "work", since slicing is ill-defined for non-monotonic indices. I.e. it's unclear what you even want from df['20200701':] with a non-monotonic index-- do you want all elements after the first element matching '20200701', or do you want just elements with an index value greater than '20200701'? You will see below that using 1.0.5 you can get either depending on whether or not your query element is in the index or not:

In [1]: ser = pd.Series([1, 2, 3], pd.DatetimeIndex(['2020-07-03 00:01', '2020-07-01 00:01', '2020-07-02 00:01']))

In [2]: ser['2020-07-01 00:01':]
Out [2]:
2020-07-01 00:01:00   2
2020-07-02 00:01:00   3
dtype: int64

In[3]: ser['2020-07-01':]
Out[3]:
2020-07-03 00:01:00   1
2020-07-01 00:01:00   2
2020-07-02 00:01:00   3
dtype: int64

Absent a reproducible example it's hard to say precisely why your code is exceptioning now (perhaps the index also has duplicate entries?), but I think regardless your code was always logically broken under 1.0.5.

maozi07 · 2020-08-28T02:59:20Z

Yes, it's index has duplicate entries,because the data set almost 400W records.I'v checked my code again by convert index to monotonic,it's no logical problem

maozi07 · 2020-08-28T03:06:27Z

maybe my exception related to https://pandas.pydata.org/docs/whatsnew/v1.1.0.html#non-monotonic-periodindex-partial-string-slicing

maozi07 · 2020-08-28T06:24:01Z

I'v looked deep into my data, and have found how to reproduce the error
In pandas 1.1.0

In [1]: data = data = {'number': [1, 2, 3, 4, 5 ,6 ,7, 8 ,9 ,0],
   ...:         'name': ['foo1', 'foo2', 'foo3', 'foo4', 'foo5', 'foo6', 'foo7', 'foo8', 'foo9', 'foo0']}

In [2]: ser = pd.DataFrame(data, index=pd.DatetimeIndex(['2020-07-22 11:47:42','2020-07-23 07:58:50','2020-0
   ...: 7-26 05:39:43','2020-07-27 05:41:12','2020-07-28 08:52:34','2020-07-29 11:01:01','2020-07-01 00:20:0
   ...: 8','2020-06-30 10:05:04','2020-07-02 09:50:04','2020-07-03 09:50:05']), columns=['number','name'])
   ...:
In [3]: ser['20200701':]                                                                           
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-78-1100bf260282> in <module>
----> 1 ser['20200701':]

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2880             # either we have a slice or we have a string that can be converted
   2881             #  to a slice for partial-string date indexing
-> 2882             return self._slice(indexer, axis=0)
   2883 
   2884         # Do we have a (boolean) DataFrame?

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/generic.py in _slice(self, slobj, axis)
   3546         Slicing with this method is *always* positional.
   3547         """
-> 3548         assert isinstance(slobj, slice), type(slobj)
   3549         axis = self._get_block_manager_axis(axis)
   3550         result = self._constructor(self._mgr.get_slice(slobj, axis=axis))

AssertionError: <class 'numpy.ndarray'>

In pandas 1.0.5 the same data set

In [6]: ser['20200701':]
Out[6]:
                     number  name
2020-07-22 11:47:42       1  foo1
2020-07-23 07:58:50       2  foo2
2020-07-26 05:39:43       3  foo3
2020-07-27 05:41:12       4  foo4
2020-07-28 08:52:34       5  foo5
2020-07-29 11:01:01       6  foo6
2020-07-01 00:20:08       7  foo7
2020-07-02 09:50:04       9  foo9
2020-07-03 09:50:05       0  foo0

TomAugspurger · 2020-09-04T18:47:16Z

Thanks @maozi07, fixed your example and reproduced.

In [33]: ser = pd.DataFrame(data, index=pd.DatetimeIndex(['2020-07-22 11:47:42','2020-07-23 07:58:50','2020-07-26 05:39:43','2020-07-27 05:41:12','2020-07-28 08:52:34','2020-07-29: 11:01:01','2020-07-01 00:20:08','2020-06-30 10:05:04','2020-07-02 09:50:04','2020-07-03 09:50:05']), columns=['number','name'])

simonjayhawkins · 2020-09-06T18:28:51Z

#31938 cc @jbrockmendel

https://github.com/simonjayhawkins/pandas/runs/1078438609?check_suite_focus=true

4ac1e5f is the first bad commit
commit 4ac1e5f
Author: jbrockmendel [email protected]
Date: Thu Feb 13 04:42:54 2020 -0800

CLN: assorted cleanups (#31938)

simonjayhawkins · 2020-09-07T14:49:18Z

closing as duplicate of #35509

maozi07 added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Aug 20, 2020

TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Sep 4, 2020

simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Sep 6, 2020

simonjayhawkins modified the milestones: 1.1.2, 1.1.3 Sep 6, 2020

simonjayhawkins closed this as completed Sep 7, 2020

simonjayhawkins removed this from the 1.1.3 milestone Sep 7, 2020

simonjayhawkins added the Duplicate Report Duplicate issue or pull request label Sep 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QST:Time index problem in version 1.1.0 #35819

QST:Time index problem in version 1.1.0 #35819

maozi07 commented Aug 20, 2020

Liam3851 commented Aug 27, 2020

maozi07 commented Aug 27, 2020

Liam3851 commented Aug 27, 2020

maozi07 commented Aug 28, 2020

maozi07 commented Aug 28, 2020

maozi07 commented Aug 28, 2020

TomAugspurger commented Sep 4, 2020

simonjayhawkins commented Sep 6, 2020

simonjayhawkins commented Sep 7, 2020

QST:Time index problem in version 1.1.0 #35819

QST:Time index problem in version 1.1.0 #35819

Comments

maozi07 commented Aug 20, 2020

Question about pandas

Liam3851 commented Aug 27, 2020

maozi07 commented Aug 27, 2020

Liam3851 commented Aug 27, 2020

maozi07 commented Aug 28, 2020

maozi07 commented Aug 28, 2020

maozi07 commented Aug 28, 2020

TomAugspurger commented Sep 4, 2020

simonjayhawkins commented Sep 6, 2020

simonjayhawkins commented Sep 7, 2020