Skip to content

QST:Time index problem in version 1.1.0 #35819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
maozi07 opened this issue Aug 20, 2020 · 9 comments
Closed
2 tasks done

QST:Time index problem in version 1.1.0 #35819

maozi07 opened this issue Aug 20, 2020 · 9 comments
Labels
Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version

Comments

@maozi07
Copy link

maozi07 commented Aug 20, 2020

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.


Question about pandas

Note: If you'd still like to submit a question, please read this guide detailing how to provide the necessary information for us to reproduce your question.
I'm using pandas 1.1.0 on macos, python:3.7
As below it's weird, when I select time larger than 2020/03/01 it's rise error, but if it's 2019, all things go right
I'v tried make some random data, but can't reproduce
I don't konw it's something wrong with my data or a pandas bug with version 1.1.0(it's ok in version 1.0.1)
Very appreciate for someone can give help

pd.__version__
'1.1.0'
df.index
DatetimeIndex(['2020-07-02 08:55:59', '2020-07-03 07:59:32',
               '2020-07-06 06:57:38', '2020-07-11 09:25:35',
               '2020-07-09 10:02:25', '2020-07-13 07:10:12',
               '2020-07-16 07:58:52', '2020-07-17 10:46:43',
               '2020-07-18 08:00:36', '2020-07-22 11:47:43',
               ...
               '2020-08-01 07:58:25', '2020-08-01 07:59:05',
               '2020-08-01 07:58:50', '2020-08-01 07:57:50',
               '2020-07-29 02:50:06', '2020-08-01 07:58:20',
               '2020-08-01 07:58:30', '2020-08-01 07:58:21',
               '2020-08-01 07:59:08', '2020-08-01 07:58:53'],
              dtype='datetime64[ns]', name='sub_time', length=4715513, freq=None)

df['20200701':]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-54-0be901d6256c> in <module>
----> 1 df['20200701':]

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2880             # either we have a slice or we have a string that can be converted
   2881             #  to a slice for partial-string date indexing
-> 2882             return self._slice(indexer, axis=0)
   2883 
   2884         # Do we have a (boolean) DataFrame?

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in _slice(self, slobj, axis)
   3546         Slicing with this method is *always* positional.
   3547         """
-> 3548         assert isinstance(slobj, slice), type(slobj)
   3549         axis = self._get_block_manager_axis(axis)
   3550         result = self._constructor(self._mgr.get_slice(slobj, axis=axis))

AssertionError: <class 'numpy.ndarray'>

df['20190701':]

                                     A         B  ...    C    D
a_time                                                   ...                    
2020-07-02 08:55:59              aa  cc  ...  22  1.2
@maozi07 maozi07 added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Aug 20, 2020
@Liam3851
Copy link
Contributor

Your index is non-monotonic, see these two rows from your output above:

               '2020-08-01 07:58:50', '2020-08-01 07:57:50',
               '2020-07-29 02:50:06', '2020-08-01 07:58:20',

Since the index is non-monotonic, slicing is going to be ill-defined. I'd suggest you work that out first. If I had to guess, I imagine all your data is from 2020-03-01 to present, and so when you slice from prior to that, everything is greater than your slice bound and so the whole frame gets returned regardless.

@maozi07
Copy link
Author

maozi07 commented Aug 27, 2020

Yes, you are right,the index is non-monotonic, all data is from 2020-03-01 to present.But silce working all right with pandas 1.0.5. Pandas 1.1.0 can't work I'v looked for release note not found something about this.Is't a change in pandas 1.1.0? Thanks for your reply

@Liam3851
Copy link
Contributor

While it may not have crashed I don't think the old version would have worked for a reasonable definition of "work", since slicing is ill-defined for non-monotonic indices. I.e. it's unclear what you even want from df['20200701':] with a non-monotonic index-- do you want all elements after the first element matching '20200701', or do you want just elements with an index value greater than '20200701'? You will see below that using 1.0.5 you can get either depending on whether or not your query element is in the index or not:

In [1]: ser = pd.Series([1, 2, 3], pd.DatetimeIndex(['2020-07-03 00:01', '2020-07-01 00:01', '2020-07-02 00:01']))

In [2]: ser['2020-07-01 00:01':]
Out [2]:
2020-07-01 00:01:00   2
2020-07-02 00:01:00   3
dtype: int64

In[3]: ser['2020-07-01':]
Out[3]:
2020-07-03 00:01:00   1
2020-07-01 00:01:00   2
2020-07-02 00:01:00   3
dtype: int64

Absent a reproducible example it's hard to say precisely why your code is exceptioning now (perhaps the index also has duplicate entries?), but I think regardless your code was always logically broken under 1.0.5.

@maozi07
Copy link
Author

maozi07 commented Aug 28, 2020

Yes, it's index has duplicate entries,because the data set almost 400W records.I'v checked my code again by convert index to monotonic,it's no logical problem

@maozi07
Copy link
Author

maozi07 commented Aug 28, 2020

@maozi07
Copy link
Author

maozi07 commented Aug 28, 2020

I'v looked deep into my data, and have found how to reproduce the error
In pandas 1.1.0

In [1]: data = data = {'number': [1, 2, 3, 4, 5 ,6 ,7, 8 ,9 ,0],
   ...:         'name': ['foo1', 'foo2', 'foo3', 'foo4', 'foo5', 'foo6', 'foo7', 'foo8', 'foo9', 'foo0']}

In [2]: ser = pd.DataFrame(data, index=pd.DatetimeIndex(['2020-07-22 11:47:42','2020-07-23 07:58:50','2020-0
   ...: 7-26 05:39:43','2020-07-27 05:41:12','2020-07-28 08:52:34','2020-07-29 11:01:01','2020-07-01 00:20:0
   ...: 8','2020-06-30 10:05:04','2020-07-02 09:50:04','2020-07-03 09:50:05']), columns=['number','name'])
   ...:
In [3]: ser['20200701':]                                                                           
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-78-1100bf260282> in <module>
----> 1 ser['20200701':]

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2880             # either we have a slice or we have a string that can be converted
   2881             #  to a slice for partial-string date indexing
-> 2882             return self._slice(indexer, axis=0)
   2883 
   2884         # Do we have a (boolean) DataFrame?

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/generic.py in _slice(self, slobj, axis)
   3546         Slicing with this method is *always* positional.
   3547         """
-> 3548         assert isinstance(slobj, slice), type(slobj)
   3549         axis = self._get_block_manager_axis(axis)
   3550         result = self._constructor(self._mgr.get_slice(slobj, axis=axis))

AssertionError: <class 'numpy.ndarray'>

In pandas 1.0.5 the same data set

In [6]: ser['20200701':]
Out[6]:
                     number  name
2020-07-22 11:47:42       1  foo1
2020-07-23 07:58:50       2  foo2
2020-07-26 05:39:43       3  foo3
2020-07-27 05:41:12       4  foo4
2020-07-28 08:52:34       5  foo5
2020-07-29 11:01:01       6  foo6
2020-07-01 00:20:08       7  foo7
2020-07-02 09:50:04       9  foo9
2020-07-03 09:50:05       0  foo0

@TomAugspurger
Copy link
Contributor

Thanks @maozi07, fixed your example and reproduced.

In [33]: ser = pd.DataFrame(data, index=pd.DatetimeIndex(['2020-07-22 11:47:42','2020-07-23 07:58:50','2020-07-26 05:39:43','2020-07-27 05:41:12','2020-07-28 08:52:34','2020-07-29: 11:01:01','2020-07-01 00:20:08','2020-06-30 10:05:04','2020-07-02 09:50:04','2020-07-03 09:50:05']), columns=['number','name'])

@TomAugspurger TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Sep 4, 2020
@simonjayhawkins
Copy link
Member

#31938 cc @jbrockmendel

https://github.com/simonjayhawkins/pandas/runs/1078438609?check_suite_focus=true

4ac1e5f is the first bad commit
commit 4ac1e5f
Author: jbrockmendel [email protected]
Date: Thu Feb 13 04:42:54 2020 -0800

CLN: assorted cleanups (#31938)

@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Sep 6, 2020
@simonjayhawkins simonjayhawkins modified the milestones: 1.1.2, 1.1.3 Sep 6, 2020
@simonjayhawkins
Copy link
Member

closing as duplicate of #35509

@simonjayhawkins simonjayhawkins removed this from the 1.1.3 milestone Sep 7, 2020
@simonjayhawkins simonjayhawkins added the Duplicate Report Duplicate issue or pull request label Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

4 participants