-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
QST:Time index problem in version 1.1.0 #35819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Your index is non-monotonic, see these two rows from your output above: '2020-08-01 07:58:50', '2020-08-01 07:57:50',
'2020-07-29 02:50:06', '2020-08-01 07:58:20', Since the index is non-monotonic, slicing is going to be ill-defined. I'd suggest you work that out first. If I had to guess, I imagine all your data is from 2020-03-01 to present, and so when you slice from prior to that, everything is greater than your slice bound and so the whole frame gets returned regardless. |
Yes, you are right,the index is non-monotonic, all data is from 2020-03-01 to present.But silce working all right with pandas 1.0.5. Pandas 1.1.0 can't work I'v looked for release note not found something about this.Is't a change in pandas 1.1.0? Thanks for your reply |
While it may not have crashed I don't think the old version would have worked for a reasonable definition of "work", since slicing is ill-defined for non-monotonic indices. I.e. it's unclear what you even want from df['20200701':] with a non-monotonic index-- do you want all elements after the first element matching '20200701', or do you want just elements with an index value greater than '20200701'? You will see below that using 1.0.5 you can get either depending on whether or not your query element is in the index or not: In [1]: ser = pd.Series([1, 2, 3], pd.DatetimeIndex(['2020-07-03 00:01', '2020-07-01 00:01', '2020-07-02 00:01']))
In [2]: ser['2020-07-01 00:01':]
Out [2]:
2020-07-01 00:01:00 2
2020-07-02 00:01:00 3
dtype: int64
In[3]: ser['2020-07-01':]
Out[3]:
2020-07-03 00:01:00 1
2020-07-01 00:01:00 2
2020-07-02 00:01:00 3
dtype: int64 Absent a reproducible example it's hard to say precisely why your code is exceptioning now (perhaps the index also has duplicate entries?), but I think regardless your code was always logically broken under 1.0.5. |
Yes, it's index has duplicate entries,because the data set almost 400W records.I'v checked my code again by convert index to monotonic,it's no logical problem |
maybe my exception related to https://pandas.pydata.org/docs/whatsnew/v1.1.0.html#non-monotonic-periodindex-partial-string-slicing |
I'v looked deep into my data, and have found how to reproduce the error In [1]: data = data = {'number': [1, 2, 3, 4, 5 ,6 ,7, 8 ,9 ,0],
...: 'name': ['foo1', 'foo2', 'foo3', 'foo4', 'foo5', 'foo6', 'foo7', 'foo8', 'foo9', 'foo0']}
In [2]: ser = pd.DataFrame(data, index=pd.DatetimeIndex(['2020-07-22 11:47:42','2020-07-23 07:58:50','2020-0
...: 7-26 05:39:43','2020-07-27 05:41:12','2020-07-28 08:52:34','2020-07-29 11:01:01','2020-07-01 00:20:0
...: 8','2020-06-30 10:05:04','2020-07-02 09:50:04','2020-07-03 09:50:05']), columns=['number','name'])
...:
In [3]: ser['20200701':]
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-78-1100bf260282> in <module>
----> 1 ser['20200701':]
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
2880 # either we have a slice or we have a string that can be converted
2881 # to a slice for partial-string date indexing
-> 2882 return self._slice(indexer, axis=0)
2883
2884 # Do we have a (boolean) DataFrame?
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/generic.py in _slice(self, slobj, axis)
3546 Slicing with this method is *always* positional.
3547 """
-> 3548 assert isinstance(slobj, slice), type(slobj)
3549 axis = self._get_block_manager_axis(axis)
3550 result = self._constructor(self._mgr.get_slice(slobj, axis=axis))
AssertionError: <class 'numpy.ndarray'> In pandas 1.0.5 the same data set In [6]: ser['20200701':]
Out[6]:
number name
2020-07-22 11:47:42 1 foo1
2020-07-23 07:58:50 2 foo2
2020-07-26 05:39:43 3 foo3
2020-07-27 05:41:12 4 foo4
2020-07-28 08:52:34 5 foo5
2020-07-29 11:01:01 6 foo6
2020-07-01 00:20:08 7 foo7
2020-07-02 09:50:04 9 foo9
2020-07-03 09:50:05 0 foo0 |
Thanks @maozi07, fixed your example and reproduced. In [33]: ser = pd.DataFrame(data, index=pd.DatetimeIndex(['2020-07-22 11:47:42','2020-07-23 07:58:50','2020-07-26 05:39:43','2020-07-27 05:41:12','2020-07-28 08:52:34','2020-07-29: 11:01:01','2020-07-01 00:20:08','2020-06-30 10:05:04','2020-07-02 09:50:04','2020-07-03 09:50:05']), columns=['number','name']) |
https://github.com/simonjayhawkins/pandas/runs/1078438609?check_suite_focus=true 4ac1e5f is the first bad commit
|
closing as duplicate of #35509 |
I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.
Question about pandas
Note: If you'd still like to submit a question, please read this guide detailing how to provide the necessary information for us to reproduce your question.
I'm using pandas 1.1.0 on macos, python:3.7
As below it's weird, when I select time larger than 2020/03/01 it's rise error, but if it's 2019, all things go right
I'v tried make some random data, but can't reproduce
I don't konw it's something wrong with my data or a pandas bug with version 1.1.0(it's ok in version 1.0.1)
Very appreciate for someone can give help
The text was updated successfully, but these errors were encountered: