Skip to content

BUG: DatetimeIndex with time object as key #8907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 29, 2014

Conversation

behzadnouri
Copy link
Contributor

closes #8667
on master:

>>> from datetime import time
>>> from pandas.index import _SIZE_CUTOFF
>>> n = _SIZE_CUTOFF + 100
>>> idx = pd.date_range('2014-11-26', periods=n, freq='S')
>>> ts = pd.Series(np.random.randn(n), index=idx)
>>> key = time(15, 0)
>>> ts[key]
TypeError: 'datetime.time' object is not iterable
>>> ts.index.get_loc(key)
TypeError: unorderable types: int() > datetime.time()

above would work on master branch if n was smaller than _SIZE_CUTOFF.
_SIZE_CUTOFF is set here

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Datetime Datetime data dtype labels Nov 27, 2014
@jreback jreback added this to the 0.15.2 milestone Nov 27, 2014
@jreback
Copy link
Contributor

jreback commented Nov 27, 2014

@behzadnouri looks good. can you run a perf check just to be sure nothing significant has changed. ping when ready (post if their is a problem, otherwise say a-ok).

@behzadnouri
Copy link
Contributor Author

I ran the benchmarks and I got below; I cannot run the vb_suite on my home machine (which is python 3), but I tried a number of the benchmarks in the bottom with %timeit and I do not see any difference in performance.

For the last benchmark which is worst, it does not even hit this line or this line. Since the benchmarks were ran on a shared server, it may have been on a different load and that has impacted result.

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
frame_reindex_axis1                          | 157.1807 | 204.6683 |   0.7680 |
melt_dataframe                               |   4.6460 |   6.0183 |   0.7720 |
timeseries_day_apply                         |   0.0126 |   0.0153 |   0.8238 |
frame_reindex_axis0                          | 132.8634 | 126.6564 |   1.0490 |
....
frame_xs_mi_ix                               |   3.3040 |   3.1153 |   1.0606 |
timeseries_day_incr                          |   0.0149 |   0.0140 |   1.0682 |
frame_reindex_both_axes_ix                   |  54.4243 |  50.6121 |   1.0753 |
frame_reindex_both_axes                      |  54.3477 |  50.4670 |   1.0769 |
frame_dropna_axis0_any                       |  88.5150 |  81.9420 |   1.0802 |
datetimeindex_normalize                      |   4.4916 |   4.1403 |   1.0849 |
frame_dropna_axis0_all                       | 124.3957 | 113.2793 |   1.0981 |
index_int64_intersection                     |  30.5480 |  27.0540 |   1.1292 |
frame_mask_bools                             |  23.4393 |  20.1070 |   1.1657 |
dtype_infer_timedelta64_2                    |  25.8369 |  20.1476 |   1.2824 |
series_xs_mi_ix                              |   7.5959 |   3.0520 |   2.4888 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [0ef5c07] : BUG: DatetimeIndex with time as key
Base   [e5fe75e] : Merge pull request #8752 from selasley/trailing_spaces_fix

Update tokenizer to fix #8679 #8661

jreback added a commit that referenced this pull request Nov 29, 2014
BUG: DatetimeIndex with time object as key
@jreback jreback merged commit e2f8f0a into pandas-dev:master Nov 29, 2014
@jreback
Copy link
Contributor

jreback commented Nov 29, 2014

thanks!

I think the timing on the last one was messed up somehow. Looked ok to me when I did it.

@jorisvandenbossche
Copy link
Member

@behzadnouri we get an error in this test:

INSTALLED VERSIONS
------------------
commit: 844f7ae082be2b853f6c195b863c062962a558ba
python: 2.7.8.final.0
python-bits: 32
OS: Linux
OS-release: 3.13.0-39-generic
machine: i686
processor: i686
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.1-102-ga3e478d
nose: 1.3.4
Cython: 0.20.2
numpy: 1.9.1
scipy: 0.14.0
======================================================================
ERROR: test_time_loc (pandas.tests.test_index.TestDatetimeIndex)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/joris/scipy/pandas/pandas/tests/test_index.py", line 1900, in test_time_loc
    ts = pd.Series(np.random.randn(n), index=idx)
  File "/home/joris/scipy/pandas/pandas/core/series.py", line 212, in __init__
    data = SingleBlockManager(data, index, fastpath=True)
  File "/home/joris/scipy/pandas/pandas/core/internals.py", line 3371, in __init__
    ndim=1, fastpath=True)
  File "/home/joris/scipy/pandas/pandas/core/internals.py", line 2099, in make_block
    placement=placement)
  File "/home/joris/scipy/pandas/pandas/core/internals.py", line 76, in __init__
    len(self.values), len(self.mgr_locs)))
ValueError: Wrong number of items passed 999900, placement implies 2

@jorisvandenbossche
Copy link
Member

A bit strange, we are here with two, and both getting an error, but a different one, for this test, with master.

@jreback
Copy link
Contributor

jreback commented Nov 30, 2014

did u rebuild the cython?
eg make

@jorisvandenbossche
Copy link
Member

yes, we were also thinking that, and for @papaloizouc that solved it, but I keep getting the error, strange ..

@jreback
Copy link
Contributor

jreback commented Nov 30, 2014

hmm u r in 32-but linux ?
could be related to that

@jorisvandenbossche
Copy link
Member

yep

@jorisvandenbossche
Copy link
Member

I can't further dig in at the moment, but some quick runs in the console:

In [4]:  from pandas.index import _SIZE_CUTOFF

In [5]:  ns = _SIZE_CUTOFF + np.array([-100, 100])

In [6]: ns
Out[6]: array([ 999900, 1000100])

In [7]: n = ns[0]

In [8]:  idx = pd.date_range('2014-11-26', periods=n, freq='S')

In [9]: idx
Out[9]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-11-26 00:00:00, 2014-11-26 00:00:01]
Length: 2, Freq: S, Timezone: None

In [10]: n
Out[10]: 999900

So it gives here an index of length 2, while we say it should have 999900 periods.

@jreback
Copy link
Contributor

jreback commented Nov 30, 2014

fixed here: e759d99

which shows a minor compat issue
#8943

@jorisvandenbossche
Copy link
Member

yep, I can confirm that this fixes it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Slicing timeseries with over 1000000 entries with time fails
3 participants