Skip to content

BUG: (GH3925) partial string selection with seconds resolution #3931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 19, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jun 17, 2013

this no longer has much to do with #3925, and is only fixing a bug

Minor revision to select on second frequency

In [11]: df = DataFrame(randn(5,5),columns=['open','high','low','close','volume'],index=date_range('2012-01-02 18:01:00',periods=5,tz='US/Central',freq='s'))

In [12]: df
Out[12]: 
                               open      high       low     close    volume
2012-01-02 18:01:00-06:00  0.131243  0.301542  0.128027  0.804162  1.296658
2012-01-02 18:01:01-06:00  0.341487  1.548695  0.703234  0.904201  1.422337
2012-01-02 18:01:02-06:00 -1.050453 -1.884035  1.537788 -0.821058  0.558631
2012-01-02 18:01:03-06:00  0.846885  1.045378 -0.722903 -0.613625 -0.476531
2012-01-02 18:01:04-06:00  1.186823 -0.018299 -0.513886 -1.103269 -0.311907

In [14]: df['2012-01-02 18:01:02']
Out[14]: 
                               open      high       low     close    volume
2012-01-02 18:01:02-06:00 -1.050453 -1.884035  1.537788 -0.821058  0.558631

@cpcloud
Copy link
Member

cpcloud commented Jun 17, 2013

this is a bit inconsistent with the to-be-deprecated arith ops broadcasting behavior is it not?

@cpcloud
Copy link
Member

cpcloud commented Jun 17, 2013

i find it the following a bit confusing, maybe it's just me

In [35]: df.loc[df.index[:1]]
Out[35]:
                            open   high    low  close  volume
2012-01-02 18:01:00-06:00  0.645 -1.347 -0.257 -0.816   0.155

In [36]: df[df.index[0]]
Out[36]:
                            open   high    low  close  volume
2012-01-02 18:01:00-06:00  0.645 -1.347 -0.257 -0.816   0.155

is there really that much added by saving 5 characters? i think it makes it confusing because now u have 2 remember that rows are sliced if a single date is passed, but if u passed 0 that will __getitem__(self, 0) column lookup for the key 0 + if a slice is passed slice the rows + the closed intervals gotcha + column indexing + loc, iloc, ix, iat, at + xs + how all of this stuff plays with MultiIndex. doesn't seem like adding another thing to remember about indexing is beneficial.

@jreback
Copy link
Contributor Author

jreback commented Jun 17, 2013

@cpcloud you are right, this is inconsisten.

Sicing with a string is fine because its treated as a slice (of course could just return 1 label)

and this is fine, but a single datetime is not ....(just like df[0] would be rejected in a integer index based frame)

In [28]: df[df.index[1]:df.index[2]]
Out[28]: 
                               open      high       low     close    volume
2012-01-02 18:01:01-06:00 -0.332505  0.705486 -0.447803 -0.134373  1.706555
2012-01-02 18:01:02-06:00 -0.590720  0.519744 -0.881321 -0.886215 -0.292298

Integer indexed frame

In [19]: df = DataFrame(randn(5,5),columns=['open','high','low','close','volume'])

In [20]: df
Out[20]: 
       open      high       low     close    volume
0  1.144082  0.842330  1.774455 -2.126466  0.139598
1  0.635929  2.139363 -0.594184  0.516679  0.348589
2  1.101361 -0.520283  2.191319 -0.072571 -0.749482
3  0.535801  0.768180  1.360995 -0.512688  2.990026
4  1.297711 -0.549184  1.457773 -1.740196  0.442782

In [21]: df[0:3]
Out[21]: 
       open      high       low     close    volume
0  1.144082  0.842330  1.774455 -2.126466  0.139598
1  0.635929  2.139363 -0.594184  0.516679  0.348589
2  1.101361 -0.520283  2.191319 -0.072571 -0.749482

In [22]: df.loc[0:3]
Out[22]: 
       open      high       low     close    volume
0  1.144082  0.842330  1.774455 -2.126466  0.139598
1  0.635929  2.139363 -0.594184  0.516679  0.348589
2  1.101361 -0.520283  2.191319 -0.072571 -0.749482
3  0.535801  0.768180  1.360995 -0.512688  2.990026

In [23]: df[0]
KeyError: u'no item named 0'

@cpcloud
Copy link
Member

cpcloud commented Jun 17, 2013

is this going to be merged? i would assume no as per the other thread....close?

@jreback
Copy link
Contributor Author

jreback commented Jun 17, 2013

do you have a problem with the new PR? (which reverses out the last change) and only fixes a bug

@cpcloud
Copy link
Member

cpcloud commented Jun 17, 2013

oh sorry yeah it's fine! :)

@jreback
Copy link
Contributor Author

jreback commented Jun 17, 2013

@wesm was there a reason that second resolution via string not originaly implemented?

@cpcloud
Copy link
Member

cpcloud commented Jun 17, 2013

what is the bug here? i wasn't aware that passing a date string with any resolution was allowed unless that was in the column index...

@jreback
Copy link
Contributor Author

jreback commented Jun 17, 2013

partial string indexing

try

df['2012']
df['2012-01']

etc (generate dates of string frequencies)

@cpcloud
Copy link
Member

cpcloud commented Jun 17, 2013

ah i c thanks.

@jreback
Copy link
Contributor Author

jreback commented Jun 17, 2013

http://pandas.pydata.org/pandas-docs/dev/timeseries.html#datetimeindex

I think the docs need expansion here in any event

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2013

hm so should this eventually be for all dt strings? current not working for usecs

@jreback
Copy link
Contributor Author

jreback commented Jun 18, 2013

guess should add that too

though this is for string slicing

not sure how useful that would be
(as u would prob want to construct using datetime ranges)

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2013

yeah i'm not sure that's that useful since at that point you should probably be resampling anyway

@jreback
Copy link
Contributor Author

jreback commented Jun 19, 2013

@wesm

any reason u didn't implement this originally?

@wesm
Copy link
Member

wesm commented Jun 19, 2013

Nope

jreback added a commit that referenced this pull request Jun 19, 2013
BUG: (GH3925) partial string selection with seconds resolution
@jreback jreback merged commit fc589c6 into pandas-dev:master Jun 19, 2013
@jreback
Copy link
Contributor Author

jreback commented Jun 19, 2013

@snth thanks...fixed up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants