-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Sorting on Timestamp broken version 0.11 #3461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When you do
I get (using my example)
Floats are not dates are very unweidlty to deal with, instead You must have dtype of Here's an example
|
This must be an issue with pandas.tslib.Timestamp on import then. import pandas as pd data = pd.read_csv('../data/txns_baseline-has10b5-filtered.csv', sep='|', parse_dates=['disclosuredate']) type(data['disclosuredate'][0]) a = data[data['ticker']=='A'] a[['ticker', 'disclosuredate', 'txnid']].sort(['disclosuredate']).values a[['ticker', 'disclosuredate', 'txnid']].values import time a['epoch'] = a['disclosuredate'].map(lambda x: time.mktime(x.timetuple())) a[['ticker', 'disclosuredate', 'txnid', 'epoch']].sort(['epoch']).values |
show ng the |
the disclosuredate column turns out to be an integer representation of the date import pandas as pd data = pd.read_csv('sample.csv', sep='|', parse_dates=['disclosuredate']) data.info() here is a sample of the csv |ticker|companyname|sector|subsector|industry_id|mcap|mcapgroup|insidercik|insidername|position|iacc|txnid|code|has_10b5|value|shares|pps|round_limit_price|nickel_limit_price|quarter_limit_price|fifty_cent_limit_price|dollar_limit_price|triggertype|triggerprice|round_trigger_price|nickel_trigger_price|quarter_trigger_price|fifty_cent_trigger_price|dollar_trigger_price|disclosuredate|disclosure_t1|txndate|option_gain|option_gain_group|option_daystoexpire|option_days_group|planid|adoptiondisclosure|adoption_days|adoption_days_group|adoption_change|adoption_change_group|first_10b5|days_between_10b5|expiring|sharesinc|prevshares|sharesincpct|sharesinc_change_group|pricedown|pricedownpct|lasthighdate|lasthighprice|dayssincelasthigh|dayssincelasthigh_group|pregain3m|pregain1m|postgain1m|postgain3m|postgain6m|postgain12m|postgain24m|relgain1m|relgain3m|relgain6m|relgain12m|relgain24m |
are you doing something different than this?
|
yes im not appending .loc[:,['ticker','disclosuredate']] after the sort In [1]: import pandas as pd In [2]: data = pd.read_csv('sample.csv', sep='|', parse_dates=['disclosuredate']) In [3]: data.get_dtype_counts() In [4]: data.sort(columns='disclosuredate').loc[:,['ticker','disclosuredate']] In [5]: data[['ticker', 'disclosuredate']].sort(columns=['disclosuredate']) |
In [8]: data.sort(columns=['disclosuredate']).tail(1)['ticker'] In [9]: data.sort(columns='disclosuredate').loc[:,['ticker','disclosuredate']].tail(1)['ticker'] |
ok...this is a bug, but very odd the difference is that I am specifying a list when it doesn't work (bottom), but a single column name when it does these are handled by different sorters (the single argument uses argsort which works fine with datetime64[ns])
|
thanks for the report, this is merged into master if you would like to try |
Issue:
frame.sort_index
usesargsort
for a single sort column, but_lexsort_indexer
for multi-columnsneed to do a transform on datetimens[64] before passing to ``lexsort
indexer
In [54]: a.columns
Out[54]: Index([ticker, disclosuredate, txnid], dtype=object)
In [55]: a.values
Out[55]:
array([[A, 2010-03-09 00:00:00, 11110508],
[A, 2010-03-12 00:00:00, 11121853],
[A, 2011-02-15 00:00:00, 12488915],
[A, 2011-03-08 00:00:00, 12563380],
[A, 2011-04-22 00:00:00, 12653015],
[A, 2013-01-28 00:00:00, 15244694]], dtype=object)
In [56]: a.sort(columns=['disclosuredate']).values
Out[56]:
array([[A, 2010-03-09 00:00:00, 11110508],
[A, 2010-03-12 00:00:00, 11121853],
[A, 2011-04-22 00:00:00, 12653015],
[A, 2013-01-28 00:00:00, 15244694],
[A, 2011-03-08 00:00:00, 12563380],
[A, 2011-02-15 00:00:00, 12488915]], dtype=object)
In [57]: pd.version
Out[57]: '0.11.0'
In [58]: import time
In [59]: a['epoch'] = a['disclosuredate'].map(lambda x: time.mktime(x.timetuple()))
In [60]: a.sort(['epoch']).values
Out[60]:
array([[A, 2010-03-09 00:00:00, 11110508, 1268110800.0],
[A, 2010-03-12 00:00:00, 11121853, 1268370000.0],
[A, 2011-02-15 00:00:00, 12488915, 1297746000.0],
[A, 2011-03-08 00:00:00, 12563380, 1299560400.0],
[A, 2011-04-22 00:00:00, 12653015, 1303444800.0],
[A, 2013-01-28 00:00:00, 15244694, 1359349200.0]], dtype=object)
The text was updated successfully, but these errors were encountered: