-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame sort_values and multiple "by" columns fails to order NaT correctly (since v0.19) #16836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
yeah looks a bit buggy. happy to have you take a look. |
A bit of investigation: The problem starts here The array is converted to a int64 view and the nan is thus lost. I presume this is done for speed up? I think |
These are converted just fine. (now I wouldn't mind actually moving the key conversions inside
|
The problem is that If the function was converting the array instead, it could special case the NaT. |
actually in this case these can be simply converted to categoricals before passing to lexsort_indexer which will solve the problem (they are converted internallly anyhow).
so first thing to try is to remove the i8 conversion entirely (as Categoricals already handle all of these endge cases). |
as you suggested, just removing the conversion seems to solve the problem. |
great! |
* Removed unnecessary conversion to i8 * Fixed failed test (`test_frame_column_inplace_sort_exception`) * Added check to ensure that the test is performing its intended goal(`test_sort_nan`)
* Removed unnecessary conversion to i8 * Fixed failed test (`test_frame_column_inplace_sort_exception`) * Added check to ensure that the test is performing its intended goal(`test_sort_nan`)
* Removed unnecessary conversion to i8 * Fixed failed test (`test_frame_column_inplace_sort_exception`) * Added check to ensure that the test is performing its intended goal(`test_sort_nan`)
…aT correctly closes pandas-dev#16836 Author: Jean-Mathieu Deschenes <[email protected]> This patch had conflicts when merged, resolved by Committer: Jeff Reback <[email protected]> Closes pandas-dev#16995 from jdeschenes/datetime_sort_issues and squashes the following commits: 257e10a [Jean-Mathieu Deschenes] Changes requested by @jreback c6d55e2 [Jean-Mathieu Deschenes] Fix for pandas-dev#16836
…aT correctly closes pandas-dev#16836 Author: Jean-Mathieu Deschenes <[email protected]> This patch had conflicts when merged, resolved by Committer: Jeff Reback <[email protected]> Closes pandas-dev#16995 from jdeschenes/datetime_sort_issues and squashes the following commits: 257e10a [Jean-Mathieu Deschenes] Changes requested by @jreback c6d55e2 [Jean-Mathieu Deschenes] Fix for pandas-dev#16836
Code Used to Demonstrate Issue
Problem description
Considering sort_values(by=['a', 'b'], na_position='last')
In v0.19.2 this works as expected:
In v0.20.2 this does not work as expected, with NaT in 'a' appearing first. Sorting only on 'a' does put NaT last.
Expected Output - as observed for v0.19
Bad Output - observed for v0.20
Output of
pd.show_versions()
pandas: 0.20.2
pytest: 2.8.5
pip: 8.1.1
setuptools: 35.0.2
Cython: 0.23.4
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 4.1.1
sphinx: 1.4
patsy: 0.4.1
dateutil: 2.5.0
pytz: 2016.3
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.9999999
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None
None
The text was updated successfully, but these errors were encountered: