Skip to content

Fix irregular Timestamp arithmetic types #6543 #6544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

rosnfeld
Copy link
Contributor

@rosnfeld rosnfeld commented Mar 4, 2014

closes #6543

This is my first PR so I may have done a few things wrong. If I should have added my tests to a different file/location, please let me know.

I should point out that this code is also tested by many other sections of the pandas codebase, on my first pass I hadn't captured the "Timestamp - datetime" case and all sorts of things blew up. Better now. :)

@jreback jreback added this to the 0.14.0 milestone Mar 4, 2014
@jreback
Copy link
Contributor

jreback commented Mar 4, 2014

no exactly sure why travis doesn't run your branch (its supposed to)...but this fails under numpy < 1.7

https://travis-ci.org/jreback/pandas/jobs/20093901

need to put a conditional not to test under 1.7 (search for _np_version_under1p7) and skip that portion of the test (its the broken timedelta support that causes an issue)

@immerrr
Copy link
Contributor

immerrr commented Mar 5, 2014

@jreback this reminds me: I was looking at BlockManager implementation the other day and it seemed to me that it was too busy micromanaging type coercions & casts. I though that pandas could benefit by wrapping all the nasty numpy interaction details in a single module that would provide a pdarray subclass of np.ndarray with all the necessary casts and workarounds already applied (i.e. boxing/unboxing values, proper datetime parsing, type inference). It could also provide basic building blocks for more advanced algorithms, e.g. pandas-dtype-aware concatenation.

This seems an obvious application of single responsibility principle, did anyone consider/start doing this before? Were there any pitfalls that prevented that?

@immerrr
Copy link
Contributor

immerrr commented Mar 5, 2014

I stand corrected: single responsibility principle is more about OOP, I meant separation of concerns.

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

@immerrr and how would this be different than the Block types? that's really their main purpose.

more interesting would be to cythonize BlockManager (but a bit of a project!)

@immerrr
Copy link
Contributor

immerrr commented Mar 5, 2014

This is not really about Blocks, but rather about handling np.datetime64, np.timedelta64 and np.bool_ -related workarounds in a centralized manner. For a person outside of project's scope all that magic is impossible to follow, because they're distributed evenly all over the code base.

This would require proxying all objects and function imports that may relate to numpy via some pdnumpy package. This is quite a project, like you say, but it would demarcate an area inside pandas and within that area everything would behave exactly as expected. This would allow to abstract away those gory details and focus more on the actual value added by pandas. Do you agree?

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

@immerrr makes sense. Though one could argue that these are these are actually conventions that pandas uses to deal with numpy. Not bugs per se, but more 'workarounds' to treat things in a nicer manner (e.g. taking views on datetime64[ns] and such). IMHO this would only be useful as a layer above numpy, but I am not sure of the utilitiy for non-pandas (my 2c).

Centralizing this would be a very nice feature as they are pretty scattered. I am all for elegant code, so I would be +1 on this.

Their was some code in core/array.py to proxy out ndarray in general (should prob be removed as its only used in a very way and doesn't really do anything, side -issue). The series not a sub of ndarray mostly 'fixed' the need for this (as pandas I don't think can really ever divorce from numpy).

@rosnfeld
Copy link
Contributor Author

rosnfeld commented Mar 5, 2014

I am new to Travis, hopefully I got it right this time - looks like the same build works after my latest change?

https://travis-ci.org/rosnfeld/pandas/jobs/20135641

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

yep looks good

pls rebase and squash to a single commit, see here: https://github.com/pydata/pandas/wiki

(I can do it for you when you merge, but its nice to try to do it yourself, makes a clean workflow)

essentially

git rebase -i origin/master

squash

git push myfork mybranch -f

@rosnfeld
Copy link
Contributor Author

rosnfeld commented Mar 5, 2014

Ok, I hope I got it right. The command you posted has me rebasing on my own fork, but not incorporating the latest upstream commits, right? So you may need to do a tiny bit of work to merge release.rst ..?

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

you can pull in the master changes by doing

git pull origin/master

or do

git pull origin/master --rebase

@rosnfeld
Copy link
Contributor Author

rosnfeld commented Mar 5, 2014

I think you mean "upstream/master" for me to incorporate new items from others? "origin/master" is my fork and doesn't have anything new.

Sadly while the change (using upstream/master) looks good to my eyes, Travis seems to be upset. I don't understand why.

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

yes..upstream/master....rebase and force push again...

immerrr and others added 9 commits March 5, 2014 16:15
Preserve .names in df.set_index(df.index)

Check that df.set_index(df.index) doesn't convert a MultiIndex to an Index

Handle general case of df.set_index([df.index,...])

Cleanup

Add to release notes

Add equality checks

Fix issue on 2.6

Add example to whatsnew
…s (GH6335)

BUG: Changes types used in packing structs

Corrected incorrect data type conversion between pandas and Stata

Remove unnecessary, potentially precision degrading cast to Series when writing data
Added function to cast columns from NumPy data types to Stata data types
Corrected tests for correct Stata datatypes
Fixed formatting in comparison after casting

Added docstring for new function and warning class

BUG: Fixes and tests for extreme values in all data types

The extreme values of float and double (Stata, pandas eqiv: float 32 and
float64) were not correct.  This resulted in incorrect truncation. The
handling of missing values have been improved and code to convert missing
values in any format has been added.  The improvement differentiated between
valid ranges for data and missing values.

Additional issues were found when handling missing Dates, where missing Dates
(NaT) were converted to non-missing dates when written.

A test has been added for extreme numeric values as well as missing values.

Fixed legacy date issue with format 114 files
Added test for 114 files

Added format 114 (Stata 9/10/11) data file

Add test for Stata data with file format 114

Added additional data files for testing alternative Stata file formats

Added expected result to test
Renamed Stata data files to include file format

Types used for integer conversion where always half the size they should be.
Produced a bug when exporting data tables with long integer data (np.int64).

Added test for integer conversion bug

Added test for incorrect integer conversion from int16, int32 and int64

Added additional data files for testing alternative Stata file formats

Added expected result to test
Renamed Stata data files to include file format

Disabled the big endian skips
BUG/TST: fix handling of slice.stop < -len, obj.iloc[:-len(obj)] should be empty

BUG/TST: fix exceptions raised by Series.iloc when slice.start > len

CLN: remove unused _check_slice_bound function and raise_on_error params
@rosnfeld
Copy link
Contributor Author

rosnfeld commented Mar 5, 2014

Ok, I am confused now. Not sure why a second rebase did anything, though it did, and now this PR shows everyone's commits instead of just my own (probably because my branch is now "ahead" of my origin/master). I guess I would have expected this result after my first rebase. Meanwhile, Travis is still upset.

@jreback
Copy link
Contributor

jreback commented Mar 5, 2014

merged via baa4c1d

@rosnfeld git takes some getting used to......but great job on this PR!

thanks!

@jreback jreback closed this Mar 5, 2014
@rosnfeld rosnfeld deleted the subtract_timedelta branch March 7, 2014 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

inconsistent types resulting from Timestamp arithmetic
6 participants