Fix irregular Timestamp arithmetic types #6543 #6544

rosnfeld · 2014-03-04T20:11:49Z

This is my first PR so I may have done a few things wrong. If I should have added my tests to a different file/location, please let me know.

I should point out that this code is also tested by many other sections of the pandas codebase, on my first pass I hadn't captured the "Timestamp - datetime" case and all sorts of things blew up. Better now. :)

jreback · 2014-03-04T23:23:11Z

no exactly sure why travis doesn't run your branch (its supposed to)...but this fails under numpy < 1.7

https://travis-ci.org/jreback/pandas/jobs/20093901

need to put a conditional not to test under 1.7 (search for _np_version_under1p7) and skip that portion of the test (its the broken timedelta support that causes an issue)

immerrr · 2014-03-05T12:16:09Z

@jreback this reminds me: I was looking at BlockManager implementation the other day and it seemed to me that it was too busy micromanaging type coercions & casts. I though that pandas could benefit by wrapping all the nasty numpy interaction details in a single module that would provide a pdarray subclass of np.ndarray with all the necessary casts and workarounds already applied (i.e. boxing/unboxing values, proper datetime parsing, type inference). It could also provide basic building blocks for more advanced algorithms, e.g. pandas-dtype-aware concatenation.

This seems an obvious application of single responsibility principle, did anyone consider/start doing this before? Were there any pitfalls that prevented that?

immerrr · 2014-03-05T12:27:12Z

I stand corrected: single responsibility principle is more about OOP, I meant separation of concerns.

jreback · 2014-03-05T12:33:00Z

@immerrr and how would this be different than the Block types? that's really their main purpose.

more interesting would be to cythonize BlockManager (but a bit of a project!)

immerrr · 2014-03-05T12:54:45Z

This is not really about Blocks, but rather about handling np.datetime64, np.timedelta64 and np.bool_ -related workarounds in a centralized manner. For a person outside of project's scope all that magic is impossible to follow, because they're distributed evenly all over the code base.

This would require proxying all objects and function imports that may relate to numpy via some pdnumpy package. This is quite a project, like you say, but it would demarcate an area inside pandas and within that area everything would behave exactly as expected. This would allow to abstract away those gory details and focus more on the actual value added by pandas. Do you agree?

jreback · 2014-03-05T13:01:08Z

@immerrr makes sense. Though one could argue that these are these are actually conventions that pandas uses to deal with numpy. Not bugs per se, but more 'workarounds' to treat things in a nicer manner (e.g. taking views on datetime64[ns] and such). IMHO this would only be useful as a layer above numpy, but I am not sure of the utilitiy for non-pandas (my 2c).

Centralizing this would be a very nice feature as they are pretty scattered. I am all for elegant code, so I would be +1 on this.

Their was some code in core/array.py to proxy out ndarray in general (should prob be removed as its only used in a very way and doesn't really do anything, side -issue). The series not a sub of ndarray mostly 'fixed' the need for this (as pandas I don't think can really ever divorce from numpy).

rosnfeld · 2014-03-05T14:57:01Z

I am new to Travis, hopefully I got it right this time - looks like the same build works after my latest change?

https://travis-ci.org/rosnfeld/pandas/jobs/20135641

jreback · 2014-03-05T14:59:27Z

yep looks good

pls rebase and squash to a single commit, see here: https://github.com/pydata/pandas/wiki

(I can do it for you when you merge, but its nice to try to do it yourself, makes a clean workflow)

essentially

git rebase -i origin/master

squash

git push myfork mybranch -f

rosnfeld · 2014-03-05T20:39:14Z

Ok, I hope I got it right. The command you posted has me rebasing on my own fork, but not incorporating the latest upstream commits, right? So you may need to do a tiny bit of work to merge release.rst ..?

jreback · 2014-03-05T20:50:48Z

you can pull in the master changes by doing

git pull origin/master

or do

git pull origin/master --rebase

rosnfeld · 2014-03-05T21:05:30Z

I think you mean "upstream/master" for me to incorporate new items from others? "origin/master" is my fork and doesn't have anything new.

Sadly while the change (using upstream/master) looks good to my eyes, Travis seems to be upset. I don't understand why.

jreback · 2014-03-05T21:11:09Z

yes..upstream/master....rebase and force push again...

…s dtypes in Series creation (GH6529)

Preserve .names in df.set_index(df.index) Check that df.set_index(df.index) doesn't convert a MultiIndex to an Index Handle general case of df.set_index([df.index,...]) Cleanup Add to release notes Add equality checks Fix issue on 2.6 Add example to whatsnew

…s (GH6335) BUG: Changes types used in packing structs Corrected incorrect data type conversion between pandas and Stata Remove unnecessary, potentially precision degrading cast to Series when writing data Added function to cast columns from NumPy data types to Stata data types Corrected tests for correct Stata datatypes Fixed formatting in comparison after casting Added docstring for new function and warning class BUG: Fixes and tests for extreme values in all data types The extreme values of float and double (Stata, pandas eqiv: float 32 and float64) were not correct. This resulted in incorrect truncation. The handling of missing values have been improved and code to convert missing values in any format has been added. The improvement differentiated between valid ranges for data and missing values. Additional issues were found when handling missing Dates, where missing Dates (NaT) were converted to non-missing dates when written. A test has been added for extreme numeric values as well as missing values. Fixed legacy date issue with format 114 files Added test for 114 files Added format 114 (Stata 9/10/11) data file Add test for Stata data with file format 114 Added additional data files for testing alternative Stata file formats Added expected result to test Renamed Stata data files to include file format Types used for integer conversion where always half the size they should be. Produced a bug when exporting data tables with long integer data (np.int64). Added test for integer conversion bug Added test for incorrect integer conversion from int16, int32 and int64 Added additional data files for testing alternative Stata file formats Added expected result to test Renamed Stata data files to include file format Disabled the big endian skips

BUG/TST: fix handling of slice.stop < -len, obj.iloc[:-len(obj)] should be empty BUG/TST: fix exceptions raised by Series.iloc when slice.start > len CLN: remove unused _check_slice_bound function and raise_on_error params

rosnfeld · 2014-03-05T21:25:44Z

Ok, I am confused now. Not sure why a second rebase did anything, though it did, and now this PR shows everyone's commits instead of just my own (probably because my branch is now "ahead" of my origin/master). I guess I would have expected this result after my first rebase. Meanwhile, Travis is still upset.

jreback · 2014-03-05T22:15:50Z

merged via baa4c1d

@rosnfeld git takes some getting used to......but great job on this PR!

thanks!

jreback added Bug labels Mar 4, 2014

jreback added this to the 0.14.0 milestone Mar 4, 2014

immerrr and others added 9 commits March 5, 2014 16:15

BUG: fix _ref_locs corruption when slice indexing across columns axis

3b0b53c

BUG: Regression from 0.13 in the treatmenet of numpy datetime64 non-n…

cd3ba57

…s dtypes in Series creation (GH6529)

informative error message

5f81319

BUG: Bug in setitem with a duplicate index and an alignable rhs (GH6541)

653cc43

BUG: Bug in setitem with loc on mixed integer Indexes (GH6546)

f467a91

BUG/TST: fix several issues with slice bound checking code

ef4228b

BUG/TST: fix handling of slice.stop < -len, obj.iloc[:-len(obj)] should be empty BUG/TST: fix exceptions raised by Series.iloc when slice.start > len CLN: remove unused _check_slice_bound function and raise_on_error params

BUG: Fix irregular Timestamp arithmetic types pandas-dev#6543

48c3969

jreback closed this Mar 5, 2014

rosnfeld deleted the subtract_timedelta branch March 7, 2014 00:19

jreback mentioned this pull request Mar 15, 2014

REGR: Timestamp - series is failing #6648

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix irregular Timestamp arithmetic types #6543 #6544

Fix irregular Timestamp arithmetic types #6543 #6544

rosnfeld commented Mar 4, 2014

jreback commented Mar 4, 2014

immerrr commented Mar 5, 2014

immerrr commented Mar 5, 2014

jreback commented Mar 5, 2014

immerrr commented Mar 5, 2014

jreback commented Mar 5, 2014

rosnfeld commented Mar 5, 2014

jreback commented Mar 5, 2014

rosnfeld commented Mar 5, 2014

jreback commented Mar 5, 2014

rosnfeld commented Mar 5, 2014

jreback commented Mar 5, 2014

rosnfeld commented Mar 5, 2014

jreback commented Mar 5, 2014

Fix irregular Timestamp arithmetic types #6543 #6544

Fix irregular Timestamp arithmetic types #6543 #6544

Conversation

rosnfeld commented Mar 4, 2014

jreback commented Mar 4, 2014

immerrr commented Mar 5, 2014

immerrr commented Mar 5, 2014

jreback commented Mar 5, 2014

immerrr commented Mar 5, 2014

jreback commented Mar 5, 2014

rosnfeld commented Mar 5, 2014

jreback commented Mar 5, 2014

rosnfeld commented Mar 5, 2014

jreback commented Mar 5, 2014

rosnfeld commented Mar 5, 2014

jreback commented Mar 5, 2014

rosnfeld commented Mar 5, 2014

jreback commented Mar 5, 2014