CLN/ENH: Provide full suite of arithmetic (and flex) methods to all NDFrame objects. #4560

jtratner · 2013-08-14T01:17:18Z

This goes on top of @jreback's Series refactoring (#3482). If you want to review, I suggest looking at the PR on top of jreback/series2, so you only see the diffs specific to this PR.

This does a few things:

Normalizes the arithmetic signature so that all objects have the full-range of special methods, inplace methods and __r*__ methods. Similarly, most objects get the full-range of flex methods (varies based on the object.
Refactors all arithmetic methods to core/ops and adds methods to add both special (e.g., __add__) and flex (e.g. add()) to PandasObjects.
Slight refactoring of the arithmetic methods themselves to abstract the common na_op idiom into a single function.
Uses numexpr throughout for most operations (except for the sparse objects)
Makes it easier to create new PandasObjects that support all arithmetic methods - just need to define 2-5 functions.
Establishes a standard type signature for arithmetic methods [documented in core/ops], specifically:

arithmetic method: f(op, name, str_rep, default_axis=None, fill_zeros=None, **eval_kwargs)
comp/bool method: f(op, name, str_rep)

This is a straight continuation of #4051, but since it's different enough (especially with moving the operations to core/ops and no longer using them as classmethods), I'm doing it in a separate PR.

I'm open to changing the naming of the arithmetic functions in core/ops. I named them as they are to make it easier to move them into core/ops

Sidenote: I don't know whether numexpr plays well with sparse objects (or if that's even an issue). It's trivial to disable numexpr on Sparse* objects, just need to know what you want to do.

jtratner · 2013-08-14T01:43:29Z

second side note - adding a truediv method to Series and DataFrame was the major reason that I started contributing to pandas. 2 months or so later, I finally got to a PR.

cpcloud · 2013-08-14T01:44:24Z

@jtratner nice! i love those little anecdotes 😄

jreback · 2013-08-14T01:45:48Z

sparse is prob not tested for Numexpr
and won't work I think; disable for now

jtratner · 2013-08-14T01:51:40Z

@jreback okay. I'll make that change.

axis creation routines now commonized under _setup_axes ENH: more methods added PERF: was missing multi-take opportunity in reindex was incorrectly passing to com._count_not_none doing an extra copy in certain cases BUG: reindex with called with no args will by default return a copy (fixed bug) ENH: moved filter and added axis arg moved where,mask,align TST: make reindex benchmarks longer CLN: fixed up names for creation in panelnd.py DOC: minor release notes changes ENH: initial commite - attempt to reengineer series to inherit from NDFrame rather than ndarray ENH: fixed SparseDataFrame constructor with scalar values reindex still broken removed refs to SparseSeries in internals (not all SparseArray) TST: more fixed TST: more fixes TST: more tests TST: fixed up indexing TST: more sparse fixes BUG: reindex with single block manager now correctly fills with a method BUG: fixed pickle I think BUG: fixed set in internals for sparse fixed boolean indexing iin series I thnk BUG: fixed printing and inclusion of sparse series in DataFrame (now keeps its type), converted to dense for printing CLN: took out SeriesIndex, now uses regular indexing properties BUG: fixed copy (was using series method, bad) block filling for datetimes now ok (was filling with NaT, not iNaT) NaN in boolean ops now correctly handled (was not working for Datetimes) BUG: fixed set_item in SparseFrame if only a scalar is passed (needed index) BUG: sparse join fixed, did I break something in merge? BUG: consolidated block slicing under _slice BUG: added Series to santize_array all numeric methods now call get_values() rather than values ENH: partial SparsePanel support ENH: reverted SparsePanel changes, save for later fixed up xs in SparseFrame BUG: SparsePanel was using an inherited as_matrix(), bad TST: fixed shift default in class creation wrapper is to not pass existing fillers added sanitize column for generalitiy fixed count (in series) CLN: modify core/expressions to use get_values() remove methods from SparseFrame (and use inherited): combine_first,icol,as_matrix,get_dtype_counts bug fix in core/internals/get_dtype_counts CLN: use _values_from_object instead of direct call to get_values() BUG: fixed set_value semantics, as it could possibily change the index BUG: fixed tseries/period indexing fixed some bugs showing up in 32-bit (in nanops) BUG: fix incorrect exception raised in indexing (on 32-bit) BUG: fixed get_merge_keys (add Series to ndarray testing) BUG: fixed pivot table maybe???x core/internals/_ref_locs will now set indexer if ref_items==items TST: apply_reduce in tests/test_frame still failing BUG: fixed getitem_boolean_object finally I think (was issue in set_value in Series) BUG: fixed putmasking mess in Series, now in core/internals BUG: more fixes BUG: fixed core/internals/replace as choking on input BUG: refixed groupby BUG: fix test_where in series BUG: fixed reindex on a sparse block (was not taking correctly) BUG: fixed sparse filling!!!!! BUG: fixed pivot, need to define __hash__ to raise TypeError in NDFrame BUG: downcast argument not in SparseBlock or sparse/frame.py for fillna BUG: fix apply_reduce? BUG: fixes in reduce.pyx to deal with reconstrucing a Series argument to the function if needed BUG: reducer now produces a Series with its index (to the called function) ols converts to_dense to avoid some issues ENH: fixed core/frame/apply to accept reduce argument (default True), to allow turning off the reduction attempt (to preserver the column character) if say self.values would change it BUG: finally fixed reducer? BUG: reduce on frame bug (showing in py3) BUG: ols not working with sparse TST: stats.tests.test_ols/test_wls is not testing for the correct version of statsmodels (fails on 32-bit) PTF TST: make sure to skip the test_wls if our version isn't enough PERF: some perf enhancements BUG: fix sparse/array/make_sparse to take objects and extract the arrays PERF: series construction now much faster PERF: improvements in core/internals MERGE: updated to master and merged in MERGE: more merging fixes PERF: fixed null tests to be MUCH faster PERF: improvements in series construction via from_array PERF: merge improvements by using _has_sparse in bms PERF: some improvements PERF: more internals optimizations CLN: Index now subclassed off of PandasObject BUG: fixed inheritence for core/index.py (Index), solves unicode issues BUG: some merge errors in sparse VB: modernize the sparse vb suite BUG: fixed merging by single item (was broker for sparse for some reason) names not propogating in Series constructor on _slice BUG: add name back to series constructor ENH: pickle compatibility for Series/SparseSeries prior to 0.12! ENH: added pickle_compat to common/load BUG: in core/series on fastpath and index is actually changed (e.g. its actually a datelike index, but is of type object), need to set the axis in the BlockManager BUG: _getitem__bool only is active for Index/Int64Index (issues with DatetimeIndex/PeriodIndex) so default to having it call (slower) __getitem__ COMPAT: py3 compat fixes TST: recover pickles in a particular order or names MERGE: fixup merging with 0.11.0 final BUG: set _subtyp in sparse (use main type of object) BUG: fixed mergig on need to reindex sparse BUG: fixed consolidation issue prior to merge BUG: construction of a series with another series odd bug BUG: fix series constructor when passed a dtype (and no copy) BUG: fixed sparse slicing via blocks (don't use a sparse block when slicing) BUG: fixed remaining sparse issue (SpareDataFrame was converting SparseArray incorrectly) BUG: dtypes in groupby nth fixed (converting on aggregation item_by_item) BUG: partial fix on groupby? BUG: restored groupby back to master (SeriesGrouper) BUG: more fixes on groupby BUG: fixed all groupbys! BUG: get_median in core/nanops.py complaining PERF: made constructions of SparseFrame have less redundant steps PERF: minor series perf improvement TST: trying to fix how_lambda in tseries/resample PTF PERF: addtl groupby multi_python perf improvements PERF: speeds up for Series.__getitem__ PERF: some perf on groupby..... added _block, _values in SingleBlockManager PERF: more reducer improvements BUG: fixed SeriesBinGrouper hopefully BUG: tseries/index.py was missing __str__ = __repr__

BUG: groupby filter that return a series/ndarray truth testing BUG: refixed GH3880, prop name index BUG: not handling sparse block deletes in internals/_delete_from_block BUG: refix generic/truncate TST: refixed generic/replace (bug in core/internals/putmask) revealed as well TST: fix spare_array to put up correct type exceptions rather than Exception CLN: cleanups BUG: fix stata dtype inference (error in core/internals/astype) BUG: fix ujson handling of new series object BUG: fixed scalar coercion (e.g. calling float(series)) to work BUG: fixed astyping with and w/o copy ENH: added _propogate_attributes method to generic.py to allow subclasses to automatically propogate things like name DOC: added v0.13.0.txt feature descriptions CLN: pep8ish cleanups BUG: fix 32-bit,numpy 1.6.1 issue with datetimes in astype_nansafe PERF: speedup for groupby by passing a SNDArray (Series like ndarray) object to evaluation functions if allowed, can avoid Series creation overhead BUG: issue with older numpy (1.6.1) in SeriesGrouper, fallback to passing a Series rather than SNDArray DOC: release notes & doc updates DOC: fixup doc build failures DOC: change pasing of direct ndarrays to cython doc functions (enhancedperformance.rst)

…cache based on changes (GH4080) BUG: Series not updating properly with object dtype (GH33217) BUG: (GH3386) fillna same issue as (GH4080), not updating cacher

CLN: cleaned up internal block action routines, now always return a list of blocks

Instead of the `is_series`, `is_generic`, etc methods, can use the ABC* methods to check for certain pandas types. This is useful because it helps decrease issues with circular imports (since they can be easily imported from core/common). The checks take advantage of the `_typ` and `_subtyp` attributes to handle checks. (e.g. `DataFrame` now has `_typ` of `"dataframe"`, etc. See the code for specifics. PERF: register _cacher as an internal name BUG: fixed abstract base class type checking bug in py2.6 DOC: updates for abc type checking PERF: small perf gains in _get_item_cache

TST/BUG: test/bugfix for GH4463 BUG: fix core/internals/setitem to work for boolean types (weird numpy bug!) BUG: partial frame setting with dtype change (GH4204) BUG: Indexing with dtype conversions fixed GH4463 (int->float), GH4204(boolean->float) BUG: provide better ndarray compat CLN: removed some duped methods MERGE: fix an issue cropping up on the rebase

jreback · 2013-08-14T13:12:46Z

@jtratner ok.....just rebased, so try your git fu!

jtratner · 2013-08-14T22:07:00Z

@jreback did you change any of the Series arithmetic in your last rebase? Hard to tell b/c of all the file movement :P I thought you did, but then I rechecked and it didn't look like it.

jreback · 2013-08-14T22:09:44Z

the timedelta/datetime ops were changed around (in the wrapper); you have it now in core/ops.py, you should be able to use my version with only slight mods I think

jtratner · 2013-08-14T22:11:47Z

@jreback do you have a link to the PR you merged in? Just helpful to have it to look at. Thanks for clarifying tho.

jtratner · 2013-08-14T22:15:03Z

@jreback found the PR. it's all good ... guess I just missed that whole thing. Thanks for the tip!

jreback · 2013-08-14T22:17:55Z

trying to make it smooth
hopefully can get merged soon

* Abstract all arithmetic methods into core/ops * Normalize arithmetic methods signature (see `ops.add_special_arithmetic_methods` and `ops.add_flex_arithmetic_methods` for signature). * Opt-in more arithmetic operations with numexpr (except for SparsePanel, which has to opt-out because it doesn't respond to `shape`). * BUG: Fix ``_fill_zeros`` call to work even if TypeError (previously was inconsistent). * Add bind method to core/common * Add full range of flex arithmetic methods to all NDFrame/ndarray PandasObjects (except for SparsePanel pow and mod, which only work for scalars) * CLN: Remove some calls to np.putmask in favor of masker * ENH: make series datetime ops compatible with add, radd, sub, rsub, etc

Adds a set of testing methods to check that numexpr was actually used successfully. Also changes the if hasattr idiom --> getattr.

* better panel arith test * TST: refactor test case in series

doc fixup

jtratner · 2013-08-15T02:42:57Z

I'm closing this for now because I'm not convinced test suite for time arithmetic is checking rops and I don't want it to be accidentally merged.

jreback · 2013-08-16T19:34:43Z

@jtratner just merged #3482! so you are up with this!

jtratner · 2013-08-16T21:26:51Z

Awesome, now I just have to make datetime arithmetic play nicely with r*
methods.

jreback · 2013-08-16T21:30:26Z

lmk...

jtratner · 2013-08-19T03:19:34Z

@jreback what's the difference between 'i8' dtype and np.int64? On my system it looks like they are the same...

jreback · 2013-08-19T03:35:41Z

same
integer 8 = 8 x 8 = 64 bit

jtratner · 2013-08-19T03:39:59Z

Thanks - that's really helpful!

cpcloud · 2013-08-19T03:55:57Z

@jtratner Also check out np.typeDict: it's a dict mapping str names to dtypes. Pretty useful 😄

jtratner mentioned this pull request Aug 14, 2013

Arithmetic refactor on top of series2 jreback/pandas#6

Closed

jreback and others added 7 commits August 14, 2013 08:29

BLD: pep8 major changes

35c4735

BUG: Bug in Series update where the parent frame is not updating its …

d7bc62a

…cache based on changes (GH4080) BUG: Series not updating properly with object dtype (GH33217) BUG: (GH3386) fillna same issue as (GH4080), not updating cacher

ENH: 'replaced' series.replace with generic.replace !

bd2106e

CLN: cleaned up internal block action routines, now always return a list of blocks

jreback mentioned this pull request Aug 14, 2013

CLN: Post Series subclass from NDFrame #4324

Closed

18 tasks

jtratner added 9 commits August 14, 2013 19:08

CLN: Add testing util in core/expressions

9b364be

Adds a set of testing methods to check that numexpr was actually used successfully. Also changes the if hasattr idiom --> getattr.

TST: Flesh out test cases to include all arithmetic ops

ab51a02

* better panel arith test * TST: refactor test case in series

DOC: Document entire range of arithmetic methods in api.rst

2da3408

doc fixup

CLN: Refactor core/ops + add Panel to it

e01b0f6

CLN: Disable numexpr for sparse objects

4b1561c

restore np_version_under1p7

b8ea326

fixups to datetime handling

fc339bb

add a test for commutative addition

a0c0383

jtratner closed this Aug 15, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN/ENH: Provide full suite of arithmetic (and flex) methods to all NDFrame objects. #4560

CLN/ENH: Provide full suite of arithmetic (and flex) methods to all NDFrame objects. #4560

jtratner commented Aug 14, 2013

jtratner commented Aug 14, 2013

cpcloud commented Aug 14, 2013

jreback commented Aug 14, 2013

jtratner commented Aug 14, 2013

jreback commented Aug 14, 2013

jtratner commented Aug 14, 2013

jreback commented Aug 14, 2013

jtratner commented Aug 14, 2013

jtratner commented Aug 14, 2013

jreback commented Aug 14, 2013

jtratner commented Aug 15, 2013

jreback commented Aug 16, 2013

jtratner commented Aug 16, 2013

jreback commented Aug 16, 2013

jtratner commented Aug 19, 2013

jreback commented Aug 19, 2013

jtratner commented Aug 19, 2013

cpcloud commented Aug 19, 2013

CLN/ENH: Provide full suite of arithmetic (and flex) methods to all NDFrame objects. #4560

CLN/ENH: Provide full suite of arithmetic (and flex) methods to all NDFrame objects. #4560

Conversation

jtratner commented Aug 14, 2013

jtratner commented Aug 14, 2013

cpcloud commented Aug 14, 2013

jreback commented Aug 14, 2013

jtratner commented Aug 14, 2013

jreback commented Aug 14, 2013

jtratner commented Aug 14, 2013

jreback commented Aug 14, 2013

jtratner commented Aug 14, 2013

jtratner commented Aug 14, 2013

jreback commented Aug 14, 2013

jtratner commented Aug 15, 2013

jreback commented Aug 16, 2013

jtratner commented Aug 16, 2013

jreback commented Aug 16, 2013

jtratner commented Aug 19, 2013

jreback commented Aug 19, 2013

jtratner commented Aug 19, 2013

cpcloud commented Aug 19, 2013