Skip to content

Quick Frame Shift Implementation #6404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 138 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
138 commits
Select commit Hold shift + click to select a range
13cedfc
This is an implementation of quick shift logic
gouthambs Feb 19, 2014
4056fd6
Added a vbench to reflect quick shift implementation
gouthambs Feb 19, 2014
3a0861c
ENH: Import testing into main namespace.
jseabold Jan 30, 2014
fa9d5fd
DOC: clarify docstring of rolling/expanding moments
jorisvandenbossche Feb 15, 2014
4e96c86
DOC: fix doc build warnings
jorisvandenbossche Feb 16, 2014
f5f79f2
CLN: remove need for tz_localize, tz_convert in Series, use the generic
jreback Feb 17, 2014
6edf066
TST: comapre only full dtypes in testing
jreback Feb 17, 2014
75afc98
BUG: preserve dtypes in interpolate
Feb 6, 2014
0dc1016
check in interp_with_fill too
Feb 14, 2014
6fafaa5
ENH: Add sym_diff for index
Jan 20, 2014
d41f038
CLN: Change assert_(a in b) and assert_(a not in b) to specialized forms
bwignall Feb 17, 2014
637af57
TST: dtype comparisons on windows in test_generic.py
jreback Feb 17, 2014
f0510b3
FIX: hdfstore queries of the form where=[('date', '>=', datetime(2013…
Feb 9, 2014
909418f
added test by taking test_term_compat, and removing all Term calls
Feb 10, 2014
76609ec
DOC: added release note about #6313
Feb 14, 2014
2ee87b9
TST: checked for DeprecationWarning on tests for backwards compatabil…
Feb 17, 2014
dc59b47
BUG: Bug in Series.get, was using a buggy access method (GH6383)
jreback Feb 17, 2014
e4bb9ca
CLN: Change assert_(a is [not] None) to specialized forms
bwignall Feb 17, 2014
675caa5
CLN: Specialize assert_(np.array_equal(...))
bwignall Feb 17, 2014
afa2354
BUG: Bug in DataFrame.dropna with duplicate indicies (GH6355)
jreback Feb 17, 2014
2a69be2
DOC: add cookbook example for reading in simple binary file formats
cpcloud Feb 17, 2014
c316f85
Fix behavior of `to_offset` with leading zeroes
ischwabacher Feb 18, 2014
31c50bf
API/CLN: add in common operations to Series/Index, refactored as a Op…
jreback Feb 17, 2014
261da67
TST: Add tests for `to_offset` with leading zeroes
ischwabacher Feb 18, 2014
d2619df
CLN: Change assert_(a is [not] b) to specialized forms
bwignall Feb 18, 2014
54af30e
CLN: Change assert_(a [not] in b) to specialized forms
bwignall Feb 18, 2014
06244a2
BUG: Regression in chained getitem indexing with embedded list-like f…
jreback Feb 18, 2014
079147d
DOC: release.rst update
jreback Feb 18, 2014
afa0537
Add `to_offset` fix to release notes
ischwabacher Feb 18, 2014
5a2c649
TST: dtype issues on windows with test_getitem_dups
jreback Feb 18, 2014
d4c5305
TST: disable odd test_data/test_fred tests failing, maybe a data revi…
jreback Feb 18, 2014
318016e
DOC: missing spaces in release.rst
jreback Feb 18, 2014
e1a0938
DOC: more release.rst fixes
jreback Feb 18, 2014
b047969
BUG: Fix for GH6399, mergesort is unstable when ascending=False
Feb 18, 2014
56be317
BUG: Float64Index with nans not comparing correctly
jreback Feb 18, 2014
46c630f
ENH: pd.infer_freq() will accept a Series as input
jreback Feb 19, 2014
e885765
API: validate conversions of datetimeindex with tz, and fixup to_seri…
jreback Feb 18, 2014
79116ad
DOC: use ipython in bool replace doc warning
cpcloud Feb 19, 2014
955f951
DOC: use code block for shorter error message
cpcloud Feb 19, 2014
02f35d0
TST: GH6410 / numpy 4328
jreback Feb 19, 2014
f3b1d62
BLD: add optional numpy_dev build
jreback Feb 19, 2014
9d4d8b7
BLD: tweak numpy_dev build
jreback Feb 19, 2014
066bd98
BUG: inconcistency in contained values of a Series created from a Dat…
jreback Feb 20, 2014
6dbe501
ENH #6416: performance improvements on write - tradoff higher memory …
mangecoeur Feb 20, 2014
525bc9b
ENH: backport Python 3.3 ChainMap
cpcloud Feb 1, 2014
dbc5350
CLN/ENH: use Scope.swapkey() to update names
cpcloud Feb 16, 2014
aa69b97
ENH/CLN: track scope instead of hard coding minimal stack level
cpcloud Feb 16, 2014
2af800e
ERR/API: disallow local references in top-level calls to eval
cpcloud Feb 18, 2014
d65c80b
BUG: ChainMap m parameter only exists in Python 3.4
cpcloud Feb 18, 2014
58fe0b9
BUG: use a regular for loop in _ensure_term
cpcloud Feb 19, 2014
b783fc3
REGR: Bug in Series.reindex when specifying a method with some nan va…
jreback Feb 20, 2014
3df66d4
ENH #6416 cleanup for PR
mangecoeur Feb 20, 2014
e113df1
ENH: Add functions for creating dataframes with NaNs
jseabold Feb 20, 2014
7693029
TST: windows dtype fix for test_series/test_reindex_pad
jreback Feb 20, 2014
7e975b8
BUG: punt to user when passing overlapping replacment values in a nes…
cpcloud Feb 21, 2014
cb51b68
BUG: correctly tokenize local variable references
cpcloud Feb 21, 2014
977da4a
DOC: link to pandas-cookbook instructions
jvns Feb 21, 2014
c17ac75
PERF: Perf issue in concatting with empty objects (GH3259)
jreback Feb 21, 2014
0bb3188
API: concat will now concatenate mixed Series and DataFrames using th…
jreback Feb 21, 2014
68429b9
DOC: add explanation to doc/sphinxext
jorisvandenbossche Feb 22, 2014
c07301e
DOC: small doc fixes
jorisvandenbossche Feb 22, 2014
24f873b
TST: fix spurious tests for test_groupby (GH6436)
jreback Feb 22, 2014
010ee4a
BUG/TST: iloc will now raise IndexError on out-of-bounds list indexer…
jreback Feb 22, 2014
c554245
DOC: fix .tz attribute error in DatetimeIndex when building docs
jreback Feb 22, 2014
6317a59
CLN: minimize tokenizer passes
cpcloud Feb 21, 2014
4df6669
INT/CLN: clean up block slicing semantics
jreback Feb 22, 2014
622cf5c
str_extract should work for timeseries, bug 6348
andrewkittredge Feb 14, 2014
af652f8
BUG FIX: cartesian_product now converts all arguments to ndarrays
shoyer Feb 23, 2014
fe807b5
BUG/TST: sorting of NaNs on sym_diff
Feb 23, 2014
9dd1188
CLN: remove dead code in pandas.computation
cpcloud Feb 22, 2014
86cb092
BLD: remove pdb from test_strings.py
jreback Feb 23, 2014
438bb2a
BLD: add numpy 1.8.x builds as optional
jreback Feb 23, 2014
762d442
Add references to isnull in notnull docs and vice versa
Feb 23, 2014
dd5084e
COMPAT: infer_freq compat on passing an Index of strings (GH6463)
jreback Feb 24, 2014
a5a3c5b
BUG: split should respect maxsplit when no pat is given
cpcloud Feb 24, 2014
3dab6ab
DOC: release notes for #6466
cpcloud Feb 25, 2014
32b446e
ENH: Allow custom frequencies. Closes #4541.
jseabold Feb 25, 2014
5783d97
Modified get_data_famafrench(name) to allow for all file
kdiether Feb 25, 2014
c485492
BUG: Bug in sum of a timedelta64[ns] series (GH6462)
jreback Feb 25, 2014
9fceba8
BUG: Bug in resample with a timezone and certain offsets (GH6397)
jreback Feb 25, 2014
09d6950
CLN: remove vestigial count code
dsm054 Feb 25, 2014
bc6528f
PERF: perf improvements in DataFrame construction with a non-daily da…
jreback Feb 25, 2014
95863a1
BUG: fix non-caching of some frequency offsets for date generation
jreback Feb 26, 2014
bea86e2
DOC: Expand on usage.
jseabold Feb 26, 2014
33cdd41
Initial implementation of calculation of astronomical Julian Date.
timcera Jun 26, 2013
60c17d4
BUG: Bug in iat/iloc with duplicate indices on a Series (6493)
jreback Feb 27, 2014
d19636a
PERF: perf improvements in single-dtyped indexing (GH6484)
jreback Feb 26, 2014
dfa8b80
DOC: Clarify that methods taking a MultiIndex level index also accept…
toobaz Feb 27, 2014
bddc6b4
BUG/TST: read_html should follow pandas conventions when creating emp…
cpcloud Feb 22, 2014
86e6e0b
DOC: further homogenized the description of "level" argument
toobaz Feb 27, 2014
7fd527e
DOC: level argument description in _binop
toobaz Feb 27, 2014
618918e
DOC: show users how to emulate R c function with iloc slicing and r_
cpcloud Feb 27, 2014
55180b5
TST: windows dtype test fix for tests_indexing/test_imethods_with_dups
jreback Feb 27, 2014
785d087
PERF: optimize index.__getitem__ for slice & boolean mask indexers
immerrr Feb 22, 2014
18db035
BUG: Bug in multi-axis indexing using .loc on non-unique indices (GH6…
jreback Feb 28, 2014
0437f6e
remove semicolon from CREATE TABLE legacy template
rcarneva Feb 28, 2014
7f9ae74
ENH: add method='dense' to rank
dsm054 Mar 1, 2014
c8d7fd1
DOC: Add common error message to byte-ordering gotcha.
danielballan Mar 2, 2014
c409ccf
BUG: fix _ref_locs corruption when slice indexing across columns axis
immerrr Mar 3, 2014
094705c
BUG: Regression from 0.13 in the treatmenet of numpy datetime64 non-n…
jreback Mar 3, 2014
5b5ba81
BUG/API groupby head and tail act like filter, since they dont aggreg…
hayd Mar 3, 2014
7a9b182
ENH: Preserve .names in df.set_index(df.index)
qwhelan Feb 24, 2014
0780cfc
informative error message
clarkfitzg Mar 4, 2014
e354daa
BUG: Bug in setitem with a duplicate index and an alignable rhs (GH6541)
jreback Mar 4, 2014
b229786
BUG: Bug in setitem with loc on mixed integer Indexes (GH6546)
jreback Mar 5, 2014
4cd649b
BUG/API: Fix stata io to deal with wrong data types and missing value…
bashtage Feb 12, 2014
6d150ba
BUG/TST: fix several issues with slice bound checking code
immerrr Mar 3, 2014
d2c9adc
BUG: fix fancy indexing with empty list
immerrr Mar 5, 2014
57063f9
ENH: Keep series name in GroupBy agg/apply ops
bburan-galenea Mar 5, 2014
17b5fd9
DOC add groupby head and tail to docs
hayd Mar 5, 2014
d5401cb
BUG: Fix irregular Timestamp arithmetic types #6543
rosnfeld Mar 4, 2014
595e8fd
BLD: change wheel url to pandas.pydata.org
jreback Mar 6, 2014
2c3927e
BLD: add bottleneck 0.8.0 to 3.3 build
jreback Mar 6, 2014
2ef3bfc
BUG: Series.quantile raising on an object dtype (GH6555)
jreback Mar 5, 2014
e9857a9
ENH: Allow timestamp and data label to be set when exporting to Stata
bashtage Mar 5, 2014
56e1b39
BUG: preserve frequency across Timestamp addition/subtraction (#4547)
rosnfeld Mar 6, 2014
8aaf8fb
ENH/BUG groupby nth now filters, works with DataFrames
hayd Mar 7, 2014
790420c
TST add vbench for groupby nth
hayd Mar 7, 2014
b14cbc9
BLD: fix versions of setuptools/pip/wheel to be stable and know to work
jreback Mar 9, 2014
c69f83b
DOC: cookbook.rst entry
jreback Mar 9, 2014
8e4de94
CLN: Change assert_([not] isinstance(a,b)) to specialized forms
bwignall Mar 8, 2014
ddf9be7
CLN: Change assert_([not] isinstance(a,b)) to specialized forms
bwignall Mar 8, 2014
cea38e4
BUG: Bug in .xs with a nan in level when dropped (GH6574)
jreback Mar 9, 2014
e9a2f13
CLN: Finish changing assert_(...) to specialized forms
bwignall Mar 9, 2014
4df5bd2
Make to_csv return a string in case no buffer is supplied.
filmor Feb 19, 2014
5936bea
FIX use selected_obj rather the obj throughout groupby
hayd Mar 7, 2014
37ee8a5
PERF #6570 patch by @jreback
hayd Mar 9, 2014
9de49db
BUG: Bug in fillna with method = bfill/ffill and datetime64[ns] dtype…
jreback Mar 10, 2014
ab09f82
ENH: including offset/freq in Timestamp repr (#4553)
rosnfeld Mar 7, 2014
fc69613
FIX filter selects selected columns
hayd Mar 8, 2014
3c460f0
BUG/TST: replace iterrows with itertuples in sql insert (GH6509)
jorisvandenbossche Mar 10, 2014
d24efd7
TST: add check_exact arg to assert_frame/series_equal
jorisvandenbossche Mar 11, 2014
2e305d9
FIX: Bug whereby array_equivalent was not correctly comparing Float64…
unutbu Mar 11, 2014
ca99dfe
BUG: iloc back to values for assignment. Closes #6602.
jseabold Mar 11, 2014
57db1c2
ENH: Make sure to return int for indices
jseabold Mar 11, 2014
97a5d1e
BUG: Bug in popping from a Series (GH6600)
jreback Mar 11, 2014
b562eb2
Squashed version of the commits below.
gouthambs Feb 19, 2014
ff18ac9
Fixed the test failure
gouthambs Mar 13, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
*~
*.pyc
*.pyo
*.swp
Expand Down
37 changes: 31 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,40 +11,65 @@ env:
- secure: "PCzUFR8CHmw9lH84p4ygnojdF7Z8U5h7YfY0RyT+5K/aiQ1ZTU3ZkDTPI0/rR5FVMxsEEKEQKMcc5fvqW0PeD7Q2wRmluloKgT9w4EVEJ1ppKf7lITPcvZR2QgVOvjv4AfDtibLHFNiaSjzoqyJVjM4igjOu8WTlF3JfZcmOQjQ="

matrix:
fast_finish: true
include:
- python: 2.6
env:
- NOSE_ARGS="not slow and not network and not disabled"
- CLIPBOARD=xclip
- LOCALE_OVERRIDE="it_IT.UTF-8"
- JOB_NAME: "26_nslow_nnet" # ScatterCI Build name, 20 chars max
- JOB_NAME: "26_nslow_nnet"
- python: 2.7
env:
- NOSE_ARGS="slow and not network and not disabled"
- LOCALE_OVERRIDE="zh_CN.GB18030"
- FULL_DEPS=true
- JOB_TAG=_LOCALE
- JOB_NAME: "27_slow_nnet_LOCALE" # ScatterCI Build name, 20 chars max
- JOB_NAME: "27_slow_nnet_LOCALE"
- python: 2.7
env:
- NOSE_ARGS="not slow and not disabled"
- FULL_DEPS=true
- CLIPBOARD_GUI=gtk2
- JOB_NAME: "27_nslow" # ScatterCI Build name, 20 chars max
- JOB_NAME: "27_nslow"
- DOC_BUILD=true # if rst files were changed, build docs in parallel with tests
- python: 3.2
env:
- NOSE_ARGS="not slow and not disabled"
- FULL_DEPS=true
- CLIPBOARD_GUI=qt4
- JOB_NAME: "32_nslow" # ScatterCI Build name, 20 chars max
- JOB_NAME: "32_nslow"
- python: 3.3
env:
- NOSE_ARGS="not slow and not disabled"
- FULL_DEPS=true
- CLIPBOARD=xsel
- JOB_NAME: "33_nslow" # ScatterCI Build name, 20 chars max

- JOB_NAME: "33_nslow"
- python: 2.7
env:
- NOSE_ARGS="not slow and not network and not disabled"
- JOB_NAME: "27_numpy_master"
- JOB_TAG=_NUMPY_DEV_master
- NUMPY_BUILD=master
- python: 2.7
env:
- NOSE_ARGS="not slow and not network and not disabled"
- JOB_NAME: "27_numpy_1.8.x"
- JOB_TAG=_NUMPY_DEV_1_8_x
- NUMPY_BUILD=maintenance/1.8.x
allow_failures:
- python: 2.7
env:
- NOSE_ARGS="not slow and not network and not disabled"
- JOB_NAME: "27_numpy_master"
- JOB_TAG=_NUMPY_DEV_master
- NUMPY_BUILD=master
- python: 2.7
env:
- NOSE_ARGS="not slow and not network and not disabled"
- JOB_NAME: "27_numpy_1.8.x"
- JOB_TAG=_NUMPY_DEV_1_8_x
- NUMPY_BUILD=maintenance/1.8.x

# allow importing from site-packages,
# so apt-get python-x works for system pythons
Expand Down
43 changes: 33 additions & 10 deletions ci/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,28 +31,47 @@ edit_init
python_major_version="${TRAVIS_PYTHON_VERSION:0:1}"
[ "$python_major_version" == "2" ] && python_major_version=""

pip install -I -U setuptools
pip install wheel
# fix these versions
pip install -I pip==1.5.1
pip install -I setuptools==2.2
pip install wheel==0.22

# comment this line to disable the fetching of wheel files
base_url=http://cache27diy-cpycloud.rhcloud.com
base_url=http://pandas.pydata.org/pandas-build/dev/wheels

wheel_box=${TRAVIS_PYTHON_VERSION}${JOB_TAG}
PIP_ARGS+=" -I --use-wheel --find-links=$base_url/$wheel_box/ --allow-external --allow-insecure"

# Force virtualenv to accept system_site_packages
rm -f $VIRTUAL_ENV/lib/python$TRAVIS_PYTHON_VERSION/no-global-site-packages.txt


if [ -n "$LOCALE_OVERRIDE" ]; then
# make sure the locale is available
# probably useless, since you would need to relogin
time sudo locale-gen "$LOCALE_OVERRIDE"
fi


# we need these for numpy
time sudo apt-get $APT_ARGS install libatlas-base-dev gfortran

if [ -n "$NUMPY_BUILD" ]; then
# building numpy
curdir=$(pwd)
echo "building numpy: $curdir"

# remove the system installed numpy
pip uninstall numpy -y

# clone & install
git clone --branch $NUMPY_BUILD https://github.com/numpy/numpy.git numpy
cd numpy
time sudo python setup.py install

cd $curdir
numpy_version=$(python -c 'import numpy; print(numpy.__version__)')
echo "numpy: $numpy_version"
else
# Force virtualenv to accept system_site_packages
rm -f $VIRTUAL_ENV/lib/python$TRAVIS_PYTHON_VERSION/no-global-site-packages.txt
fi

time pip install $PIP_ARGS -r ci/requirements-${wheel_box}.txt


Expand Down Expand Up @@ -98,6 +117,10 @@ export PATH=/usr/lib/ccache:/usr/lib64/ccache:$PATH
which gcc
ccache -z
time pip install $(find dist | grep gz | head -n 1)
# restore cython
time pip install $PIP_ARGS $(cat ci/requirements-${wheel_box}.txt | grep -i cython)

# restore cython (if not numpy building)
if [ -z "$NUMPY_BUILD" ]; then
time pip install $PIP_ARGS $(cat ci/requirements-${wheel_box}.txt | grep -i cython)
fi

true
1 change: 0 additions & 1 deletion ci/requirements-2.7.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ xlrd==0.9.2
patsy==0.1.0
html5lib==1.0b2
lxml==3.2.1
scikits.timeseries==0.91.3
scipy==0.10.0
beautifulsoup4==4.2.1
statsmodels==0.5.0
Expand Down
3 changes: 3 additions & 0 deletions ci/requirements-2.7_NUMPY_DEV_1_8_x.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
python-dateutil
pytz==2013b
cython==0.19.1
3 changes: 3 additions & 0 deletions ci/requirements-2.7_NUMPY_DEV_master.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
python-dateutil
pytz
cython==0.19.1
1 change: 1 addition & 0 deletions ci/requirements-3.3.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ numpy==1.8.0
cython==0.19.1
numexpr==2.3
tables==3.1.0
bottleneck==0.8.0
matplotlib==1.2.1
patsy==0.1.0
lxml==3.2.1
Expand Down
6 changes: 6 additions & 0 deletions ci/speedpack/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,12 @@ function generate_wheels() {
}


# generate a single wheel version
# generate_wheels "/reqf/requirements-3.3.txt"
#
# if vagrant is already up
# run as vagrant provision

for reqfile in $(ls -1 /reqf/requirements-*.*); do
generate_wheels "$reqfile"
done
23 changes: 20 additions & 3 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -424,10 +424,25 @@ Time series-related
Series.shift
Series.first_valid_index
Series.last_valid_index
Series.weekday
Series.resample
Series.tz_convert
Series.tz_localize
Series.year
Series.month
Series.day
Series.hour
Series.minute
Series.second
Series.microsecond
Series.nanosecond
Series.date
Series.time
Series.dayofyear
Series.weekofyear
Series.week
Series.dayofweek
Series.weekday
Series.quarter

String handling
~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -1129,7 +1144,9 @@ Time/Date Components
DatetimeIndex.dayofweek
DatetimeIndex.weekday
DatetimeIndex.quarter

DatetimeIndex.tz
DatetimeIndex.freq
DatetimeIndex.freqstr

Selecting
~~~~~~~~~
Expand Down Expand Up @@ -1159,7 +1176,7 @@ Conversion
DatetimeIndex.to_datetime
DatetimeIndex.to_period
DatetimeIndex.to_pydatetime

DatetimeIndex.to_series

GroupBy
-------
Expand Down
40 changes: 40 additions & 0 deletions doc/source/comparison_with_r.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,43 @@ R packages.
Base R
------

Slicing with R's |c|_
~~~~~~~~~~~~~~~~~~~~~

R makes it easy to access ``data.frame`` columns by name

.. code-block:: r

df <- data.frame(a=rnorm(5), b=rnorm(5), c=rnorm(5), d=rnorm(5), e=rnorm(5))
df[, c("a", "c", "e")]

or by integer location

.. code-block:: r

df <- data.frame(matrix(rnorm(1000), ncol=100))
df[, c(1:10, 25:30, 40, 50:100)]

Selecting multiple columns by name in ``pandas`` is straightforward

.. ipython:: python

df = DataFrame(np.random.randn(10, 3), columns=list('abc'))
df[['a', 'c']]
df.loc[:, ['a', 'c']]

Selecting multiple noncontiguous columns by integer location can be achieved
with a combination of the ``iloc`` indexer attribute and ``numpy.r_``.

.. ipython:: python

named = list('abcdefg')
n = 30
columns = named + np.arange(len(named), n).tolist()
df = DataFrame(np.random.randn(n, n), columns=columns)

df.iloc[:, np.r_[:10, 24:30]]

|aggregate|_
~~~~~~~~~~~~

Expand Down Expand Up @@ -407,6 +444,9 @@ The second approach is to use the :meth:`~pandas.DataFrame.groupby` method:
For more details and examples see :ref:`the reshaping documentation
<reshaping.pivot>` or :ref:`the groupby documentation<groupby.split>`.

.. |c| replace:: ``c``
.. _c: http://stat.ethz.ch/R-manual/R-patched/library/base/html/c.html

.. |aggregate| replace:: ``aggregate``
.. _aggregate: http://finzi.psych.upenn.edu/R/library/stats/html/aggregate.html

Expand Down
4 changes: 2 additions & 2 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@
'sphinx.ext.extlinks',
'sphinx.ext.todo',
'numpydoc', # used to parse numpy-style docstrings for autodoc
'ipython_directive',
'ipython_console_highlighting',
'ipython_sphinxext.ipython_directive',
'ipython_sphinxext.ipython_console_highlighting',
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'sphinx.ext.coverage',
Expand Down
72 changes: 72 additions & 0 deletions doc/source/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,9 @@ Turn a matrix with hours in columns and days in rows into a continuous row seque
`How to rearrange a python pandas DataFrame?
<http://stackoverflow.com/questions/15432659/how-to-rearrange-a-python-pandas-dataframe>`__

`Dealing with duplicates when reindexing a timeseries to a specified frequency
<http://stackoverflow.com/questions/22244383/pandas-df-refill-adding-two-columns-of-different-shape>`__

.. _cookbook.resample:

Resampling
Expand Down Expand Up @@ -470,6 +473,75 @@ Storing Attributes to a group node
store.close()
os.remove('test.h5')


.. _cookbook.binary:

Binary Files
~~~~~~~~~~~~

Pandas readily accepts numpy record arrays, if you need to read in a binary
file consisting of an array of C structs. For example, given this C program
in a file called ``main.c`` compiled with ``gcc main.c -std=gnu99`` on a
64-bit machine,

.. code-block:: c

#include <stdio.h>
#include <stdint.h>

typedef struct _Data
{
int32_t count;
double avg;
float scale;
} Data;

int main(int argc, const char *argv[])
{
size_t n = 10;
Data d[n];

for (int i = 0; i < n; ++i)
{
d[i].count = i;
d[i].avg = i + 1.0;
d[i].scale = (float) i + 2.0f;
}

FILE *file = fopen("binary.dat", "wb");
fwrite(&d, sizeof(Data), n, file);
fclose(file);

return 0;
}

the following Python code will read the binary file ``'binary.dat'`` into a
pandas ``DataFrame``, where each element of the struct corresponds to a column
in the frame:

.. code-block:: python

import numpy as np
from pandas import DataFrame

names = 'count', 'avg', 'scale'

# note that the offsets are larger than the size of the type because of
# struct padding
offsets = 0, 8, 16
formats = 'i4', 'f8', 'f4'
dt = np.dtype({'names': names, 'offsets': offsets, 'formats': formats},
align=True)
df = DataFrame(np.fromfile('binary.dat', dt))

.. note::

The offsets of the structure elements may be different depending on the
architecture of the machine on which the file was created. Using a raw
binary file format like this for general data storage is not recommended, as
it is not cross platform. We recommended either HDF5 or msgpack, both of
which are supported by pandas' IO facilities.

Computation
-----------

Expand Down
Loading