Skip to content

Commit eab0553

Browse files
author
Nick Eubank
committed
Merge branch 'master' into merge_indicator
2 parents b797bfd + fe735be commit eab0553

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+5836
-1185
lines changed

doc/source/advanced.rst

+1-4
Original file line numberDiff line numberDiff line change
@@ -675,10 +675,7 @@ values NOT in the categories, similarly to how you can reindex ANY pandas index.
675675
}).set_index('B')
676676
677677
In [11]: df3.index
678-
Out[11]:
679-
CategoricalIndex([u'a', u'a', u'b', u'b', u'c', u'a'],
680-
categories=[u'a', u'b', u'c'],
681-
ordered=False)
678+
Out[11]: CategoricalIndex([u'a', u'a', u'b', u'b', u'c', u'a'], categories=[u'a', u'b', u'c'], ordered=False, name=u'B', dtype='category')
682679
683680
In [12]: pd.concat([df2,df3]
684681
TypeError: categories must match existing categories when appending

doc/source/api.rst

+10
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,7 @@ Top-level dealing with datetimelike
158158
bdate_range
159159
period_range
160160
timedelta_range
161+
infer_freq
161162

162163
Top-level evaluation
163164
~~~~~~~~~~~~~~~~~~~~
@@ -491,6 +492,7 @@ These can be accessed like ``Series.dt.<property>``.
491492
Series.dt.to_pydatetime
492493
Series.dt.tz_localize
493494
Series.dt.tz_convert
495+
Series.dt.normalize
494496

495497
**Timedelta Properties**
496498

@@ -534,17 +536,22 @@ strings and apply several methods to it. These can be acccessed like
534536
Series.str.find
535537
Series.str.findall
536538
Series.str.get
539+
Series.str.index
537540
Series.str.join
538541
Series.str.len
539542
Series.str.ljust
540543
Series.str.lower
541544
Series.str.lstrip
542545
Series.str.match
546+
Series.str.normalize
543547
Series.str.pad
548+
Series.str.partition
544549
Series.str.repeat
545550
Series.str.replace
546551
Series.str.rfind
552+
Series.str.rindex
547553
Series.str.rjust
554+
Series.str.rpartition
548555
Series.str.rstrip
549556
Series.str.slice
550557
Series.str.slice_replace
@@ -553,6 +560,7 @@ strings and apply several methods to it. These can be acccessed like
553560
Series.str.strip
554561
Series.str.swapcase
555562
Series.str.title
563+
Series.str.translate
556564
Series.str.upper
557565
Series.str.wrap
558566
Series.str.zfill
@@ -1364,6 +1372,7 @@ Time/Date Components
13641372
DatetimeIndex.is_quarter_end
13651373
DatetimeIndex.is_year_start
13661374
DatetimeIndex.is_year_end
1375+
DatetimeIndex.inferred_freq
13671376

13681377
Selecting
13691378
~~~~~~~~~
@@ -1414,6 +1423,7 @@ Components
14141423
TimedeltaIndex.microseconds
14151424
TimedeltaIndex.nanoseconds
14161425
TimedeltaIndex.components
1426+
TimedeltaIndex.inferred_freq
14171427

14181428
Conversion
14191429
~~~~~~~~~~

doc/source/basics.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1004,7 +1004,7 @@ Note that the following also works, but is a bit less obvious / clean:
10041004

10051005
.. ipython:: python
10061006
1007-
df.reindex(df.index - ['a', 'd'])
1007+
df.reindex(df.index.difference(['a', 'd']))
10081008
10091009
.. _basics.rename:
10101010

doc/source/computation.rst

+20-21
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,22 @@
11
.. currentmodule:: pandas
2-
.. _computation:
32

43
.. ipython:: python
54
:suppress:
65
76
import numpy as np
87
np.random.seed(123456)
9-
from pandas import *
10-
import pandas.util.testing as tm
11-
randn = np.random.randn
128
np.set_printoptions(precision=4, suppress=True)
9+
import pandas as pd
1310
import matplotlib
1411
try:
1512
matplotlib.style.use('ggplot')
1613
except AttributeError:
17-
options.display.mpl_style = 'default'
14+
pd.options.display.mpl_style = 'default'
1815
import matplotlib.pyplot as plt
1916
plt.close('all')
20-
options.display.max_rows=15
17+
pd.options.display.max_rows=15
18+
19+
.. _computation:
2120

2221
Computational tools
2322
===================
@@ -36,13 +35,13 @@ NA/null values *before* computing the percent change).
3635

3736
.. ipython:: python
3837
39-
ser = Series(randn(8))
38+
ser = pd.Series(np.random.randn(8))
4039
4140
ser.pct_change()
4241
4342
.. ipython:: python
4443
45-
df = DataFrame(randn(10, 4))
44+
df = pd.DataFrame(np.random.randn(10, 4))
4645
4746
df.pct_change(periods=3)
4847
@@ -56,8 +55,8 @@ The ``Series`` object has a method ``cov`` to compute covariance between series
5655

5756
.. ipython:: python
5857
59-
s1 = Series(randn(1000))
60-
s2 = Series(randn(1000))
58+
s1 = pd.Series(np.random.randn(1000))
59+
s2 = pd.Series(np.random.randn(1000))
6160
s1.cov(s2)
6261
6362
Analogously, ``DataFrame`` has a method ``cov`` to compute pairwise covariances
@@ -78,7 +77,7 @@ among the series in the DataFrame, also excluding NA/null values.
7877

7978
.. ipython:: python
8079
81-
frame = DataFrame(randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
80+
frame = pd.DataFrame(np.random.randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
8281
frame.cov()
8382
8483
``DataFrame.cov`` also supports an optional ``min_periods`` keyword that
@@ -87,7 +86,7 @@ in order to have a valid result.
8786

8887
.. ipython:: python
8988
90-
frame = DataFrame(randn(20, 3), columns=['a', 'b', 'c'])
89+
frame = pd.DataFrame(np.random.randn(20, 3), columns=['a', 'b', 'c'])
9190
frame.ix[:5, 'a'] = np.nan
9291
frame.ix[5:10, 'b'] = np.nan
9392
@@ -123,7 +122,7 @@ All of these are currently computed using pairwise complete observations.
123122

124123
.. ipython:: python
125124
126-
frame = DataFrame(randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
125+
frame = pd.DataFrame(np.random.randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
127126
frame.ix[::2] = np.nan
128127
129128
# Series with Series
@@ -140,7 +139,7 @@ Like ``cov``, ``corr`` also supports the optional ``min_periods`` keyword:
140139

141140
.. ipython:: python
142141
143-
frame = DataFrame(randn(20, 3), columns=['a', 'b', 'c'])
142+
frame = pd.DataFrame(np.random.randn(20, 3), columns=['a', 'b', 'c'])
144143
frame.ix[:5, 'a'] = np.nan
145144
frame.ix[5:10, 'b'] = np.nan
146145
@@ -157,8 +156,8 @@ objects.
157156
158157
index = ['a', 'b', 'c', 'd', 'e']
159158
columns = ['one', 'two', 'three', 'four']
160-
df1 = DataFrame(randn(5, 4), index=index, columns=columns)
161-
df2 = DataFrame(randn(4, 4), index=index[:4], columns=columns)
159+
df1 = pd.DataFrame(np.random.randn(5, 4), index=index, columns=columns)
160+
df2 = pd.DataFrame(np.random.randn(4, 4), index=index[:4], columns=columns)
162161
df1.corrwith(df2)
163162
df2.corrwith(df1, axis=1)
164163
@@ -172,7 +171,7 @@ of the ranks (by default) for the group:
172171

173172
.. ipython:: python
174173
175-
s = Series(np.random.randn(5), index=list('abcde'))
174+
s = pd.Series(np.random.np.random.randn(5), index=list('abcde'))
176175
s['d'] = s['b'] # so there's a tie
177176
s.rank()
178177
@@ -181,7 +180,7 @@ or the columns (``axis=1``). ``NaN`` values are excluded from the ranking.
181180

182181
.. ipython:: python
183182
184-
df = DataFrame(np.random.randn(10, 6))
183+
df = pd.DataFrame(np.random.np.random.randn(10, 6))
185184
df[4] = df[2][:5] # some ties
186185
df
187186
df.rank(1)
@@ -253,7 +252,7 @@ These functions can be applied to ndarrays or Series objects:
253252

254253
.. ipython:: python
255254
256-
ts = Series(randn(1000), index=date_range('1/1/2000', periods=1000))
255+
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
257256
ts = ts.cumsum()
258257
259258
ts.plot(style='k--')
@@ -271,7 +270,7 @@ sugar for applying the moving window operator to all of the DataFrame's columns:
271270
272271
.. ipython:: python
273272
274-
df = DataFrame(randn(1000, 4), index=ts.index,
273+
df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,
275274
columns=['A', 'B', 'C', 'D'])
276275
df = df.cumsum()
277276
@@ -310,7 +309,7 @@ keyword. The list of recognized types are:
310309

311310
.. ipython:: python
312311
313-
ser = Series(randn(10), index=date_range('1/1/2000', periods=10))
312+
ser = pd.Series(np.random.randn(10), index=pd.date_range('1/1/2000', periods=10))
314313
315314
rolling_window(ser, 5, 'triang')
316315

doc/source/contributing.rst

+22-8
Original file line numberDiff line numberDiff line change
@@ -112,8 +112,10 @@ want to clone your fork to your machine: ::
112112
This creates the directory `pandas-yourname` and connects your repository to
113113
the upstream (main project) *pandas* repository.
114114

115-
You will also need to hook up Travis-CI to your GitHub repository so the suite
116-
is automatically run when a Pull Request is submitted. Instructions are `here
115+
The testing suite will run automatically on Travis-CI once your Pull Request is
116+
submitted. However, if you wish to run the test suite on a branch prior to
117+
submitting the Pull Request, then Travis-CI needs to be hooked up to your
118+
GitHub repository. Instructions are for doing so are `here
117119
<http://about.travis-ci.org/docs/user/getting-started/>`_.
118120

119121
Creating a Branch
@@ -134,6 +136,17 @@ changes in this branch specific to one bug or feature so it is clear
134136
what the branch brings to *pandas*. You can have many shiny-new-features
135137
and switch in between them using the git checkout command.
136138

139+
To update this branch, you need to retrieve the changes from the master branch::
140+
141+
git fetch upstream
142+
git rebase upstream/master
143+
144+
This will replay your commits on top of the lastest pandas git master. If this
145+
leads to merge conflicts, you must resolve these before submitting your Pull
146+
Request. If you have uncommitted changes, you will need to `stash` them prior
147+
to updating. This will effectively store your changes and they can be reapplied
148+
after updating.
149+
137150
.. _contributing.dev_env:
138151

139152
Creating a Development Environment
@@ -338,7 +351,7 @@ dependencies.
338351
Building the documentation
339352
~~~~~~~~~~~~~~~~~~~~~~~~~~
340353

341-
So how do you build the docs? Navigate to your local the folder
354+
So how do you build the docs? Navigate to your local the folder
342355
``pandas/doc/`` directory in the console and run::
343356

344357
python make.py html
@@ -358,8 +371,9 @@ If you want to do a full clean build, do::
358371

359372
Starting with 0.13.1 you can tell ``make.py`` to compile only a single section
360373
of the docs, greatly reducing the turn-around time for checking your changes.
361-
You will be prompted to delete `.rst` files that aren't required, since the
362-
last committed version can always be restored from git.
374+
You will be prompted to delete `.rst` files that aren't required. This is okay
375+
since the prior version can be checked out from git, but make sure to
376+
not commit the file deletions.
363377

364378
::
365379

@@ -417,7 +431,7 @@ deprecation warnings where needed.
417431
Test-driven Development/Writing Code
418432
------------------------------------
419433

420-
*Pandas* is serious about `Test-driven Development (TDD)
434+
*Pandas* is serious about testing and strongly encourages individuals to embrace `Test-driven Development (TDD)
421435
<http://en.wikipedia.org/wiki/Test-driven_development>`_.
422436
This development process "relies on the repetition of a very short development cycle:
423437
first the developer writes an (initially failing) automated test case that defines a desired
@@ -550,8 +564,8 @@ Doing 'git status' again should give something like ::
550564
# modified: /relative/path/to/file-you-added.py
551565
#
552566

553-
Finally, commit your changes to your local repository with an explanatory message. An informal
554-
commit message format is in effect for the project. Please try to adhere to it. Here are
567+
Finally, commit your changes to your local repository with an explanatory message. *Pandas*
568+
uses a convention for commit message prefixes and layout. Here are
555569
some common prefixes along with general guidelines for when to use them:
556570

557571
* ENH: Enhancement, new functionality

doc/source/ecosystem.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ large data to thin clients.
5757
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5858

5959
Hadley Wickham's `ggplot2 <http://ggplot2.org/>`__ is a foundational exploratory visualization package for the R language.
60-
Based on `"The Grammer of Graphics" <http://www.cs.uic.edu/~wilkinson/TheGrammarOfGraphics/GOG.html>`__ it
60+
Based on `"The Grammar of Graphics" <http://www.cs.uic.edu/~wilkinson/TheGrammarOfGraphics/GOG.html>`__ it
6161
provides a powerful, declarative and extremely general way to generate bespoke plots of any kind of data.
6262
It's really quite incredible. Various implementations to other languages are available,
6363
but a faithful implementation for python users has long been missing. Although still young

doc/source/index.rst.template

+1-1
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ See the package overview for more detail about what's in the library.
115115
{%if not single -%}
116116
whatsnew
117117
install
118+
contributing
118119
faq
119120
overview
120121
10min
@@ -149,7 +150,6 @@ See the package overview for more detail about what's in the library.
149150
api
150151
{% endif -%}
151152
{%if not single -%}
152-
contributing
153153
internals
154154
release
155155
{% endif -%}

doc/source/install.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,7 @@ Optional Dependencies
243243
* `Cython <http://www.cython.org>`__: Only necessary to build development
244244
version. Version 0.19.1 or higher.
245245
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions
246-
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage. Version 3.0.0 or higher required.
246+
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage. Version 3.0.0 or higher required, Version 3.2.0 or higher highly recommended.
247247
* `SQLAlchemy <http://www.sqlalchemy.org>`__: for SQL database support. Version 0.8.1 or higher recommended.
248248
* `matplotlib <http://matplotlib.sourceforge.net/>`__: for plotting
249249
* `statsmodels <http://statsmodels.sourceforge.net/>`__
@@ -255,6 +255,7 @@ Optional Dependencies
255255
* Alternative Excel writer.
256256
* `boto <https://pypi.python.org/pypi/boto>`__: necessary for Amazon S3
257257
access.
258+
* `blosc <https://pypi.python.org/pypi/blosc`__: for msgpack compression using ``blosc``
258259
* One of `PyQt4
259260
<http://www.riverbankcomputing.com/software/pyqt/download>`__, `PySide
260261
<http://qt-project.org/wiki/Category:LanguageBindings::PySide>`__, `pygtk

doc/source/internals.rst

+1-2
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,7 @@ not check (or care) whether the levels themselves are sorted. Fortunately, the
9494
constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but
9595
if you compute the levels and labels yourself, please be careful.
9696

97-
98-
.. _:
97+
.. _ref-subclassing-pandas:
9998

10099
Subclassing pandas Data Structures
101100
----------------------------------

doc/source/io.rst

+22
Original file line numberDiff line numberDiff line change
@@ -2364,6 +2364,10 @@ for some advanced strategies
23642364

23652365
As of version 0.15.0, pandas requires ``PyTables`` >= 3.0.0. Stores written with prior versions of pandas / ``PyTables`` >= 2.3 are fully compatible (this was the previous minimum ``PyTables`` required version).
23662366

2367+
.. warning::
2368+
2369+
There is a ``PyTables`` indexing bug which may appear when querying stores using an index. If you see a subset of results being returned, upgrade to ``PyTables`` >= 3.2. Stores created previously will need to be rewritten using the updated version.
2370+
23672371
.. ipython:: python
23682372
:suppress:
23692373
:okexcept:
@@ -3996,6 +4000,24 @@ whether imported ``Categorical`` variables are ordered.
39964000
a ``Categorial`` with string categories for the values that are labeled and
39974001
numeric categories for values with no label.
39984002

4003+
.. _io.other:
4004+
4005+
Other file formats
4006+
------------------
4007+
4008+
pandas itself only supports IO with a limited set of file formats that map
4009+
cleanly to its tabular data model. For reading and writing other file formats
4010+
into and from pandas, we recommend these packages from the broader community.
4011+
4012+
netCDF
4013+
~~~~~~
4014+
4015+
xray_ provides data structures inspired by the pandas DataFrame for working
4016+
with multi-dimensional datasets, with a focus on the netCDF file format and
4017+
easy conversion to and from pandas.
4018+
4019+
.. _xray: http://xray.readthedocs.org/
4020+
39994021
.. _io.perf:
40004022

40014023
Performance Considerations

0 commit comments

Comments
 (0)