Skip to content

Commit d0963b2

Browse files
committed
Merge remote-tracking branch 'upstream/master'
2 parents 406f61f + b3fea8f commit d0963b2

File tree

151 files changed

+5069
-2392
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

151 files changed

+5069
-2392
lines changed

CONTRIBUTING.md

+33-124
Original file line numberDiff line numberDiff line change
@@ -4,41 +4,38 @@ All contributions, bug reports, bug fixes, documentation improvements,
44
enhancements and ideas are welcome.
55

66
The [GitHub "issues" tab](https://github.com/pydata/pandas/issues)
7-
contains some issues labeled "Good as first PR"; these are
8-
tasks which do not require deep knowledge of the package. Look those up if you're
7+
contains some issues labeled "Good as first PR"; Look those up if you're
98
looking for a quick way to help out.
109

11-
Please try and follow these guidelines, as this makes it easier for us to accept
12-
your contribution or address the issue you're having.
13-
1410
#### Bug Reports
1511

1612
- Please include a short, self-contained Python snippet reproducing the problem.
1713
You can have the code formatted nicely by using [GitHub Flavored Markdown](http://github.github.com/github-flavored-markdown/) :
1814

1915
```python
20-
16+
2117
print("I ♥ pandas!")
2218

2319
```
2420

25-
- A [test case](https://github.com/pydata/pandas/tree/master/pandas/tests) may be more helpful.
26-
- Specify the pandas (and NumPy) version used. (check `pandas.__version__`
27-
and `numpy.__version__`)
28-
- Explain what the expected behavior was, and what you saw instead.
29-
- If the issue seems to involve some of [pandas' dependencies](https://github.com/pydata/pandas#dependencies)
30-
such as
31-
[NumPy](http://numpy.org),
32-
[matplotlib](http://matplotlib.org/), and
33-
[PyTables](http://www.pytables.org/)
34-
you should include (the relevant parts of) the output of
21+
- Specify the pandas version used and those of it's dependencies. You can simply include the output of
3522
[`ci/print_versions.py`](https://github.com/pydata/pandas/blob/master/ci/print_versions.py).
23+
- Explain what the expected behavior was, and what you saw instead.
3624

3725
#### Pull Requests
3826

39-
- **Make sure the test suite passes** for both python2 and python3.
40-
You can use `test_fast.sh`, **tox** locally, and/or enable **Travis-CI** on your fork.
41-
See "Getting Travis-CI going" below.
27+
- **Make sure the test suite passes** on your box, Use the provided `test_*.sh` scripts or tox.
28+
- Use [proper commit messages](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html):
29+
- a subject line with `< 80` chars.
30+
- One blank line.
31+
- Optionally, a commit message body.
32+
- Please reference relevant Github issues in your commit message using `GH1234`
33+
or `#1234`. Either style is fine but the '#' style generates nose when your rebase your PR.
34+
- `doc/source/release.rst` and `doc/source/vx.y.z.txt` contain an ongoing
35+
changelog for each release. Add entries to these files
36+
as needed in a separate commit in your PR: document the fix, enhancement,
37+
or (unavoidable) breaking change.
38+
- Keep style fixes to a separate commit to make your PR more readable.
4239
- An informal commit message format is in effect for the project. Please try
4340
and adhere to it. Check `git log` for examples. Here are some common prefixes
4441
along with general guidelines for when to use them:
@@ -49,119 +46,31 @@ your contribution or address the issue you're having.
4946
- **BLD**: Updates to the build process/scripts
5047
- **PERF**: Performance improvement
5148
- **CLN**: Code cleanup
52-
- Commit messages should have:
53-
- a subject line with `< 80` chars
54-
- one blank line
55-
- a commit message body, if there's a need for one
56-
- If you are changing any code, you should enable Travis-CI on your fork
57-
to make it easier for the team to see that the PR does indeed pass all the tests.
58-
- **Backward-compatibility really matters**. Pandas already has a large user base and
59-
a lot of existing user code.
60-
- Don't break old code if you can avoid it.
61-
- If there is a need, explain it in the PR.
62-
- Changes to method signatures should be made in a way which doesn't break existing
63-
code. For example, you should beware of changes to ordering and naming of keyword
64-
arguments.
49+
- Maintain backward-compatibility. Pandas has lots of users with lots of existing code. Don't break it.
50+
- If you think breakage is required clearly state why as part of the PR.
51+
- Be careful when changing method signatures.
6552
- Add deprecation warnings where needed.
66-
- Performance matters. You can use the included `test_perf.sh`
67-
script to make sure your PR does not introduce any new performance regressions
68-
in the library.
53+
- Performance matters. Make sure your PR hasn't introduced perf regressions by using `test_perf.sh`.
6954
- Docstrings follow the [numpydoc](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt) format.
70-
- **Don't** merge upstream into a branch you're going to submit as a PR.
71-
This can create all sorts of problems. Use `git rebase` instead. This ensures
72-
no merge conflicts occur when your code is merged by the core team.
73-
- Please reference the GH issue number in your commit message using `GH1234`
74-
or `#1234`. Either style is fine.
75-
- Use `raise AssertionError` rather then plain `assert` in library code (`assert` is fine
76-
for test code). `python -o` strips assertions. Better safe than sorry.
77-
- When writing tests, don't use "new" assertion methods added to the `unittest` module
78-
in 2.7 since pandas currently supports 2.6. The most common pitfall is:
79-
80-
with self.assertRaises(ValueError):
81-
foo
82-
83-
84-
which fails with Python 2.6. You need to use `assertRaises` from
85-
`pandas.util.testing` instead (or use `self.assertRaises(TheException,func,args)`).
86-
87-
- `doc/source/release.rst` and `doc/source/vx.y.z.txt` contain an ongoing
88-
changelog for each release. Add entries to these files
89-
as needed in a separate commit in your PR: document the fix, enhancement,
90-
or (unavoidable) breaking change.
91-
- For extra brownie points, use `git rebase -i` to squash and reorder
92-
commits in your PR so that the history makes the most sense. Use your own
93-
judgment to decide what history needs to be preserved.
94-
- Pandas source code should not -- with some exceptions, such as 3rd party licensed code --
95-
generally speaking, include an "Authors" list or attribution to individuals in source code.
96-
`RELEASE.rst` details changes and enhancements to the code over time.
97-
A "thanks goes to @JohnSmith." as part of the appropriate entry is a suitable way to acknowledge
98-
contributions. The rest is `git blame`/`git log`.
99-
Feel free to ask the commiter who merges your code to include such an entry
100-
or include it directly yourself as part of the PR if you'd like to.
101-
**We're always glad to have new contributors join us from the ever-growing pandas community.**
102-
You may also be interested in the copyright policy as detailed in the pandas [LICENSE](https://github.com/pydata/pandas/blob/master/LICENSE).
55+
- When writing tests, use 2.6 compatible `self.assertFoo` methods. Some polyfills such as `assertRaises`
56+
can be found in `pandas.util.testing`.
57+
- Generally, pandas source files should not contain attributions. You can include a "thanks to..."
58+
in the release changelog. The rest is `git blame`/`git log`.
59+
- When you start working on a PR, start by creating a new branch pointing at the latest
60+
commit on github master.
61+
- **Do not** merge upstream into a branch you're going to submit as a PR.
62+
Use `git rebase` against the current github master.
63+
- For extra brownie points, you can squash and reorder the commits in your PR using `git rebase -i`.
64+
Use your own judgment to decide what history needs to be preserved. If git frightens you, that's OK too.
65+
- Use `raise AssertionError` over `assert` unless you want the assertion stripped by `python -o`.
66+
- The pandas copyright policy is detailed in the pandas [LICENSE](https://github.com/pydata/pandas/blob/master/LICENSE).
10367
- On the subject of [PEP8](http://www.python.org/dev/peps/pep-0008/): yes.
104-
- On the subject of massive PEP8 fix PRs touching everything, please consider the following:
105-
- They create noisy merge conflicts for people working in their own fork.
106-
- They make `git blame` less effective.
107-
- Different tools / people achieve PEP8 in different styles. This can create
108-
"style wars" and churn that produces little real benefit.
109-
- If your code changes are intermixed with style fixes, they are harder to review
110-
before merging. Keep style fixes in separate commits.
111-
- It's fine to clean-up a little around an area you just worked on.
112-
- Generally it's a BAD idea to PEP8 on documentation.
113-
114-
Having said that, if you still feel a PEP8 storm is in order, go for it.
68+
- On the subject of a massive PEP8-storm touching everything: not too often (once per release works).
11569

11670
### Notes on plotting function conventions
11771

11872
https://groups.google.com/forum/#!topic/pystatsmodels/biNlCvJPNNY/discussion
11973

120-
### Getting Travis-CI going
121-
122-
Instructions for getting Travis-CI installed are available [here](http://about.travis-ci.org/docs/user/getting-started/).
123-
For those users who are new to Travis-CI and [continuous integration](https://en.wikipedia.org/wiki/Continuous_integration) in particular,
124-
Here's a few high-level notes:
125-
- Travis-CI is a free service (with premium account upgrades available) that integrates
126-
well with GitHub.
127-
- Enabling Travis-CI on your GitHub fork of a project will cause any *new* commit
128-
pushed to the repo to trigger a full build+test on Travis-CI's servers.
129-
- All the configuration for Travis-CI builds is already specified by `.travis.yml` in the repo.
130-
That means all you have to do is enable Travis-CI once, and then just push commits
131-
and you'll get full testing across py2/3 with pandas' considerable
132-
[test-suite](https://github.com/pydata/pandas/tree/master/pandas/tests).
133-
- Enabling Travis-CI will attach the test results (red/green) to the Pull-Request
134-
page for any PR you submit. For example:
135-
136-
https://github.com/pydata/pandas/pull/2532,
137-
138-
See the Green "Good to merge!" banner? that's it.
139-
140-
This is especially important for new contributors, as members of the pandas dev team
141-
like to know that the test suite passes before considering it for merging.
142-
Even regular contributors who test religiously on their local box (using tox
143-
for example) often rely on a PR+travis=green to make double sure everything
144-
works ok on another system, as occasionally, it doesn't.
145-
146-
#### Steps to enable Travis-CI
147-
148-
- Open https://travis-ci.org/
149-
- Select "Sign in with GitHub" (Top Navbar)
150-
- Select \[your username\] -> "Accounts" (Top Navbar)
151-
- Select 'Sync now' to refresh the list of repos from your GH account.
152-
- Flip the switch for the repos you want Travis-CI enabled for.
153-
"pandas", obviously.
154-
- Then, pushing a *new* commit to a certain branch on that repo
155-
will trigger a build/test for that branch. For example, the branch
156-
might be `master` or `PR1234_fix_everything__atomically`, if that's the
157-
name of your PR branch.
158-
159-
You can see the build history and current builds for your fork
160-
at: https://travis-ci.org/(your_GH_username)/pandas.
161-
162-
For example, the builds for the main pandas repo can be seen at:
163-
https://travis-ci.org/pydata/pandas.
164-
16574
####More developer docs
16675

16776
* See the [developers](http://pandas.pydata.org/developers.html) page on the

MANIFEST.in

-2
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,7 @@ include MANIFEST.in
22
include LICENSE
33
include RELEASE.md
44
include README.rst
5-
include TODO.rst
65
include setup.py
7-
include setupegg.py
86

97
graft doc
108
prune doc/build

ci/print_versions.py

+11-3
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,24 @@
11
#!/usr/bin/env python
22

33

4-
def show_versions():
4+
5+
def show_versions(as_json=False):
56
import imp
67
import os
78
fn = __file__
89
this_dir = os.path.dirname(fn)
910
pandas_dir = os.path.abspath(os.path.join(this_dir,".."))
1011
sv_path = os.path.join(pandas_dir, 'pandas','util')
1112
mod = imp.load_module('pvmod', *imp.find_module('print_versions', [sv_path]))
12-
return mod.show_versions()
13+
return mod.show_versions(as_json)
1314

1415

1516
if __name__ == '__main__':
16-
show_versions()
17+
# optparse is 2.6-safe
18+
from optparse import OptionParser
19+
parser = OptionParser()
20+
parser.add_option("-j", "--json", action="store_true", help="Format output as JSON")
21+
22+
(options, args) = parser.parse_args()
23+
24+
show_versions(as_json=options.json)

doc/source/10min.rst

+1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
import pandas as pd
1414
np.set_printoptions(precision=4, suppress=True)
1515
options.display.mpl_style='default'
16+
options.display.max_rows=15
1617
1718
#### portions of this were borrowed from the
1819
#### Pandas cheatsheet
7.85 KB
Loading

doc/source/api.rst

+36
Original file line numberDiff line numberDiff line change
@@ -1112,6 +1112,42 @@ Conversion
11121112
DatetimeIndex.to_pydatetime
11131113

11141114

1115+
GroupBy
1116+
-------
1117+
.. currentmodule:: pandas.core.groupby
1118+
1119+
GroupBy objects are returned by groupby calls: :func:`pandas.DataFrame.groupby`, :func:`pandas.Series.groupby`, etc.
1120+
1121+
Indexing, iteration
1122+
~~~~~~~~~~~~~~~~~~~
1123+
.. autosummary::
1124+
:toctree: generated/
1125+
1126+
GroupBy.__iter__
1127+
GroupBy.groups
1128+
GroupBy.indices
1129+
GroupBy.get_group
1130+
1131+
Function application
1132+
~~~~~~~~~~~~~~~~~~~~
1133+
.. autosummary::
1134+
:toctree: generated/
1135+
1136+
GroupBy.apply
1137+
GroupBy.aggregate
1138+
GroupBy.transform
1139+
1140+
Computations / Descriptive Stats
1141+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1142+
.. autosummary::
1143+
:toctree: generated/
1144+
1145+
GroupBy.mean
1146+
GroupBy.median
1147+
GroupBy.std
1148+
GroupBy.var
1149+
GroupBy.ohlc
1150+
11151151
..
11161152
HACK - see github issue #4539. To ensure old links remain valid, include
11171153
here the autosummaries with previous currentmodules as a comment and add

doc/source/basics.rst

+47-15
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
randn = np.random.randn
1010
np.set_printoptions(precision=4, suppress=True)
1111
from pandas.compat import lrange
12+
options.display.max_rows=15
1213
1314
==============================
1415
Essential Basic Functionality
@@ -103,8 +104,8 @@ a set of specialized cython routines that are especially fast when dealing with
103104
Here is a sample (using 100 column x 100,000 row ``DataFrames``):
104105

105106
.. csv-table::
106-
:header: "Operation", "0.11.0 (ms)", "Prior Vern (ms)", "Ratio to Prior"
107-
:widths: 30, 30, 30, 30
107+
:header: "Operation", "0.11.0 (ms)", "Prior Version (ms)", "Ratio to Prior"
108+
:widths: 25, 25, 25, 25
108109
:delim: ;
109110

110111
``df1 > df2``; 13.32; 125.35; 0.1063
@@ -556,12 +557,11 @@ will either be of lower dimension or the same dimension.
556557
about a data set. For example, suppose we wanted to extract the date where the
557558
maximum value for each column occurred:
558559

559-
560560
.. ipython:: python
561561
562562
tsdf = DataFrame(randn(1000, 3), columns=['A', 'B', 'C'],
563563
index=date_range('1/1/2000', periods=1000))
564-
tsdf.apply(lambda x: x[x.idxmax()])
564+
tsdf.apply(lambda x: x.idxmax())
565565
566566
You may also pass additional arguments and keyword arguments to the ``apply``
567567
method. For instance, consider the following function you would like to apply:
@@ -1029,7 +1029,7 @@ with more than one group returns a DataFrame with one column per group.
10291029
10301030
Series(['a1', 'b2', 'c3']).str.extract('([ab])(\d)')
10311031
1032-
Elements that do not match return a row of ``NaN``s.
1032+
Elements that do not match return a row filled with ``NaN``.
10331033
Thus, a Series of messy strings can be "converted" into a
10341034
like-indexed Series or DataFrame of cleaned-up or more useful strings,
10351035
without necessitating ``get()`` to access tuples or ``re.match`` objects.
@@ -1051,18 +1051,35 @@ can also be used.
10511051
Testing for Strings that Match or Contain a Pattern
10521052
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10531053

1054-
In previous versions, *extracting* match groups was accomplished by ``match``,
1055-
which returned a not-so-convenient Series of tuples. Starting in version 0.14,
1056-
the default behavior of match will change. It will return a boolean
1057-
indexer, analagous to the method ``contains``.
10581054

1059-
The distinction between
1060-
``match`` and ``contains`` is strictness: ``match`` relies on
1061-
strict ``re.match`` while ``contains`` relies on ``re.search``.
1055+
You can check whether elements contain a pattern:
1056+
1057+
.. ipython:: python
1058+
1059+
pattern = r'[a-z][0-9]'
1060+
Series(['1', '2', '3a', '3b', '03c']).str.contains(pattern)
1061+
1062+
or match a pattern:
10621063

1063-
In version 0.13, ``match`` performs its old, deprecated behavior by default,
1064-
but the new behavior is availabe through the keyword argument
1065-
``as_indexer=True``.
1064+
1065+
.. ipython:: python
1066+
1067+
Series(['1', '2', '3a', '3b', '03c']).str.match(pattern, as_indexer=True)
1068+
1069+
The distinction between ``match`` and ``contains`` is strictness: ``match``
1070+
relies on strict ``re.match``, while ``contains`` relies on ``re.search``.
1071+
1072+
.. warning::
1073+
1074+
In previous versions, ``match`` was for *extracting* groups,
1075+
returning a not-so-convenient Series of tuples. The new method ``extract``
1076+
(described in the previous section) is now preferred.
1077+
1078+
This old, deprecated behavior of ``match`` is still the default. As
1079+
demonstrated above, use the new behavior by setting ``as_indexer=True``.
1080+
In this mode, ``match`` is analagous to ``contains``, returning a boolean
1081+
Series. The new behavior will become the default behavior in a future
1082+
release.
10661083

10671084
Methods like ``match``, ``contains``, ``startswith``, and ``endswith`` take
10681085
an extra ``na`` arguement so missing values can be considered True or False:
@@ -1457,6 +1474,21 @@ It's also possible to reset multiple options at once (using a regex):
14571474
reset_option("^display")
14581475
14591476
1477+
.. versionadded:: 0.13.1
1478+
1479+
Beginning with v0.13.1 the `option_context` context manager has been exposed through
1480+
the top-level API, allowing you to execute code with given option values. Option values
1481+
are restored automatically when you exit the `with` block:
1482+
1483+
.. ipython:: python
1484+
1485+
with option_context("display.max_rows",10,"display.max_columns", 5):
1486+
print get_option("display.max_rows")
1487+
print get_option("display.max_columns")
1488+
1489+
print get_option("display.max_rows")
1490+
print get_option("display.max_columns")
1491+
14601492
14611493
Console Output Formatting
14621494
-------------------------

0 commit comments

Comments
 (0)