diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 5329bad1d90e4..dc7cb7f2ab0bc 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,105 +1,569 @@ -### Guidelines +Contributing to pandas +====================== + +Where to start? +--------------- All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome. -The [GitHub "issues" tab](https://github.com/pydata/pandas/issues) -contains some issues labeled "Good as first PR"; Look those up if you're -looking for a quick way to help out. +If you are simply looking to start working with the *pandas* codebase, +navigate to the [GitHub "issues" +tab](https://github.com/pydata/pandas/issues) and start looking through +interesting issues. There are a number of issues listed under +[Docs](https://github.com/pydata/pandas/issues?labels=Docs&sort=updated&state=open) +and [Good as first +PR](https://github.com/pydata/pandas/issues?labels=Good+as+first+PR&sort=updated&state=open) +where you could start out. -#### Bug Reports +Or maybe through using *pandas* you have an idea of you own or are +looking for something in the documentation and thinking 'this can be +improved'...you can do something about it! - - Please include a short, self-contained Python snippet reproducing the problem. - You can have the code formatted nicely by using [GitHub Flavored Markdown](http://github.github.com/github-flavored-markdown/) : +Feel free to ask questions on [mailing +list](https://groups.google.com/forum/?fromgroups#!forum/pydata) - ```python +Bug Reports/Enhancement Requests +-------------------------------- + +Bug reports are an important part of making *pandas* more stable. Having +a complete bug report will allow others to reproduce the bug and provide +insight into fixing. Since many versions of *pandas* are supported, +knowing version information will also identify improvements made since +previous versions. Often trying the bug-producing code out on the +*master* branch is a worthwhile exercise to confirm the bug still +exists. It is also worth searching existing bug reports and pull +requests to see if the issue has already been reported and/or fixed. + +Bug reports must: + +1. Include a short, self-contained Python snippet reproducing the + problem. You can have the code formatted nicely by using [GitHub + Flavored + Markdown](http://github.github.com/github-flavored-markdown/): : + ```python >>> from pandas import DataFrame >>> df = DataFrame(...) ... ``` - - Include the full version string of pandas and its dependencies. In recent (>0.12) versions - of pandas you can use a built in function: - - ```python - >>> from pandas.util.print_versions import show_versions - >>> show_versions() - ``` - - and in 0.13.1 onwards: - ```python - >>> pd.show_versions() - ``` - - Explain what the expected behavior was, and what you saw instead. - -#### Pull Requests - -##### Testing: - - Every addition to the codebase whether it be a bug or new feature should have associated tests. The can be placed in the `tests` directory where your code change occurs. - - When writing tests, use 2.6 compatible `self.assertFoo` methods. Some polyfills such as `assertRaises` - can be found in `pandas.util.testing`. - - Do not attach doctrings to tests. Make the test itself readable and use comments if needed. - - **Make sure the test suite passes** on your box, use the provided `test_*.sh` scripts or tox. Pandas tests a variety of platforms and Python versions so be cognizant of cross-platorm considerations. - - Performance matters. Make sure your PR hasn't introduced performance regressions by using `test_perf.sh`. See [vbench performance tests](https://github.com/pydata/pandas/wiki/Performance-Testing) wiki for more information on running these tests. - - For more information on testing see [Testing advice and best practices in `pandas`](https://github.com/pydata/pandas/wiki/Testing) - -##### Documentation / Commit Messages: - - Docstrings follow the [numpydoc](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt) format. - - Keep style fixes to a separate commit to make your PR more readable. - - An informal commit message format is in effect for the project. Please try - and adhere to it. Check `git log` for examples. Here are some common prefixes - along with general guidelines for when to use them: - - **ENH**: Enhancement, new functionality - - **BUG**: Bug fix - - **DOC**: Additions/updates to documentation - - **TST**: Additions/updates to tests - - **BLD**: Updates to the build process/scripts - - **PERF**: Performance improvement - - **CLN**: Code cleanup - - Use [proper commit messages](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html): - - a subject line with `< 80` chars. - - One blank line. - - Optionally, a commit message body. - - Please reference relevant Github issues in your commit message using `GH1234` - or `#1234`. Either style is fine but the '#' style generates noise when your rebase your PR. - - `doc/source/vx.y.z.txt` contains an ongoing - changelog for each release. Add an entry to this file - as needed in your PR: document the fix, enhancement, - or (unavoidable) breaking change. - - Maintain backward-compatibility. Pandas has lots of users with lots of existing code. Don't break it. - - If you think breakage is required clearly state why as part of the PR. - - Be careful when changing method signatures. - - Add deprecation warnings where needed. - - Generally, pandas source files should not contain attributions. You can include a "thanks to..." - in the release changelog. The rest is `git blame`/`git log`. - -##### Workflow/Git - - When you start working on a PR, start by creating a new branch pointing at the latest - commit on github master. - - **Do not** merge upstream into a branch you're going to submit as a PR. - Use `git rebase` against the current github master. - - For extra brownie points, you can squash and reorder the commits in your PR using `git rebase -i`. - Use your own judgment to decide what history needs to be preserved. If git frightens you, that's OK too. - - Use `raise AssertionError` over `assert` unless you want the assertion stripped by `python -o`. - - The pandas copyright policy is detailed in the pandas [LICENSE](https://github.com/pydata/pandas/blob/master/LICENSE). - - On the subject of [PEP8](http://www.python.org/dev/peps/pep-0008/): yes. - - [Git tips and tricks](https://github.com/pydata/pandas/wiki/Using-Git) - -##### Code standards: - - We've written a tool to check that your commits are PEP8 great, - [`pip install pep8radius`](https://github.com/hayd/pep8radius). Look at PEP8 fixes in your branch - vs master with `pep8radius master --diff` and make these changes with - `pep8radius master --diff --in-place`. - - On the subject of a massive PEP8-storm touching everything: not too often (once per release works). - - Additional standards are outlined on the [code style wiki page](https://github.com/pydata/pandas/wiki/Code-Style-and-Conventions) - -### Notes on plotting function conventions - -https://groups.google.com/forum/#!topic/pystatsmodels/biNlCvJPNNY/discussion - -#### More developer docs -* See the [developers](http://pandas.pydata.org/developers.html) page on the - project website for more details. -* [`pandas` wiki](https://github.com/pydata/pandas/wiki) constains useful pages for development and general pandas usage -* [Tips and tricks](https://github.com/pydata/pandas/wiki/Tips-&-Tricks) +2. Include the full version string of *pandas* and its dependencies. In + recent (\>0.12) versions of *pandas* you can use a built in + function: : + + >>> from pandas.util.print_versions import show_versions + >>> show_versions() + + and in 0.13.1 onwards: : + + >>> pd.show_versions() + +3. Explain why the current behavior is wrong/not desired and what you + expect instead. + +The issue will then show up to the *pandas* community and be open to +comments/ideas from others. + +Working with the code +--------------------- + +Now that you have an issue you want to fix, enhancement to add, or +documentation to improve, you need to learn how to work with GitHub and +the *pandas* code base. + +### Version Control, Git, and GitHub + +To the new user, working with Git is one of the more daunting aspects of +contributing to *pandas*. It can very quickly become overwhelming, but +sticking to the guidelines below will make the process straightforward +and will work without much trouble. As always, if you are having +difficulties please feel free to ask for help. + +The code is hosted on [GitHub](https://www.github.com/pydata/pandas). To +contribute you will need to sign up for a [free GitHub +account](https://github.com/signup/free). We use +[Git](http://git-scm.com/) for version control to allow many people to +work together on the project. + +Some great resources for learning git: + +> - the [GitHub help pages](http://help.github.com/). +> - the [NumPy's +> documentation](http://docs.scipy.org/doc/numpy/dev/index.html). +> - Matthew Brett's +> [Pydagogue](http://matthew-brett.github.com/pydagogue/). + +### Getting Started with Git + +[GitHub has instructions](http://help.github.com/set-up-git-redirect) +for installing git, setting up your SSH key, and configuring git. All +these steps need to be completed before working seamlessly with your +local repository and GitHub. + +### Forking + +You will need your own fork to work on the code. Go to the [pandas +project page](https://github.com/pydata/pandas) and hit the *fork* +button. You will want to clone your fork to your machine: : + + git clone git@github.com:your-user-name/pandas.git pandas-yourname + cd pandas-yourname + git remote add upstream git://github.com/pydata/pandas.git + +This creates the directory pandas-yourname and connects your repository +to the upstream (main project) *pandas* repository. + +You will also need to hook up Travis-CI to your GitHub repository so the +suite is automatically run when a Pull Request is submitted. +Instructions are +[here](http://about.travis-ci.org/docs/user/getting-started/). + +### Creating a Branch + +You want your master branch to reflect only production-ready code, so +create a feature branch for making your changes. For example: + + git branch shiny-new-feature + git checkout shiny-new-feature + +The above can be simplified to: + + git checkout -b shiny-new-feature + +This changes your working directory to the shiny-new-feature branch. +Keep any changes in this branch specific to one bug or feature so it is +clear what the branch brings to *pandas*. You can have many +shiny-new-features and switch in between them using the git checkout +command. + +### Making changes + +Before making your code changes, it is often necessary to build the code +that was just checked out. There are two primary methods of doing this. + +1. The best way to develop *pandas* is to build the C extensions + in-place by running: + + python setup.py build_ext --inplace + + If you startup the Python interpreter in the *pandas* source + directory you will call the built C extensions + +2. Another very common option is to do a `develop` install of *pandas*: + + python setup.py develop + + This makes a symbolic link that tells the Python interpreter to + import *pandas* from your development directory. Thus, you can + always be using the development version on your system without being + inside the clone directory. + +### Contributing to the documentation + +If you're not the developer type, contributing to the documentation is +still of huge value. You don't even have to be an expert on *pandas* to +do so! Something as simple as rewriting small passages for clarity as +you reference the docs is a simple but effective way to contribute. The +next person to read that passage will be in your debt! + +Actually, there are sections of the docs that are worse off by being +written by experts. If something in the docs doesn't make sense to you, +updating the relevant section after you figure it out is a simple way to +ensure it will help the next person. + +#### About the pandas documentation + +The documentation is written in **reStructuredText**, which is almost +like writing in plain English, and built using +[Sphinx](http://sphinx.pocoo.org/). The Sphinx Documentation has an +excellent [introduction to reST](http://sphinx.pocoo.org/rest.html). +Review the Sphinx docs to perform more complex changes to the +documentation as well. + +Some other important things to know about the docs: + +- The *pandas* documentation consists of two parts: the docstrings in + the code itself and the docs in this folder `pandas/doc/`. + + The docstrings provide a clear explanation of the usage of the + individual functions, while the documentation in this folder + consists of tutorial-like overviews per topic together with some + other information (what's new, installation, etc). + +- The docstrings follow the **Numpy Docstring Standard** which is used + widely in the Scientific Python community. This standard specifies + the format of the different sections of the docstring. See [this + document](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt) + for a detailed explanation, or look at some of the existing + functions to extend it in a similar manner. +- The tutorials make heavy use of the [ipython + directive](http://matplotlib.org/sampledoc/ipython_directive.html) + sphinx extension. This directive lets you put code in the + documentation which will be run during the doc build. For example: + + .. ipython:: python + + x = 2 + x**3 + + will be rendered as + + In [1]: x = 2 + + In [2]: x**3 + Out[2]: 8 + + This means that almost all code examples in the docs are always run + (and the output saved) during the doc build. This way, they will + always be up to date, but it makes the doc building a bit more + complex. + +#### How to build the pandas documentation + +##### Requirements + +To build the *pandas* docs there are some extra requirements: you will +need to have `sphinx` and `ipython` installed. +[numpydoc](https://github.com/numpy/numpydoc) is used to parse the +docstrings that follow the Numpy Docstring Standard (see above), but you +don't need to install this because a local copy of `numpydoc` is +included in the *pandas* source code. + +Furthermore, it is recommended to have all [optional +dependencies](http://pandas.pydata.org/pandas-docs/dev/install.html#optional-dependencies) +installed. This is not needed, but be aware that you will see some error +messages. Because all the code in the documentation is executed during +the doc build, the examples using this optional dependencies will +generate errors. Run `pd.show_versions()` to get an overview of the +installed version of all dependencies. + +> **warning** +> +> Sphinx version \>= 1.2.2 or the older 1.1.3 is required. + +##### Building the documentation + +So how do you build the docs? Navigate to your local the folder +`pandas/doc/` directory in the console and run: + + python make.py html + +And then you can find the html output in the folder +`pandas/doc/build/html/`. + +The first time it will take quite a while, because it has to run all the +code examples in the documentation and build all generated docstring +pages. In subsequent evocations, sphinx will try to only build the pages +that have been modified. + +If you want to do a full clean build, do: + + python make.py clean + python make.py build + +Starting with 0.13.1 you can tell `make.py` to compile only a single +section of the docs, greatly reducing the turn-around time for checking +your changes. You will be prompted to delete .rst files that aren't +required, since the last committed version can always be restored from +git. + + #omit autosummary and API section + python make.py clean + python make.py --no-api + + # compile the docs with only a single + # section, that which is in indexing.rst + python make.py clean + python make.py --single indexing + +For comparison, a full documentation build may take 10 minutes. a +`-no-api` build may take 3 minutes and a single section may take 15 +seconds. However, subsequent builds only process portions you changed. +Now, open the following file in a web browser to see the full +documentation you just built: + + pandas/docs/build/html/index.html + +And you'll have the satisfaction of seeing your new and improved +documentation! + +### Contributing to the code base + +#### Code Standards + +*pandas* uses the [PEP8](http://www.python.org/dev/peps/pep-0008/) +standard. There are several tools to ensure you abide by this standard. + +We've written a tool to check that your commits are PEP8 great, [pip +install pep8radius](https://github.com/hayd/pep8radius). Look at PEP8 +fixes in your branch vs master with: + + pep8radius master --diff` and make these changes with `pep8radius master --diff --in-place` + +Alternatively, use [flake8](http://pypi.python.org/pypi/flake8) tool for +checking the style of your code. Additional standards are outlined on +the [code style wiki +page](https://github.com/pydata/pandas/wiki/Code-Style-and-Conventions). + +Please try to maintain backward-compatibility. *Pandas* has lots of +users with lots of existing code, so don't break it if at all possible. +If you think breakage is required clearly state why as part of the Pull +Request. Also, be careful when changing method signatures and add +deprecation warnings where needed. + +#### Test-driven Development/Writing Code + +*Pandas* is serious about [Test-driven Development +(TDD)](http://en.wikipedia.org/wiki/Test-driven_development). This +development process "relies on the repetition of a very short +development cycle: first the developer writes an (initially failing) +automated test case that defines a desired improvement or new function, +then produces the minimum amount of code to pass that test." So, before +actually writing any code, you should write your tests. Often the test +can be taken from the original GitHub issue. However, it is always worth +considering additional use cases and writing corresponding tests. + +Adding tests is one of the most common requests after code is pushed to +*pandas*. It is worth getting in the habit of writing tests ahead of +time so this is never an issue. + +Like many packages, *pandas* uses the [Nose testing +system](http://somethingaboutorange.com/mrl/projects/nose/) and the +convenient extensions in +[numpy.testing](http://docs.scipy.org/doc/numpy/reference/routines.testing.html). + +##### Writing tests + +All tests should go into the *tests* subdirectory of the specific +package. There are probably many examples already there and looking to +these for inspiration is suggested. If you test requires working with +files or network connectivity there is more information on the [testing +page](https://github.com/pydata/pandas/wiki/Testing) of the wiki. + +The `pandas.util.testing` module has many special `assert` functions +that make it easier to make statements about whether Series or DataFrame +objects are equivalent. The easiest way to verify that your code is +correct is to explicitly construct the result you expect, then compare +the actual result to the expected correct result: + + def test_pivot(self): + data = { + 'index' : ['A', 'B', 'C', 'C', 'B', 'A'], + 'columns' : ['One', 'One', 'One', 'Two', 'Two', 'Two'], + 'values' : [1., 2., 3., 3., 2., 1.] + } + + frame = DataFrame(data) + pivoted = frame.pivot(index='index', columns='columns', values='values') + + expected = DataFrame({ + 'One' : {'A' : 1., 'B' : 2., 'C' : 3.}, + 'Two' : {'A' : 1., 'B' : 2., 'C' : 3.} + }) + + assert_frame_equal(pivoted, expected) + +##### Running the test suite + +The tests can then be run directly inside your git clone (without having +to install *pandas*) by typing:: + + nosetests pandas + +The tests suite is exhaustive and takes around 20 minutes to run. Often +it is worth running only a subset of tests first around your changes +before running the entire suite. This is done using one of the following +constructs: + + nosetests pandas/tests/[test-module].py + nosetests pandas/tests/[test-module].py:[TestClass] + nosetests pandas/tests/[test-module].py:[TestClass].[test_method] + +##### Running the performance test suite + +Performance matters and it is worth considering that your code has not +introduced performance regressions. Currently *pandas* uses the [vbench +library](https://github.com/pydata/vbench) to enable easy monitoring of +the performance of critical *pandas* operations. These benchmarks are +all found in the `pandas/vb_suite` directory. vbench currently only +works on python2. + +To install vbench: + + pip install git+https://github.com/pydata/vbench + +Vbench also requires sqlalchemy, gitpython, and psutil which can all be +installed using pip. If you need to run a benchmark, change your +directory to the *pandas* root and run: + + ./test_perf.sh -b master -t HEAD + +This will checkout the master revision and run the suite on both master +and your commit. Running the full test suite can take up to one hour and +use up to 3GB of RAM. Usually it is sufficient to past a subset of the +results in to the Pull Request to show that the committed changes do not +cause unexpected performance regressions. + +You can run specific benchmarks using the *-r* flag which takes a +regular expression. + +See the [performance testing +wiki](https://github.com/pydata/pandas/wiki/Performance-Testing) for +information on how to write a benchmark. + +#### Documenting your code + +Changes should be reflected in the release notes located in +doc/source/whatsnew/vx.y.z.txt. This file contains an ongoing change log +for each release. Add an entry to this file to document your fix, +enhancement or (unavoidable) breaking change. Make sure to include the +GitHub issue number when adding your entry. + +If your code is an enhancement, it is most likely necessary to add usage +examples to the existing documentation. This can be done following the +section regarding documentation. + +### Committing your code + +Keep style fixes to a separate commit to make your PR more readable. + +Once you've made changes, you can see them by typing: + + git status + +If you've created a new file, it is not being tracked by git. Add it by +typing : + + git add path/to/file-to-be-added.py + +Doing 'git status' again should give something like : + + # On branch shiny-new-feature + # + # modified: /relative/path/to/file-you-added.py + # + +Finally, commit your changes to your local repository with an +explanatory message. An informal commit message format is in effect for +the project. Please try to adhere to it. Here are some common prefixes +along with general guidelines for when to use them: + +> - ENH: Enhancement, new functionality +> - BUG: Bug fix +> - DOC: Additions/updates to documentation +> - TST: Additions/updates to tests +> - BLD: Updates to the build process/scripts +> - PERF: Performance improvement +> - CLN: Code cleanup + +The following defines how a commit message should be structured. Please +reference the relevant GitHub issues in your commit message using GH1234 +or \#1234. Either style is fine, but the former is generally preferred: + +> - a subject line with \< 80 chars. +> - One blank line. +> - Optionally, a commit message body. + +Now you can commit your changes in your local repository: + + git commit -m + +If you have multiple commits, it is common to want to combine them into +one commit, often referred to as "squashing" or "rebasing". This is a +common request by package maintainers when submitting a Pull Request as +it maintains a more compact commit history. To rebase your commits: + + git rebase -i HEAD~# + +Where \# is the number of commits you want to combine. Then you can pick +the relevant commit message and discard others. + +### Pushing your changes + +When you want your changes to appear publicly on your GitHub page, push +your forked feature branch's commits : + + git push origin shiny-new-feature + +Here origin is the default name given to your remote repository on +GitHub. You can see the remote repositories : + + git remote -v + +If you added the upstream repository as described above you will see +something like : + + origin git@github.com:yourname/pandas.git (fetch) + origin git@github.com:yourname/pandas.git (push) + upstream git://github.com/pydata/pandas.git (fetch) + upstream git://github.com/pydata/pandas.git (push) + +Now your code is on GitHub, but it is not yet a part of the *pandas* +project. For that to happen, a Pull Request needs to be submitted on +GitHub. + +Contributing your changes to *pandas* +\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~ + +### Review your code + +When you're ready to ask for a code review, you will file a Pull +Request. Before you do, again make sure you've followed all the +guidelines outlined in this document regarding code style, tests, +performance tests, and documentation. You should also double check your +branch changes against the branch it was based off of: + +1. Navigate to your repository on + GitHub--. +2. Click on Branches. +3. Click on the Compare button for your feature branch. +4. Select the base and compare branches, if necessary. This will be + master and shiny-new-feature, respectively. + +### Finally, make the Pull Request + +If everything looks good you are ready to make a Pull Request. A Pull +Request is how code from a local repository becomes available to the +GitHub community and can be looked at and eventually merged into the +master version. This Pull Request and its associated changes will +eventually be committed to the master branch and available in the next +release. To submit a Pull Request: + +1. Navigate to your repository on GitHub. +2. Click on the Pull Request button. +3. You can then click on Commits and Files Changed to make sure + everything looks okay one last time. +4. Write a description of your changes in the Preview Discussion tab. +5. Click Send Pull Request. + +This request then appears to the repository maintainers, and they will +review the code. If you need to make more changes, you can make them in +your branch, push them to GitHub, and the pull request will be +automatically updated. Pushing them to GitHub again is done by: + + git push -f origin shiny-new-feature + +This will automatically update your Pull Request with the latest code +and restart the Travis-CI tests. + +### Delete your merged branch (optional) + +Once your feature branch is accepted into upstream, you'll probably want +to get rid of the branch. First, merge upstream master into your branch +so git knows it is safe to delete your branch : + + git fetch upstream + git checkout master + git merge upstream/master + +Then you can just do: + + git branch -d shiny-new-feature + +Make sure you use a lower-case -d, or else git won't warn you if your +feature branch has not actually been merged. + +The branch will still exist on GitHub, so to delete it there do : + + git push origin --delete shiny-new-feature