diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 5329bad1d90e4..f7041dbabdad5 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,105 +1,571 @@ -### Guidelines +Contributing to pandas +====================== + +Where to start? +--------------- All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome. -The [GitHub "issues" tab](https://github.com/pydata/pandas/issues) -contains some issues labeled "Good as first PR"; Look those up if you're -looking for a quick way to help out. +If you are simply looking to start working with the *pandas* codebase, +navigate to the [GitHub "issues" +tab](https://github.com/pydata/pandas/issues) and start looking through +interesting issues. There are a number of issues listed under +[Docs](https://github.com/pydata/pandas/issues?labels=Docs&sort=updated&state=open) +and [Good as first +PR](https://github.com/pydata/pandas/issues?labels=Good+as+first+PR&sort=updated&state=open) +where you could start out. -#### Bug Reports +Or maybe through using *pandas* you have an idea of you own or are +looking for something in the documentation and thinking 'this can be +improved'...you can do something about it! - - Please include a short, self-contained Python snippet reproducing the problem. - You can have the code formatted nicely by using [GitHub Flavored Markdown](http://github.github.com/github-flavored-markdown/) : +Feel free to ask questions on [mailing +list](https://groups.google.com/forum/?fromgroups#!forum/pydata) - ```python +Bug Reports/Enhancement Requests +-------------------------------- + +Bug reports are an important part of making *pandas* more stable. Having +a complete bug report will allow others to reproduce the bug and provide +insight into fixing. Since many versions of *pandas* are supported, +knowing version information will also identify improvements made since +previous versions. Often trying the bug-producing code out on the +*master* branch is a worthwhile exercise to confirm the bug still +exists. It is also worth searching existing bug reports and pull +requests to see if the issue has already been reported and/or fixed. + +Bug reports must: + +1. Include a short, self-contained Python snippet reproducing the + problem. You can have the code formatted nicely by using [GitHub + Flavored + Markdown](http://github.github.com/github-flavored-markdown/): : + ```python >>> from pandas import DataFrame >>> df = DataFrame(...) ... ``` - - Include the full version string of pandas and its dependencies. In recent (>0.12) versions - of pandas you can use a built in function: - - ```python - >>> from pandas.util.print_versions import show_versions - >>> show_versions() - ``` - - and in 0.13.1 onwards: - ```python - >>> pd.show_versions() - ``` - - Explain what the expected behavior was, and what you saw instead. - -#### Pull Requests - -##### Testing: - - Every addition to the codebase whether it be a bug or new feature should have associated tests. The can be placed in the `tests` directory where your code change occurs. - - When writing tests, use 2.6 compatible `self.assertFoo` methods. Some polyfills such as `assertRaises` - can be found in `pandas.util.testing`. - - Do not attach doctrings to tests. Make the test itself readable and use comments if needed. - - **Make sure the test suite passes** on your box, use the provided `test_*.sh` scripts or tox. Pandas tests a variety of platforms and Python versions so be cognizant of cross-platorm considerations. - - Performance matters. Make sure your PR hasn't introduced performance regressions by using `test_perf.sh`. See [vbench performance tests](https://github.com/pydata/pandas/wiki/Performance-Testing) wiki for more information on running these tests. - - For more information on testing see [Testing advice and best practices in `pandas`](https://github.com/pydata/pandas/wiki/Testing) - -##### Documentation / Commit Messages: - - Docstrings follow the [numpydoc](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt) format. - - Keep style fixes to a separate commit to make your PR more readable. - - An informal commit message format is in effect for the project. Please try - and adhere to it. Check `git log` for examples. Here are some common prefixes - along with general guidelines for when to use them: - - **ENH**: Enhancement, new functionality - - **BUG**: Bug fix - - **DOC**: Additions/updates to documentation - - **TST**: Additions/updates to tests - - **BLD**: Updates to the build process/scripts - - **PERF**: Performance improvement - - **CLN**: Code cleanup - - Use [proper commit messages](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html): - - a subject line with `< 80` chars. - - One blank line. - - Optionally, a commit message body. - - Please reference relevant Github issues in your commit message using `GH1234` - or `#1234`. Either style is fine but the '#' style generates noise when your rebase your PR. - - `doc/source/vx.y.z.txt` contains an ongoing - changelog for each release. Add an entry to this file - as needed in your PR: document the fix, enhancement, - or (unavoidable) breaking change. - - Maintain backward-compatibility. Pandas has lots of users with lots of existing code. Don't break it. - - If you think breakage is required clearly state why as part of the PR. - - Be careful when changing method signatures. - - Add deprecation warnings where needed. - - Generally, pandas source files should not contain attributions. You can include a "thanks to..." - in the release changelog. The rest is `git blame`/`git log`. - -##### Workflow/Git - - When you start working on a PR, start by creating a new branch pointing at the latest - commit on github master. - - **Do not** merge upstream into a branch you're going to submit as a PR. - Use `git rebase` against the current github master. - - For extra brownie points, you can squash and reorder the commits in your PR using `git rebase -i`. - Use your own judgment to decide what history needs to be preserved. If git frightens you, that's OK too. - - Use `raise AssertionError` over `assert` unless you want the assertion stripped by `python -o`. - - The pandas copyright policy is detailed in the pandas [LICENSE](https://github.com/pydata/pandas/blob/master/LICENSE). - - On the subject of [PEP8](http://www.python.org/dev/peps/pep-0008/): yes. - - [Git tips and tricks](https://github.com/pydata/pandas/wiki/Using-Git) - -##### Code standards: - - We've written a tool to check that your commits are PEP8 great, - [`pip install pep8radius`](https://github.com/hayd/pep8radius). Look at PEP8 fixes in your branch - vs master with `pep8radius master --diff` and make these changes with - `pep8radius master --diff --in-place`. - - On the subject of a massive PEP8-storm touching everything: not too often (once per release works). - - Additional standards are outlined on the [code style wiki page](https://github.com/pydata/pandas/wiki/Code-Style-and-Conventions) - -### Notes on plotting function conventions - -https://groups.google.com/forum/#!topic/pystatsmodels/biNlCvJPNNY/discussion - -#### More developer docs -* See the [developers](http://pandas.pydata.org/developers.html) page on the - project website for more details. -* [`pandas` wiki](https://github.com/pydata/pandas/wiki) constains useful pages for development and general pandas usage -* [Tips and tricks](https://github.com/pydata/pandas/wiki/Tips-&-Tricks) +2. Include the full version string of *pandas* and its dependencies. In + recent (\>0.12) versions of *pandas* you can use a built in + function: : + + >>> from pandas.util.print_versions import show_versions + >>> show_versions() + + and in 0.13.1 onwards: : + + >>> pd.show_versions() + +3. Explain why the current behavior is wrong/not desired and what you + expect instead. + +The issue will then show up to the *pandas* community and be open to +comments/ideas from others. + +Working with the code +--------------------- + +Now that you have an issue you want to fix, enhancement to add, or +documentation to improve, you need to learn how to work with GitHub and +the *pandas* code base. + +### Version Control, Git, and GitHub + +To the new user, working with Git is one of the more daunting aspects of +contributing to *pandas*. It can very quickly become overwhelming, but +sticking to the guidelines below will make the process straightforward +and will work without much trouble. As always, if you are having +difficulties please feel free to ask for help. + +The code is hosted on [GitHub](https://www.github.com/pydata/pandas). To +contribute you will need to sign up for a [free GitHub +account](https://github.com/signup/free). We use +[Git](http://git-scm.com/) for version control to allow many people to +work together on the project. + +Some great resources for learning git: + +- the [GitHub help pages](http://help.github.com/). +- the [NumPy's + documentation](http://docs.scipy.org/doc/numpy/dev/index.html). +- Matthew Brett's + [Pydagogue](http://matthew-brett.github.com/pydagogue/). + +### Getting Started with Git + +[GitHub has instructions](http://help.github.com/set-up-git-redirect) +for installing git, setting up your SSH key, and configuring git. All +these steps need to be completed before working seamlessly with your +local repository and GitHub. + +### Forking + +You will need your own fork to work on the code. Go to the [pandas +project page](https://github.com/pydata/pandas) and hit the *fork* +button. You will want to clone your fork to your machine: : + + git clone git@github.com:your-user-name/pandas.git pandas-yourname + cd pandas-yourname + git remote add upstream git://github.com/pydata/pandas.git + +This creates the directory pandas-yourname and connects your repository +to the upstream (main project) *pandas* repository. + +You will also need to hook up Travis-CI to your GitHub repository so the +suite is automatically run when a Pull Request is submitted. +Instructions are +[here](http://about.travis-ci.org/docs/user/getting-started/). + +### Creating a Branch + +You want your master branch to reflect only production-ready code, so +create a feature branch for making your changes. For example: + + git branch shiny-new-feature + git checkout shiny-new-feature + +The above can be simplified to: + + git checkout -b shiny-new-feature + +This changes your working directory to the shiny-new-feature branch. +Keep any changes in this branch specific to one bug or feature so it is +clear what the branch brings to *pandas*. You can have many +shiny-new-features and switch in between them using the git checkout +command. + +### Making changes + +Before making your code changes, it is often necessary to build the code +that was just checked out. There are two primary methods of doing this. + +1. The best way to develop *pandas* is to build the C extensions + in-place by running: + + python setup.py build_ext --inplace + + If you startup the Python interpreter in the *pandas* source + directory you will call the built C extensions + +2. Another very common option is to do a `develop` install of *pandas*: + + python setup.py develop + + This makes a symbolic link that tells the Python interpreter to + import *pandas* from your development directory. Thus, you can + always be using the development version on your system without being + inside the clone directory. + +Contributing to the documentation +--------------------------------- + +If you're not the developer type, contributing to the documentation is +still of huge value. You don't even have to be an expert on *pandas* to +do so! Something as simple as rewriting small passages for clarity as +you reference the docs is a simple but effective way to contribute. The +next person to read that passage will be in your debt! + +Actually, there are sections of the docs that are worse off by being +written by experts. If something in the docs doesn't make sense to you, +updating the relevant section after you figure it out is a simple way to +ensure it will help the next person. + +### About the pandas documentation + +The documentation is written in **reStructuredText**, which is almost +like writing in plain English, and built using +[Sphinx](http://sphinx.pocoo.org/). The Sphinx Documentation has an +excellent [introduction to reST](http://sphinx.pocoo.org/rest.html). +Review the Sphinx docs to perform more complex changes to the +documentation as well. + +Some other important things to know about the docs: + +- The *pandas* documentation consists of two parts: the docstrings in + the code itself and the docs in this folder `pandas/doc/`. + + The docstrings provide a clear explanation of the usage of the + individual functions, while the documentation in this folder + consists of tutorial-like overviews per topic together with some + other information (what's new, installation, etc). + +- The docstrings follow the **Numpy Docstring Standard** which is used + widely in the Scientific Python community. This standard specifies + the format of the different sections of the docstring. See [this + document](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt) + for a detailed explanation, or look at some of the existing + functions to extend it in a similar manner. +- The tutorials make heavy use of the [ipython + directive](http://matplotlib.org/sampledoc/ipython_directive.html) + sphinx extension. This directive lets you put code in the + documentation which will be run during the doc build. For example: + + .. ipython:: python + + x = 2 + x**3 + + will be rendered as + + In [1]: x = 2 + + In [2]: x**3 + Out[2]: 8 + + This means that almost all code examples in the docs are always run + (and the output saved) during the doc build. This way, they will + always be up to date, but it makes the doc building a bit more + complex. + +### How to build the pandas documentation + +#### Requirements + +To build the *pandas* docs there are some extra requirements: you will +need to have `sphinx` and `ipython` installed. +[numpydoc](https://github.com/numpy/numpydoc) is used to parse the +docstrings that follow the Numpy Docstring Standard (see above), but you +don't need to install this because a local copy of `numpydoc` is +included in the *pandas* source code. + +Furthermore, it is recommended to have all [optional +dependencies](http://pandas.pydata.org/pandas-docs/dev/install.html#optional-dependencies) +installed. This is not needed, but be aware that you will see some error +messages. Because all the code in the documentation is executed during +the doc build, the examples using this optional dependencies will +generate errors. Run `pd.show_versions()` to get an overview of the +installed version of all dependencies. + +> **warning** +> +> Sphinx version \>= 1.2.2 or the older 1.1.3 is required. + +#### Building the documentation + +So how do you build the docs? Navigate to your local the folder +`pandas/doc/` directory in the console and run: + + python make.py html + +And then you can find the html output in the folder +`pandas/doc/build/html/`. + +The first time it will take quite a while, because it has to run all the +code examples in the documentation and build all generated docstring +pages. In subsequent evocations, sphinx will try to only build the pages +that have been modified. + +If you want to do a full clean build, do: + + python make.py clean + python make.py build + +Starting with 0.13.1 you can tell `make.py` to compile only a single +section of the docs, greatly reducing the turn-around time for checking +your changes. You will be prompted to delete .rst files that aren't +required, since the last committed version can always be restored from +git. + + #omit autosummary and API section + python make.py clean + python make.py --no-api + + # compile the docs with only a single + # section, that which is in indexing.rst + python make.py clean + python make.py --single indexing + +For comparison, a full documentation build may take 10 minutes. a +`-no-api` build may take 3 minutes and a single section may take 15 +seconds. However, subsequent builds only process portions you changed. +Now, open the following file in a web browser to see the full +documentation you just built: + + pandas/docs/build/html/index.html + +And you'll have the satisfaction of seeing your new and improved +documentation! + +Contributing to the code base +----------------------------- + +### Code Standards + +*pandas* uses the [PEP8](http://www.python.org/dev/peps/pep-0008/) +standard. There are several tools to ensure you abide by this standard. + +We've written a tool to check that your commits are PEP8 great, [pip +install pep8radius](https://github.com/hayd/pep8radius). Look at PEP8 +fixes in your branch vs master with: + + pep8radius master --diff` and make these changes with `pep8radius master --diff --in-place` + +Alternatively, use [flake8](http://pypi.python.org/pypi/flake8) tool for +checking the style of your code. Additional standards are outlined on +the [code style wiki +page](https://github.com/pydata/pandas/wiki/Code-Style-and-Conventions). + +Please try to maintain backward-compatibility. *Pandas* has lots of +users with lots of existing code, so don't break it if at all possible. +If you think breakage is required clearly state why as part of the Pull +Request. Also, be careful when changing method signatures and add +deprecation warnings where needed. + +### Test-driven Development/Writing Code + +*Pandas* is serious about [Test-driven Development +(TDD)](http://en.wikipedia.org/wiki/Test-driven_development). This +development process "relies on the repetition of a very short +development cycle: first the developer writes an (initially failing) +automated test case that defines a desired improvement or new function, +then produces the minimum amount of code to pass that test." So, before +actually writing any code, you should write your tests. Often the test +can be taken from the original GitHub issue. However, it is always worth +considering additional use cases and writing corresponding tests. + +Adding tests is one of the most common requests after code is pushed to +*pandas*. It is worth getting in the habit of writing tests ahead of +time so this is never an issue. + +Like many packages, *pandas* uses the [Nose testing +system](http://somethingaboutorange.com/mrl/projects/nose/) and the +convenient extensions in +[numpy.testing](http://docs.scipy.org/doc/numpy/reference/routines.testing.html). + +#### Writing tests + +All tests should go into the *tests* subdirectory of the specific +package. There are probably many examples already there and looking to +these for inspiration is suggested. If you test requires working with +files or network connectivity there is more information on the [testing +page](https://github.com/pydata/pandas/wiki/Testing) of the wiki. + +The `pandas.util.testing` module has many special `assert` functions +that make it easier to make statements about whether Series or DataFrame +objects are equivalent. The easiest way to verify that your code is +correct is to explicitly construct the result you expect, then compare +the actual result to the expected correct result: + + def test_pivot(self): + data = { + 'index' : ['A', 'B', 'C', 'C', 'B', 'A'], + 'columns' : ['One', 'One', 'One', 'Two', 'Two', 'Two'], + 'values' : [1., 2., 3., 3., 2., 1.] + } + + frame = DataFrame(data) + pivoted = frame.pivot(index='index', columns='columns', values='values') + + expected = DataFrame({ + 'One' : {'A' : 1., 'B' : 2., 'C' : 3.}, + 'Two' : {'A' : 1., 'B' : 2., 'C' : 3.} + }) + + assert_frame_equal(pivoted, expected) + +#### Running the test suite + +The tests can then be run directly inside your git clone (without having +to install *pandas*) by typing:: + + nosetests pandas + +The tests suite is exhaustive and takes around 20 minutes to run. Often +it is worth running only a subset of tests first around your changes +before running the entire suite. This is done using one of the following +constructs: + + nosetests pandas/tests/[test-module].py + nosetests pandas/tests/[test-module].py:[TestClass] + nosetests pandas/tests/[test-module].py:[TestClass].[test_method] + +#### Running the performance test suite + +Performance matters and it is worth considering that your code has not +introduced performance regressions. Currently *pandas* uses the [vbench +library](https://github.com/pydata/vbench) to enable easy monitoring of +the performance of critical *pandas* operations. These benchmarks are +all found in the `pandas/vb_suite` directory. vbench currently only +works on python2. + +To install vbench: + + pip install git+https://github.com/pydata/vbench + +Vbench also requires sqlalchemy, gitpython, and psutil which can all be +installed using pip. If you need to run a benchmark, change your +directory to the *pandas* root and run: + + ./test_perf.sh -b master -t HEAD + +This will checkout the master revision and run the suite on both master +and your commit. Running the full test suite can take up to one hour and +use up to 3GB of RAM. Usually it is sufficient to past a subset of the +results in to the Pull Request to show that the committed changes do not +cause unexpected performance regressions. + +You can run specific benchmarks using the *-r* flag which takes a +regular expression. + +See the [performance testing +wiki](https://github.com/pydata/pandas/wiki/Performance-Testing) for +information on how to write a benchmark. + +### Documenting your code + +Changes should be reflected in the release notes located in +doc/source/whatsnew/vx.y.z.txt. This file contains an ongoing change log +for each release. Add an entry to this file to document your fix, +enhancement or (unavoidable) breaking change. Make sure to include the +GitHub issue number when adding your entry. + +If your code is an enhancement, it is most likely necessary to add usage +examples to the existing documentation. This can be done following the +section regarding documentation. + +Contributing your changes to *pandas* +------------------------------------- + +### Committing your code + +Keep style fixes to a separate commit to make your PR more readable. + +Once you've made changes, you can see them by typing: + + git status + +If you've created a new file, it is not being tracked by git. Add it by +typing : + + git add path/to/file-to-be-added.py + +Doing 'git status' again should give something like : + + # On branch shiny-new-feature + # + # modified: /relative/path/to/file-you-added.py + # + +Finally, commit your changes to your local repository with an +explanatory message. An informal commit message format is in effect for +the project. Please try to adhere to it. Here are some common prefixes +along with general guidelines for when to use them: + +> - ENH: Enhancement, new functionality +> - BUG: Bug fix +> - DOC: Additions/updates to documentation +> - TST: Additions/updates to tests +> - BLD: Updates to the build process/scripts +> - PERF: Performance improvement +> - CLN: Code cleanup + +The following defines how a commit message should be structured. Please +reference the relevant GitHub issues in your commit message using GH1234 +or \#1234. Either style is fine, but the former is generally preferred: + +> - a subject line with \< 80 chars. +> - One blank line. +> - Optionally, a commit message body. + +Now you can commit your changes in your local repository: + + git commit -m + +If you have multiple commits, it is common to want to combine them into +one commit, often referred to as "squashing" or "rebasing". This is a +common request by package maintainers when submitting a Pull Request as +it maintains a more compact commit history. To rebase your commits: + + git rebase -i HEAD~# + +Where \# is the number of commits you want to combine. Then you can pick +the relevant commit message and discard others. + +### Pushing your changes + +When you want your changes to appear publicly on your GitHub page, push +your forked feature branch's commits : + + git push origin shiny-new-feature + +Here origin is the default name given to your remote repository on +GitHub. You can see the remote repositories : + + git remote -v + +If you added the upstream repository as described above you will see +something like : + + origin git@github.com:yourname/pandas.git (fetch) + origin git@github.com:yourname/pandas.git (push) + upstream git://github.com/pydata/pandas.git (fetch) + upstream git://github.com/pydata/pandas.git (push) + +Now your code is on GitHub, but it is not yet a part of the *pandas* +project. For that to happen, a Pull Request needs to be submitted on +GitHub. + +### Review your code + +When you're ready to ask for a code review, you will file a Pull +Request. Before you do, again make sure you've followed all the +guidelines outlined in this document regarding code style, tests, +performance tests, and documentation. You should also double check your +branch changes against the branch it was based off of: + +1. Navigate to your repository on + GitHub--. +2. Click on Branches. +3. Click on the Compare button for your feature branch. +4. Select the base and compare branches, if necessary. This will be + master and shiny-new-feature, respectively. + +### Finally, make the Pull Request + +If everything looks good you are ready to make a Pull Request. A Pull +Request is how code from a local repository becomes available to the +GitHub community and can be looked at and eventually merged into the +master version. This Pull Request and its associated changes will +eventually be committed to the master branch and available in the next +release. To submit a Pull Request: + +1. Navigate to your repository on GitHub. +2. Click on the Pull Request button. +3. You can then click on Commits and Files Changed to make sure + everything looks okay one last time. +4. Write a description of your changes in the Preview Discussion tab. +5. Click Send Pull Request. + +This request then appears to the repository maintainers, and they will +review the code. If you need to make more changes, you can make them in +your branch, push them to GitHub, and the pull request will be +automatically updated. Pushing them to GitHub again is done by: + + git push -f origin shiny-new-feature + +This will automatically update your Pull Request with the latest code +and restart the Travis-CI tests. + +### Delete your merged branch (optional) + +Once your feature branch is accepted into upstream, you'll probably want +to get rid of the branch. First, merge upstream master into your branch +so git knows it is safe to delete your branch : + + git fetch upstream + git checkout master + git merge upstream/master + +Then you can just do: + + git branch -d shiny-new-feature + +Make sure you use a lower-case -d, or else git won't warn you if your +feature branch has not actually been merged. + +The branch will still exist on GitHub, so to delete it there do : + + git push origin --delete shiny-new-feature diff --git a/doc/source/contributing.rst b/doc/source/contributing.rst index 68bd6109b85d7..b3b2d272e66c6 100644 --- a/doc/source/contributing.rst +++ b/doc/source/contributing.rst @@ -13,8 +13,8 @@ Where to start? All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome. -If you are simply looking to start working with the *pandas* codebase, navigate to the -`GitHub "issues" tab `_ and start looking through +If you are simply looking to start working with the *pandas* codebase, navigate to the +`GitHub "issues" tab `_ and start looking through interesting issues. There are a number of issues listed under `Docs `_ and `Good as first PR @@ -31,11 +31,11 @@ Feel free to ask questions on `mailing list Bug Reports/Enhancement Requests ================================ -Bug reports are an important part of making *pandas* more stable. Having a complete bug report -will allow others to reproduce the bug and provide insight into fixing. Since many versions of -*pandas* are supported, knowing version information will also identify improvements made since -previous versions. Often trying the bug-producing code out on the *master* branch is a worthwhile exercise -to confirm the bug still exists. It is also worth searching existing bug reports and pull requests +Bug reports are an important part of making *pandas* more stable. Having a complete bug report +will allow others to reproduce the bug and provide insight into fixing. Since many versions of +*pandas* are supported, knowing version information will also identify improvements made since +previous versions. Often trying the bug-producing code out on the *master* branch is a worthwhile exercise +to confirm the bug still exists. It is also worth searching existing bug reports and pull requests to see if the issue has already been reported and/or fixed. Bug reports must: @@ -59,7 +59,7 @@ Bug reports must: and in 0.13.1 onwards: :: >>> pd.show_versions() - + #. Explain why the current behavior is wrong/not desired and what you expect instead. The issue will then show up to the *pandas* community and be open to comments/ideas from others. @@ -67,15 +67,15 @@ The issue will then show up to the *pandas* community and be open to comments/id Working with the code ===================== -Now that you have an issue you want to fix, enhancement to add, or documentation to improve, +Now that you have an issue you want to fix, enhancement to add, or documentation to improve, you need to learn how to work with GitHub and the *pandas* code base. Version Control, Git, and GitHub -------------------------------- -To the new user, working with Git is one of the more daunting aspects of contributing to *pandas*. -It can very quickly become overwhelming, but sticking to the guidelines below will make the process -straightforward and will work without much trouble. As always, if you are having difficulties please +To the new user, working with Git is one of the more daunting aspects of contributing to *pandas*. +It can very quickly become overwhelming, but sticking to the guidelines below will make the process +straightforward and will work without much trouble. As always, if you are having difficulties please feel free to ask for help. The code is hosted on `GitHub `_. To @@ -85,14 +85,14 @@ version control to allow many people to work together on the project. Some great resources for learning git: - * the `GitHub help pages `_. - * the `NumPy's documentation `_. - * Matthew Brett's `Pydagogue `_. +* the `GitHub help pages `_. +* the `NumPy's documentation `_. +* Matthew Brett's `Pydagogue `_. Getting Started with Git ------------------------ -`GitHub has instructions `__ for installing git, +`GitHub has instructions `__ for installing git, setting up your SSH key, and configuring git. All these steps need to be completed before working seamlessly with your local repository and GitHub. @@ -110,7 +110,7 @@ want to clone your fork to your machine: :: This creates the directory `pandas-yourname` and connects your repository to the upstream (main project) *pandas* repository. -You will also need to hook up Travis-CI to your GitHub repository so the suite +You will also need to hook up Travis-CI to your GitHub repository so the suite is automatically run when a Pull Request is submitted. Instructions are `here `_. @@ -127,27 +127,27 @@ The above can be simplified to:: git checkout -b shiny-new-feature -This changes your working directory to the shiny-new-feature branch. Keep any -changes in this branch specific to one bug or feature so it is clear -what the branch brings to *pandas*. You can have many shiny-new-features +This changes your working directory to the shiny-new-feature branch. Keep any +changes in this branch specific to one bug or feature so it is clear +what the branch brings to *pandas*. You can have many shiny-new-features and switch in between them using the git checkout command. Making changes -------------- -Before making your code changes, it is often necessary to build the code that was -just checked out. There are two primary methods of doing this. +Before making your code changes, it is often necessary to build the code that was +just checked out. There are two primary methods of doing this. #. The best way to develop *pandas* is to build the C extensions in-place by running:: - + python setup.py build_ext --inplace - - If you startup the Python interpreter in the *pandas* source directory you + + If you startup the Python interpreter in the *pandas* source directory you will call the built C extensions - + #. Another very common option is to do a ``develop`` install of *pandas*:: - + python setup.py develop This makes a symbolic link that tells the Python interpreter to import *pandas* @@ -155,7 +155,7 @@ just checked out. There are two primary methods of doing this. version on your system without being inside the clone directory. Contributing to the documentation ---------------------------------- +================================= If you're not the developer type, contributing to the documentation is still of huge value. You don't even have to be an expert on @@ -173,7 +173,7 @@ help the next person. About the pandas documentation -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +------------------------------ The documentation is written in **reStructuredText**, which is almost like writing in plain English, and built using `Sphinx `__. The @@ -225,10 +225,10 @@ Some other important things to know about the docs: How to build the pandas documentation -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +------------------------------------- Requirements -"""""""""""" +~~~~~~~~~~~~ To build the *pandas* docs there are some extra requirements: you will need to have ``sphinx`` and ``ipython`` installed. `numpydoc @@ -250,7 +250,7 @@ dependencies. Sphinx version >= 1.2.2 or the older 1.1.3 is required. Building the documentation -"""""""""""""""""""""""""" +~~~~~~~~~~~~~~~~~~~~~~~~~~ So how do you build the docs? Navigate to your local the folder ``pandas/doc/`` directory in the console and run:: @@ -287,8 +287,8 @@ last committed version can always be restored from git. python make.py --single indexing For comparison, a full documentation build may take 10 minutes. a ``-no-api`` build -may take 3 minutes and a single section may take 15 seconds. However, subsequent -builds only process portions you changed. Now, open the following file in a web +may take 3 minutes and a single section may take 15 seconds. However, subsequent +builds only process portions you changed. Now, open the following file in a web browser to see the full documentation you just built:: pandas/docs/build/html/index.html @@ -297,40 +297,40 @@ And you'll have the satisfaction of seeing your new and improved documentation! Contributing to the code base ------------------------------ +============================= .. contents:: Code Base: :local: Code Standards -^^^^^^^^^^^^^^ +-------------- -*pandas* uses the `PEP8 `_ standard. +*pandas* uses the `PEP8 `_ standard. There are several tools to ensure you abide by this standard. -We've written a tool to check that your commits are PEP8 great, `pip install pep8radius `_. +We've written a tool to check that your commits are PEP8 great, `pip install pep8radius `_. Look at PEP8 fixes in your branch vs master with:: pep8radius master --diff` and make these changes with `pep8radius master --diff --in-place` -Alternatively, use `flake8 `_ tool for checking the style of your code. +Alternatively, use `flake8 `_ tool for checking the style of your code. Additional standards are outlined on the `code style wiki page `_. -Please try to maintain backward-compatibility. *Pandas* has lots of users with lots of existing code, so -don't break it if at all possible. If you think breakage is required clearly state why -as part of the Pull Request. Also, be careful when changing method signatures and add +Please try to maintain backward-compatibility. *Pandas* has lots of users with lots of existing code, so +don't break it if at all possible. If you think breakage is required clearly state why +as part of the Pull Request. Also, be careful when changing method signatures and add deprecation warnings where needed. Test-driven Development/Writing Code -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -*Pandas* is serious about `Test-driven Development (TDD) -`_. -This development process "relies on the repetition of a very short development cycle: -first the developer writes an (initially failing) automated test case that defines a desired -improvement or new function, then produces the minimum amount of code to pass that test." -So, before actually writing any code, you should write your tests. Often the test can be -taken from the original GitHub issue. However, it is always worth considering additional +------------------------------------ + +*Pandas* is serious about `Test-driven Development (TDD) +`_. +This development process "relies on the repetition of a very short development cycle: +first the developer writes an (initially failing) automated test case that defines a desired +improvement or new function, then produces the minimum amount of code to pass that test." +So, before actually writing any code, you should write your tests. Often the test can be +taken from the original GitHub issue. However, it is always worth considering additional use cases and writing corresponding tests. Adding tests is one of the most common requests after code is pushed to *pandas*. It is worth getting @@ -342,10 +342,10 @@ extensions in `numpy.testing `_. Writing tests -""""""""""""" +~~~~~~~~~~~~~ All tests should go into the *tests* subdirectory of the specific package. -There are probably many examples already there and looking to these for +There are probably many examples already there and looking to these for inspiration is suggested. If you test requires working with files or network connectivity there is more information on the `testing page `_ of the wiki. @@ -376,64 +376,67 @@ the expected correct result: assert_frame_equal(pivoted, expected) Running the test suite -"""""""""""""""""""""" +~~~~~~~~~~~~~~~~~~~~~~ The tests can then be run directly inside your git clone (without having to install *pandas*) by typing::: nosetests pandas -The tests suite is exhaustive and takes around 20 minutes to run. Often it is -worth running only a subset of tests first around your changes before running the +The tests suite is exhaustive and takes around 20 minutes to run. Often it is +worth running only a subset of tests first around your changes before running the entire suite. This is done using one of the following constructs: :: - + nosetests pandas/tests/[test-module].py nosetests pandas/tests/[test-module].py:[TestClass] nosetests pandas/tests/[test-module].py:[TestClass].[test_method] Running the performance test suite -"""""""""""""""""""""""""""""""""" +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Performance matters and it is worth considering that your code has not introduced -performance regressions. Currently *pandas* uses the `vbench library `__ +Performance matters and it is worth considering that your code has not introduced +performance regressions. Currently *pandas* uses the `vbench library `__ to enable easy monitoring of the performance of critical *pandas* operations. -These benchmarks are all found in the ``pandas/vb_suite`` directory. vbench +These benchmarks are all found in the ``pandas/vb_suite`` directory. vbench currently only works on python2. To install vbench:: pip install git+https://github.com/pydata/vbench -Vbench also requires sqlalchemy, gitpython, and psutil which can all be installed +Vbench also requires sqlalchemy, gitpython, and psutil which can all be installed using pip. If you need to run a benchmark, change your directory to the *pandas* root and run:: ./test_perf.sh -b master -t HEAD -This will checkout the master revision and run the suite on both master and -your commit. Running the full test suite can take up to one hour and use up -to 3GB of RAM. Usually it is sufficient to past a subset of the results in -to the Pull Request to show that the committed changes do not cause unexpected +This will checkout the master revision and run the suite on both master and +your commit. Running the full test suite can take up to one hour and use up +to 3GB of RAM. Usually it is sufficient to past a subset of the results in +to the Pull Request to show that the committed changes do not cause unexpected performance regressions. You can run specific benchmarks using the *-r* flag which takes a regular expression. -See the `performance testing wiki `_ for information +See the `performance testing wiki `_ for information on how to write a benchmark. Documenting your code -^^^^^^^^^^^^^^^^^^^^^ +--------------------- -Changes should be reflected in the release notes located in `doc/source/whatsnew/vx.y.z.txt`. -This file contains an ongoing change log for each release. Add an entry to this file to -document your fix, enhancement or (unavoidable) breaking change. Make sure to include the +Changes should be reflected in the release notes located in `doc/source/whatsnew/vx.y.z.txt`. +This file contains an ongoing change log for each release. Add an entry to this file to +document your fix, enhancement or (unavoidable) breaking change. Make sure to include the GitHub issue number when adding your entry. -If your code is an enhancement, it is most likely necessary to add usage examples to the +If your code is an enhancement, it is most likely necessary to add usage examples to the existing documentation. This can be done following the section regarding documentation. +Contributing your changes to *pandas* +===================================== + Committing your code -------------------- @@ -454,8 +457,8 @@ Doing 'git status' again should give something like :: # modified: /relative/path/to/file-you-added.py # -Finally, commit your changes to your local repository with an explanatory message. An informal -commit message format is in effect for the project. Please try to adhere to it. Here are +Finally, commit your changes to your local repository with an explanatory message. An informal +commit message format is in effect for the project. Please try to adhere to it. Here are some common prefixes along with general guidelines for when to use them: * ENH: Enhancement, new functionality @@ -466,8 +469,8 @@ some common prefixes along with general guidelines for when to use them: * PERF: Performance improvement * CLN: Code cleanup -The following defines how a commit message should be structured. Please reference the -relevant GitHub issues in your commit message using `GH1234` or `#1234`. Either style +The following defines how a commit message should be structured. Please reference the +relevant GitHub issues in your commit message using `GH1234` or `#1234`. Either style is fine, but the former is generally preferred: * a subject line with `< 80` chars. @@ -478,13 +481,13 @@ Now you can commit your changes in your local repository:: git commit -m -If you have multiple commits, it is common to want to combine them into one commit, often -referred to as "squashing" or "rebasing". This is a common request by package maintainers +If you have multiple commits, it is common to want to combine them into one commit, often +referred to as "squashing" or "rebasing". This is a common request by package maintainers when submitting a Pull Request as it maintains a more compact commit history. To rebase your commits:: git rebase -i HEAD~# -Where # is the number of commits you want to combine. Then you can pick the relevant +Where # is the number of commits you want to combine. Then you can pick the relevant commit message and discard others. Pushing your changes @@ -508,33 +511,30 @@ like :: upstream git://github.com/pydata/pandas.git (fetch) upstream git://github.com/pydata/pandas.git (push) -Now your code is on GitHub, but it is not yet a part of the *pandas* project. For that to +Now your code is on GitHub, but it is not yet a part of the *pandas* project. For that to happen, a Pull Request needs to be submitted on GitHub. -Contributing your changes to *pandas* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Review your code ---------------- -When you're ready to ask for a code review, you will file a Pull Request. Before you do, -again make sure you've followed all the guidelines outlined in this document regarding -code style, tests, performance tests, and documentation. You should also double check +When you're ready to ask for a code review, you will file a Pull Request. Before you do, +again make sure you've followed all the guidelines outlined in this document regarding +code style, tests, performance tests, and documentation. You should also double check your branch changes against the branch it was based off of: #. Navigate to your repository on GitHub--https://github.com/your-user-name/pandas. #. Click on `Branches`. #. Click on the `Compare` button for your feature branch. -#. Select the `base` and `compare` branches, if necessary. This will be `master` and +#. Select the `base` and `compare` branches, if necessary. This will be `master` and `shiny-new-feature`, respectively. Finally, make the Pull Request ------------------------------ -If everything looks good you are ready to make a Pull Request. A Pull Request is how -code from a local repository becomes available to the GitHub community and can be looked -at and eventually merged into the master version. This Pull Request and its associated -changes will eventually be committed to the master branch and available in the next +If everything looks good you are ready to make a Pull Request. A Pull Request is how +code from a local repository becomes available to the GitHub community and can be looked +at and eventually merged into the master version. This Pull Request and its associated +changes will eventually be committed to the master branch and available in the next release. To submit a Pull Request: #. Navigate to your repository on GitHub. @@ -555,7 +555,7 @@ This will automatically update your Pull Request with the latest code and restar Delete your merged branch (optional) ------------------------------------ -Once your feature branch is accepted into upstream, you'll probably want to get rid of +Once your feature branch is accepted into upstream, you'll probably want to get rid of the branch. First, merge upstream master into your branch so git knows it is safe to delete your branch :: git fetch upstream