diff --git a/ci/environment-dev.yaml b/ci/environment-dev.yaml new file mode 100644 index 0000000000000..c3d3d59f895c6 --- /dev/null +++ b/ci/environment-dev.yaml @@ -0,0 +1,14 @@ +name: pandas-dev +channels: + - defaults + - conda-forge +dependencies: + - Cython + - NumPy + - moto + - pytest + - python-dateutil + - python=3 + - pytz + - setuptools + - sphinx diff --git a/ci/requirements_all.txt b/ci/requirements-optional-conda.txt similarity index 68% rename from ci/requirements_all.txt rename to ci/requirements-optional-conda.txt index e13afd619f105..6edb8d17337e4 100644 --- a/ci/requirements_all.txt +++ b/ci/requirements-optional-conda.txt @@ -1,28 +1,27 @@ -pytest>=3.1.0 -pytest-cov -pytest-xdist -flake8 -sphinx=1.5* -nbsphinx -ipython -python-dateutil -pytz -openpyxl -xlsxwriter -xlrd -xlwt -html5lib -patsy beautifulsoup4 -numpy -cython -scipy +blosc +bottleneck +fastparquet +feather-format +html5lib +ipython +ipykernel +jinja2 +lxml +matplotlib +nbsphinx numexpr +openpyxl +pyarrow +pymysql pytables -matplotlib +pytest-cov +pytest-xdist +s3fs +scipy seaborn -lxml sqlalchemy -bottleneck -pymysql -Jinja2 +xarray +xlrd +xlsxwriter +xlwt diff --git a/ci/requirements-optional-pip.txt b/ci/requirements-optional-pip.txt new file mode 100644 index 0000000000000..06b22bd8f2c63 --- /dev/null +++ b/ci/requirements-optional-pip.txt @@ -0,0 +1,27 @@ +# This file was autogenerated by scripts/convert_deps.py +# Do not modify directlybeautifulsoup4 +blosc +bottleneck +fastparquet +feather-format +html5lib +ipython +jinja2 +lxml +matplotlib +nbsphinx +numexpr +openpyxl +pyarrow +pymysql +tables +pytest-cov +pytest-xdist +s3fs +scipy +seaborn +sqlalchemy +xarray +xlrd +xlsxwriter +xlwt \ No newline at end of file diff --git a/ci/requirements_dev.txt b/ci/requirements_dev.txt index dbc4f6cbd6509..2fb36b7cd70d8 100644 --- a/ci/requirements_dev.txt +++ b/ci/requirements_dev.txt @@ -1,8 +1,10 @@ +# This file was autogenerated by scripts/convert_deps.py +# Do not modify directly +Cython +NumPy +moto +pytest python-dateutil pytz -numpy -cython -pytest>=3.1.0 -pytest-cov -flake8 -moto +setuptools +sphinx \ No newline at end of file diff --git a/doc/source/contributing.rst b/doc/source/contributing.rst index e345f79dad5c2..062f49980ee0e 100644 --- a/doc/source/contributing.rst +++ b/doc/source/contributing.rst @@ -11,32 +11,32 @@ Where to start? =============== All contributions, bug reports, bug fixes, documentation improvements, -enhancements and ideas are welcome. +enhancements, and ideas are welcome. -If you are simply looking to start working with the *pandas* codebase, navigate to the -`GitHub "issues" tab `_ and start looking through -interesting issues. There are a number of issues listed under `Docs +If you are brand new to pandas or open-source development, we recommend going +through the `GitHub "issues" tab `_ +to find issues that interest you. There are a number of issues listed under `Docs `_ and `Difficulty Novice `_ -where you could start out. - -Or maybe through using *pandas* you have an idea of your own or are looking for something -in the documentation and thinking 'this can be improved'...you can do something -about it! +where you could start out. Once you've found an interesting issue, you can +return here to get your development environment setup. Feel free to ask questions on the `mailing list -`_ or on `Gitter -`_. +`_ or on `Gitter`_. + +.. _contributing.bug_reports: Bug reports and enhancement requests ==================================== -Bug reports are an important part of making *pandas* more stable. Having a complete bug report -will allow others to reproduce the bug and provide insight into fixing. Because many versions of -*pandas* are supported, knowing version information will also identify improvements made since -previous versions. Trying the bug-producing code out on the *master* branch is often a worthwhile exercise -to confirm the bug still exists. It is also worth searching existing bug reports and pull requests +Bug reports are an important part of making *pandas* more stable. Having a complete bug report +will allow others to reproduce the bug and provide insight into fixing. See +`this stackoverflow article `_ for tips on +writing a good bug report. + +Trying the bug-producing code out on the *master* branch is often a worthwhile exercise +to confirm the bug still exists. It is also worth searching existing bug reports and pull requests to see if the issue has already been reported and/or fixed. Bug reports must: @@ -60,12 +60,16 @@ Bug reports must: The issue will then show up to the *pandas* community and be open to comments/ideas from others. +.. _contributing.github + Working with the code ===================== Now that you have an issue you want to fix, enhancement to add, or documentation to improve, you need to learn how to work with GitHub and the *pandas* code base. +.. _contributing.version_control: + Version control, Git, and GitHub -------------------------------- @@ -103,167 +107,164 @@ want to clone your fork to your machine:: git clone https://github.com/your-user-name/pandas.git pandas-yourname cd pandas-yourname - git remote add upstream git://github.com/pandas-dev/pandas.git + git remote add upstream https://github.com/pandas-dev/pandas.git This creates the directory `pandas-yourname` and connects your repository to the upstream (main project) *pandas* repository. -Creating a branch ------------------ +.. _contributing.dev_env: -You want your master branch to reflect only production-ready code, so create a -feature branch for making your changes. For example:: +Creating a development environment +---------------------------------- - git branch shiny-new-feature - git checkout shiny-new-feature +To test out code changes, you'll need to build pandas from source, which +requires a C compiler and python environment. If you're making documentation +changes, you can skip to :ref:`contributing.documentation` but you won't be able +to build the documentation locally before pushing your changes. -The above can be simplified to:: +.. _contributiong.dev_c: - git checkout -b shiny-new-feature +Installing a C Complier +~~~~~~~~~~~~~~~~~~~~~~~ -This changes your working directory to the shiny-new-feature branch. Keep any -changes in this branch specific to one bug or feature so it is clear -what the branch brings to *pandas*. You can have many shiny-new-features -and switch in between them using the git checkout command. +Pandas uses C extensions (mostly written using Cython) to speed up certain +operations. To install pandas from source, you need to compile these C +extensions, which means you need a C complier. This process depends on which +platform you're using. Follow the `CPython contributing guidelines +`_ for getting a +complier installed. You don't need to do any of the ``./configure`` or ``make`` +steps; you only need to install the complier. -To update this branch, you need to retrieve the changes from the master branch:: +For Windows developers, the following links may be helpful. - git fetch upstream - git rebase upstream/master +- https://blogs.msdn.microsoft.com/pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/ +- https://github.com/conda/conda-recipes/wiki/Building-from-Source-on-Windows-32-bit-and-64-bit +- https://cowboyprogrammer.org/building-python-wheels-for-windows/ +- https://blog.ionelmc.ro/2014/12/21/compiling-python-extensions-on-windows/ +- https://support.enthought.com/hc/en-us/articles/204469260-Building-Python-extensions-with-Canopy -This will replay your commits on top of the latest pandas git master. If this -leads to merge conflicts, you must resolve these before submitting your pull -request. If you have uncommitted changes, you will need to ``stash`` them prior -to updating. This will effectively store your changes and they can be reapplied -after updating. +Let us know if you have any difficulties by opening an issue or reaching out on +`Gitter`_. -.. _contributing.dev_env: +.. _contributiong.dev_python: -Creating a development environment ----------------------------------- +Creating a Python Environment +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -An easy way to create a *pandas* development environment is as follows. +Now that you have a C complier, create an isolated pandas development +environment: -- Install either :ref:`Anaconda ` or :ref:`miniconda ` +- Install either `Anaconda `_ or `miniconda + `_ +- Make sure your conda is up to date (``conda update conda``) - Make sure that you have :ref:`cloned the repository ` - ``cd`` to the *pandas* source directory -Tell conda to create a new environment, named ``pandas_dev``, or any other name you would like -for this environment, by running:: - - conda create -n pandas_dev --file ci/requirements_dev.txt - - -For a python 3 environment:: - - conda create -n pandas_dev python=3 --file ci/requirements_dev.txt - -.. warning:: - - If you are on Windows, see :ref:`here for a fully compliant Windows environment `. - -This will create the new environment, and not touch any of your existing environments, -nor any existing python installation. It will install all of the basic dependencies of -*pandas*, as well as the development and testing tools. If you would like to install -other dependencies, you can install them as follows:: +We'll now kick off a three-step process: - conda install -n pandas_dev -c pandas pytables scipy +1. Install the build dependencies +2. Build and install pandas +3. Install the optional dependencies -To install *all* pandas dependencies you can do the following:: +.. code-block:: none - conda install -n pandas_dev -c conda-forge --file ci/requirements_all.txt + # Create and activate the build environment + conda env create -f ci/environment-dev.yaml + conda activate pandas-dev -To work in this environment, Windows users should ``activate`` it as follows:: + # Build and install pandas + python setup.py build_ext --inplace -j 4 + python -m pip install -e . - activate pandas_dev + # Install the rest of the optional dependencies + conda install -c defaults -c conda-forge --file=ci/requirements-optional-conda.txt -Mac OSX / Linux users should use:: +At this point you should be able to import pandas from your locally built version:: - source activate pandas_dev + $ python # start an interpreter + >>> import pandas + >>> print(pandas.__version__) + 0.22.0.dev0+29.g4ad6d4d74 -You will then see a confirmation message to indicate you are in the new development environment. +This will create the new environment, and not touch any of your existing environments, +nor any existing python installation. To view your environments:: conda info -e -To return to your home root environment in Windows:: - - deactivate +To return to your root environment:: -To return to your home root environment in OSX / Linux:: - - source deactivate + conda deactivate See the full conda docs `here `__. -At this point you can easily do an *in-place* install, as detailed in the next section. - -.. _contributing.windows: - -Creating a Windows development environment ------------------------------------------- +.. _contributing.pip: -To build on Windows, you need to have compilers installed to build the extensions. You will need to install the appropriate Visual Studio compilers, VS 2008 for Python 2.7, VS 2010 for 3.4, and VS 2015 for Python 3.5 and 3.6. +Creating a Python Environment (pip) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -For Python 2.7, you can install the ``mingw`` compiler which will work equivalently to VS 2008:: +If you aren't using conda for you development environment, follow these instructions. +You'll need to have at least python3.5 installed on your system. - conda install -n pandas_dev libpython +.. code-block:: none -or use the `Microsoft Visual Studio VC++ compiler for Python `__. Note that you have to check the ``x64`` box to install the ``x64`` extension building capability as this is not installed by default. + # Create a virtual environment + # Use an ENV_DIR of your choice. We'll use ~/virtualenvs/pandas-dev + # Any parent directories should already exist + python3 -m venv ~/virtualenvs/pandas-dev + # Activate the virtulaenv + . ~/virtualenvs/pandas-dev/bin/activate -For Python 3.4, you can download and install the `Windows 7.1 SDK `__. Read the references below as there may be various gotchas during the installation. - -For Python 3.5 and 3.6, you can download and install the `Visual Studio 2015 Community Edition `__. - -Here are some references and blogs: - -- https://blogs.msdn.microsoft.com/pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/ -- https://github.com/conda/conda-recipes/wiki/Building-from-Source-on-Windows-32-bit-and-64-bit -- https://cowboyprogrammer.org/building-python-wheels-for-windows/ -- https://blog.ionelmc.ro/2014/12/21/compiling-python-extensions-on-windows/ -- https://support.enthought.com/hc/en-us/articles/204469260-Building-Python-extensions-with-Canopy + # Install the build dependencies + python -m pip install -r ci/requirements_dev.txt + # Build and install pandas + python setup.py build_ext --inplace -j 4 + python -m pip install -e . -.. _contributing.getting_source: + # Install additional dependencies + python -m pip install -r ci/requirements-optional-pip.txt -Making changes --------------- +Creating a branch +----------------- -Before making your code changes, it is often necessary to build the code that was -just checked out. There are two primary methods of doing this. +You want your master branch to reflect only production-ready code, so create a +feature branch for making your changes. For example:: -#. The best way to develop *pandas* is to build the C extensions in-place by - running:: + git branch shiny-new-feature + git checkout shiny-new-feature - python setup.py build_ext --inplace +The above can be simplified to:: - If you startup the Python interpreter in the *pandas* source directory you - will call the built C extensions + git checkout -b shiny-new-feature -#. Another very common option is to do a ``develop`` install of *pandas*:: +This changes your working directory to the shiny-new-feature branch. Keep any +changes in this branch specific to one bug or feature so it is clear +what the branch brings to *pandas*. You can have many shiny-new-features +and switch in between them using the git checkout command. - python setup.py develop +To update this branch, you need to retrieve the changes from the master branch:: - This makes a symbolic link that tells the Python interpreter to import *pandas* - from your development directory. Thus, you can always be using the development - version on your system without being inside the clone directory. + git fetch upstream + git rebase upstream/master +This will replay your commits on top of the latest pandas git master. If this +leads to merge conflicts, you must resolve these before submitting your pull +request. If you have uncommitted changes, you will need to ``stash`` them prior +to updating. This will effectively store your changes and they can be reapplied +after updating. .. _contributing.documentation: Contributing to the documentation ================================= -If you're not the developer type, contributing to the documentation is still -of huge value. You don't even have to be an expert on -*pandas* to do so! Something as simple as rewriting small passages for clarity -as you reference the docs is a simple but effective way to contribute. The -next person to read that passage will be in your debt! - -In fact, there are sections of the docs that are worse off after being written -by experts. If something in the docs doesn't make sense to you, updating the -relevant section after you figure it out is a simple way to ensure it will -help the next person. +If you're not the developer type, contributing to the documentation is still of +huge value. You don't even have to be an expert on *pandas* to do so! In fact, +there are sections of the docs that are worse off after being written by +experts. If something in the docs doesn't make sense to you, updating the +relevant section after you figure it out is a great way to ensure it will help +the next person. .. contents:: Documentation: :local: @@ -342,30 +343,6 @@ Requirements First, you need to have a development environment to be able to build pandas (see the docs on :ref:`creating a development environment above `). -Further, to build the docs, there are some extra requirements: you will need to -have ``sphinx`` and ``ipython`` installed. `numpydoc -`_ is used to parse the docstrings that -follow the Numpy Docstring Standard (see above), but you don't need to install -this because a local copy of numpydoc is included in the *pandas* source -code. `nbsphinx `_ is required to build -the Jupyter notebooks included in the documentation. - -If you have a conda environment named ``pandas_dev``, you can install the extra -requirements with:: - - conda install -n pandas_dev sphinx ipython nbconvert nbformat - conda install -n pandas_dev -c conda-forge nbsphinx - -Furthermore, it is recommended to have all :ref:`optional dependencies `. -installed. This is not strictly necessary, but be aware that you will see some error -messages when building the docs. This happens because all the code in the documentation -is executed during the doc build, and so code examples using optional dependencies -will generate errors. Run ``pd.show_versions()`` to get an overview of the installed -version of all dependencies. - -.. warning:: - - You need to have ``sphinx`` version >= 1.3.2. Building the documentation ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -386,10 +363,10 @@ If you want to do a full clean build, do:: python make.py clean python make.py html -Starting with *pandas* 0.13.1 you can tell ``make.py`` to compile only a single section -of the docs, greatly reducing the turn-around time for checking your changes. -You will be prompted to delete ``.rst`` files that aren't required. This is okay because -the prior versions of these files can be checked out from git. However, you must make sure +You can tell ``make.py`` to compile only a single section of the docs, greatly +reducing the turn-around time for checking your changes. You will be prompted to +delete ``.rst`` files that aren't required. This is okay because the prior +versions of these files can be checked out from git. However, you must make sure not to commit the file deletions to your Git repository! :: @@ -422,6 +399,8 @@ the documentation are also built by Travis-CI. These docs are then hosted `here `__, see also the :ref:`Continuous Integration ` section. +.. _contributing.code: + Contributing to the code base ============================= @@ -480,7 +459,7 @@ Once configured, you can run the tool as follows:: clang-format modified-c-file This will output what your file will look like if the changes are made, and to apply -them, just run the following command:: +them, run the following command:: clang-format -i modified-c-file @@ -1033,7 +1012,7 @@ delete your branch:: git checkout master git merge upstream/master -Then you can just do:: +Then you can do:: git branch -d shiny-new-feature @@ -1043,3 +1022,6 @@ branch has not actually been merged. The branch will still exist on GitHub, so to delete it there do:: git push origin --delete shiny-new-feature + + +.. _Gitter: https://gitter.im/pydata/pandas diff --git a/scripts/convert_deps.py b/scripts/convert_deps.py new file mode 100644 index 0000000000000..aabeb24a0c3c8 --- /dev/null +++ b/scripts/convert_deps.py @@ -0,0 +1,29 @@ +""" +Convert the conda environment.yaml to a pip requirements.txt +""" +import yaml + +exclude = {'python=3'} +rename = {'pytables': 'tables'} + +with open("ci/environment-dev.yaml") as f: + dev = yaml.load(f) + +with open("ci/requirements-optional-conda.txt") as f: + optional = [x.strip() for x in f.readlines()] + +required = dev['dependencies'] +required = [rename.get(dep, dep) for dep in required if dep not in exclude] +optional = [rename.get(dep, dep) for dep in optional if dep not in exclude] + + +with open("ci/requirements_dev.txt", 'wt') as f: + f.write("# This file was autogenerated by scripts/convert_deps.py\n") + f.write("# Do not modify directly\n") + f.write('\n'.join(required)) + + +with open("ci/requirements-optional-pip.txt", 'wt') as f: + f.write("# This file was autogenerated by scripts/convert_deps.py\n") + f.write("# Do not modify directly\n") + f.write("\n".join(optional))