diff --git a/.circleci/config.yml b/.circleci/config.yml index 6e789d0aafdb4..6b516b21722ac 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -26,7 +26,13 @@ jobs: name: build command: | ./ci/circle/install_circle.sh - ./ci/circle/show_circle.sh + export PATH="$MINICONDA_DIR/bin:$PATH" + source activate pandas-dev + python -c "import pandas; pandas.show_versions();" - run: name: test - command: ./ci/circle/run_circle.sh --skip-slow --skip-network + command: | + export PATH="$MINICONDA_DIR/bin:$PATH" + source activate pandas-dev + echo "pytest -m "not slow and not network" --strict --durations=10 --color=no --junitxml=$CIRCLE_TEST_REPORTS/reports/junit.xml pandas" + pytest -m "not slow and not network" --strict --durations=10 --color=no --junitxml=$CIRCLE_TEST_REPORTS/reports/junit.xml pandas diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 95729f845ff5c..21df1a3aacd59 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -1,24 +1,23 @@ -Contributing to pandas -====================== +# Contributing to pandas Whether you are a novice or experienced software developer, all contributions and suggestions are welcome! -Our main contribution docs can be found [here](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst), but if you do not want to read it in its entirety, we will summarize the main ways in which you can contribute and point to relevant places in the docs for further information. +Our main contributing guide can be found [in this repo](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst) or [on the website](https://pandas-docs.github.io/pandas-docs-travis/contributing.html). If you do not want to read it in its entirety, we will summarize the main ways in which you can contribute and point to relevant sections of that document for further information. + +## Getting Started -Getting Started ---------------- If you are looking to contribute to the *pandas* codebase, the best place to start is the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues). This is also a great place for filing bug reports and making suggestions for ways in which we can improve the code and documentation. -If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in our [Getting Started](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#where-to-start) section of our main contribution doc. +If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in the "[Where to start?](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#where-to-start)" section. + +## Filing Issues + +If you notice a bug in the code or documentation, or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the "[Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#bug-reports-and-enhancement-requests)" section. -Filing Issues -------------- -If you notice a bug in the code or in docs or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the [Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#bug-reports-and-enhancement-requests) section of our main contribution doc. +## Contributing to the Codebase -Contributing to the Codebase ----------------------------- -The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to our [Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#working-with-the-code) section of our main contribution docs. +The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to the "[Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#working-with-the-code)" section. -Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites can be found [here](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#test-driven-development-code-writing). We also have guidelines regarding coding style that will be enforced during testing. Details about coding style can be found [here](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#code-standards). +Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites in the "[Test-driven development/code writing](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#test-driven-development-code-writing)" section. We also have guidelines regarding coding style that will be enforced during testing, which can be found in the "[Code standards](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#code-standards)" section. -Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the [Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#contributing-your-changes-to-pandas) section of our main contribution docs. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase! +Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the "[Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#contributing-your-changes-to-pandas)" section. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase! diff --git a/.travis.yml b/.travis.yml index 6d31adcbf8a43..c30a7f3f58f55 100644 --- a/.travis.yml +++ b/.travis.yml @@ -34,28 +34,28 @@ matrix: include: - dist: trusty env: - - JOB="3.7" ENV_FILE="ci/deps/travis-37.yaml" TEST_ARGS="--skip-slow --skip-network" + - JOB="3.7" ENV_FILE="ci/deps/travis-37.yaml" PATTERN="not slow and not network" - dist: trusty env: - - JOB="2.7, locale, slow, old NumPy" ENV_FILE="ci/deps/travis-27-locale.yaml" LOCALE_OVERRIDE="zh_CN.UTF-8" SLOW=true + - JOB="2.7, locale, slow, old NumPy" ENV_FILE="ci/deps/travis-27-locale.yaml" LOCALE_OVERRIDE="zh_CN.UTF-8" PATTERN="slow" addons: apt: packages: - language-pack-zh-hans - dist: trusty env: - - JOB="2.7" ENV_FILE="ci/deps/travis-27.yaml" TEST_ARGS="--skip-slow" + - JOB="2.7" ENV_FILE="ci/deps/travis-27.yaml" PATTERN="not slow" addons: apt: packages: - python-gtk2 - dist: trusty env: - - JOB="3.6, lint, coverage" ENV_FILE="ci/deps/travis-36.yaml" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate" COVERAGE=true LINT=true + - JOB="3.6, coverage" ENV_FILE="ci/deps/travis-36.yaml" PATTERN="not slow and not network" PANDAS_TESTING_MODE="deprecate" COVERAGE=true - dist: trusty env: - - JOB="3.7, NumPy dev" ENV_FILE="ci/deps/travis-37-numpydev.yaml" TEST_ARGS="--skip-slow --skip-network -W error" PANDAS_TESTING_MODE="deprecate" + - JOB="3.7, NumPy dev" ENV_FILE="ci/deps/travis-37-numpydev.yaml" PATTERN="not slow and not network" TEST_ARGS="-W error" PANDAS_TESTING_MODE="deprecate" addons: apt: packages: @@ -64,7 +64,7 @@ matrix: # In allow_failures - dist: trusty env: - - JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" SLOW=true + - JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow" # In allow_failures - dist: trusty @@ -73,7 +73,7 @@ matrix: allow_failures: - dist: trusty env: - - JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" SLOW=true + - JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow" - dist: trusty env: - JOB="3.6, doc" ENV_FILE="ci/deps/travis-36-doc.yaml" DOC=true @@ -90,6 +90,12 @@ before_install: - uname -a - git --version - git tag + # Because travis runs on Google Cloud and has a /etc/boto.cfg, + # it breaks moto import, see: + # https://github.com/spulec/moto/issues/1771 + # https://github.com/boto/boto/issues/3741 + # This overrides travis and tells it to look nowhere. + - export BOTO_CONFIG=/dev/null install: - echo "install start" @@ -105,21 +111,17 @@ before_script: script: - echo "script start" - - ci/run_build_docs.sh - - ci/script_single.sh - - ci/script_multi.sh - - ci/code_checks.sh - -after_success: - - ci/upload_coverage.sh + - source activate pandas-dev + - ci/build_docs.sh + - ci/run_tests.sh after_script: - echo "after_script start" - - source activate pandas && pushd /tmp && python -c "import pandas; pandas.show_versions();" && popd + - source activate pandas-dev && pushd /tmp && python -c "import pandas; pandas.show_versions();" && popd - if [ -e test-data-single.xml ]; then - ci/print_skipped.py test-data-single.xml; + ci/print_skipped.py test-data-single.xml; fi - if [ -e test-data-multiple.xml ]; then - ci/print_skipped.py test-data-multiple.xml; + ci/print_skipped.py test-data-multiple.xml; fi - echo "after_script done" diff --git a/LICENSES/MUSL_LICENSE b/LICENSES/MUSL_LICENSE new file mode 100644 index 0000000000000..a8833d4bc4744 --- /dev/null +++ b/LICENSES/MUSL_LICENSE @@ -0,0 +1,132 @@ +musl as a whole is licensed under the following standard MIT license: + +---------------------------------------------------------------------- +Copyright © 2005-2014 Rich Felker, et al. + +Permission is hereby granted, free of charge, to any person obtaining +a copy of this software and associated documentation files (the +"Software"), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions: + +The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE +SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +---------------------------------------------------------------------- + +Authors/contributors include: + +Anthony G. Basile +Arvid Picciani +Bobby Bingham +Boris Brezillon +Brent Cook +Chris Spiegel +Clément Vasseur +Emil Renner Berthing +Hiltjo Posthuma +Isaac Dunham +Jens Gustedt +Jeremy Huntwork +John Spencer +Justin Cormack +Luca Barbato +Luka Perkov +M Farkas-Dyck (Strake) +Michael Forney +Nicholas J. Kain +orc +Pascal Cuoq +Pierre Carrier +Rich Felker +Richard Pennington +sin +Solar Designer +Stefan Kristiansson +Szabolcs Nagy +Timo Teräs +Valentin Ochs +William Haddon + +Portions of this software are derived from third-party works licensed +under terms compatible with the above MIT license: + +The TRE regular expression implementation (src/regex/reg* and +src/regex/tre*) is Copyright © 2001-2008 Ville Laurikari and licensed +under a 2-clause BSD license (license text in the source files). The +included version has been heavily modified by Rich Felker in 2012, in +the interests of size, simplicity, and namespace cleanliness. + +Much of the math library code (src/math/* and src/complex/*) is +Copyright © 1993,2004 Sun Microsystems or +Copyright © 2003-2011 David Schultz or +Copyright © 2003-2009 Steven G. Kargl or +Copyright © 2003-2009 Bruce D. Evans or +Copyright © 2008 Stephen L. Moshier +and labelled as such in comments in the individual source files. All +have been licensed under extremely permissive terms. + +The ARM memcpy code (src/string/armel/memcpy.s) is Copyright © 2008 +The Android Open Source Project and is licensed under a two-clause BSD +license. It was taken from Bionic libc, used on Android. + +The implementation of DES for crypt (src/misc/crypt_des.c) is +Copyright © 1994 David Burren. It is licensed under a BSD license. + +The implementation of blowfish crypt (src/misc/crypt_blowfish.c) was +originally written by Solar Designer and placed into the public +domain. The code also comes with a fallback permissive license for use +in jurisdictions that may not recognize the public domain. + +The smoothsort implementation (src/stdlib/qsort.c) is Copyright © 2011 +Valentin Ochs and is licensed under an MIT-style license. + +The BSD PRNG implementation (src/prng/random.c) and XSI search API +(src/search/*.c) functions are Copyright © 2011 Szabolcs Nagy and +licensed under following terms: "Permission to use, copy, modify, +and/or distribute this code for any purpose with or without fee is +hereby granted. There is no warranty." + +The x86_64 port was written by Nicholas J. Kain. Several files (crt) +were released into the public domain; others are licensed under the +standard MIT license terms at the top of this file. See individual +files for their copyright status. + +The mips and microblaze ports were originally written by Richard +Pennington for use in the ellcc project. The original code was adapted +by Rich Felker for build system and code conventions during upstream +integration. It is licensed under the standard MIT terms. + +The powerpc port was also originally written by Richard Pennington, +and later supplemented and integrated by John Spencer. It is licensed +under the standard MIT terms. + +All other files which have no copyright comments are original works +produced specifically for use as part of this library, written either +by Rich Felker, the main author of the library, or by one or more +contibutors listed above. Details on authorship of individual files +can be found in the git version control history of the project. The +omission of copyright and license comments in each file is in the +interest of source tree size. + +All public header files (include/* and arch/*/bits/*) should be +treated as Public Domain as they intentionally contain no content +which can be covered by copyright. Some source modules may fall in +this category as well. If you believe that a file is so trivial that +it should be in the Public Domain, please contact the authors and +request an explicit statement releasing it from copyright. + +The following files are trivial, believed not to be copyrightable in +the first place, and hereby explicitly released to the Public Domain: + +All public headers: include/*, arch/*/bits/* +Startup files: crt/* diff --git a/README.md b/README.md index b4dedecb4c697..be45faf06187e 100644 --- a/README.md +++ b/README.md @@ -171,7 +171,7 @@ pip install pandas ``` ## Dependencies -- [NumPy](https://www.numpy.org): 1.9.0 or higher +- [NumPy](https://www.numpy.org): 1.12.0 or higher - [python-dateutil](https://labix.org/python-dateutil): 2.5.0 or higher - [pytz](https://pythonhosted.org/pytz): 2011k or higher @@ -231,9 +231,9 @@ Most development discussion is taking place on github in this repo. Further, the All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome. -A detailed overview on how to contribute can be found in the **[contributing guide.](https://pandas.pydata.org/pandas-docs/stable/contributing.html)** +A detailed overview on how to contribute can be found in the **[contributing guide](https://pandas-docs.github.io/pandas-docs-travis/contributing.html)**. There is also an [overview](.github/CONTRIBUTING.md) on GitHub. -If you are simply looking to start working with the pandas codebase, navigate to the [GitHub “issues” tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [good first issue](https://github.com/pandas-dev/pandas/issues?labels=good+first+issue&sort=updated&state=open) where you could start out. +If you are simply looking to start working with the pandas codebase, navigate to the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [good first issue](https://github.com/pandas-dev/pandas/issues?labels=good+first+issue&sort=updated&state=open) where you could start out. You can also triage issues which may include reproducing bug reports, or asking for vital information such as version numbers or reproduction instructions. If you would like to start triaging issues, one easy way to get started is to [subscribe to pandas on CodeTriage](https://www.codetriage.com/pandas-dev/pandas). diff --git a/asv_bench/benchmarks/algorithms.py b/asv_bench/benchmarks/algorithms.py index 1ab88dc9f9e6d..7dcd7b284d66d 100644 --- a/asv_bench/benchmarks/algorithms.py +++ b/asv_bench/benchmarks/algorithms.py @@ -1,10 +1,11 @@ -import warnings from importlib import import_module import numpy as np + import pandas as pd from pandas.util import testing as tm + for imp in ['pandas.util', 'pandas.tools.hashing']: try: hashing = import_module(imp) @@ -73,10 +74,6 @@ def setup(self): self.uniques = tm.makeStringIndex(1000).values self.all = self.uniques.repeat(10) - def time_match_string(self): - with warnings.catch_warnings(record=True): - pd.match(self.all, self.uniques) - class Hashing(object): diff --git a/asv_bench/benchmarks/binary_ops.py b/asv_bench/benchmarks/binary_ops.py index dfdebec86d67c..22b8ed80f3d07 100644 --- a/asv_bench/benchmarks/binary_ops.py +++ b/asv_bench/benchmarks/binary_ops.py @@ -52,6 +52,8 @@ def setup(self): np.iinfo(np.int16).max, size=(N, N))) + self.s = Series(np.random.randn(N)) + # Division def time_frame_float_div(self): @@ -74,6 +76,17 @@ def time_frame_int_mod(self): def time_frame_float_mod(self): self.df % self.df2 + # Dot product + + def time_frame_dot(self): + self.df.dot(self.df2) + + def time_series_dot(self): + self.s.dot(self.s) + + def time_frame_series_dot(self): + self.df.dot(self.s) + class Timeseries(object): diff --git a/asv_bench/benchmarks/categoricals.py b/asv_bench/benchmarks/categoricals.py index 8a0fbc48755b5..7318b40efc8fb 100644 --- a/asv_bench/benchmarks/categoricals.py +++ b/asv_bench/benchmarks/categoricals.py @@ -46,6 +46,8 @@ def setup(self): self.values_some_nan = list(np.tile(self.categories + [np.nan], N)) self.values_all_nan = [np.nan] * len(self.values) self.values_all_int8 = np.ones(N, 'int8') + self.categorical = pd.Categorical(self.values, self.categories) + self.series = pd.Series(self.categorical) def time_regular(self): pd.Categorical(self.values, self.categories) @@ -68,6 +70,12 @@ def time_all_nan(self): def time_from_codes_all_int8(self): pd.Categorical.from_codes(self.values_all_int8, self.categories) + def time_existing_categorical(self): + pd.Categorical(self.categorical) + + def time_existing_series(self): + pd.Categorical(self.series) + class ValueCounts(object): diff --git a/asv_bench/benchmarks/frame_ctor.py b/asv_bench/benchmarks/frame_ctor.py index 60f6a66e07a7b..dfb6ab5b189b2 100644 --- a/asv_bench/benchmarks/frame_ctor.py +++ b/asv_bench/benchmarks/frame_ctor.py @@ -91,4 +91,17 @@ def time_frame_from_ndarray(self): self.df = DataFrame(self.data) +class FromLists(object): + + goal_time = 0.2 + + def setup(self): + N = 1000 + M = 100 + self.data = [[j for j in range(M)] for i in range(N)] + + def time_frame_from_lists(self): + self.df = DataFrame(self.data) + + from .pandas_vb_common import setup # noqa: F401 diff --git a/asv_bench/benchmarks/frame_methods.py b/asv_bench/benchmarks/frame_methods.py index b60b45cc29f7d..3c0dd646aa502 100644 --- a/asv_bench/benchmarks/frame_methods.py +++ b/asv_bench/benchmarks/frame_methods.py @@ -1,10 +1,10 @@ import string -import warnings import numpy as np + +from pandas import ( + DataFrame, MultiIndex, NaT, Series, date_range, isnull, period_range) import pandas.util.testing as tm -from pandas import (DataFrame, Series, MultiIndex, date_range, period_range, - isnull, NaT) class GetNumericData(object): @@ -13,8 +13,7 @@ def setup(self): self.df = DataFrame(np.random.randn(10000, 25)) self.df['foo'] = 'bar' self.df['bar'] = 'baz' - with warnings.catch_warnings(record=True): - self.df = self.df.consolidate() + self.df = self.df._consolidate() def time_frame_get_numeric_data(self): self.df._get_numeric_data() @@ -62,13 +61,40 @@ def time_reindex_axis1(self): def time_reindex_both_axes(self): self.df.reindex(index=self.idx, columns=self.idx) - def time_reindex_both_axes_ix(self): - self.df.ix[self.idx, self.idx] - def time_reindex_upcast(self): self.df2.reindex(np.random.permutation(range(1200))) +class Rename(object): + + def setup(self): + N = 10**3 + self.df = DataFrame(np.random.randn(N * 10, N)) + self.idx = np.arange(4 * N, 7 * N) + self.dict_idx = {k: k for k in self.idx} + self.df2 = DataFrame( + {c: {0: np.random.randint(0, 2, N).astype(np.bool_), + 1: np.random.randint(0, N, N).astype(np.int16), + 2: np.random.randint(0, N, N).astype(np.int32), + 3: np.random.randint(0, N, N).astype(np.int64)} + [np.random.randint(0, 4)] for c in range(N)}) + + def time_rename_single(self): + self.df.rename({0: 0}) + + def time_rename_axis0(self): + self.df.rename(self.dict_idx) + + def time_rename_axis1(self): + self.df.rename(columns=self.dict_idx) + + def time_rename_both_axes(self): + self.df.rename(index=self.dict_idx, columns=self.dict_idx) + + def time_dict_rename_both_axes(self): + self.df.rename(index=self.dict_idx, columns=self.dict_idx) + + class Iteration(object): def setup(self): diff --git a/asv_bench/benchmarks/groupby.py b/asv_bench/benchmarks/groupby.py index dbd79185ec006..59e43ee22afde 100644 --- a/asv_bench/benchmarks/groupby.py +++ b/asv_bench/benchmarks/groupby.py @@ -1,11 +1,13 @@ -import warnings -from string import ascii_letters -from itertools import product from functools import partial +from itertools import product +from string import ascii_letters +import warnings import numpy as np -from pandas import (DataFrame, Series, MultiIndex, date_range, period_range, - TimeGrouper, Categorical, Timestamp) + +from pandas import ( + Categorical, DataFrame, MultiIndex, Series, TimeGrouper, Timestamp, + date_range, period_range) import pandas.util.testing as tm @@ -210,7 +212,7 @@ def time_multi_int_nunique(self, df): class AggFunctions(object): - def setup_cache(): + def setup_cache(self): N = 10**5 fac1 = np.array(['A', 'B', 'C'], dtype='O') fac2 = np.array(['one', 'two'], dtype='O') @@ -471,8 +473,8 @@ def setup(self): n1 = 400 n2 = 250 index = MultiIndex(levels=[np.arange(n1), tm.makeStringIndex(n2)], - labels=[np.repeat(range(n1), n2).tolist(), - list(range(n2)) * n1], + codes=[np.repeat(range(n1), n2).tolist(), + list(range(n2)) * n1], names=['lev1', 'lev2']) arr = np.random.randn(n1 * n2, 3) arr[::10000, 0] = np.nan diff --git a/asv_bench/benchmarks/join_merge.py b/asv_bench/benchmarks/join_merge.py index 5b28d8a4eec62..a1cdb00260fc4 100644 --- a/asv_bench/benchmarks/join_merge.py +++ b/asv_bench/benchmarks/join_merge.py @@ -23,11 +23,7 @@ def setup(self): self.mdf1['obj1'] = 'bar' self.mdf1['obj2'] = 'bar' self.mdf1['int1'] = 5 - try: - with warnings.catch_warnings(record=True): - self.mdf1.consolidate(inplace=True) - except (AttributeError, TypeError): - pass + self.mdf1 = self.mdf1._consolidate() self.mdf2 = self.mdf1.copy() self.mdf2.index = self.df2.index @@ -54,7 +50,7 @@ def setup(self, axis): self.empty_right = [df, DataFrame()] def time_concat_series(self, axis): - concat(self.series, axis=axis) + concat(self.series, axis=axis, sort=False) def time_concat_small_frames(self, axis): concat(self.small_frames, axis=axis) @@ -119,16 +115,16 @@ class Join(object): def setup(self, sort): level1 = tm.makeStringIndex(10).values level2 = tm.makeStringIndex(1000).values - label1 = np.arange(10).repeat(1000) - label2 = np.tile(np.arange(1000), 10) + codes1 = np.arange(10).repeat(1000) + codes2 = np.tile(np.arange(1000), 10) index2 = MultiIndex(levels=[level1, level2], - labels=[label1, label2]) + codes=[codes1, codes2]) self.df_multi = DataFrame(np.random.randn(len(index2), 4), index=index2, columns=['A', 'B', 'C', 'D']) - self.key1 = np.tile(level1.take(label1), 10) - self.key2 = np.tile(level2.take(label2), 10) + self.key1 = np.tile(level1.take(codes1), 10) + self.key2 = np.tile(level2.take(codes2), 10) self.df = DataFrame({'data1': np.random.randn(100000), 'data2': np.random.randn(100000), 'key1': self.key1, diff --git a/asv_bench/benchmarks/multiindex_object.py b/asv_bench/benchmarks/multiindex_object.py index ff202322dbe84..adc6730dcd946 100644 --- a/asv_bench/benchmarks/multiindex_object.py +++ b/asv_bench/benchmarks/multiindex_object.py @@ -79,8 +79,8 @@ def setup(self): levels = [np.arange(n), tm.makeStringIndex(n).values, 1000 + np.arange(n)] - labels = [np.random.choice(n, (k * n)) for lev in levels] - self.mi = MultiIndex(levels=levels, labels=labels) + codes = [np.random.choice(n, (k * n)) for lev in levels] + self.mi = MultiIndex(levels=levels, codes=codes) def time_duplicated(self): self.mi.duplicated() diff --git a/asv_bench/benchmarks/panel_ctor.py b/asv_bench/benchmarks/panel_ctor.py index 47b3ad612f9b1..627705284481b 100644 --- a/asv_bench/benchmarks/panel_ctor.py +++ b/asv_bench/benchmarks/panel_ctor.py @@ -1,7 +1,7 @@ import warnings from datetime import datetime, timedelta -from pandas import DataFrame, Panel, DatetimeIndex, date_range +from pandas import DataFrame, Panel, date_range class DifferentIndexes(object): @@ -23,9 +23,9 @@ def time_from_dict(self): class SameIndexes(object): def setup(self): - idx = DatetimeIndex(start=datetime(1990, 1, 1), - end=datetime(2012, 1, 1), - freq='D') + idx = date_range(start=datetime(1990, 1, 1), + end=datetime(2012, 1, 1), + freq='D') df = DataFrame({'a': 0, 'b': 1, 'c': 2}, index=idx) self.data_frames = dict(enumerate([df] * 100)) @@ -40,10 +40,10 @@ def setup(self): start = datetime(1990, 1, 1) end = datetime(2012, 1, 1) df1 = DataFrame({'a': 0, 'b': 1, 'c': 2}, - index=DatetimeIndex(start=start, end=end, freq='D')) + index=date_range(start=start, end=end, freq='D')) end += timedelta(days=1) df2 = DataFrame({'a': 0, 'b': 1, 'c': 2}, - index=DatetimeIndex(start=start, end=end, freq='D')) + index=date_range(start=start, end=end, freq='D')) dfs = [df1] * 50 + [df2] * 50 self.data_frames = dict(enumerate(dfs)) diff --git a/asv_bench/benchmarks/period.py b/asv_bench/benchmarks/period.py index fc34a47fee3e1..1af1ba1fb7b0b 100644 --- a/asv_bench/benchmarks/period.py +++ b/asv_bench/benchmarks/period.py @@ -43,6 +43,7 @@ class PeriodIndexConstructor(object): def setup(self, freq): self.rng = date_range('1985', periods=1000) self.rng2 = date_range('1985', periods=1000).to_pydatetime() + self.ints = list(range(2000, 3000)) def time_from_date_range(self, freq): PeriodIndex(self.rng, freq=freq) @@ -50,6 +51,9 @@ def time_from_date_range(self, freq): def time_from_pydatetime(self, freq): PeriodIndex(self.rng2, freq=freq) + def time_from_ints(self, freq): + PeriodIndex(self.ints, freq=freq) + class DataFramePeriodColumn(object): diff --git a/asv_bench/benchmarks/plotting.py b/asv_bench/benchmarks/plotting.py index 1373d5f0b4258..4f0bbb1690d4b 100644 --- a/asv_bench/benchmarks/plotting.py +++ b/asv_bench/benchmarks/plotting.py @@ -8,17 +8,48 @@ matplotlib.use('Agg') -class Plotting(object): - - def setup(self): - self.s = Series(np.random.randn(1000000)) - self.df = DataFrame({'col': self.s}) - - def time_series_plot(self): - self.s.plot() - - def time_frame_plot(self): - self.df.plot() +class SeriesPlotting(object): + params = [['line', 'bar', 'area', 'barh', 'hist', 'kde', 'pie']] + param_names = ['kind'] + + def setup(self, kind): + if kind in ['bar', 'barh', 'pie']: + n = 100 + elif kind in ['kde']: + n = 10000 + else: + n = 1000000 + + self.s = Series(np.random.randn(n)) + if kind in ['area', 'pie']: + self.s = self.s.abs() + + def time_series_plot(self, kind): + self.s.plot(kind=kind) + + +class FramePlotting(object): + params = [['line', 'bar', 'area', 'barh', 'hist', 'kde', 'pie', 'scatter', + 'hexbin']] + param_names = ['kind'] + + def setup(self, kind): + if kind in ['bar', 'barh', 'pie']: + n = 100 + elif kind in ['kde', 'scatter', 'hexbin']: + n = 10000 + else: + n = 1000000 + + self.x = Series(np.random.randn(n)) + self.y = Series(np.random.randn(n)) + if kind in ['area', 'pie']: + self.x = self.x.abs() + self.y = self.y.abs() + self.df = DataFrame({'x': self.x, 'y': self.y}) + + def time_frame_plot(self, kind): + self.df.plot(x='x', y='y', kind=kind) class TimeseriesPlotting(object): diff --git a/asv_bench/benchmarks/reindex.py b/asv_bench/benchmarks/reindex.py index 82c61a98e2c34..fb47fa81d8dfd 100644 --- a/asv_bench/benchmarks/reindex.py +++ b/asv_bench/benchmarks/reindex.py @@ -1,6 +1,6 @@ import numpy as np import pandas.util.testing as tm -from pandas import (DataFrame, Series, DatetimeIndex, MultiIndex, Index, +from pandas import (DataFrame, Series, MultiIndex, Index, date_range) from .pandas_vb_common import lib @@ -8,7 +8,7 @@ class Reindex(object): def setup(self): - rng = DatetimeIndex(start='1/1/1970', periods=10000, freq='1min') + rng = date_range(start='1/1/1970', periods=10000, freq='1min') self.df = DataFrame(np.random.rand(10000, 10), index=rng, columns=range(10)) self.df['foo'] = 'bar' @@ -71,9 +71,9 @@ class LevelAlign(object): def setup(self): self.index = MultiIndex( levels=[np.arange(10), np.arange(100), np.arange(100)], - labels=[np.arange(10).repeat(10000), - np.tile(np.arange(100).repeat(100), 10), - np.tile(np.tile(np.arange(100), 100), 10)]) + codes=[np.arange(10).repeat(10000), + np.tile(np.arange(100).repeat(100), 10), + np.tile(np.tile(np.arange(100), 100), 10)]) self.df = DataFrame(np.random.randn(len(self.index), 4), index=self.index) self.df_level = DataFrame(np.random.randn(100, 4), diff --git a/asv_bench/benchmarks/reshape.py b/asv_bench/benchmarks/reshape.py index 67fdfb82e72c0..e5c2f54263a3c 100644 --- a/asv_bench/benchmarks/reshape.py +++ b/asv_bench/benchmarks/reshape.py @@ -146,4 +146,42 @@ def time_get_dummies_1d_sparse(self): pd.get_dummies(self.s, sparse=True) +class Cut(object): + params = [[4, 10, 1000]] + param_names = ['bins'] + + def setup(self, bins): + N = 10**5 + self.int_series = pd.Series(np.arange(N).repeat(5)) + self.float_series = pd.Series(np.random.randn(N).repeat(5)) + self.timedelta_series = pd.Series(np.random.randint(N, size=N), + dtype='timedelta64[ns]') + self.datetime_series = pd.Series(np.random.randint(N, size=N), + dtype='datetime64[ns]') + + def time_cut_int(self, bins): + pd.cut(self.int_series, bins) + + def time_cut_float(self, bins): + pd.cut(self.float_series, bins) + + def time_cut_timedelta(self, bins): + pd.cut(self.timedelta_series, bins) + + def time_cut_datetime(self, bins): + pd.cut(self.datetime_series, bins) + + def time_qcut_int(self, bins): + pd.qcut(self.int_series, bins) + + def time_qcut_float(self, bins): + pd.qcut(self.float_series, bins) + + def time_qcut_timedelta(self, bins): + pd.qcut(self.timedelta_series, bins) + + def time_qcut_datetime(self, bins): + pd.qcut(self.datetime_series, bins) + + from .pandas_vb_common import setup # noqa: F401 diff --git a/asv_bench/benchmarks/rolling.py b/asv_bench/benchmarks/rolling.py index 86294e33e1e06..659b6591fbd4b 100644 --- a/asv_bench/benchmarks/rolling.py +++ b/asv_bench/benchmarks/rolling.py @@ -21,6 +21,42 @@ def time_rolling(self, constructor, window, dtype, method): getattr(self.roll, method)() +class ExpandingMethods(object): + + sample_time = 0.2 + params = (['DataFrame', 'Series'], + ['int', 'float'], + ['median', 'mean', 'max', 'min', 'std', 'count', 'skew', 'kurt', + 'sum']) + param_names = ['contructor', 'window', 'dtype', 'method'] + + def setup(self, constructor, dtype, method): + N = 10**5 + arr = (100 * np.random.random(N)).astype(dtype) + self.expanding = getattr(pd, constructor)(arr).expanding() + + def time_expanding(self, constructor, dtype, method): + getattr(self.expanding, method)() + + +class EWMMethods(object): + + sample_time = 0.2 + params = (['DataFrame', 'Series'], + [10, 1000], + ['int', 'float'], + ['mean', 'std']) + param_names = ['contructor', 'window', 'dtype', 'method'] + + def setup(self, constructor, window, dtype, method): + N = 10**5 + arr = (100 * np.random.random(N)).astype(dtype) + self.ewm = getattr(pd, constructor)(arr).ewm(halflife=window) + + def time_ewm(self, constructor, window, dtype, method): + getattr(self.ewm, method)() + + class VariableWindowMethods(Methods): sample_time = 0.2 params = (['DataFrame', 'Series'], diff --git a/asv_bench/benchmarks/stat_ops.py b/asv_bench/benchmarks/stat_ops.py index 5c777c00261e1..500e4d74d4c4f 100644 --- a/asv_bench/benchmarks/stat_ops.py +++ b/asv_bench/benchmarks/stat_ops.py @@ -31,10 +31,10 @@ class FrameMultiIndexOps(object): def setup(self, level, op): levels = [np.arange(10), np.arange(100), np.arange(100)] - labels = [np.arange(10).repeat(10000), - np.tile(np.arange(100).repeat(100), 10), - np.tile(np.tile(np.arange(100), 100), 10)] - index = pd.MultiIndex(levels=levels, labels=labels) + codes = [np.arange(10).repeat(10000), + np.tile(np.arange(100).repeat(100), 10), + np.tile(np.tile(np.arange(100), 100), 10)] + index = pd.MultiIndex(levels=levels, codes=codes) df = pd.DataFrame(np.random.randn(len(index), 4), index=index) self.df_func = getattr(df, op) @@ -67,10 +67,10 @@ class SeriesMultiIndexOps(object): def setup(self, level, op): levels = [np.arange(10), np.arange(100), np.arange(100)] - labels = [np.arange(10).repeat(10000), - np.tile(np.arange(100).repeat(100), 10), - np.tile(np.tile(np.arange(100), 100), 10)] - index = pd.MultiIndex(levels=levels, labels=labels) + codes = [np.arange(10).repeat(10000), + np.tile(np.arange(100).repeat(100), 10), + np.tile(np.tile(np.arange(100), 100), 10)] + index = pd.MultiIndex(levels=levels, codes=codes) s = pd.Series(np.random.randn(len(index)), index=index) self.s_func = getattr(s, op) @@ -96,14 +96,42 @@ def time_average_old(self, constructor, pct): class Correlation(object): - params = ['spearman', 'kendall', 'pearson'] - param_names = ['method'] + params = [['spearman', 'kendall', 'pearson'], [True, False]] + param_names = ['method', 'use_bottleneck'] - def setup(self, method): + def setup(self, method, use_bottleneck): + try: + pd.options.compute.use_bottleneck = use_bottleneck + except TypeError: + from pandas.core import nanops + nanops._USE_BOTTLENECK = use_bottleneck self.df = pd.DataFrame(np.random.randn(1000, 30)) + self.s = pd.Series(np.random.randn(1000)) + self.s2 = pd.Series(np.random.randn(1000)) - def time_corr(self, method): + def time_corr(self, method, use_bottleneck): self.df.corr(method=method) + def time_corr_series(self, method, use_bottleneck): + self.s.corr(self.s2, method=method) + + +class Covariance(object): + + params = [[True, False]] + param_names = ['use_bottleneck'] + + def setup(self, use_bottleneck): + try: + pd.options.compute.use_bottleneck = use_bottleneck + except TypeError: + from pandas.core import nanops + nanops._USE_BOTTLENECK = use_bottleneck + self.s = pd.Series(np.random.randn(100000)) + self.s2 = pd.Series(np.random.randn(100000)) + + def time_cov_series(self, use_bottleneck): + self.s.cov(self.s2) + from .pandas_vb_common import setup # noqa: F401 diff --git a/asv_bench/benchmarks/strings.py b/asv_bench/benchmarks/strings.py index d880fb258560d..e9f2727f64e15 100644 --- a/asv_bench/benchmarks/strings.py +++ b/asv_bench/benchmarks/strings.py @@ -26,21 +26,42 @@ def time_extract(self): def time_findall(self): self.s.str.findall('[A-Z]+') + def time_find(self): + self.s.str.find('[A-Z]+') + + def time_rfind(self): + self.s.str.rfind('[A-Z]+') + def time_get(self): self.s.str.get(0) def time_len(self): self.s.str.len() + def time_join(self): + self.s.str.join(' ') + def time_match(self): self.s.str.match('A') + def time_normalize(self): + self.s.str.normalize('NFC') + def time_pad(self): self.s.str.pad(100, side='both') + def time_partition(self): + self.s.str.partition('A') + + def time_rpartition(self): + self.s.str.rpartition('A') + def time_replace(self): self.s.str.replace('A', '\x01\x01') + def time_translate(self): + self.s.str.translate({'A': '\x01\x01'}) + def time_slice(self): self.s.str.slice(5, 15, 2) @@ -65,6 +86,12 @@ def time_upper(self): def time_lower(self): self.s.str.lower() + def time_wrap(self): + self.s.str.wrap(10) + + def time_zfill(self): + self.s.str.zfill(10) + class Repeat(object): @@ -129,6 +156,9 @@ def setup(self, expand): def time_split(self, expand): self.s.str.split('--', expand=expand) + def time_rsplit(self, expand): + self.s.str.rsplit('--', expand=expand) + class Dummies(object): diff --git a/asv_bench/benchmarks/timedelta.py b/asv_bench/benchmarks/timedelta.py index 01d53fb9cbbd9..0cfbbd536bc8b 100644 --- a/asv_bench/benchmarks/timedelta.py +++ b/asv_bench/benchmarks/timedelta.py @@ -1,7 +1,9 @@ import datetime import numpy as np -from pandas import Series, timedelta_range, to_timedelta, Timestamp, Timedelta + +from pandas import ( + DataFrame, Series, Timedelta, Timestamp, timedelta_range, to_timedelta) class TimedeltaConstructor(object): @@ -116,3 +118,36 @@ def time_timedelta_microseconds(self, series): def time_timedelta_nanoseconds(self, series): series.dt.nanoseconds + + +class TimedeltaIndexing(object): + + def setup(self): + self.index = timedelta_range(start='1985', periods=1000, freq='D') + self.index2 = timedelta_range(start='1986', periods=1000, freq='D') + self.series = Series(range(1000), index=self.index) + self.timedelta = self.index[500] + + def time_get_loc(self): + self.index.get_loc(self.timedelta) + + def time_shape(self): + self.index.shape + + def time_shallow_copy(self): + self.index._shallow_copy() + + def time_series_loc(self): + self.series.loc[self.timedelta] + + def time_align(self): + DataFrame({'a': self.series, 'b': self.series[:500]}) + + def time_intersection(self): + self.index.intersection(self.index2) + + def time_union(self): + self.index.union(self.index2) + + def time_unique(self): + self.index.unique() diff --git a/asv_bench/benchmarks/timestamp.py b/asv_bench/benchmarks/timestamp.py index 64f46fe378e53..4c1d6e8533408 100644 --- a/asv_bench/benchmarks/timestamp.py +++ b/asv_bench/benchmarks/timestamp.py @@ -1,8 +1,9 @@ import datetime -from pandas import Timestamp -import pytz import dateutil +import pytz + +from pandas import Timestamp class TimestampConstruction(object): @@ -46,7 +47,7 @@ def time_dayofweek(self, tz, freq): self.ts.dayofweek def time_weekday_name(self, tz, freq): - self.ts.weekday_name + self.ts.day_name def time_dayofyear(self, tz, freq): self.ts.dayofyear diff --git a/azure-pipelines.yml b/azure-pipelines.yml index 373c22fdf8e62..1e4e43ac03815 100644 --- a/azure-pipelines.yml +++ b/azure-pipelines.yml @@ -1,25 +1,118 @@ # Adapted from https://github.com/numba/numba/blob/master/azure-pipelines.yml jobs: -# Mac and Linux could potentially use the same template -# except it isn't clear how to use a different build matrix -# for each, so for now they are separate -- template: ci/azure/macos.yml +# Mac and Linux use the same template +- template: ci/azure/posix.yml parameters: name: macOS vmImage: xcode9-macos10.13 -- template: ci/azure/linux.yml +- template: ci/azure/posix.yml parameters: name: Linux vmImage: ubuntu-16.04 -# Windows Python 2.7 needs VC 9.0 installed, and not sure -# how to make that a conditional task, so for now these are -# separate templates as well +# Windows Python 2.7 needs VC 9.0 installed, handled in the template - template: ci/azure/windows.yml parameters: name: Windows vmImage: vs2017-win2016 -- template: ci/azure/windows-py27.yml - parameters: - name: WindowsPy27 - vmImage: vs2017-win2016 + +- job: 'Checks_and_doc' + pool: + vmImage: ubuntu-16.04 + timeoutInMinutes: 90 + steps: + - script: | + # XXX next command should avoid redefining the path in every step, but + # made the process crash as it couldn't find deactivate + #echo '##vso[task.prependpath]$HOME/miniconda3/bin' + echo '##vso[task.setvariable variable=CONDA_ENV]pandas-dev' + echo '##vso[task.setvariable variable=ENV_FILE]environment.yml' + echo '##vso[task.setvariable variable=AZURE]true' + displayName: 'Setting environment variables' + + # Do not require a conda environment + - script: | + export PATH=$HOME/miniconda3/bin:$PATH + ci/code_checks.sh patterns + displayName: 'Looking for unwanted patterns' + condition: true + + - script: | + export PATH=$HOME/miniconda3/bin:$PATH + sudo apt-get install -y libc6-dev-i386 + ci/incremental/install_miniconda.sh + ci/incremental/setup_conda_environment.sh + displayName: 'Set up environment' + + # Do not require pandas + - script: | + export PATH=$HOME/miniconda3/bin:$PATH + source activate pandas-dev + ci/code_checks.sh lint + displayName: 'Linting' + condition: true + + - script: | + export PATH=$HOME/miniconda3/bin:$PATH + source activate pandas-dev + ci/code_checks.sh dependencies + displayName: 'Dependencies consistency' + condition: true + + - script: | + export PATH=$HOME/miniconda3/bin:$PATH + source activate pandas-dev + ci/incremental/build.sh + displayName: 'Build' + condition: true + + # Require pandas + - script: | + export PATH=$HOME/miniconda3/bin:$PATH + source activate pandas-dev + ci/code_checks.sh code + displayName: 'Checks on imported code' + condition: true + + - script: | + export PATH=$HOME/miniconda3/bin:$PATH + source activate pandas-dev + ci/code_checks.sh doctests + displayName: 'Running doctests' + condition: true + + - script: | + export PATH=$HOME/miniconda3/bin:$PATH + source activate pandas-dev + ci/code_checks.sh docstrings + displayName: 'Docstring validation' + condition: true + + - script: | + export PATH=$HOME/miniconda3/bin:$PATH + source activate pandas-dev + pytest --capture=no --strict scripts + displayName: 'Testing docstring validaton script' + condition: true + + - script: | + export PATH=$HOME/miniconda3/bin:$PATH + source activate pandas-dev + git remote add upstream https://github.com/pandas-dev/pandas.git + git fetch upstream + if git diff upstream/master --name-only | grep -q "^asv_bench/"; then + cd asv_bench + asv machine --yes + ASV_OUTPUT="$(asv dev)" + if [[ $(echo "$ASV_OUTPUT" | grep "failed") ]]; then + echo "##vso[task.logissue type=error]Benchmarks run with errors" + echo "$ASV_OUTPUT" + exit 1 + else + echo "Benchmarks run without errors" + fi + else + echo "Benchmarks did not run, no changes detected" + fi + displayName: 'Running benchmarks' + condition: true diff --git a/ci/README.txt b/ci/README.txt deleted file mode 100644 index bb71dc25d6093..0000000000000 --- a/ci/README.txt +++ /dev/null @@ -1,17 +0,0 @@ -Travis is a ci service that's well-integrated with GitHub. -The following types of breakage should be detected -by Travis builds: - -1) Failing tests on any supported version of Python. -2) Pandas should install and the tests should run if no optional deps are installed. -That also means tests which rely on optional deps need to raise SkipTest() -if the dep is missing. -3) unicode related fails when running under exotic locales. - -We tried running the vbench suite for a while, but with varying load -on Travis machines, that wasn't useful. - -Travis currently (4/2013) has a 5-job concurrency limit. Exceeding it -basically doubles the total runtime for a commit through travis, and -since dep+pandas installation is already quite long, this should become -a hard limit on concurrent travis runs. diff --git a/ci/azure/macos.yml b/ci/azure/macos.yml deleted file mode 100644 index 16f2fa2d4890f..0000000000000 --- a/ci/azure/macos.yml +++ /dev/null @@ -1,68 +0,0 @@ -parameters: - name: '' - vmImage: '' - -jobs: -- job: ${{ parameters.name }} - pool: - vmImage: ${{ parameters.vmImage }} - strategy: - maxParallel: 11 - matrix: - py35_np_120: - ENV_FILE: ci/deps/azure-macos-35.yaml - CONDA_PY: "35" - CONDA_ENV: pandas - TEST_ARGS: "--skip-slow --skip-network" - - steps: - - script: | - if [ "$(uname)" == "Linux" ]; then sudo apt-get install -y libc6-dev-i386; fi - echo "Installing Miniconda" - ci/incremental/install_miniconda.sh - export PATH=$HOME/miniconda3/bin:$PATH - echo "Setting up Conda environment" - ci/incremental/setup_conda_environment.sh - displayName: 'Before Install' - - script: | - export PATH=$HOME/miniconda3/bin:$PATH - ci/incremental/build.sh - displayName: 'Build' - - script: | - export PATH=$HOME/miniconda3/bin:$PATH - ci/script_single.sh - ci/script_multi.sh - echo "[Test done]" - displayName: 'Test' - - script: | - export PATH=$HOME/miniconda3/bin:$PATH - source activate pandas && pushd /tmp && python -c "import pandas; pandas.show_versions();" && popd - - task: PublishTestResults@2 - inputs: - testResultsFiles: 'test-data-*.xml' - testRunTitle: 'MacOS-35' - - powershell: | - $junitXml = "test-data-single.xml" - $(Get-Content $junitXml | Out-String) -match 'failures="(.*?)"' - if ($matches[1] -eq 0) - { - Write-Host "No test failures in test-data-single" - } - else - { - # note that this will produce $LASTEXITCODE=1 - Write-Error "$($matches[1]) tests failed" - } - - $junitXmlMulti = "test-data-multiple.xml" - $(Get-Content $junitXmlMulti | Out-String) -match 'failures="(.*?)"' - if ($matches[1] -eq 0) - { - Write-Host "No test failures in test-data-multi" - } - else - { - # note that this will produce $LASTEXITCODE=1 - Write-Error "$($matches[1]) tests failed" - } - displayName: Check for test failures \ No newline at end of file diff --git a/ci/azure/linux.yml b/ci/azure/posix.yml similarity index 60% rename from ci/azure/linux.yml rename to ci/azure/posix.yml index b5a8e36d5097d..374a82a5ed7d0 100644 --- a/ci/azure/linux.yml +++ b/ci/azure/posix.yml @@ -7,31 +7,35 @@ jobs: pool: vmImage: ${{ parameters.vmImage }} strategy: - maxParallel: 11 matrix: - py27_np_19: - ENV_FILE: ci/deps/azure-27-compat.yaml - CONDA_PY: "27" - CONDA_ENV: pandas - TEST_ARGS: "--skip-slow --skip-network" + ${{ if eq(parameters.name, 'macOS') }}: + py35_np_120: + ENV_FILE: ci/deps/azure-macos-35.yaml + CONDA_PY: "35" + PATTERN: "not slow and not network" - py36_locale: - ENV_FILE: ci/deps/azure-37-locale.yaml - CONDA_PY: "37" - CONDA_ENV: pandas - TEST_ARGS: "--skip-slow --skip-network" - LOCALE_OVERRIDE: "zh_CN.UTF-8" + ${{ if eq(parameters.name, 'Linux') }}: + py27_np_120: + ENV_FILE: ci/deps/azure-27-compat.yaml + CONDA_PY: "27" + PATTERN: "not slow and not network" - py36_locale_slow: - ENV_FILE: ci/deps/azure-36-locale_slow.yaml - CONDA_PY: "36" - CONDA_ENV: pandas - TEST_ARGS: "--only-slow --skip-network" + py37_locale: + ENV_FILE: ci/deps/azure-37-locale.yaml + CONDA_PY: "37" + PATTERN: "not slow and not network" + LOCALE_OVERRIDE: "zh_CN.UTF-8" + + py36_locale_slow: + ENV_FILE: ci/deps/azure-36-locale_slow.yaml + CONDA_PY: "36" + PATTERN: "not slow and not network" + LOCALE_OVERRIDE: "it_IT.UTF-8" steps: - script: | if [ "$(uname)" == "Linux" ]; then sudo apt-get install -y libc6-dev-i386; fi - echo "Installing Miniconda"{ + echo "Installing Miniconda" ci/incremental/install_miniconda.sh export PATH=$HOME/miniconda3/bin:$PATH echo "Setting up Conda environment" @@ -39,21 +43,21 @@ jobs: displayName: 'Before Install' - script: | export PATH=$HOME/miniconda3/bin:$PATH + source activate pandas-dev ci/incremental/build.sh displayName: 'Build' - script: | export PATH=$HOME/miniconda3/bin:$PATH - ci/script_single.sh - ci/script_multi.sh - echo "[Test done]" + source activate pandas-dev + ci/run_tests.sh displayName: 'Test' - script: | export PATH=$HOME/miniconda3/bin:$PATH - source activate pandas && pushd /tmp && python -c "import pandas; pandas.show_versions();" && popd + source activate pandas-dev && pushd /tmp && python -c "import pandas; pandas.show_versions();" && popd - task: PublishTestResults@2 inputs: testResultsFiles: 'test-data-*.xml' - testRunTitle: 'Linux' + testRunTitle: ${{ format('{0}-$(CONDA_PY)', parameters.name) }} - powershell: | $junitXml = "test-data-single.xml" $(Get-Content $junitXml | Out-String) -match 'failures="(.*?)"' diff --git a/ci/azure/windows-py27.yml b/ci/azure/windows-py27.yml deleted file mode 100644 index fd72b7080e84d..0000000000000 --- a/ci/azure/windows-py27.yml +++ /dev/null @@ -1,58 +0,0 @@ -parameters: - name: '' - vmImage: '' - -jobs: -- job: ${{ parameters.name }} - pool: - vmImage: ${{ parameters.vmImage }} - strategy: - maxParallel: 11 - matrix: - py36_np121: - ENV_FILE: ci/deps/azure-windows-27.yaml - CONDA_PY: "27" - CONDA_ENV: pandas - - steps: - - task: CondaEnvironment@1 - inputs: - updateConda: no - packageSpecs: '' - - # Need to install VC 9.0 only for Python 2.7 - # Once we understand how to do tasks conditional on build matrix variables - # we could merge this into azure-windows.yml - - powershell: | - $wc = New-Object net.webclient - $wc.Downloadfile("https://download.microsoft.com/download/7/9/6/796EF2E4-801B-4FC4-AB28-B59FBF6D907B/VCForPython27.msi", "VCForPython27.msi") - Start-Process "VCForPython27.msi" /qn -Wait - displayName: 'Install VC 9.0' - - - script: | - ci\\incremental\\setup_conda_environment.cmd - displayName: 'Before Install' - - script: | - ci\\incremental\\build.cmd - displayName: 'Build' - - script: | - call activate %CONDA_ENV% - pytest --junitxml=test-data.xml --skip-slow --skip-network pandas -n 2 -r sxX --strict --durations=10 %* - displayName: 'Test' - - task: PublishTestResults@2 - inputs: - testResultsFiles: 'test-data.xml' - testRunTitle: 'Windows 27' - - powershell: | - $junitXml = "test-data.xml" - $(Get-Content $junitXml | Out-String) -match 'failures="(.*?)"' - if ($matches[1] -eq 0) - { - Write-Host "No test failures in test-data" - } - else - { - # note that this will produce $LASTEXITCODE=1 - Write-Error "$($matches[1]) tests failed" - } - displayName: Check for test failures diff --git a/ci/azure/windows.yml b/ci/azure/windows.yml index 9b87ac7711f40..cece002024936 100644 --- a/ci/azure/windows.yml +++ b/ci/azure/windows.yml @@ -7,12 +7,14 @@ jobs: pool: vmImage: ${{ parameters.vmImage }} strategy: - maxParallel: 11 matrix: py36_np14: ENV_FILE: ci/deps/azure-windows-36.yaml CONDA_PY: "36" - CONDA_ENV: pandas + + py27_np121: + ENV_FILE: ci/deps/azure-windows-27.yaml + CONDA_PY: "27" steps: - task: CondaEnvironment@1 @@ -20,20 +22,28 @@ jobs: updateConda: no packageSpecs: '' + - powershell: | + $wc = New-Object net.webclient + $wc.Downloadfile("https://download.microsoft.com/download/7/9/6/796EF2E4-801B-4FC4-AB28-B59FBF6D907B/VCForPython27.msi", "VCForPython27.msi") + Start-Process "VCForPython27.msi" /qn -Wait + displayName: 'Install VC 9.0 only for Python 2.7' + condition: eq(variables.CONDA_PY, '27') + - script: | ci\\incremental\\setup_conda_environment.cmd displayName: 'Before Install' - script: | + call activate pandas-dev ci\\incremental\\build.cmd displayName: 'Build' - script: | - call activate %CONDA_ENV% - pytest --junitxml=test-data.xml --skip-slow --skip-network pandas -n 2 -r sxX --strict --durations=10 %* + call activate pandas-dev + pytest -m "not slow and not network" --junitxml=test-data.xml pandas -n 2 -r sxX --strict --durations=10 %* displayName: 'Test' - task: PublishTestResults@2 inputs: testResultsFiles: 'test-data.xml' - testRunTitle: 'Windows 36' + testRunTitle: 'Windows-$(CONDA_PY)' - powershell: | $junitXml = "test-data.xml" $(Get-Content $junitXml | Out-String) -match 'failures="(.*?)"' @@ -46,4 +56,4 @@ jobs: # note that this will produce $LASTEXITCODE=1 Write-Error "$($matches[1]) tests failed" } - displayName: Check for test failures \ No newline at end of file + displayName: Check for test failures diff --git a/ci/build_docs.sh b/ci/build_docs.sh index 33340a1c038dc..fbb27286e9566 100755 --- a/ci/build_docs.sh +++ b/ci/build_docs.sh @@ -1,5 +1,7 @@ #!/bin/bash +set -e + if [ "${TRAVIS_OS_NAME}" != "linux" ]; then echo "not doing build_docs on non-linux" exit 0 @@ -12,8 +14,6 @@ if [ "$DOC" ]; then echo "Will build docs" - source activate pandas - echo ############################### echo # Log file for the doc build # echo ############################### @@ -25,28 +25,33 @@ if [ "$DOC" ]; then echo # Create and send docs # echo ######################## - cd build/html - git config --global user.email "pandas-docs-bot@localhost.foo" - git config --global user.name "pandas-docs-bot" - - # create the repo - git init - - touch README - git add README - git commit -m "Initial commit" --allow-empty - git branch gh-pages - git checkout gh-pages - touch .nojekyll - git add --all . - git commit -m "Version" --allow-empty - - git remote remove origin - git remote add origin "https://${PANDAS_GH_TOKEN}@github.com/pandas-dev/pandas-docs-travis.git" - git fetch origin - git remote -v - - git push origin gh-pages -f + echo "Only uploading docs when TRAVIS_PULL_REQUEST is 'false'" + echo "TRAVIS_PULL_REQUEST: ${TRAVIS_PULL_REQUEST}" + + if [ "${TRAVIS_PULL_REQUEST}" == "false" ]; then + cd build/html + git config --global user.email "pandas-docs-bot@localhost.foo" + git config --global user.name "pandas-docs-bot" + + # create the repo + git init + + touch README + git add README + git commit -m "Initial commit" --allow-empty + git branch gh-pages + git checkout gh-pages + touch .nojekyll + git add --all . + git commit -m "Version" --allow-empty + + git remote remove origin + git remote add origin "https://${PANDAS_GH_TOKEN}@github.com/pandas-dev/pandas-docs-travis.git" + git fetch origin + git remote -v + + git push origin gh-pages -f + fi fi exit 0 diff --git a/ci/circle/install_circle.sh b/ci/circle/install_circle.sh index f8bcf6bcffc99..0918e8790fca2 100755 --- a/ci/circle/install_circle.sh +++ b/ci/circle/install_circle.sh @@ -60,9 +60,9 @@ fi # create envbuild deps echo "[create env]" -time conda env create -q -n pandas --file="${ENV_FILE}" || exit 1 +time conda env create -q --file="${ENV_FILE}" || exit 1 -source activate pandas +source activate pandas-dev # remove any installed pandas package # w/o removing anything else diff --git a/ci/circle/run_circle.sh b/ci/circle/run_circle.sh deleted file mode 100755 index 803724c2f492d..0000000000000 --- a/ci/circle/run_circle.sh +++ /dev/null @@ -1,9 +0,0 @@ -#!/usr/bin/env bash - -echo "[running tests]" -export PATH="$MINICONDA_DIR/bin:$PATH" - -source activate pandas - -echo "pytest --strict --durations=10 --color=no --junitxml=$CIRCLE_TEST_REPORTS/reports/junit.xml $@ pandas" -pytest --strict --durations=10 --color=no --junitxml=$CIRCLE_TEST_REPORTS/reports/junit.xml $@ pandas diff --git a/ci/circle/show_circle.sh b/ci/circle/show_circle.sh deleted file mode 100755 index bfaa65c1d84f2..0000000000000 --- a/ci/circle/show_circle.sh +++ /dev/null @@ -1,8 +0,0 @@ -#!/usr/bin/env bash - -echo "[installed versions]" - -export PATH="$MINICONDA_DIR/bin:$PATH" -source activate pandas - -python -c "import pandas; pandas.show_versions();" diff --git a/ci/code_checks.sh b/ci/code_checks.sh index 86e7003681e98..b594f6a2f8df6 100755 --- a/ci/code_checks.sh +++ b/ci/code_checks.sh @@ -5,26 +5,48 @@ # This script is intended for both the CI and to check locally that code standards are # respected. We are currently linting (PEP-8 and similar), looking for patterns of # common mistakes (sphinx directives with missing blank lines, old style classes, -# unwanted imports...), and we also run doctests here (currently some files only). -# In the future we may want to add the validation of docstrings and other checks here. +# unwanted imports...), we run doctests here (currently some files only), and we +# validate formatting error in docstrings. # # Usage: # $ ./ci/code_checks.sh # run all checks # $ ./ci/code_checks.sh lint # run linting only # $ ./ci/code_checks.sh patterns # check for patterns that should not exist +# $ ./ci/code_checks.sh code # checks on imported code # $ ./ci/code_checks.sh doctests # run doctests +# $ ./ci/code_checks.sh docstrings # validate docstring errors # $ ./ci/code_checks.sh dependencies # check that dependencies are consistent -echo "inside $0" -[[ $LINT ]] || { echo "NOT Linting. To lint use: LINT=true $0 $1"; exit 0; } -[[ -z "$1" || "$1" == "lint" || "$1" == "patterns" || "$1" == "doctests" || "$1" == "dependencies" ]] \ - || { echo "Unknown command $1. Usage: $0 [lint|patterns|doctests|dependencies]"; exit 9999; } +[[ -z "$1" || "$1" == "lint" || "$1" == "patterns" || "$1" == "code" || "$1" == "doctests" || "$1" == "docstrings" || "$1" == "dependencies" ]] || \ + { echo "Unknown command $1. Usage: $0 [lint|patterns|code|doctests|docstrings|dependencies]"; exit 9999; } -source activate pandas BASE_DIR="$(dirname $0)/.." RET=0 CHECK=$1 +function invgrep { + # grep with inverse exist status and formatting for azure-pipelines + # + # This function works exactly as grep, but with opposite exit status: + # - 0 (success) when no patterns are found + # - 1 (fail) when the patterns are found + # + # This is useful for the CI, as we want to fail if one of the patterns + # that we want to avoid is found by grep. + if [[ "$AZURE" == "true" ]]; then + set -o pipefail + grep -n "$@" | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Found unwanted pattern: " $3}' + else + grep "$@" + fi + return $((! $?)) +} + +if [[ "$AZURE" == "true" ]]; then + FLAKE8_FORMAT="##vso[task.logissue type=error;sourcepath=%(path)s;linenumber=%(row)s;columnnumber=%(col)s;code=%(code)s;]%(text)s" +else + FLAKE8_FORMAT="default" +fi ### LINTING ### if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then @@ -36,22 +58,22 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then # pandas/_libs/src is C code, so no need to search there. MSG='Linting .py code' ; echo $MSG - flake8 . + flake8 --format="$FLAKE8_FORMAT" . RET=$(($RET + $?)) ; echo $MSG "DONE" MSG='Linting .pyx code' ; echo $MSG - flake8 pandas --filename=*.pyx --select=E501,E302,E203,E111,E114,E221,E303,E128,E231,E126,E265,E305,E301,E127,E261,E271,E129,W291,E222,E241,E123,F403,C400,C401,C402,C403,C404,C405,C406,C407,C408,C409,C410,C411 + flake8 --format="$FLAKE8_FORMAT" pandas --filename=*.pyx --select=E501,E302,E203,E111,E114,E221,E303,E128,E231,E126,E265,E305,E301,E127,E261,E271,E129,W291,E222,E241,E123,F403,C400,C401,C402,C403,C404,C405,C406,C407,C408,C409,C410,C411 RET=$(($RET + $?)) ; echo $MSG "DONE" MSG='Linting .pxd and .pxi.in' ; echo $MSG - flake8 pandas/_libs --filename=*.pxi.in,*.pxd --select=E501,E302,E203,E111,E114,E221,E303,E231,E126,F403 + flake8 --format="$FLAKE8_FORMAT" pandas/_libs --filename=*.pxi.in,*.pxd --select=E501,E302,E203,E111,E114,E221,E303,E231,E126,F403 RET=$(($RET + $?)) ; echo $MSG "DONE" echo "flake8-rst --version" flake8-rst --version MSG='Linting code-blocks in .rst documentation' ; echo $MSG - flake8-rst doc/source --filename=*.rst + flake8-rst doc/source --filename=*.rst --format="$FLAKE8_FORMAT" RET=$(($RET + $?)) ; echo $MSG "DONE" # Check that cython casting is of the form `obj` as opposed to ` obj`; @@ -59,7 +81,7 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then # Note: this grep pattern is (intended to be) equivalent to the python # regex r'(?])> ' MSG='Linting .pyx code for spacing conventions in casting' ; echo $MSG - ! grep -r -E --include '*.pyx' --include '*.pxi.in' '[a-zA-Z0-9*]> ' pandas/_libs + invgrep -r -E --include '*.pyx' --include '*.pxi.in' '[a-zA-Z0-9*]> ' pandas/_libs RET=$(($RET + $?)) ; echo $MSG "DONE" # readability/casting: Warnings about C casting instead of C++ casting @@ -89,45 +111,62 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then # Check for imports from pandas.core.common instead of `import pandas.core.common as com` MSG='Check for non-standard imports' ; echo $MSG - ! grep -R --include="*.py*" -E "from pandas.core.common import " pandas + invgrep -R --include="*.py*" -E "from pandas.core.common import " pandas RET=$(($RET + $?)) ; echo $MSG "DONE" MSG='Check for pytest warns' ; echo $MSG - ! grep -r -E --include '*.py' 'pytest\.warns' pandas/tests/ + invgrep -r -E --include '*.py' 'pytest\.warns' pandas/tests/ RET=$(($RET + $?)) ; echo $MSG "DONE" # Check for the following code in testing: `np.testing` and `np.array_equal` MSG='Check for invalid testing' ; echo $MSG - ! grep -r -E --include '*.py' --exclude testing.py '(numpy|np)(\.testing|\.array_equal)' pandas/tests/ + invgrep -r -E --include '*.py' --exclude testing.py '(numpy|np)(\.testing|\.array_equal)' pandas/tests/ RET=$(($RET + $?)) ; echo $MSG "DONE" # Check for the following code in the extension array base tests: `tm.assert_frame_equal` and `tm.assert_series_equal` MSG='Check for invalid EA testing' ; echo $MSG - ! grep -r -E --include '*.py' --exclude base.py 'tm.assert_(series|frame)_equal' pandas/tests/extension/base + invgrep -r -E --include '*.py' --exclude base.py 'tm.assert_(series|frame)_equal' pandas/tests/extension/base RET=$(($RET + $?)) ; echo $MSG "DONE" MSG='Check for deprecated messages without sphinx directive' ; echo $MSG - ! grep -R --include="*.py" --include="*.pyx" -E "(DEPRECATED|DEPRECATE|Deprecated)(:|,|\.)" pandas + invgrep -R --include="*.py" --include="*.pyx" -E "(DEPRECATED|DEPRECATE|Deprecated)(:|,|\.)" pandas RET=$(($RET + $?)) ; echo $MSG "DONE" MSG='Check for old-style classes' ; echo $MSG - ! grep -R --include="*.py" -E "class\s\S*[^)]:" pandas scripts + invgrep -R --include="*.py" -E "class\s\S*[^)]:" pandas scripts RET=$(($RET + $?)) ; echo $MSG "DONE" MSG='Check for backticks incorrectly rendering because of missing spaces' ; echo $MSG - ! grep -R --include="*.rst" -E "[a-zA-Z0-9]\`\`?[a-zA-Z0-9]" doc/source/ + invgrep -R --include="*.rst" -E "[a-zA-Z0-9]\`\`?[a-zA-Z0-9]" doc/source/ RET=$(($RET + $?)) ; echo $MSG "DONE" MSG='Check for incorrect sphinx directives' ; echo $MSG - ! grep -R --include="*.py" --include="*.pyx" --include="*.rst" -E "\.\. (autosummary|contents|currentmodule|deprecated|function|image|important|include|ipython|literalinclude|math|module|note|raw|seealso|toctree|versionadded|versionchanged|warning):[^:]" ./pandas ./doc/source + invgrep -R --include="*.py" --include="*.pyx" --include="*.rst" -E "\.\. (autosummary|contents|currentmodule|deprecated|function|image|important|include|ipython|literalinclude|math|module|note|raw|seealso|toctree|versionadded|versionchanged|warning):[^:]" ./pandas ./doc/source RET=$(($RET + $?)) ; echo $MSG "DONE" MSG='Check that the deprecated `assert_raises_regex` is not used (`pytest.raises(match=pattern)` should be used instead)' ; echo $MSG - ! grep -R --exclude=*.pyc --exclude=testing.py --exclude=test_testing.py assert_raises_regex pandas + invgrep -R --exclude=*.pyc --exclude=testing.py --exclude=test_util.py assert_raises_regex pandas RET=$(($RET + $?)) ; echo $MSG "DONE" - MSG='Check for modules that pandas should not import' ; echo $MSG - python -c " + # Check that we use pytest.raises only as a context manager + # + # For any flake8-compliant code, the only way this regex gets + # matched is if there is no "with" statement preceding "pytest.raises" + MSG='Check for pytest.raises as context manager (a line starting with `pytest.raises` is invalid, needs a `with` to precede it)' ; echo $MSG + MSG='TODO: This check is currently skipped because so many files fail this. Please enable when all are corrected (xref gh-24332)' ; echo $MSG + # invgrep -R --include '*.py' -E '[[:space:]] pytest.raises' pandas/tests + # RET=$(($RET + $?)) ; echo $MSG "DONE" + + MSG='Check that no file in the repo contains tailing whitespaces' ; echo $MSG + invgrep --exclude="*.svg" -RI "\s$" * + RET=$(($RET + $?)) ; echo $MSG "DONE" +fi + +### CODE ### +if [[ -z "$CHECK" || "$CHECK" == "code" ]]; then + + MSG='Check import. No warnings, and blacklist some optional dependencies' ; echo $MSG + python -W error -c " import sys import pandas @@ -136,7 +175,7 @@ blacklist = {'bs4', 'gcsfs', 'html5lib', 'ipython', 'jinja2' 'hypothesis', 'tables', 'xlrd', 'xlsxwriter', 'xlwt'} mods = blacklist & set(m.split('.')[0] for m in sys.modules) if mods: - sys.stderr.write('pandas should not import: {}\n'.format(', '.join(mods))) + sys.stderr.write('err: pandas should not import: {}\n'.format(', '.join(mods))) sys.exit(len(mods)) " RET=$(($RET + $?)) ; echo $MSG "DONE" @@ -148,7 +187,7 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then MSG='Doctests frame.py' ; echo $MSG pytest -q --doctest-modules pandas/core/frame.py \ - -k"-axes -combine -itertuples -join -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack" + -k"-axes -combine -itertuples -join -pivot_table -query -reindex -reindex_axis -round" RET=$(($RET + $?)) ; echo $MSG "DONE" MSG='Doctests series.py' ; echo $MSG @@ -158,7 +197,7 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then MSG='Doctests generic.py' ; echo $MSG pytest -q --doctest-modules pandas/core/generic.py \ - -k"-_set_axis_name -_xs -describe -droplevel -groupby -interpolate -pct_change -pipe -reindex -reindex_axis -to_json -transpose -values -xs" + -k"-_set_axis_name -_xs -describe -droplevel -groupby -interpolate -pct_change -pipe -reindex -reindex_axis -to_json -transpose -values -xs -to_clipboard" RET=$(($RET + $?)) ; echo $MSG "DONE" MSG='Doctests top-level reshaping functions' ; echo $MSG @@ -179,11 +218,22 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then fi +### DOCSTRINGS ### +if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then + + MSG='Validate docstrings (GL06, GL07, GL09, SS04, PR03, PR05, EX04)' ; echo $MSG + $BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,PR03,PR05,EX04 + RET=$(($RET + $?)) ; echo $MSG "DONE" + +fi + ### DEPENDENCIES ### if [[ -z "$CHECK" || "$CHECK" == "dependencies" ]]; then + MSG='Check that requirements-dev.txt has been generated from environment.yml' ; echo $MSG - $BASE_DIR/scripts/generate_pip_deps_from_conda.py --compare + $BASE_DIR/scripts/generate_pip_deps_from_conda.py --compare --azure RET=$(($RET + $?)) ; echo $MSG "DONE" + fi exit $RET diff --git a/ci/deps/azure-27-compat.yaml b/ci/deps/azure-27-compat.yaml index 44c561e9c8911..f3cc615c35243 100644 --- a/ci/deps/azure-27-compat.yaml +++ b/ci/deps/azure-27-compat.yaml @@ -1,4 +1,4 @@ -name: pandas +name: pandas-dev channels: - defaults - conda-forge diff --git a/ci/deps/azure-36-locale_slow.yaml b/ci/deps/azure-36-locale_slow.yaml index 7e40bd1a9979e..4bbc6a2c11f1e 100644 --- a/ci/deps/azure-36-locale_slow.yaml +++ b/ci/deps/azure-36-locale_slow.yaml @@ -1,4 +1,4 @@ -name: pandas +name: pandas-dev channels: - defaults - conda-forge diff --git a/ci/deps/azure-37-locale.yaml b/ci/deps/azure-37-locale.yaml index 59c8818eaef1e..11a698ce7648e 100644 --- a/ci/deps/azure-37-locale.yaml +++ b/ci/deps/azure-37-locale.yaml @@ -1,4 +1,4 @@ -name: pandas +name: pandas-dev channels: - defaults - conda-forge @@ -18,7 +18,7 @@ dependencies: - pymysql - pytables - python-dateutil - - python=3.6* + - python=3.7* - pytz - s3fs - scipy @@ -30,6 +30,6 @@ dependencies: # universal - pytest - pytest-xdist - - moto - pip: - hypothesis>=3.58.0 + - moto # latest moto in conda-forge fails with 3.7, move to conda dependencies when this is fixed diff --git a/ci/deps/azure-macos-35.yaml b/ci/deps/azure-macos-35.yaml index 6ccdc79d11b27..7a0c3b81ac8f9 100644 --- a/ci/deps/azure-macos-35.yaml +++ b/ci/deps/azure-macos-35.yaml @@ -1,4 +1,4 @@ -name: pandas +name: pandas-dev channels: - defaults dependencies: diff --git a/ci/deps/azure-windows-27.yaml b/ci/deps/azure-windows-27.yaml index dc68129a5e6d3..b1533b071fa74 100644 --- a/ci/deps/azure-windows-27.yaml +++ b/ci/deps/azure-windows-27.yaml @@ -1,4 +1,4 @@ -name: pandas +name: pandas-dev channels: - defaults - conda-forge diff --git a/ci/deps/azure-windows-36.yaml b/ci/deps/azure-windows-36.yaml index af42545af7971..817aab66c65aa 100644 --- a/ci/deps/azure-windows-36.yaml +++ b/ci/deps/azure-windows-36.yaml @@ -1,4 +1,4 @@ -name: pandas +name: pandas-dev channels: - defaults - conda-forge diff --git a/ci/deps/circle-36-locale.yaml b/ci/deps/circle-36-locale.yaml index 59c8818eaef1e..2b38465c04512 100644 --- a/ci/deps/circle-36-locale.yaml +++ b/ci/deps/circle-36-locale.yaml @@ -1,4 +1,4 @@ -name: pandas +name: pandas-dev channels: - defaults - conda-forge diff --git a/ci/deps/travis-27-locale.yaml b/ci/deps/travis-27-locale.yaml index c8d17cf190e35..0846ef5e8264e 100644 --- a/ci/deps/travis-27-locale.yaml +++ b/ci/deps/travis-27-locale.yaml @@ -1,4 +1,4 @@ -name: pandas +name: pandas-dev channels: - defaults - conda-forge diff --git a/ci/deps/travis-27.yaml b/ci/deps/travis-27.yaml index 5a9e206ec2c69..8d14673ebde6d 100644 --- a/ci/deps/travis-27.yaml +++ b/ci/deps/travis-27.yaml @@ -1,4 +1,4 @@ -name: pandas +name: pandas-dev channels: - defaults - conda-forge diff --git a/ci/deps/travis-36-doc.yaml b/ci/deps/travis-36-doc.yaml index fb54c784d6fac..c345af0a2983c 100644 --- a/ci/deps/travis-36-doc.yaml +++ b/ci/deps/travis-36-doc.yaml @@ -1,4 +1,4 @@ -name: pandas +name: pandas-dev channels: - defaults - conda-forge @@ -21,6 +21,7 @@ dependencies: - notebook - numexpr - numpy=1.13* + - numpydoc - openpyxl - pandoc - pyarrow diff --git a/ci/deps/travis-36-slow.yaml b/ci/deps/travis-36-slow.yaml index 3157ecac3a902..a6ffdb95e5e7c 100644 --- a/ci/deps/travis-36-slow.yaml +++ b/ci/deps/travis-36-slow.yaml @@ -1,4 +1,4 @@ -name: pandas +name: pandas-dev channels: - defaults - conda-forge diff --git a/ci/deps/travis-36.yaml b/ci/deps/travis-36.yaml index 1880fa2501581..1085ecd008fa6 100644 --- a/ci/deps/travis-36.yaml +++ b/ci/deps/travis-36.yaml @@ -1,22 +1,16 @@ -name: pandas +name: pandas-dev channels: - defaults - conda-forge dependencies: - beautifulsoup4 + - botocore>=1.11 - cython>=0.28.2 - dask - fastparquet - - flake8>=3.5 - - flake8-comprehensions - - flake8-rst=0.4.2 - gcsfs - geopandas - html5lib - - ipython - - isort - - jinja2 - - lxml - matplotlib - nomkl - numexpr @@ -27,12 +21,11 @@ dependencies: - pymysql - pytables - python-snappy - - python=3.6* + - python=3.6.6 - pytz - s3fs - scikit-learn - scipy - - seaborn - sqlalchemy - statsmodels - xarray @@ -43,11 +36,10 @@ dependencies: - pytest - pytest-xdist - pytest-cov - - moto - hypothesis>=3.58.0 - pip: - brotlipy - coverage - - cpplint + - moto - pandas-datareader - python-dateutil diff --git a/ci/deps/travis-37-numpydev.yaml b/ci/deps/travis-37-numpydev.yaml index 82c75b7c91b1f..99ae228f25de3 100644 --- a/ci/deps/travis-37-numpydev.yaml +++ b/ci/deps/travis-37-numpydev.yaml @@ -1,4 +1,4 @@ -name: pandas +name: pandas-dev channels: - defaults dependencies: diff --git a/ci/deps/travis-37.yaml b/ci/deps/travis-37.yaml index 7dbd85ac27df6..c503124d8cd26 100644 --- a/ci/deps/travis-37.yaml +++ b/ci/deps/travis-37.yaml @@ -1,10 +1,11 @@ -name: pandas +name: pandas-dev channels: - defaults - conda-forge - c3i_test dependencies: - python=3.7 + - botocore>=1.11 - cython>=0.28.2 - numpy - python-dateutil @@ -14,3 +15,6 @@ dependencies: - pytest - pytest-xdist - hypothesis>=3.58.0 + - s3fs + - pip: + - moto diff --git a/ci/incremental/build.cmd b/ci/incremental/build.cmd index d2fd06d7d9e50..2cce38c03f406 100644 --- a/ci/incremental/build.cmd +++ b/ci/incremental/build.cmd @@ -1,5 +1,4 @@ @rem https://github.com/numba/numba/blob/master/buildscripts/incremental/build.cmd -call activate %CONDA_ENV% @rem Build numba extensions without silencing compile errors python setup.py build_ext -q --inplace diff --git a/ci/incremental/build.sh b/ci/incremental/build.sh index 8f2301a3b7ef5..05648037935a3 100755 --- a/ci/incremental/build.sh +++ b/ci/incremental/build.sh @@ -1,7 +1,5 @@ #!/bin/bash -source activate $CONDA_ENV - # Make sure any error below is reported as such set -v -e diff --git a/ci/incremental/setup_conda_environment.cmd b/ci/incremental/setup_conda_environment.cmd index 35595ffb03695..c104d78591384 100644 --- a/ci/incremental/setup_conda_environment.cmd +++ b/ci/incremental/setup_conda_environment.cmd @@ -11,11 +11,11 @@ call deactivate @rem Display root environment (for debugging) conda list @rem Clean up any left-over from a previous build -conda remove --all -q -y -n %CONDA_ENV% +conda remove --all -q -y -n pandas-dev @rem Scipy, CFFI, jinja2 and IPython are optional dependencies, but exercised in the test suite -conda env create -n %CONDA_ENV% --file=ci\deps\azure-windows-%CONDA_PY%.yaml +conda env create --file=ci\deps\azure-windows-%CONDA_PY%.yaml -call activate %CONDA_ENV% +call activate pandas-dev conda list if %errorlevel% neq 0 exit /b %errorlevel% diff --git a/ci/incremental/setup_conda_environment.sh b/ci/incremental/setup_conda_environment.sh index f3ac99d5e7c5a..f174c17a614d8 100755 --- a/ci/incremental/setup_conda_environment.sh +++ b/ci/incremental/setup_conda_environment.sh @@ -5,6 +5,7 @@ set -v -e CONDA_INSTALL="conda install -q -y" PIP_INSTALL="pip install -q" + # Deactivate any environment source deactivate # Display root environment (for debugging) @@ -12,15 +13,14 @@ conda list # Clean up any left-over from a previous build # (note workaround for https://github.com/conda/conda/issues/2679: # `conda env remove` issue) -conda remove --all -q -y -n $CONDA_ENV +conda remove --all -q -y -n pandas-dev echo echo "[create env]" -time conda env create -q -n "${CONDA_ENV}" --file="${ENV_FILE}" || exit 1 +time conda env create -q --file="${ENV_FILE}" || exit 1 -# Activate first set +v -source activate $CONDA_ENV +source activate pandas-dev set -v # remove any installed pandas package diff --git a/ci/install_travis.sh b/ci/install_travis.sh index fd4a36f86db6c..d1a940f119228 100755 --- a/ci/install_travis.sh +++ b/ci/install_travis.sh @@ -80,9 +80,9 @@ echo echo "[create env]" # create our environment -time conda env create -q -n pandas --file="${ENV_FILE}" || exit 1 +time conda env create -q --file="${ENV_FILE}" || exit 1 -source activate pandas +source activate pandas-dev # remove any installed pandas package # w/o removing anything else diff --git a/ci/print_versions.py b/ci/print_versions.py deleted file mode 100755 index a2c93748b0388..0000000000000 --- a/ci/print_versions.py +++ /dev/null @@ -1,29 +0,0 @@ -#!/usr/bin/env python - - -def show_versions(as_json=False): - import imp - import os - fn = __file__ - this_dir = os.path.dirname(fn) - pandas_dir = os.path.abspath(os.path.join(this_dir, "..")) - sv_path = os.path.join(pandas_dir, 'pandas', 'util') - mod = imp.load_module( - 'pvmod', *imp.find_module('print_versions', [sv_path])) - return mod.show_versions(as_json) - - -if __name__ == '__main__': - # optparse is 2.6-safe - from optparse import OptionParser - parser = OptionParser() - parser.add_option("-j", "--json", metavar="FILE", nargs=1, - help="Save output as JSON into file, " - "pass in '-' to output to stdout") - - (options, args) = parser.parse_args() - - if options.json == "-": - options.json = True - - show_versions(as_json=options.json) diff --git a/ci/run_build_docs.sh b/ci/run_build_docs.sh deleted file mode 100755 index 2909b9619552e..0000000000000 --- a/ci/run_build_docs.sh +++ /dev/null @@ -1,10 +0,0 @@ -#!/bin/bash - -echo "inside $0" - -"$TRAVIS_BUILD_DIR"/ci/build_docs.sh 2>&1 - -# wait until subprocesses finish (build_docs.sh) -wait - -exit 0 diff --git a/ci/run_tests.sh b/ci/run_tests.sh new file mode 100755 index 0000000000000..ee46da9f52eab --- /dev/null +++ b/ci/run_tests.sh @@ -0,0 +1,58 @@ +#!/bin/bash + +set -e + +if [ "$DOC" ]; then + echo "We are not running pytest as this is a doc-build" + exit 0 +fi + +# Workaround for pytest-xdist flaky collection order +# https://github.com/pytest-dev/pytest/issues/920 +# https://github.com/pytest-dev/pytest/issues/1075 +export PYTHONHASHSEED=$(python -c 'import random; print(random.randint(1, 4294967295))') + +if [ -n "$LOCALE_OVERRIDE" ]; then + export LC_ALL="$LOCALE_OVERRIDE" + export LANG="$LOCALE_OVERRIDE" + PANDAS_LOCALE=`python -c 'import pandas; pandas.get_option("display.encoding")'` + if [[ "$LOCALE_OVERIDE" != "$PANDAS_LOCALE" ]]; then + echo "pandas could not detect the locale. System locale: $LOCALE_OVERRIDE, pandas detected: $PANDAS_LOCALE" + # TODO Not really aborting the tests until https://github.com/pandas-dev/pandas/issues/23923 is fixed + # exit 1 + fi +fi +if [[ "not network" == *"$PATTERN"* ]]; then + export http_proxy=http://1.2.3.4 https_proxy=http://1.2.3.4; +fi + + +if [ -n "$PATTERN" ]; then + PATTERN=" and $PATTERN" +fi + +for TYPE in single multiple +do + if [ "$COVERAGE" ]; then + COVERAGE_FNAME="/tmp/coc-$TYPE.xml" + COVERAGE="-s --cov=pandas --cov-report=xml:$COVERAGE_FNAME" + fi + + TYPE_PATTERN=$TYPE + NUM_JOBS=1 + if [[ "$TYPE_PATTERN" == "multiple" ]]; then + TYPE_PATTERN="not single" + NUM_JOBS=2 + fi + + PYTEST_CMD="pytest -m \"$TYPE_PATTERN$PATTERN\" -n $NUM_JOBS -s --strict --durations=10 --junitxml=test-data-$TYPE.xml $TEST_ARGS $COVERAGE pandas" + echo $PYTEST_CMD + # if no tests are found (the case of "single and slow"), pytest exits with code 5, and would make the script fail, if not for the below code + sh -c "$PYTEST_CMD; ret=\$?; [ \$ret = 5 ] && exit 0 || exit \$ret" + + if [[ "$COVERAGE" && $? == 0 ]]; then + echo "uploading coverage for $TYPE tests" + echo "bash <(curl -s https://codecov.io/bash) -Z -c -F $TYPE -f $COVERAGE_FNAME" + bash <(curl -s https://codecov.io/bash) -Z -c -F $TYPE -f $COVERAGE_FNAME + fi +done diff --git a/ci/script_multi.sh b/ci/script_multi.sh deleted file mode 100755 index e56d5da7232b2..0000000000000 --- a/ci/script_multi.sh +++ /dev/null @@ -1,46 +0,0 @@ -#!/bin/bash -e - -echo "[script multi]" - -source activate pandas - -if [ -n "$LOCALE_OVERRIDE" ]; then - export LC_ALL="$LOCALE_OVERRIDE"; - echo "Setting LC_ALL to $LOCALE_OVERRIDE" - - pycmd='import pandas; print("pandas detected console encoding: %s" % pandas.get_option("display.encoding"))' - python -c "$pycmd" -fi - -# Enforce absent network during testing by faking a proxy -if echo "$TEST_ARGS" | grep -e --skip-network -q; then - export http_proxy=http://1.2.3.4 https_proxy=http://1.2.3.4; -fi - -# Workaround for pytest-xdist flaky collection order -# https://github.com/pytest-dev/pytest/issues/920 -# https://github.com/pytest-dev/pytest/issues/1075 -export PYTHONHASHSEED=$(python -c 'import random; print(random.randint(1, 4294967295))') -echo PYTHONHASHSEED=$PYTHONHASHSEED - -if [ "$DOC" ]; then - echo "We are not running pytest as this is a doc-build" - -elif [ "$COVERAGE" ]; then - echo pytest -s -n 2 -m "not single" --durations=10 --cov=pandas --cov-report xml:/tmp/cov-multiple.xml --junitxml=test-data-multiple.xml --strict $TEST_ARGS pandas - pytest -s -n 2 -m "not single" --durations=10 --cov=pandas --cov-report xml:/tmp/cov-multiple.xml --junitxml=test-data-multiple.xml --strict $TEST_ARGS pandas - -elif [ "$SLOW" ]; then - TEST_ARGS="--only-slow --skip-network" - echo pytest -m "not single and slow" -v --durations=10 --junitxml=test-data-multiple.xml --strict $TEST_ARGS pandas - pytest -m "not single and slow" -v --durations=10 --junitxml=test-data-multiple.xml --strict $TEST_ARGS pandas - -else - echo pytest -n 2 -m "not single" --durations=10 --junitxml=test-data-multiple.xml --strict $TEST_ARGS pandas - pytest -n 2 -m "not single" --durations=10 --junitxml=test-data-multiple.xml --strict $TEST_ARGS pandas # TODO: doctest - -fi - -RET="$?" - -exit "$RET" diff --git a/ci/script_single.sh b/ci/script_single.sh deleted file mode 100755 index ea0d48bc2da8a..0000000000000 --- a/ci/script_single.sh +++ /dev/null @@ -1,41 +0,0 @@ -#!/bin/bash - -echo "[script_single]" - -source activate pandas - -if [ -n "$LOCALE_OVERRIDE" ]; then - echo "Setting LC_ALL and LANG to $LOCALE_OVERRIDE" - export LC_ALL="$LOCALE_OVERRIDE"; - export LANG="$LOCALE_OVERRIDE"; - - pycmd='import pandas; print("pandas detected console encoding: %s" % pandas.get_option("display.encoding"))' - python -c "$pycmd" -fi - -if [ "$SLOW" ]; then - TEST_ARGS="--only-slow --skip-network" -fi - -# Enforce absent network during testing by faking a proxy -if echo "$TEST_ARGS" | grep -e --skip-network -q; then - export http_proxy=http://1.2.3.4 https_proxy=http://1.2.3.4; -fi - -if [ "$DOC" ]; then - echo "We are not running pytest as this is a doc-build" - -elif [ "$COVERAGE" ]; then - echo pytest -s -m "single" --durations=10 --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=test-data-single.xml $TEST_ARGS pandas - pytest -s -m "single" --durations=10 --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=test-data-single.xml $TEST_ARGS pandas - echo pytest -s --strict scripts - pytest -s --strict scripts -else - echo pytest -m "single" --durations=10 --junitxml=test-data-single.xml --strict $TEST_ARGS pandas - pytest -m "single" --durations=10 --junitxml=test-data-single.xml --strict $TEST_ARGS pandas - -fi - -RET="$?" - -exit "$RET" diff --git a/ci/upload_coverage.sh b/ci/upload_coverage.sh deleted file mode 100755 index a7ef2fa908079..0000000000000 --- a/ci/upload_coverage.sh +++ /dev/null @@ -1,12 +0,0 @@ -#!/bin/bash - -if [ -z "$COVERAGE" ]; then - echo "coverage is not selected for this build" - exit 0 -fi - -source activate pandas - -echo "uploading coverage" -bash <(curl -s https://codecov.io/bash) -Z -c -F single -f /tmp/cov-single.xml -bash <(curl -s https://codecov.io/bash) -Z -c -F multiple -f /tmp/cov-multiple.xml diff --git a/doc/README.rst b/doc/README.rst index 12950d323f5d3..5423e7419d03b 100644 --- a/doc/README.rst +++ b/doc/README.rst @@ -1,173 +1 @@ -.. _contributing.docs: - -Contributing to the documentation -================================= - -Whether you are someone who loves writing, teaching, or development, -contributing to the documentation is a huge value. If you don't see yourself -as a developer type, please don't stress and know that we want you to -contribute. You don't even have to be an expert on *pandas* to do so! -Something as simple as rewriting small passages for clarity -as you reference the docs is a simple but effective way to contribute. The -next person to read that passage will be in your debt! - -Actually, there are sections of the docs that are worse off by being written -by experts. If something in the docs doesn't make sense to you, updating the -relevant section after you figure it out is a simple way to ensure it will -help the next person. - -.. contents:: Table of contents: - :local: - - -About the pandas documentation ------------------------------- - -The documentation is written in **reStructuredText**, which is almost like writing -in plain English, and built using `Sphinx `__. The -Sphinx Documentation has an excellent `introduction to reST -`__. Review the Sphinx docs to perform more -complex changes to the documentation as well. - -Some other important things to know about the docs: - -- The pandas documentation consists of two parts: the docstrings in the code - itself and the docs in this folder ``pandas/doc/``. - - The docstrings provide a clear explanation of the usage of the individual - functions, while the documentation in this folder consists of tutorial-like - overviews per topic together with some other information (what's new, - installation, etc). - -- The docstrings follow the **Numpy Docstring Standard** which is used widely - in the Scientific Python community. This standard specifies the format of - the different sections of the docstring. See `this document - `_ - for a detailed explanation, or look at some of the existing functions to - extend it in a similar manner. - -- The tutorials make heavy use of the `ipython directive - `_ sphinx extension. - This directive lets you put code in the documentation which will be run - during the doc build. For example: - - :: - - .. ipython:: python - - x = 2 - x**3 - - will be rendered as - - :: - - In [1]: x = 2 - - In [2]: x**3 - Out[2]: 8 - - This means that almost all code examples in the docs are always run (and the - output saved) during the doc build. This way, they will always be up to date, - but it makes the doc building a bit more complex. - - -How to build the pandas documentation -------------------------------------- - -Requirements -^^^^^^^^^^^^ - -To build the pandas docs there are some extra requirements: you will need to -have ``sphinx`` and ``ipython`` installed. `numpydoc -`_ is used to parse the docstrings that -follow the Numpy Docstring Standard (see above), but you don't need to install -this because a local copy of ``numpydoc`` is included in the pandas source -code. `nbsphinx `_ is used to convert -Jupyter notebooks. You will need to install it if you intend to modify any of -the notebooks included in the documentation. - -Furthermore, it is recommended to have all `optional dependencies -`_ -installed. This is not needed, but be aware that you will see some error -messages. Because all the code in the documentation is executed during the doc -build, the examples using this optional dependencies will generate errors. -Run ``pd.show_versions()`` to get an overview of the installed version of all -dependencies. - -.. warning:: - - Sphinx version >= 1.2.2 or the older 1.1.3 is required. - -Building pandas -^^^^^^^^^^^^^^^ - -For a step-by-step overview on how to set up your environment, to work with -the pandas code and git, see `the developer pages -`_. -When you start to work on some docs, be sure to update your code to the latest -development version ('master'):: - - git fetch upstream - git rebase upstream/master - -Often it will be necessary to rebuild the C extension after updating:: - - python setup.py build_ext --inplace - -Building the documentation -^^^^^^^^^^^^^^^^^^^^^^^^^^ - -So how do you build the docs? Navigate to your local folder -``pandas/doc/`` directory in the console and run:: - - python make.py html - -And then you can find the html output in the folder ``pandas/doc/build/html/``. - -The first time it will take quite a while, because it has to run all the code -examples in the documentation and build all generated docstring pages. -In subsequent evocations, sphinx will try to only build the pages that have -been modified. - -If you want to do a full clean build, do:: - - python make.py clean - python make.py build - - -Starting with 0.13.1 you can tell ``make.py`` to compile only a single section -of the docs, greatly reducing the turn-around time for checking your changes. -You will be prompted to delete `.rst` files that aren't required, since the -last committed version can always be restored from git. - -:: - - #omit autosummary and API section - python make.py clean - python make.py --no-api - - # compile the docs with only a single - # section, that which is in indexing.rst - python make.py clean - python make.py --single indexing - -For comparison, a full doc build may take 10 minutes. a ``-no-api`` build -may take 3 minutes and a single section may take 15 seconds. - -Where to start? ---------------- - -There are a number of issues listed under `Docs -`_ -and `good first issue -`_ -where you could start out. - -Or maybe you have an idea of your own, by using pandas, looking for something -in the documentation and thinking 'this can be improved', let's do something -about that! - -Feel free to ask questions on `mailing list -`_ or submit an -issue on Github. +See `contributing.rst `_ in this repo. diff --git a/doc/make.py b/doc/make.py index 0a3a7483fcc91..b3ea2b7a6f5fe 100755 --- a/doc/make.py +++ b/doc/make.py @@ -214,7 +214,10 @@ def _run_os(*args): # TODO check_call should be more safe, but it fails with # exclude patterns, needs investigation # subprocess.check_call(args, stderr=subprocess.STDOUT) - os.system(' '.join(args)) + exit_status = os.system(' '.join(args)) + if exit_status: + msg = 'Command "{}" finished with exit code {}' + raise RuntimeError(msg.format(' '.join(args), exit_status)) def _sphinx_build(self, kind): """Call sphinx to build documentation. @@ -229,9 +232,9 @@ def _sphinx_build(self, kind): -------- >>> DocBuilder(num_jobs=4)._sphinx_build('html') """ - if kind not in ('html', 'latex', 'spelling'): - raise ValueError('kind must be html, latex or ' - 'spelling, not {}'.format(kind)) + if kind not in ('html', 'latex'): + raise ValueError('kind must be html or latex, ' + 'not {}'.format(kind)) self._run_os('sphinx-build', '-j{}'.format(self.num_jobs), @@ -310,18 +313,6 @@ def zip_html(self): '-q', *fnames) - def spellcheck(self): - """Spell check the documentation.""" - self._sphinx_build('spelling') - output_location = os.path.join('build', 'spelling', 'output.txt') - with open(output_location) as output: - lines = output.readlines() - if lines: - raise SyntaxError( - 'Found misspelled words.' - ' Check pandas/doc/build/spelling/output.txt' - ' for more details.') - def main(): cmds = [method for method in dir(DocBuilder) if not method.startswith('_')] diff --git a/doc/source/10min.rst b/doc/source/10min.rst index b5938a24ce6c5..a7557e6e1d1c2 100644 --- a/doc/source/10min.rst +++ b/doc/source/10min.rst @@ -1,24 +1,6 @@ .. _10min: -.. currentmodule:: pandas - -.. ipython:: python - :suppress: - - import numpy as np - import pandas as pd - import os - np.random.seed(123456) - np.set_printoptions(precision=4, suppress=True) - import matplotlib - # matplotlib.style.use('default') - pd.options.display.max_rows = 15 - - #### portions of this were borrowed from the - #### Pandas cheatsheet - #### created during the PyData Workshop-Sprint 2012 - #### Hannah Chen, Henry Chow, Eric Cox, Robert Mauriello - +{{ header }} ******************** 10 Minutes to pandas @@ -31,9 +13,8 @@ Customarily, we import as follows: .. ipython:: python - import pandas as pd import numpy as np - import matplotlib.pyplot as plt + import pandas as pd Object Creation --------------- @@ -55,7 +36,7 @@ and labeled columns: dates = pd.date_range('20130101', periods=6) dates - df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD')) + df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD')) df Creating a ``DataFrame`` by passing a dict of objects that can be converted to series-like. @@ -64,13 +45,13 @@ Creating a ``DataFrame`` by passing a dict of objects that can be converted to s df2 = pd.DataFrame({'A': 1., 'B': pd.Timestamp('20130102'), - 'C': pd.Series(1, index=list(range(4)),dtype='float32'), + 'C': pd.Series(1, index=list(range(4)), dtype='float32'), 'D': np.array([3] * 4, dtype='int32'), 'E': pd.Categorical(["test", "train", "test", "train"]), 'F': 'foo'}) df2 -The columns of the resulting ``DataFrame`` have different +The columns of the resulting ``DataFrame`` have different :ref:`dtypes `. .. ipython:: python @@ -84,7 +65,7 @@ will be completed: .. ipython:: @verbatim - In [1]: df2. + In [1]: df2. # noqa: E225, E999 df2.A df2.bool df2.abs df2.boxplot df2.add df2.C @@ -114,13 +95,40 @@ Here is how to view the top and bottom rows of the frame: df.head() df.tail(3) -Display the index, columns, and the underlying NumPy data: +Display the index, columns: .. ipython:: python df.index df.columns - df.values + +:meth:`DataFrame.to_numpy` gives a NumPy representation of the underlying data. +Note that his can be an expensive operation when your :class:`DataFrame` has +columns with different data types, which comes down to a fundamental difference +between pandas and NumPy: **NumPy arrays have one dtype for the entire array, +while pandas DataFrames have one dtype per column**. When you call +:meth:`DataFrame.to_numpy`, pandas will find the NumPy dtype that can hold *all* +of the dtypes in the DataFrame. This may end up being ``object``, which requires +casting every value to a Python object. + +For ``df``, our :class:`DataFrame` of all floating-point values, +:meth:`DataFrame.to_numpy` is fast and doesn't require copying data. + +.. ipython:: python + + df.to_numpy() + +For ``df2``, the :class:`DataFrame` with multiple dtypes, +:meth:`DataFrame.to_numpy` is relatively expensive. + +.. ipython:: python + + df2.to_numpy() + +.. note:: + + :meth:`DataFrame.to_numpy` does *not* include the index or column + labels in the output. :func:`~DataFrame.describe` shows a quick statistic summary of your data: @@ -190,31 +198,31 @@ Selecting on a multi-axis by label: .. ipython:: python - df.loc[:,['A','B']] + df.loc[:, ['A', 'B']] Showing label slicing, both endpoints are *included*: .. ipython:: python - df.loc['20130102':'20130104',['A','B']] + df.loc['20130102':'20130104', ['A', 'B']] Reduction in the dimensions of the returned object: .. ipython:: python - df.loc['20130102',['A','B']] + df.loc['20130102', ['A', 'B']] For getting a scalar value: .. ipython:: python - df.loc[dates[0],'A'] + df.loc[dates[0], 'A'] For getting fast access to a scalar (equivalent to the prior method): .. ipython:: python - df.at[dates[0],'A'] + df.at[dates[0], 'A'] Selection by Position ~~~~~~~~~~~~~~~~~~~~~ @@ -231,37 +239,37 @@ By integer slices, acting similar to numpy/python: .. ipython:: python - df.iloc[3:5,0:2] + df.iloc[3:5, 0:2] By lists of integer position locations, similar to the numpy/python style: .. ipython:: python - df.iloc[[1,2,4],[0,2]] + df.iloc[[1, 2, 4], [0, 2]] For slicing rows explicitly: .. ipython:: python - df.iloc[1:3,:] + df.iloc[1:3, :] For slicing columns explicitly: .. ipython:: python - df.iloc[:,1:3] + df.iloc[:, 1:3] For getting a value explicitly: .. ipython:: python - df.iloc[1,1] + df.iloc[1, 1] For getting fast access to a scalar (equivalent to the prior method): .. ipython:: python - df.iat[1,1] + df.iat[1, 1] Boolean Indexing ~~~~~~~~~~~~~~~~ @@ -303,19 +311,19 @@ Setting values by label: .. ipython:: python - df.at[dates[0],'A'] = 0 + df.at[dates[0], 'A'] = 0 Setting values by position: .. ipython:: python - df.iat[0,1] = 0 + df.iat[0, 1] = 0 Setting by assigning with a NumPy array: .. ipython:: python - df.loc[:,'D'] = np.array([5] * len(df)) + df.loc[:, 'D'] = np.array([5] * len(df)) The result of the prior setting operations. @@ -345,7 +353,7 @@ returns a copy of the data. .. ipython:: python df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E']) - df1.loc[dates[0]:dates[1],'E'] = 1 + df1.loc[dates[0]:dates[1], 'E'] = 1 df1 To drop any rows that have missing data. @@ -487,7 +495,7 @@ Another example that can be given is: Append ~~~~~~ -Append rows to a dataframe. See the :ref:`Appending ` +Append rows to a dataframe. See the :ref:`Appending ` section. .. ipython:: python @@ -520,14 +528,14 @@ See the :ref:`Grouping section `. 'D': np.random.randn(8)}) df -Grouping and then applying the :meth:`~DataFrame.sum` function to the resulting +Grouping and then applying the :meth:`~DataFrame.sum` function to the resulting groups. .. ipython:: python df.groupby('A').sum() -Grouping by multiple columns forms a hierarchical index, and again we can +Grouping by multiple columns forms a hierarchical index, and again we can apply the ``sum`` function. .. ipython:: python @@ -653,7 +661,8 @@ pandas can include categorical data in a ``DataFrame``. For full docs, see the .. ipython:: python - df = pd.DataFrame({"id":[1, 2, 3, 4, 5, 6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']}) + df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6], + "raw_grade": ['a', 'b', 'b', 'a', 'a', 'e']}) Convert the raw grades to a categorical data type. @@ -662,7 +671,7 @@ Convert the raw grades to a categorical data type. df["grade"] = df["raw_grade"].astype("category") df["grade"] -Rename the categories to more meaningful names (assigning to +Rename the categories to more meaningful names (assigning to ``Series.cat.categories`` is inplace!). .. ipython:: python @@ -674,7 +683,8 @@ Reorder the categories and simultaneously add the missing categories (methods un .. ipython:: python - df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium", "good", "very good"]) + df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium", + "good", "very good"]) df["grade"] Sorting is per order in the categories, not lexical order. @@ -703,13 +713,14 @@ See the :ref:`Plotting ` docs. .. ipython:: python - ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000)) + ts = pd.Series(np.random.randn(1000), + index=pd.date_range('1/1/2000', periods=1000)) ts = ts.cumsum() @savefig series_plot_basic.png ts.plot() -On a DataFrame, the :meth:`~DataFrame.plot` method is a convenience to plot all +On a DataFrame, the :meth:`~DataFrame.plot` method is a convenience to plot all of the columns with labels: .. ipython:: python @@ -718,8 +729,10 @@ of the columns with labels: columns=['A', 'B', 'C', 'D']) df = df.cumsum() + plt.figure() + df.plot() @savefig frame_plot_basic.png - plt.figure(); df.plot(); plt.legend(loc='best') + plt.legend(loc='best') Getting Data In/Out ------------------- @@ -742,6 +755,7 @@ CSV .. ipython:: python :suppress: + import os os.remove('foo.csv') HDF5 diff --git a/doc/source/_static/banklist.html b/doc/source/_static/banklist.html index cbcce5a2d49ff..cb07c332acbe7 100644 --- a/doc/source/_static/banklist.html +++ b/doc/source/_static/banklist.html @@ -37,7 +37,7 @@ else var sValue = li.selectValue; $('#googlesearch').submit(); - + } function findValue2(li) { if( li == null ) return alert("No match!"); @@ -47,7 +47,7 @@ // otherwise, let's just display the value in the text box else var sValue = li.selectValue; - + $('#googlesearch2').submit(); } function selectItem(li) { @@ -62,7 +62,7 @@ function log(event, data, formatted) { $("
  • ").html( !data ? "No match!" : "Selected: " + formatted).appendTo("#result"); } - + function formatItem(row) { return row[0] + " (id: " + row[1] + ")"; } @@ -81,7 +81,7 @@ selectFirst: false }); - + $("#search2").autocomplete("/searchjs.asp", { width: 160, autoFill: false, @@ -93,7 +93,7 @@ selectFirst: false }); - + }); @@ -232,16 +232,16 @@

    Each depositor insured to at least $250,000 per insured bank

    Failed Bank List

    The FDIC is often appointed as receiver for failed banks. This page contains useful information for the customers and vendors of these banks. This includes information on the acquiring bank (if applicable), how your accounts and loans are affected, and how vendors can file claims against the receivership. Failed Financial Institution Contact Search displays point of contact information related to failed banks.

    - +

    This list includes banks which have failed since October 1, 2000. To search for banks that failed prior to those on this page, visit this link: Failures and Assistance Transactions

    - +

    Failed Bank List - CSV file (Updated on Mondays. Also opens in Excel - Excel Help)

    - +

    Due to the small screen size some information is no longer visible.
    Full information available when viewed on a larger screen.

    @@ -253,7 +253,7 @@

    Failed Bank List

    City ST CERT - Acquiring Institution + Acquiring Institution Closing Date Updated Date @@ -294,7 +294,7 @@

    Failed Bank List

    Capital Bank, N.A. May 10, 2013 May 14, 2013 - + Douglas County Bank Douglasville @@ -383,7 +383,7 @@

    Failed Bank List

    Sunwest Bank January 11, 2013 January 24, 2013 - + Community Bank of the Ozarks Sunrise Beach @@ -392,7 +392,7 @@

    Failed Bank List

    Bank of Sullivan December 14, 2012 January 24, 2013 - + Hometown Community Bank Braselton @@ -401,7 +401,7 @@

    Failed Bank List

    CertusBank, National Association November 16, 2012 January 24, 2013 - + Citizens First National Bank Princeton @@ -518,7 +518,7 @@

    Failed Bank List

    Metcalf Bank July 20, 2012 December 17, 2012 - + First Cherokee State Bank Woodstock @@ -635,7 +635,7 @@

    Failed Bank List

    Southern States Bank May 18, 2012 May 20, 2013 - + Security Bank, National Association North Lauderdale @@ -644,7 +644,7 @@

    Failed Bank List

    Banesco USA May 4, 2012 October 31, 2012 - + Palm Desert National Bank Palm Desert @@ -734,7 +734,7 @@

    Failed Bank List

    No Acquirer March 9, 2012 October 29, 2012 - + Global Commerce Bank Doraville @@ -752,7 +752,7 @@

    Failed Bank List

    No Acquirer February 24, 2012 December 17, 2012 - + Central Bank of Georgia Ellaville @@ -761,7 +761,7 @@

    Failed Bank List

    Ameris Bank February 24, 2012 August 9, 2012 - + SCB Bank Shelbyville @@ -770,7 +770,7 @@

    Failed Bank List

    First Merchants Bank, National Association February 10, 2012 March 25, 2013 - + Charter National Bank and Trust Hoffman Estates @@ -779,7 +779,7 @@

    Failed Bank List

    Barrington Bank & Trust Company, National Association February 10, 2012 March 25, 2013 - + BankEast Knoxville @@ -788,7 +788,7 @@

    Failed Bank List

    U.S.Bank National Association January 27, 2012 March 8, 2013 - + Patriot Bank Minnesota Forest Lake @@ -797,7 +797,7 @@

    Failed Bank List

    First Resource Bank January 27, 2012 September 12, 2012 - + Tennessee Commerce Bank Franklin @@ -806,7 +806,7 @@

    Failed Bank List

    Republic Bank & Trust Company January 27, 2012 November 20, 2012 - + First Guaranty Bank and Trust Company of Jacksonville Jacksonville @@ -815,7 +815,7 @@

    Failed Bank List

    CenterState Bank of Florida, N.A. January 27, 2012 September 12, 2012 - + American Eagle Savings Bank Boothwyn @@ -824,7 +824,7 @@

    Failed Bank List

    Capital Bank, N.A. January 20, 2012 January 25, 2013 - + The First State Bank Stockbridge @@ -833,7 +833,7 @@

    Failed Bank List

    Hamilton State Bank January 20, 2012 January 25, 2013 - + Central Florida State Bank Belleview @@ -842,7 +842,7 @@

    Failed Bank List

    CenterState Bank of Florida, N.A. January 20, 2012 January 25, 2013 - + Western National Bank Phoenix @@ -869,7 +869,7 @@

    Failed Bank List

    First NBC Bank November 18, 2011 August 13, 2012 - + Polk County Bank Johnston @@ -887,7 +887,7 @@

    Failed Bank List

    Century Bank of Georgia November 10, 2011 August 13, 2012 - + SunFirst Bank Saint George @@ -896,7 +896,7 @@

    Failed Bank List

    Cache Valley Bank November 4, 2011 November 16, 2012 - + Mid City Bank, Inc. Omaha @@ -905,7 +905,7 @@

    Failed Bank List

    Premier Bank November 4, 2011 August 15, 2012 - + All American Bank Des Plaines @@ -914,7 +914,7 @@

    Failed Bank List

    International Bank of Chicago October 28, 2011 August 15, 2012 - + Community Banks of Colorado Greenwood Village @@ -959,7 +959,7 @@

    Failed Bank List

    Blackhawk Bank & Trust October 14, 2011 August 15, 2012 - + First State Bank Cranford @@ -968,7 +968,7 @@

    Failed Bank List

    Northfield Bank October 14, 2011 November 8, 2012 - + Blue Ridge Savings Bank, Inc. Asheville @@ -977,7 +977,7 @@

    Failed Bank List

    Bank of North Carolina October 14, 2011 November 8, 2012 - + Piedmont Community Bank Gray @@ -986,7 +986,7 @@

    Failed Bank List

    State Bank and Trust Company October 14, 2011 January 22, 2013 - + Sun Security Bank Ellington @@ -1202,7 +1202,7 @@

    Failed Bank List

    Ameris Bank July 15, 2011 November 2, 2012 - + One Georgia Bank Atlanta @@ -1247,7 +1247,7 @@

    Failed Bank List

    First American Bank and Trust Company June 24, 2011 November 2, 2012 - + First Commercial Bank of Tampa Bay Tampa @@ -1256,7 +1256,7 @@

    Failed Bank List

    Stonegate Bank June 17, 2011 November 2, 2012 - + McIntosh State Bank Jackson @@ -1265,7 +1265,7 @@

    Failed Bank List

    Hamilton State Bank June 17, 2011 November 2, 2012 - + Atlantic Bank and Trust Charleston @@ -1274,7 +1274,7 @@

    Failed Bank List

    First Citizens Bank and Trust Company, Inc. June 3, 2011 October 31, 2012 - + First Heritage Bank Snohomish @@ -1283,7 +1283,7 @@

    Failed Bank List

    Columbia State Bank May 27, 2011 January 28, 2013 - + Summit Bank Burlington @@ -1292,7 +1292,7 @@

    Failed Bank List

    Columbia State Bank May 20, 2011 January 22, 2013 - + First Georgia Banking Company Franklin @@ -2030,7 +2030,7 @@

    Failed Bank List

    Westamerica Bank August 20, 2010 September 12, 2012 - + Los Padres Bank Solvang @@ -2624,7 +2624,7 @@

    Failed Bank List

    MB Financial Bank, N.A. April 23, 2010 August 23, 2012 - + Amcore Bank, National Association Rockford @@ -2768,7 +2768,7 @@

    Failed Bank List

    First Citizens Bank March 19, 2010 August 23, 2012 - + Bank of Hiawassee Hiawassee @@ -3480,7 +3480,7 @@

    Failed Bank List

    October 2, 2009 August 21, 2012 - + Warren Bank Warren MI @@ -3767,7 +3767,7 @@

    Failed Bank List

    Herring Bank July 31, 2009 August 20, 2012 - + Security Bank of Jones County Gray @@ -3848,7 +3848,7 @@

    Failed Bank List

    California Bank & Trust July 17, 2009 August 20, 2012 - + BankFirst Sioux Falls @@ -4811,7 +4811,7 @@

    Failed Bank List

    Bank of the Orient October 13, 2000 March 17, 2005 - + @@ -4854,7 +4854,7 @@

    Failed Bank List

    @@ -4855,7 +4855,7 @@

    Failed Bank List

    @@ -304,8 +304,8 @@

    這個頁面上的內容需要較新版本的 Adobe Flash Player。

    - + @@ -518,14 +518,14 @@

    這個頁面上的內容需要較新版本的 Adobe Flash Player。

    MIA Geographical Information
    Scope of Service
    - + Slot Application - + Macau Freight Forwarders Cargo Tracking Platform For Rent Airport Capacity - + Airport Characteristics & Traffic Statistics @@ -539,11 +539,11 @@

    這個頁面上的內容需要較新版本的 Adobe Flash Player。

    - + - + @@ -553,3116 +553,3116 @@

    這個頁面上的內容需要較新版本的 Adobe Flash Player。

    Traffic Statistics - Passengers

    - +
    - - + +
    - + Traffic Statistics - - - - + + + +


    Passengers Figure(2008-2013)

    - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
      201320122011201020092008
    January - + 374,917 - + 362,379 - + 301,503 - + 358,902 - + 342,323 - + 420,574
    February - + 393,152 - + 312,405 - + 301,259 - + 351,654 - + 297,755 - + 442,809
    March - + 408,755 - + 334,000 - + 318,908 - + 360,365 - + 387,879 - + 468,540
    April - + 408,860 - + 358,198 - + 339,060 - + 352,976 - + 400,553 - + 492,930
    May - + 374,397 - + 329,218 - + 321,060 - + 330,407 - + 335,967 - + 465,045
    June - + 401,995 - + 356,679 - + 343,006 - + 326,724 - + 296,748 - + 426,764
    July - - + + - + 423,081 - + 378,993 - + 356,580 - + 351,110 - + 439,425
    August - - + + - + 453,391 - + 395,883 - + 364,011 - + 404,076 - + 425,814
    September - - + + - + 384,887 - + 325,124 - + 308,940 - + 317,226 - + 379,898
    October - - + + - + 383,889 - + 333,102 - + 317,040 - + 355,935 - + 415,339
    November - - + + - + 379,065 - + 327,803 - + 303,186 - + 372,104 - + 366,411
    December - - + + - + 413,873 - + 359,313 - + 348,051 - + 388,573 - + 354,253
    Total - + 2,362,076 - + 4,491,065 - + 4,045,014 - + 4,078,836 - + 4,250,249 - + 5,097,802
    - +


    Passengers Figure(2002-2007)

    - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
      200720062005200420032002
    January - + 381,887 - + 323,282 - + 289,701 - + 288,507 - + 290,140 - + 268,783
    February - + 426,014 - + 360,820 - + 348,723 - + 207,710 - + 323,264 - + 323,654
    March - + 443,805 - + 389,125 - + 321,953 - + 273,910 - + 295,052 - + 360,668
    April - + 500,917 - + 431,550 - + 367,976 - + 324,931 - + 144,082 - + 380,648
    May - + 468,637 - + 399,743 - + 359,298 - + 250,601 - + 47,333 - + 359,547
    June - + 463,676 - + 393,713 - + 360,147 - + 296,000 - + 94,294 - + 326,508
    July - + 490,404 - + 465,497 - + 413,131 - + 365,454 - + 272,784 - + 388,061
    August - + 490,830 - + 478,474 - + 409,281 - + 372,802 - + 333,840 - + 384,719
    September - + 446,594 - + 412,444 - + 354,751 - + 321,456 - + 295,447 - + 334,029
    October - + 465,757 - + 461,215 - + 390,435 - + 358,362 - + 291,193 - + 372,706
    November - + 455,132 - + 425,116 - + 323,347 - + 327,593 - + 268,282 - + 350,324
    December - + 465,225 - + 435,114 - + 308,999 - + 326,933 - + 249,855 - + 322,056
    Total - + 5,498,878 - + 4,976,093 - + 4,247,742 - + 3,714,259 - + 2,905,566 - + 4,171,703
    - +


    Passengers Figure(1996-2001)

    - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + + + + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
      200120001999199819971996
    January - + 265,603 - + 184,381 - + 161,264 - + 161,432 - + 117,984 - - + +
    February - + 249,259 - + 264,066 - + 209,569 - + 168,777 - + 150,772 - - + +
    March - + 312,319 - + 226,483 - + 186,965 - + 172,060 - + 149,795 - - + +
    April - + 351,793 - + 296,541 - + 237,449 - + 180,241 - + 179,049 - - -
    May - + 338,692 - + 288,949 - + 230,691 - + 172,391 - + 189,925 - - + +
    June - + 332,630 - + 271,181 - + 231,328 - + 157,519 - + 175,402 - - + +
    July - + 344,658 - + 304,276 - + 243,534 - + 205,595 - + 173,103 - - + +
    August - + 360,899 - + 300,418 - + 257,616 - + 241,140 - + 178,118 - - + +
    September - + 291,817 - + 280,803 - + 210,885 - + 183,954 - + 163,385 - - + +
    October - + 327,232 - + 298,873 - + 231,251 - + 205,726 - + 176,879 - - + +
    November - + 315,538 - + 265,528 - + 228,637 - + 181,677 - + 146,804 - - + +
    December - + 314,866 - + 257,929 - + 210,922 - + 183,975 - + 151,362 - - + +
    Total - + 3,805,306 - + 3,239,428 - + 2,640,111 - + 2,214,487 - + 1,952,578 - + 0
    - +


    Passengers Figure(1995-1995)

    - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
      1995
    January - - + +
    February - - + +
    March - - + +
    April - - + +
    May - - + +
    June - - + +
    July - - + +
    August - - + +
    September - - + +
    October - - + +
    November - + 6,601
    December - + 37,041
    Total - + 43,642
    - +


    passenger statistic picture



    - - - - + + + +


    Movement Statistics(2008-2013)

    - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
      201320122011201020092008
    January - + 3,925 - + 3,463 - + 3,289 - + 3,184 - + 3,488 - + 4,568
    February - + 3,632 - + 2,983 - + 2,902 - + 3,053 - + 3,347 - + 4,527
    March - + 3,909 - + 3,166 - + 3,217 - + 3,175 - + 3,636 - + 4,594
    April - + 3,903 - + 3,258 - + 3,146 - + 3,023 - + 3,709 - + 4,574
    May - + 4,075 - + 3,234 - + 3,266 - + 3,033 - + 3,603 - + 4,511
    June - + 4,038 - + 3,272 - + 3,316 - + 2,909 - + 3,057 - + 4,081
    July - - + + - + 3,661 - + 3,359 - + 3,062 - + 3,354 - + 4,215
    August - - + + - + 3,942 - + 3,417 - + 3,077 - + 3,395 - + 4,139
    September - - + + - + 3,703 - + 3,169 - + 3,095 - + 3,100 - + 3,752
    October - - + + - + 3,727 - + 3,469 - + 3,179 - + 3,375 - + 3,874
    November - - + + - + 3,722 - + 3,145 - + 3,159 - + 3,213 - + 3,567
    December - - + + - + 3,866 - + 3,251 - + 3,199 - + 3,324 - + 3,362
    Total - + 23,482 - + 41,997 - + 38,946 - + 37,148 - + 40,601 - + 49,764
    - +


    Movement Statistics(2002-2007)

    - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
      200720062005200420032002
    January - + 4,384 - + 3,933 - + 3,528 - + 3,051 - + 3,257 - + 2,711
    February - + 4,131 - + 3,667 - + 3,331 - + 2,372 - + 3,003 - + 2,747
    March - + 4,349 - + 4,345 - + 3,549 - + 3,049 - + 3,109 - + 2,985
    April - + 4,460 - + 4,490 - + 3,832 - + 3,359 - + 2,033 - + 2,928
    May - + 4,629 - + 4,245 - + 3,663 - + 3,251 - + 1,229 - + 3,109
    June - + 4,365 - + 4,124 - + 3,752 - + 3,414 - + 1,217 - + 3,049
    July - + 4,612 - + 4,386 - + 3,876 - + 3,664 - + 2,423 - + 3,078
    August - + 4,446 - + 4,373 - + 3,987 - + 3,631 - + 3,040 - + 3,166
    September - + 4,414 - + 4,311 - + 3,782 - + 3,514 - + 2,809 - + 3,239
    October - + 4,445 - + 4,455 - + 3,898 - + 3,744 - + 3,052 - + 3,562
    November - + 4,563 - + 4,285 - + 3,951 - + 3,694 - + 3,125 - + 3,546
    December - + 4,588 - + 4,435 - + 3,855 - + 3,763 - + 2,996 - + 3,444
    Total - + 53,386 - + 51,049 - + 45,004 - + 40,506 - + 31,293 - + 37,564
    - +


    Movement Statistics(1996-2001)

    - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
      200120001999199819971996
    January - + 2,694 - + 2,201 - + 1,835 - + 2,177 - + 1,353 - + 744
    February - + 2,364 - + 2,357 - + 1,826 - + 1,740 - + 1,339 - + 692
    March - + 2,543 - + 2,206 - + 1,895 - + 1,911 - + 1,533 - + 872
    April - + 2,531 - + 2,311 - + 2,076 - + 1,886 - + 1,587 - + 1,026
    May - + 2,579 - + 2,383 - + 1,914 - + 2,102 - + 1,720 - + 1,115
    June - + 2,681 - + 2,370 - + 1,890 - + 2,038 - + 1,716 - + 1,037
    July - + 2,903 - + 2,609 - + 1,916 - + 2,078 - + 1,693 - + 1,209
    August - + 3,037 - + 2,487 - + 1,968 - + 2,061 - + 1,676 - + 1,241
    September - + 2,767 - + 2,329 - + 1,955 - + 1,970 - + 1,681 - + 1,263
    October - + 2,922 - + 2,417 - + 2,267 - + 1,969 - + 1,809 - + 1,368
    November - + 2,670 - + 2,273 - + 2,132 - + 2,102 - + 1,786 - + 1,433
    December - + 2,815 - + 2,749 - + 2,187 - + 1,981 - + 1,944 - + 1,386
    Total - + 32,506 - + 28,692 - + 23,861 - + 24,015 - + 19,837 - + 13,386
    - +


    Movement Statistics(1995-1995)

    - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
      1995
    January - - + +
    February - - + +
    March - - + +
    April - - + +
    May - - + +
    June - - + +
    July - - + +
    August - - + +
    September - - + +
    October - - + +
    November - + 126
    December - + 536
    Total - + 662
    - +


    passenger statistic picture
    - +
    - +
    - - + +
    - +
    @@ -3676,11 +3676,11 @@

    Traffic Statistics - Passengers

    SEARCH -
    +
    diff --git a/pandas/tests/io/data/spam.html b/pandas/tests/io/data/spam.html index e4fadab6eafd2..a8e445ff1e176 100644 --- a/pandas/tests/io/data/spam.html +++ b/pandas/tests/io/data/spam.html @@ -5,31 +5,31 @@ - - + + - + Show Foods - - - - - + + + + + - + - - - - - + + + + - - - - - - + + + + + + @@ -95,8 +95,8 @@ - -
    + +
    National Nutrient Database @@ -106,18 +106,18 @@ - - + +
    - - + +
    National Nutrient Database for Standard Reference
    Release 25
    - + - - - - - + + + + +
    Basic Report
    - +

    Nutrient data for 07908, Luncheon meat, pork with ham, minced, canned, includes SPAM (Hormel) - - + +

    - - + +
    - - + +
    Modifying household measures
    - -
    - + +
    +
    @@ -179,13 +179,13 @@

    Nutrient data for 07908, Luncheon meat, pork with ham, minced, canned, inclu handler: function() {this.cancel();}, 'isDefault': true}] }); GRAILSUI.measuresHelpDialog.render(document.body); - - + + } YAHOO.util.Event.onDOMReady(init_dlg_measuresHelpDialog); - - + +
    @@ -197,29 +197,29 @@

    Nutrient data for 07908, Luncheon meat, pork with ham, minced, canned, inclu - - + +
    - - + +

    Nutrient values and weights are for edible portion

    - - + + - + - - + +
    Help
    Nutrient Unit
    Value per 100.0g

    - + oz 1 NLEA serving
    56g compression_method - compression_mappings = { - "zip": zipfile.ZipFile, - "gzip": gzip.GzipFile, - "bz2": bz2.BZ2File, - "xz": lzma_file(), - } - - compress_method = compression_mappings[compress_type] - - if compress_type == "zip": - mode = "w" - args = (dest, data) - method = "writestr" - else: - mode = "wb" - args = (data,) - method = "write" - - with compress_method(path, mode=mode) as f: - getattr(f, method)(*args) - - @pytest.fixture(params=[True, False]) def buffer(request): return request.param @@ -154,7 +96,7 @@ def test_compression(parser_and_data, compression_only, buffer, filename): "buffer of compressed data.") with tm.ensure_clean(filename=filename) as path: - write_to_compressed(compress_type, path, data) + tm.write_to_compressed(compress_type, path, data) compression = "infer" if filename else compress_type if buffer: diff --git a/pandas/tests/io/parser/test_dtypes.py b/pandas/tests/io/parser/test_dtypes.py index 17cd0ab16ea61..caa03fc3685f6 100644 --- a/pandas/tests/io/parser/test_dtypes.py +++ b/pandas/tests/io/parser/test_dtypes.py @@ -324,6 +324,22 @@ def test_categorical_coerces_timedelta(all_parsers): tm.assert_frame_equal(result, expected) +@pytest.mark.parametrize("data", [ + "b\nTrue\nFalse\nNA\nFalse", + "b\ntrue\nfalse\nNA\nfalse", + "b\nTRUE\nFALSE\nNA\nFALSE", + "b\nTrue\nFalse\nNA\nFALSE", +]) +def test_categorical_dtype_coerces_boolean(all_parsers, data): + # see gh-20498 + parser = all_parsers + dtype = {"b": CategoricalDtype([False, True])} + expected = DataFrame({"b": Categorical([True, False, None, False])}) + + result = parser.read_csv(StringIO(data), dtype=dtype) + tm.assert_frame_equal(result, expected) + + def test_categorical_unexpected_categories(all_parsers): parser = all_parsers dtype = {"b": CategoricalDtype(["a", "b", "d", "e"])} diff --git a/pandas/tests/io/parser/test_header.py b/pandas/tests/io/parser/test_header.py index 47b13ae6c50b1..38f4cc42357fa 100644 --- a/pandas/tests/io/parser/test_header.py +++ b/pandas/tests/io/parser/test_header.py @@ -236,7 +236,7 @@ def test_header_multi_index_common_format_malformed1(all_parsers): columns=MultiIndex(levels=[[u("a"), u("b"), u("c")], [u("r"), u("s"), u("t"), u("u"), u("v")]], - labels=[[0, 0, 1, 2, 2], [0, 1, 2, 3, 4]], + codes=[[0, 0, 1, 2, 2], [0, 1, 2, 3, 4]], names=[u("a"), u("q")])) data = """a,a,a,b,c,c q,r,s,t,u,v @@ -255,7 +255,7 @@ def test_header_multi_index_common_format_malformed2(all_parsers): columns=MultiIndex(levels=[[u("a"), u("b"), u("c")], [u("r"), u("s"), u("t"), u("u"), u("v")]], - labels=[[0, 0, 1, 2, 2], [0, 1, 2, 3, 4]], + codes=[[0, 0, 1, 2, 2], [0, 1, 2, 3, 4]], names=[None, u("q")])) data = """,a,a,b,c,c @@ -272,10 +272,10 @@ def test_header_multi_index_common_format_malformed3(all_parsers): expected = DataFrame(np.array( [[3, 4, 5, 6], [9, 10, 11, 12]], dtype="int64"), index=MultiIndex(levels=[[1, 7], [2, 8]], - labels=[[0, 1], [0, 1]]), + codes=[[0, 1], [0, 1]]), columns=MultiIndex(levels=[[u("a"), u("b"), u("c")], [u("s"), u("t"), u("u"), u("v")]], - labels=[[0, 1, 2, 2], [0, 1, 2, 3]], + codes=[[0, 1, 2, 2], [0, 1, 2, 3]], names=[None, u("q")])) data = """,a,a,b,c,c q,r,s,t,u,v diff --git a/pandas/tests/io/parser/test_index_col.py b/pandas/tests/io/parser/test_index_col.py index 8c2de40b46114..6421afba18f94 100644 --- a/pandas/tests/io/parser/test_index_col.py +++ b/pandas/tests/io/parser/test_index_col.py @@ -148,5 +148,5 @@ def test_multi_index_naming_not_all_at_beginning(all_parsers): expected = DataFrame({"Unnamed: 2": ["c", "d", "c", "d"]}, index=MultiIndex( levels=[['a', 'b'], [1, 2, 3, 4]], - labels=[[0, 0, 1, 1], [0, 1, 2, 3]])) + codes=[[0, 0, 1, 1], [0, 1, 2, 3]])) tm.assert_frame_equal(result, expected) diff --git a/pandas/tests/io/parser/test_na_values.py b/pandas/tests/io/parser/test_na_values.py index 921984bc44e50..1b6d2ee8a062e 100644 --- a/pandas/tests/io/parser/test_na_values.py +++ b/pandas/tests/io/parser/test_na_values.py @@ -421,3 +421,21 @@ def test_na_values_with_dtype_str_and_na_filter(all_parsers, na_filter): result = parser.read_csv(StringIO(data), na_filter=na_filter, dtype=str) tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("data, na_values", [ + ("false,1\n,1\ntrue", None), + ("false,1\nnull,1\ntrue", None), + ("false,1\nnan,1\ntrue", None), + ("false,1\nfoo,1\ntrue", 'foo'), + ("false,1\nfoo,1\ntrue", ['foo']), + ("false,1\nfoo,1\ntrue", {'a': 'foo'}), +]) +def test_cast_NA_to_bool_raises_error(all_parsers, data, na_values): + parser = all_parsers + msg = ("(Bool column has NA values in column [0a])|" + "(cannot safely convert passed user dtype of " + "bool for object dtyped data in column 0)") + with pytest.raises(ValueError, match=msg): + parser.read_csv(StringIO(data), header=None, names=['a', 'b'], + dtype={'a': 'bool'}, na_values=na_values) diff --git a/pandas/tests/io/parser/test_parse_dates.py b/pandas/tests/io/parser/test_parse_dates.py index e70ae03e007ee..ffc8af09bf239 100644 --- a/pandas/tests/io/parser/test_parse_dates.py +++ b/pandas/tests/io/parser/test_parse_dates.py @@ -840,9 +840,9 @@ def test_parse_timezone(all_parsers): 2018-01-04 09:05:00+09:00,23400""" result = parser.read_csv(StringIO(data), parse_dates=["dt"]) - dti = pd.DatetimeIndex(start="2018-01-04 09:01:00", - end="2018-01-04 09:05:00", freq="1min", - tz=pytz.FixedOffset(540)) + dti = pd.date_range(start="2018-01-04 09:01:00", + end="2018-01-04 09:05:00", freq="1min", + tz=pytz.FixedOffset(540)) expected_data = {"dt": dti, "val": [23350, 23400, 23400, 23400, 23400]} expected = DataFrame(expected_data) diff --git a/pandas/tests/io/parser/test_parsers.py b/pandas/tests/io/parser/test_parsers.py deleted file mode 100644 index 11389a943bea2..0000000000000 --- a/pandas/tests/io/parser/test_parsers.py +++ /dev/null @@ -1,143 +0,0 @@ -# -*- coding: utf-8 -*- - -import os - -import pytest - -from pandas._libs.tslib import Timestamp -from pandas.compat import StringIO -from pandas.errors import AbstractMethodError - -from pandas import DataFrame, read_csv, read_table -import pandas.util.testing as tm - -from .common import ParserTests -from .python_parser_only import PythonParserTests -from .quoting import QuotingTests -from .usecols import UsecolsTests - - -class BaseParser(ParserTests, UsecolsTests, - QuotingTests): - - def read_csv(self, *args, **kwargs): - raise NotImplementedError - - def read_table(self, *args, **kwargs): - raise NotImplementedError - - def float_precision_choices(self): - raise AbstractMethodError(self) - - @pytest.fixture(autouse=True) - def setup_method(self, datapath): - self.dirpath = datapath('io', 'parser', 'data') - self.csv1 = os.path.join(self.dirpath, 'test1.csv') - self.csv2 = os.path.join(self.dirpath, 'test2.csv') - self.xls1 = os.path.join(self.dirpath, 'test.xls') - self.csv_shiftjs = os.path.join(self.dirpath, 'sauron.SHIFT_JIS.csv') - - -class TestCParserHighMemory(BaseParser): - engine = 'c' - low_memory = False - float_precision_choices = [None, 'high', 'round_trip'] - - def read_csv(self, *args, **kwds): - kwds = kwds.copy() - kwds['engine'] = self.engine - kwds['low_memory'] = self.low_memory - return read_csv(*args, **kwds) - - def read_table(self, *args, **kwds): - kwds = kwds.copy() - kwds['engine'] = self.engine - kwds['low_memory'] = self.low_memory - with tm.assert_produces_warning(FutureWarning): - df = read_table(*args, **kwds) - return df - - -class TestCParserLowMemory(BaseParser): - engine = 'c' - low_memory = True - float_precision_choices = [None, 'high', 'round_trip'] - - def read_csv(self, *args, **kwds): - kwds = kwds.copy() - kwds['engine'] = self.engine - kwds['low_memory'] = self.low_memory - return read_csv(*args, **kwds) - - def read_table(self, *args, **kwds): - kwds = kwds.copy() - kwds['engine'] = self.engine - kwds['low_memory'] = True - with tm.assert_produces_warning(FutureWarning): - df = read_table(*args, **kwds) - return df - - -class TestPythonParser(BaseParser, PythonParserTests): - engine = 'python' - float_precision_choices = [None] - - def read_csv(self, *args, **kwds): - kwds = kwds.copy() - kwds['engine'] = self.engine - return read_csv(*args, **kwds) - - def read_table(self, *args, **kwds): - kwds = kwds.copy() - kwds['engine'] = self.engine - with tm.assert_produces_warning(FutureWarning): - df = read_table(*args, **kwds) - return df - - -class TestUnsortedUsecols(object): - def test_override__set_noconvert_columns(self): - # GH 17351 - usecols needs to be sorted in _setnoconvert_columns - # based on the test_usecols_with_parse_dates test from usecols.py - from pandas.io.parsers import CParserWrapper, TextFileReader - - s = """a,b,c,d,e - 0,1,20140101,0900,4 - 0,1,20140102,1000,4""" - - parse_dates = [[1, 2]] - cols = { - 'a': [0, 0], - 'c_d': [ - Timestamp('2014-01-01 09:00:00'), - Timestamp('2014-01-02 10:00:00') - ] - } - expected = DataFrame(cols, columns=['c_d', 'a']) - - class MyTextFileReader(TextFileReader): - def __init__(self): - self._currow = 0 - self.squeeze = False - - class MyCParserWrapper(CParserWrapper): - def _set_noconvert_columns(self): - if self.usecols_dtype == 'integer': - # self.usecols is a set, which is documented as unordered - # but in practice, a CPython set of integers is sorted. - # In other implementations this assumption does not hold. - # The following code simulates a different order, which - # before GH 17351 would cause the wrong columns to be - # converted via the parse_dates parameter - self.usecols = list(self.usecols) - self.usecols.reverse() - return CParserWrapper._set_noconvert_columns(self) - - parser = MyTextFileReader() - parser.options = {'usecols': [0, 2, 3], - 'parse_dates': parse_dates, - 'delimiter': ','} - parser._engine = MyCParserWrapper(StringIO(s), **parser.options) - df = parser.read() - - tm.assert_frame_equal(df, expected) diff --git a/pandas/tests/io/parser/test_python_parser_only.py b/pandas/tests/io/parser/test_python_parser_only.py new file mode 100644 index 0000000000000..d5a7e3549ef0f --- /dev/null +++ b/pandas/tests/io/parser/test_python_parser_only.py @@ -0,0 +1,303 @@ +# -*- coding: utf-8 -*- + +""" +Tests that apply specifically to the Python parser. Unless specifically +stated as a Python-specific issue, the goal is to eventually move as many of +these tests out of this module as soon as the C parser can accept further +arguments when parsing. +""" + +import csv +import sys + +import pytest + +import pandas.compat as compat +from pandas.compat import BytesIO, StringIO, u +from pandas.errors import ParserError + +from pandas import DataFrame, Index, MultiIndex +import pandas.util.testing as tm + + +def test_default_separator(python_parser_only): + # see gh-17333 + # + # csv.Sniffer in Python treats "o" as separator. + data = "aob\n1o2\n3o4" + parser = python_parser_only + expected = DataFrame({"a": [1, 3], "b": [2, 4]}) + + result = parser.read_csv(StringIO(data), sep=None) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("skipfooter", ["foo", 1.5, True]) +def test_invalid_skipfooter_non_int(python_parser_only, skipfooter): + # see gh-15925 (comment) + data = "a\n1\n2" + parser = python_parser_only + msg = "skipfooter must be an integer" + + with pytest.raises(ValueError, match=msg): + parser.read_csv(StringIO(data), skipfooter=skipfooter) + + +def test_invalid_skipfooter_negative(python_parser_only): + # see gh-15925 (comment) + data = "a\n1\n2" + parser = python_parser_only + msg = "skipfooter cannot be negative" + + with pytest.raises(ValueError, match=msg): + parser.read_csv(StringIO(data), skipfooter=-1) + + +@pytest.mark.parametrize("kwargs", [ + dict(sep=None), + dict(delimiter="|") +]) +def test_sniff_delimiter(python_parser_only, kwargs): + data = """index|A|B|C +foo|1|2|3 +bar|4|5|6 +baz|7|8|9 +""" + parser = python_parser_only + result = parser.read_csv(StringIO(data), index_col=0, **kwargs) + expected = DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], + columns=["A", "B", "C"], + index=Index(["foo", "bar", "baz"], name="index")) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("encoding", [None, "utf-8"]) +def test_sniff_delimiter_encoding(python_parser_only, encoding): + parser = python_parser_only + data = """ignore this +ignore this too +index|A|B|C +foo|1|2|3 +bar|4|5|6 +baz|7|8|9 +""" + + if encoding is not None: + data = u(data).encode(encoding) + data = BytesIO(data) + + if compat.PY3: + from io import TextIOWrapper + data = TextIOWrapper(data, encoding=encoding) + else: + data = StringIO(data) + + result = parser.read_csv(data, index_col=0, sep=None, + skiprows=2, encoding=encoding) + expected = DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], + columns=["A", "B", "C"], + index=Index(["foo", "bar", "baz"], name="index")) + tm.assert_frame_equal(result, expected) + + +def test_single_line(python_parser_only): + # see gh-6607: sniff separator + parser = python_parser_only + result = parser.read_csv(StringIO("1,2"), names=["a", "b"], + header=None, sep=None) + + expected = DataFrame({"a": [1], "b": [2]}) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("kwargs", [dict(skipfooter=2), dict(nrows=3)]) +def test_skipfooter(python_parser_only, kwargs): + # see gh-6607 + data = """A,B,C +1,2,3 +4,5,6 +7,8,9 +want to skip this +also also skip this +""" + parser = python_parser_only + result = parser.read_csv(StringIO(data), **kwargs) + + expected = DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], + columns=["A", "B", "C"]) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("compression,klass", [ + ("gzip", "GzipFile"), + ("bz2", "BZ2File"), +]) +def test_decompression_regex_sep(python_parser_only, csv1, compression, klass): + # see gh-6607 + parser = python_parser_only + + with open(csv1, "rb") as f: + data = f.read() + + data = data.replace(b",", b"::") + expected = parser.read_csv(csv1) + + module = pytest.importorskip(compression) + klass = getattr(module, klass) + + with tm.ensure_clean() as path: + tmp = klass(path, mode="wb") + tmp.write(data) + tmp.close() + + result = parser.read_csv(path, sep="::", + compression=compression) + tm.assert_frame_equal(result, expected) + + +def test_read_csv_buglet_4x_multi_index(python_parser_only): + # see gh-6607 + data = """ A B C D E +one two three four +a b 10.0032 5 -0.5109 -2.3358 -0.4645 0.05076 0.3640 +a q 20 4 0.4473 1.4152 0.2834 1.00661 0.1744 +x q 30 3 -0.6662 -0.5243 -0.3580 0.89145 2.5838""" + parser = python_parser_only + + expected = DataFrame([[-0.5109, -2.3358, -0.4645, 0.05076, 0.3640], + [0.4473, 1.4152, 0.2834, 1.00661, 0.1744], + [-0.6662, -0.5243, -0.3580, 0.89145, 2.5838]], + columns=["A", "B", "C", "D", "E"], + index=MultiIndex.from_tuples([ + ("a", "b", 10.0032, 5), + ("a", "q", 20, 4), + ("x", "q", 30, 3), + ], names=["one", "two", "three", "four"])) + result = parser.read_csv(StringIO(data), sep=r"\s+") + tm.assert_frame_equal(result, expected) + + +def test_read_csv_buglet_4x_multi_index2(python_parser_only): + # see gh-6893 + data = " A B C\na b c\n1 3 7 0 3 6\n3 1 4 1 5 9" + parser = python_parser_only + + expected = DataFrame.from_records( + [(1, 3, 7, 0, 3, 6), (3, 1, 4, 1, 5, 9)], + columns=list("abcABC"), index=list("abc")) + result = parser.read_csv(StringIO(data), sep=r"\s+") + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("add_footer", [True, False]) +def test_skipfooter_with_decimal(python_parser_only, add_footer): + # see gh-6971 + data = "1#2\n3#4" + parser = python_parser_only + expected = DataFrame({"a": [1.2, 3.4]}) + + if add_footer: + # The stray footer line should not mess with the + # casting of the first two lines if we skip it. + kwargs = dict(skipfooter=1) + data += "\nFooter" + else: + kwargs = dict() + + result = parser.read_csv(StringIO(data), names=["a"], + decimal="#", **kwargs) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("sep", ["::", "#####", "!!!", "123", "#1!c5", + "%!c!d", "@@#4:2", "_!pd#_"]) +@pytest.mark.parametrize("encoding", ["utf-16", "utf-16-be", "utf-16-le", + "utf-32", "cp037"]) +def test_encoding_non_utf8_multichar_sep(python_parser_only, sep, encoding): + # see gh-3404 + expected = DataFrame({"a": [1], "b": [2]}) + parser = python_parser_only + + data = "1" + sep + "2" + encoded_data = data.encode(encoding) + + result = parser.read_csv(BytesIO(encoded_data), sep=sep, + names=["a", "b"], encoding=encoding) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("quoting", [csv.QUOTE_MINIMAL, csv.QUOTE_NONE]) +def test_multi_char_sep_quotes(python_parser_only, quoting): + # see gh-13374 + kwargs = dict(sep=",,") + parser = python_parser_only + + data = 'a,,b\n1,,a\n2,,"2,,b"' + msg = "ignored when a multi-char delimiter is used" + + def fail_read(): + with pytest.raises(ParserError, match=msg): + parser.read_csv(StringIO(data), quoting=quoting, **kwargs) + + if quoting == csv.QUOTE_NONE: + # We expect no match, so there should be an assertion + # error out of the inner context manager. + with pytest.raises(AssertionError): + fail_read() + else: + fail_read() + + +@tm.capture_stderr +def test_none_delimiter(python_parser_only): + # see gh-13374 and gh-17465 + parser = python_parser_only + data = "a,b,c\n0,1,2\n3,4,5,6\n7,8,9" + expected = DataFrame({"a": [0, 7], "b": [1, 8], "c": [2, 9]}) + + # We expect the third line in the data to be + # skipped because it is malformed, but we do + # not expect any errors to occur. + result = parser.read_csv(StringIO(data), header=0, + sep=None, warn_bad_lines=True, + error_bad_lines=False) + tm.assert_frame_equal(result, expected) + + warning = sys.stderr.getvalue() + assert "Skipping line 3" in warning + + +@pytest.mark.parametrize("data", [ + 'a\n1\n"b"a', 'a,b,c\ncat,foo,bar\ndog,foo,"baz']) +@pytest.mark.parametrize("skipfooter", [0, 1]) +def test_skipfooter_bad_row(python_parser_only, data, skipfooter): + # see gh-13879 and gh-15910 + msg = "parsing errors in the skipped footer rows" + parser = python_parser_only + + def fail_read(): + with pytest.raises(ParserError, match=msg): + parser.read_csv(StringIO(data), skipfooter=skipfooter) + + if skipfooter: + fail_read() + else: + # We expect no match, so there should be an assertion + # error out of the inner context manager. + with pytest.raises(AssertionError): + fail_read() + + +def test_malformed_skipfooter(python_parser_only): + parser = python_parser_only + data = """ignore +A,B,C +1,2,3 # comment +1,2,3,4,5 +2,3,4 +footer +""" + msg = "Expected 3 fields in line 4, saw 5" + with pytest.raises(ParserError, match=msg): + parser.read_csv(StringIO(data), header=1, + comment="#", skipfooter=1) diff --git a/pandas/tests/io/parser/test_quoting.py b/pandas/tests/io/parser/test_quoting.py new file mode 100644 index 0000000000000..b33a1b8448bea --- /dev/null +++ b/pandas/tests/io/parser/test_quoting.py @@ -0,0 +1,158 @@ +# -*- coding: utf-8 -*- + +""" +Tests that quoting specifications are properly handled +during parsing for all of the parsers defined in parsers.py +""" + +import csv + +import pytest + +from pandas.compat import PY2, StringIO, u +from pandas.errors import ParserError + +from pandas import DataFrame +import pandas.util.testing as tm + + +@pytest.mark.parametrize("kwargs,msg", [ + (dict(quotechar="foo"), '"quotechar" must be a(n)? 1-character string'), + (dict(quotechar=None, quoting=csv.QUOTE_MINIMAL), + "quotechar must be set if quoting enabled"), + (dict(quotechar=2), '"quotechar" must be string, not int') +]) +def test_bad_quote_char(all_parsers, kwargs, msg): + data = "1,2,3" + parser = all_parsers + + with pytest.raises(TypeError, match=msg): + parser.read_csv(StringIO(data), **kwargs) + + +@pytest.mark.parametrize("quoting,msg", [ + ("foo", '"quoting" must be an integer'), + (5, 'bad "quoting" value'), # quoting must be in the range [0, 3] +]) +def test_bad_quoting(all_parsers, quoting, msg): + data = "1,2,3" + parser = all_parsers + + with pytest.raises(TypeError, match=msg): + parser.read_csv(StringIO(data), quoting=quoting) + + +def test_quote_char_basic(all_parsers): + parser = all_parsers + data = 'a,b,c\n1,2,"cat"' + expected = DataFrame([[1, 2, "cat"]], + columns=["a", "b", "c"]) + + result = parser.read_csv(StringIO(data), quotechar='"') + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("quote_char", ["~", "*", "%", "$", "@", "P"]) +def test_quote_char_various(all_parsers, quote_char): + parser = all_parsers + expected = DataFrame([[1, 2, "cat"]], + columns=["a", "b", "c"]) + + data = 'a,b,c\n1,2,"cat"' + new_data = data.replace('"', quote_char) + + result = parser.read_csv(StringIO(new_data), quotechar=quote_char) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("quoting", [csv.QUOTE_MINIMAL, csv.QUOTE_NONE]) +@pytest.mark.parametrize("quote_char", ["", None]) +def test_null_quote_char(all_parsers, quoting, quote_char): + kwargs = dict(quotechar=quote_char, quoting=quoting) + data = "a,b,c\n1,2,3" + parser = all_parsers + + if quoting != csv.QUOTE_NONE: + # Sanity checking. + msg = "quotechar must be set if quoting enabled" + + with pytest.raises(TypeError, match=msg): + parser.read_csv(StringIO(data), **kwargs) + else: + expected = DataFrame([[1, 2, 3]], columns=["a", "b", "c"]) + result = parser.read_csv(StringIO(data), **kwargs) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("kwargs,exp_data", [ + (dict(), [[1, 2, "foo"]]), # Test default. + + # QUOTE_MINIMAL only applies to CSV writing, so no effect on reading. + (dict(quotechar='"', quoting=csv.QUOTE_MINIMAL), [[1, 2, "foo"]]), + + # QUOTE_MINIMAL only applies to CSV writing, so no effect on reading. + (dict(quotechar='"', quoting=csv.QUOTE_ALL), [[1, 2, "foo"]]), + + # QUOTE_NONE tells the reader to do no special handling + # of quote characters and leave them alone. + (dict(quotechar='"', quoting=csv.QUOTE_NONE), [[1, 2, '"foo"']]), + + # QUOTE_NONNUMERIC tells the reader to cast + # all non-quoted fields to float + (dict(quotechar='"', quoting=csv.QUOTE_NONNUMERIC), [[1.0, 2.0, "foo"]]) +]) +def test_quoting_various(all_parsers, kwargs, exp_data): + data = '1,2,"foo"' + parser = all_parsers + columns = ["a", "b", "c"] + + result = parser.read_csv(StringIO(data), names=columns, **kwargs) + expected = DataFrame(exp_data, columns=columns) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("doublequote,exp_data", [ + (True, [[3, '4 " 5']]), + (False, [[3, '4 " 5"']]), +]) +def test_double_quote(all_parsers, doublequote, exp_data): + parser = all_parsers + data = 'a,b\n3,"4 "" 5"' + + result = parser.read_csv(StringIO(data), quotechar='"', + doublequote=doublequote) + expected = DataFrame(exp_data, columns=["a", "b"]) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("quotechar", [ + u('"'), + pytest.param(u('\u0001'), marks=pytest.mark.skipif( + PY2, reason="Python 2.x does not handle unicode well."))]) +def test_quotechar_unicode(all_parsers, quotechar): + # see gh-14477 + data = "a\n1" + parser = all_parsers + expected = DataFrame({"a": [1]}) + + result = parser.read_csv(StringIO(data), quotechar=quotechar) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("balanced", [True, False]) +def test_unbalanced_quoting(all_parsers, balanced): + # see gh-22789. + parser = all_parsers + data = "a,b,c\n1,2,\"3" + + if balanced: + # Re-balance the quoting and read in without errors. + expected = DataFrame([[1, 2, 3]], columns=["a", "b", "c"]) + result = parser.read_csv(StringIO(data + '"')) + tm.assert_frame_equal(result, expected) + else: + msg = ("EOF inside string starting at row 1" if parser.engine == "c" + else "unexpected end of data") + + with pytest.raises(ParserError, match=msg): + parser.read_csv(StringIO(data)) diff --git a/pandas/tests/io/parser/test_read_fwf.py b/pandas/tests/io/parser/test_read_fwf.py index bb64a85590c8b..e8c5b37579d71 100644 --- a/pandas/tests/io/parser/test_read_fwf.py +++ b/pandas/tests/io/parser/test_read_fwf.py @@ -15,78 +15,163 @@ from pandas.compat import BytesIO, StringIO import pandas as pd -from pandas import DataFrame +from pandas import DataFrame, DatetimeIndex import pandas.util.testing as tm from pandas.io.parsers import EmptyDataError, read_csv, read_fwf -class TestFwfParsing(object): - - def test_fwf(self): - data_expected = """\ -2011,58,360.242940,149.910199,11950.7 -2011,59,444.953632,166.985655,11788.4 -2011,60,364.136849,183.628767,11806.2 -2011,61,413.836124,184.375703,11916.8 -2011,62,502.953953,173.237159,12468.3 +def test_basic(): + data = """\ +A B C D +201158 360.242940 149.910199 11950.7 +201159 444.953632 166.985655 11788.4 +201160 364.136849 183.628767 11806.2 +201161 413.836124 184.375703 11916.8 +201162 502.953953 173.237159 12468.3 """ - expected = read_csv(StringIO(data_expected), - engine='python', header=None) - - data1 = """\ + result = read_fwf(StringIO(data)) + expected = DataFrame([[201158, 360.242940, 149.910199, 11950.7], + [201159, 444.953632, 166.985655, 11788.4], + [201160, 364.136849, 183.628767, 11806.2], + [201161, 413.836124, 184.375703, 11916.8], + [201162, 502.953953, 173.237159, 12468.3]], + columns=["A", "B", "C", "D"]) + tm.assert_frame_equal(result, expected) + + +def test_colspecs(): + data = """\ +A B C D E 201158 360.242940 149.910199 11950.7 201159 444.953632 166.985655 11788.4 201160 364.136849 183.628767 11806.2 201161 413.836124 184.375703 11916.8 201162 502.953953 173.237159 12468.3 """ - colspecs = [(0, 4), (4, 8), (8, 20), (21, 33), (34, 43)] - df = read_fwf(StringIO(data1), colspecs=colspecs, header=None) - tm.assert_frame_equal(df, expected) + colspecs = [(0, 4), (4, 8), (8, 20), (21, 33), (34, 43)] + result = read_fwf(StringIO(data), colspecs=colspecs) + + expected = DataFrame([[2011, 58, 360.242940, 149.910199, 11950.7], + [2011, 59, 444.953632, 166.985655, 11788.4], + [2011, 60, 364.136849, 183.628767, 11806.2], + [2011, 61, 413.836124, 184.375703, 11916.8], + [2011, 62, 502.953953, 173.237159, 12468.3]], + columns=["A", "B", "C", "D", "E"]) + tm.assert_frame_equal(result, expected) - data2 = """\ + +def test_widths(): + data = """\ +A B C D E 2011 58 360.242940 149.910199 11950.7 2011 59 444.953632 166.985655 11788.4 2011 60 364.136849 183.628767 11806.2 2011 61 413.836124 184.375703 11916.8 2011 62 502.953953 173.237159 12468.3 """ - df = read_fwf(StringIO(data2), widths=[5, 5, 13, 13, 7], header=None) - tm.assert_frame_equal(df, expected) - - # From Thomas Kluyver: apparently some non-space filler characters can - # be seen, this is supported by specifying the 'delimiter' character: - # http://publib.boulder.ibm.com/infocenter/dmndhelp/v6r1mx/index.jsp?topic=/com.ibm.wbit.612.help.config.doc/topics/rfixwidth.html - data3 = """\ + result = read_fwf(StringIO(data), widths=[5, 5, 13, 13, 7]) + + expected = DataFrame([[2011, 58, 360.242940, 149.910199, 11950.7], + [2011, 59, 444.953632, 166.985655, 11788.4], + [2011, 60, 364.136849, 183.628767, 11806.2], + [2011, 61, 413.836124, 184.375703, 11916.8], + [2011, 62, 502.953953, 173.237159, 12468.3]], + columns=["A", "B", "C", "D", "E"]) + tm.assert_frame_equal(result, expected) + + +def test_non_space_filler(): + # From Thomas Kluyver: + # + # Apparently, some non-space filler characters can be seen, this is + # supported by specifying the 'delimiter' character: + # + # http://publib.boulder.ibm.com/infocenter/dmndhelp/v6r1mx/index.jsp?topic=/com.ibm.wbit.612.help.config.doc/topics/rfixwidth.html + data = """\ +A~~~~B~~~~C~~~~~~~~~~~~D~~~~~~~~~~~~E 201158~~~~360.242940~~~149.910199~~~11950.7 201159~~~~444.953632~~~166.985655~~~11788.4 201160~~~~364.136849~~~183.628767~~~11806.2 201161~~~~413.836124~~~184.375703~~~11916.8 201162~~~~502.953953~~~173.237159~~~12468.3 """ - df = read_fwf( - StringIO(data3), colspecs=colspecs, delimiter='~', header=None) - tm.assert_frame_equal(df, expected) + colspecs = [(0, 4), (4, 8), (8, 20), (21, 33), (34, 43)] + result = read_fwf(StringIO(data), colspecs=colspecs, delimiter="~") - with pytest.raises(ValueError, match="must specify only one of"): - read_fwf(StringIO(data3), colspecs=colspecs, widths=[6, 10, 10, 7]) + expected = DataFrame([[2011, 58, 360.242940, 149.910199, 11950.7], + [2011, 59, 444.953632, 166.985655, 11788.4], + [2011, 60, 364.136849, 183.628767, 11806.2], + [2011, 61, 413.836124, 184.375703, 11916.8], + [2011, 62, 502.953953, 173.237159, 12468.3]], + columns=["A", "B", "C", "D", "E"]) + tm.assert_frame_equal(result, expected) - with pytest.raises(ValueError, match="Must specify either"): - read_fwf(StringIO(data3), colspecs=None, widths=None) - def test_BytesIO_input(self): - if not compat.PY3: - pytest.skip( - "Bytes-related test - only needs to work on Python 3") +def test_over_specified(): + data = """\ +A B C D E +201158 360.242940 149.910199 11950.7 +201159 444.953632 166.985655 11788.4 +201160 364.136849 183.628767 11806.2 +201161 413.836124 184.375703 11916.8 +201162 502.953953 173.237159 12468.3 +""" + colspecs = [(0, 4), (4, 8), (8, 20), (21, 33), (34, 43)] + + with pytest.raises(ValueError, match="must specify only one of"): + read_fwf(StringIO(data), colspecs=colspecs, widths=[6, 10, 10, 7]) + + +def test_under_specified(): + data = """\ +A B C D E +201158 360.242940 149.910199 11950.7 +201159 444.953632 166.985655 11788.4 +201160 364.136849 183.628767 11806.2 +201161 413.836124 184.375703 11916.8 +201162 502.953953 173.237159 12468.3 +""" + with pytest.raises(ValueError, match="Must specify either"): + read_fwf(StringIO(data), colspecs=None, widths=None) + + +def test_read_csv_compat(): + csv_data = """\ +A,B,C,D,E +2011,58,360.242940,149.910199,11950.7 +2011,59,444.953632,166.985655,11788.4 +2011,60,364.136849,183.628767,11806.2 +2011,61,413.836124,184.375703,11916.8 +2011,62,502.953953,173.237159,12468.3 +""" + expected = read_csv(StringIO(csv_data), engine="python") + + fwf_data = """\ +A B C D E +201158 360.242940 149.910199 11950.7 +201159 444.953632 166.985655 11788.4 +201160 364.136849 183.628767 11806.2 +201161 413.836124 184.375703 11916.8 +201162 502.953953 173.237159 12468.3 +""" + colspecs = [(0, 4), (4, 8), (8, 20), (21, 33), (34, 43)] + result = read_fwf(StringIO(fwf_data), colspecs=colspecs) + tm.assert_frame_equal(result, expected) - result = read_fwf(BytesIO("שלום\nשלום".encode('utf8')), widths=[ - 2, 2], encoding='utf8') - expected = DataFrame([["של", "ום"]], columns=["של", "ום"]) - tm.assert_frame_equal(result, expected) - def test_fwf_colspecs_is_list_or_tuple(self): - data = """index,A,B,C,D +def test_bytes_io_input(): + if not compat.PY3: + pytest.skip("Bytes-related test - only needs to work on Python 3") + + result = read_fwf(BytesIO("שלום\nשלום".encode('utf8')), + widths=[2, 2], encoding="utf8") + expected = DataFrame([["של", "ום"]], columns=["של", "ום"]) + tm.assert_frame_equal(result, expected) + + +def test_fwf_colspecs_is_list_or_tuple(): + data = """index,A,B,C,D foo,2,3,4,5 bar,7,8,9,10 baz,12,13,14,15 @@ -95,13 +180,14 @@ def test_fwf_colspecs_is_list_or_tuple(self): bar2,12,13,14,15 """ - msg = 'column specifications must be a list or tuple.+' - with pytest.raises(TypeError, match=msg): - pd.io.parsers.FixedWidthReader(StringIO(data), - {'a': 1}, ',', '#') + msg = "column specifications must be a list or tuple.+" - def test_fwf_colspecs_is_list_or_tuple_of_two_element_tuples(self): - data = """index,A,B,C,D + with pytest.raises(TypeError, match=msg): + read_fwf(StringIO(data), colspecs={"a": 1}, delimiter=",") + + +def test_fwf_colspecs_is_list_or_tuple_of_two_element_tuples(): + data = """index,A,B,C,D foo,2,3,4,5 bar,7,8,9,10 baz,12,13,14,15 @@ -110,146 +196,151 @@ def test_fwf_colspecs_is_list_or_tuple_of_two_element_tuples(self): bar2,12,13,14,15 """ - msg = 'Each column specification must be.+' - with pytest.raises(TypeError, match=msg): - read_fwf(StringIO(data), [('a', 1)]) + msg = "Each column specification must be.+" + + with pytest.raises(TypeError, match=msg): + read_fwf(StringIO(data), [("a", 1)]) - def test_fwf_colspecs_None(self): - # GH 7079 - data = """\ + +@pytest.mark.parametrize("colspecs,exp_data", [ + ([(0, 3), (3, None)], [[123, 456], [456, 789]]), + ([(None, 3), (3, 6)], [[123, 456], [456, 789]]), + ([(0, None), (3, None)], [[123456, 456], [456789, 789]]), + ([(None, None), (3, 6)], [[123456, 456], [456789, 789]]), +]) +def test_fwf_colspecs_none(colspecs, exp_data): + # see gh-7079 + data = """\ 123456 456789 """ - colspecs = [(0, 3), (3, None)] - result = read_fwf(StringIO(data), colspecs=colspecs, header=None) - expected = DataFrame([[123, 456], [456, 789]]) - tm.assert_frame_equal(result, expected) + expected = DataFrame(exp_data) - colspecs = [(None, 3), (3, 6)] - result = read_fwf(StringIO(data), colspecs=colspecs, header=None) - expected = DataFrame([[123, 456], [456, 789]]) - tm.assert_frame_equal(result, expected) + result = read_fwf(StringIO(data), colspecs=colspecs, header=None) + tm.assert_frame_equal(result, expected) - colspecs = [(0, None), (3, None)] - result = read_fwf(StringIO(data), colspecs=colspecs, header=None) - expected = DataFrame([[123456, 456], [456789, 789]]) - tm.assert_frame_equal(result, expected) - colspecs = [(None, None), (3, 6)] - result = read_fwf(StringIO(data), colspecs=colspecs, header=None) - expected = DataFrame([[123456, 456], [456789, 789]]) - tm.assert_frame_equal(result, expected) +@pytest.mark.parametrize("infer_nrows,exp_data", [ + # infer_nrows --> colspec == [(2, 3), (5, 6)] + (1, [[1, 2], [3, 8]]), - def test_fwf_regression(self): - # GH 3594 - # turns out 'T060' is parsable as a datetime slice! - - tzlist = [1, 10, 20, 30, 60, 80, 100] - ntz = len(tzlist) - tcolspecs = [16] + [8] * ntz - tcolnames = ['SST'] + ["T%03d" % z for z in tzlist[1:]] - data = """ 2009164202000 9.5403 9.4105 8.6571 7.8372 6.0612 5.8843 5.5192 - 2009164203000 9.5435 9.2010 8.6167 7.8176 6.0804 5.8728 5.4869 - 2009164204000 9.5873 9.1326 8.4694 7.5889 6.0422 5.8526 5.4657 - 2009164205000 9.5810 9.0896 8.4009 7.4652 6.0322 5.8189 5.4379 - 2009164210000 9.6034 9.0897 8.3822 7.4905 6.0908 5.7904 5.4039 + # infer_nrows > number of rows + (10, [[1, 2], [123, 98]]), +]) +def test_fwf_colspecs_infer_nrows(infer_nrows, exp_data): + # see gh-15138 + data = """\ + 1 2 +123 98 """ + expected = DataFrame(exp_data) + + result = read_fwf(StringIO(data), infer_nrows=infer_nrows, header=None) + tm.assert_frame_equal(result, expected) - df = read_fwf(StringIO(data), - index_col=0, - header=None, - names=tcolnames, - widths=tcolspecs, - parse_dates=True, - date_parser=lambda s: datetime.strptime(s, '%Y%j%H%M%S')) - for c in df.columns: - res = df.loc[:, c] - assert len(res) +def test_fwf_regression(): + # see gh-3594 + # + # Turns out "T060" is parsable as a datetime slice! + tz_list = [1, 10, 20, 30, 60, 80, 100] + widths = [16] + [8] * len(tz_list) + names = ["SST"] + ["T%03d" % z for z in tz_list[1:]] - def test_fwf_for_uint8(self): - data = """1421302965.213420 PRI=3 PGN=0xef00 DST=0x17 SRC=0x28 04 154 00 00 00 00 00 127 + data = """ 2009164202000 9.5403 9.4105 8.6571 7.8372 6.0612 5.8843 5.5192 +2009164203000 9.5435 9.2010 8.6167 7.8176 6.0804 5.8728 5.4869 +2009164204000 9.5873 9.1326 8.4694 7.5889 6.0422 5.8526 5.4657 +2009164205000 9.5810 9.0896 8.4009 7.4652 6.0322 5.8189 5.4379 +2009164210000 9.6034 9.0897 8.3822 7.4905 6.0908 5.7904 5.4039 +""" + + result = read_fwf(StringIO(data), index_col=0, header=None, names=names, + widths=widths, parse_dates=True, + date_parser=lambda s: datetime.strptime(s, "%Y%j%H%M%S")) + expected = DataFrame([ + [9.5403, 9.4105, 8.6571, 7.8372, 6.0612, 5.8843, 5.5192], + [9.5435, 9.2010, 8.6167, 7.8176, 6.0804, 5.8728, 5.4869], + [9.5873, 9.1326, 8.4694, 7.5889, 6.0422, 5.8526, 5.4657], + [9.5810, 9.0896, 8.4009, 7.4652, 6.0322, 5.8189, 5.4379], + [9.6034, 9.0897, 8.3822, 7.4905, 6.0908, 5.7904, 5.4039], + ], index=DatetimeIndex(["2009-06-13 20:20:00", "2009-06-13 20:30:00", + "2009-06-13 20:40:00", "2009-06-13 20:50:00", + "2009-06-13 21:00:00"]), + columns=["SST", "T010", "T020", "T030", "T060", "T080", "T100"]) + tm.assert_frame_equal(result, expected) + + +def test_fwf_for_uint8(): + data = """1421302965.213420 PRI=3 PGN=0xef00 DST=0x17 SRC=0x28 04 154 00 00 00 00 00 127 1421302964.226776 PRI=6 PGN=0xf002 SRC=0x47 243 00 00 255 247 00 00 71""" # noqa - df = read_fwf(StringIO(data), - colspecs=[(0, 17), (25, 26), (33, 37), - (49, 51), (58, 62), (63, 1000)], - names=['time', 'pri', 'pgn', 'dst', 'src', 'data'], - converters={ - 'pgn': lambda x: int(x, 16), - 'src': lambda x: int(x, 16), - 'dst': lambda x: int(x, 16), - 'data': lambda x: len(x.split(' '))}) - - expected = DataFrame([[1421302965.213420, 3, 61184, 23, 40, 8], - [1421302964.226776, 6, 61442, None, 71, 8]], - columns=["time", "pri", "pgn", - "dst", "src", "data"]) - expected["dst"] = expected["dst"].astype(object) - - tm.assert_frame_equal(df, expected) - - def test_fwf_compression(self): - try: - import gzip - import bz2 - except ImportError: - pytest.skip("Need gzip and bz2 to run this test") - - data = """1111111111 - 2222222222 - 3333333333""".strip() - widths = [5, 5] - names = ['one', 'two'] - expected = read_fwf(StringIO(data), widths=widths, names=names) - if compat.PY3: - data = bytes(data, encoding='utf-8') - comps = [('gzip', gzip.GzipFile), ('bz2', bz2.BZ2File)] - for comp_name, compresser in comps: - with tm.ensure_clean() as path: - tmp = compresser(path, mode='wb') - tmp.write(data) - tmp.close() - result = read_fwf(path, widths=widths, names=names, - compression=comp_name) - tm.assert_frame_equal(result, expected) - - def test_comment_fwf(self): - data = """ + df = read_fwf(StringIO(data), + colspecs=[(0, 17), (25, 26), (33, 37), + (49, 51), (58, 62), (63, 1000)], + names=["time", "pri", "pgn", "dst", "src", "data"], + converters={ + "pgn": lambda x: int(x, 16), + "src": lambda x: int(x, 16), + "dst": lambda x: int(x, 16), + "data": lambda x: len(x.split(" "))}) + + expected = DataFrame([[1421302965.213420, 3, 61184, 23, 40, 8], + [1421302964.226776, 6, 61442, None, 71, 8]], + columns=["time", "pri", "pgn", + "dst", "src", "data"]) + expected["dst"] = expected["dst"].astype(object) + tm.assert_frame_equal(df, expected) + + +@pytest.mark.parametrize("comment", ["#", "~", "!"]) +def test_fwf_comment(comment): + data = """\ 1 2. 4 #hello world 5 NaN 10.0 """ - expected = np.array([[1, 2., 4], - [5, np.nan, 10.]]) - df = read_fwf(StringIO(data), colspecs=[(0, 3), (4, 9), (9, 25)], - comment='#') - tm.assert_almost_equal(df.values, expected) - - def test_1000_fwf(self): - data = """ + data = data.replace("#", comment) + + colspecs = [(0, 3), (4, 9), (9, 25)] + expected = DataFrame([[1, 2., 4], [5, np.nan, 10.]]) + + result = read_fwf(StringIO(data), colspecs=colspecs, + header=None, comment=comment) + tm.assert_almost_equal(result, expected) + + +@pytest.mark.parametrize("thousands", [",", "#", "~"]) +def test_fwf_thousands(thousands): + data = """\ 1 2,334.0 5 10 13 10. """ - expected = np.array([[1, 2334., 5], - [10, 13, 10]]) - df = read_fwf(StringIO(data), colspecs=[(0, 3), (3, 11), (12, 16)], - thousands=',') - tm.assert_almost_equal(df.values, expected) - - def test_bool_header_arg(self): - # see gh-6114 - data = """\ + data = data.replace(",", thousands) + + colspecs = [(0, 3), (3, 11), (12, 16)] + expected = DataFrame([[1, 2334., 5], [10, 13, 10.]]) + + result = read_fwf(StringIO(data), header=None, + colspecs=colspecs, thousands=thousands) + tm.assert_almost_equal(result, expected) + + +@pytest.mark.parametrize("header", [True, False]) +def test_bool_header_arg(header): + # see gh-6114 + data = """\ MyColumn a b a b""" - for arg in [True, False]: - with pytest.raises(TypeError): - read_fwf(StringIO(data), header=arg) - def test_full_file(self): - # File with all values - test = """index A B C + msg = "Passing a bool to header is invalid" + with pytest.raises(TypeError, match=msg): + read_fwf(StringIO(data), header=header) + + +def test_full_file(): + # File with all values. + test = """index A B C 2000-01-03T00:00:00 0.980268513777 3 foo 2000-01-04T00:00:00 1.04791624281 -4 bar 2000-01-05T00:00:00 0.498580885705 73 baz @@ -257,13 +348,16 @@ def test_full_file(self): 2000-01-07T00:00:00 0.487094399463 0 bar 2000-01-10T00:00:00 0.836648671666 2 baz 2000-01-11T00:00:00 0.157160753327 34 foo""" - colspecs = ((0, 19), (21, 35), (38, 40), (42, 45)) - expected = read_fwf(StringIO(test), colspecs=colspecs) - tm.assert_frame_equal(expected, read_fwf(StringIO(test))) + colspecs = ((0, 19), (21, 35), (38, 40), (42, 45)) + expected = read_fwf(StringIO(test), colspecs=colspecs) + + result = read_fwf(StringIO(test)) + tm.assert_frame_equal(result, expected) + - def test_full_file_with_missing(self): - # File with missing values - test = """index A B C +def test_full_file_with_missing(): + # File with missing values. + test = """index A B C 2000-01-03T00:00:00 0.980268513777 3 foo 2000-01-04T00:00:00 1.04791624281 -4 bar 0.498580885705 73 baz @@ -271,165 +365,210 @@ def test_full_file_with_missing(self): 2000-01-07T00:00:00 0 bar 2000-01-10T00:00:00 0.836648671666 2 baz 34""" - colspecs = ((0, 19), (21, 35), (38, 40), (42, 45)) - expected = read_fwf(StringIO(test), colspecs=colspecs) - tm.assert_frame_equal(expected, read_fwf(StringIO(test))) + colspecs = ((0, 19), (21, 35), (38, 40), (42, 45)) + expected = read_fwf(StringIO(test), colspecs=colspecs) - def test_full_file_with_spaces(self): - # File with spaces in columns - test = """ + result = read_fwf(StringIO(test)) + tm.assert_frame_equal(result, expected) + + +def test_full_file_with_spaces(): + # File with spaces in columns. + test = """ Account Name Balance CreditLimit AccountCreated 101 Keanu Reeves 9315.45 10000.00 1/17/1998 312 Gerard Butler 90.00 1000.00 8/6/2003 868 Jennifer Love Hewitt 0 17000.00 5/25/1985 761 Jada Pinkett-Smith 49654.87 100000.00 12/5/2006 317 Bill Murray 789.65 5000.00 2/5/2007 -""".strip('\r\n') - colspecs = ((0, 7), (8, 28), (30, 38), (42, 53), (56, 70)) - expected = read_fwf(StringIO(test), colspecs=colspecs) - tm.assert_frame_equal(expected, read_fwf(StringIO(test))) - - def test_full_file_with_spaces_and_missing(self): - # File with spaces and missing values in columns - test = """ +""".strip("\r\n") + colspecs = ((0, 7), (8, 28), (30, 38), (42, 53), (56, 70)) + expected = read_fwf(StringIO(test), colspecs=colspecs) + + result = read_fwf(StringIO(test)) + tm.assert_frame_equal(result, expected) + + +def test_full_file_with_spaces_and_missing(): + # File with spaces and missing values in columns. + test = """ Account Name Balance CreditLimit AccountCreated 101 10000.00 1/17/1998 312 Gerard Butler 90.00 1000.00 8/6/2003 868 5/25/1985 761 Jada Pinkett-Smith 49654.87 100000.00 12/5/2006 317 Bill Murray 789.65 -""".strip('\r\n') - colspecs = ((0, 7), (8, 28), (30, 38), (42, 53), (56, 70)) - expected = read_fwf(StringIO(test), colspecs=colspecs) - tm.assert_frame_equal(expected, read_fwf(StringIO(test))) - - def test_messed_up_data(self): - # Completely messed up file - test = """ +""".strip("\r\n") + colspecs = ((0, 7), (8, 28), (30, 38), (42, 53), (56, 70)) + expected = read_fwf(StringIO(test), colspecs=colspecs) + + result = read_fwf(StringIO(test)) + tm.assert_frame_equal(result, expected) + + +def test_messed_up_data(): + # Completely messed up file. + test = """ Account Name Balance Credit Limit Account Created 101 10000.00 1/17/1998 312 Gerard Butler 90.00 1000.00 761 Jada Pinkett-Smith 49654.87 100000.00 12/5/2006 317 Bill Murray 789.65 -""".strip('\r\n') - colspecs = ((2, 10), (15, 33), (37, 45), (49, 61), (64, 79)) - expected = read_fwf(StringIO(test), colspecs=colspecs) - tm.assert_frame_equal(expected, read_fwf(StringIO(test))) +""".strip("\r\n") + colspecs = ((2, 10), (15, 33), (37, 45), (49, 61), (64, 79)) + expected = read_fwf(StringIO(test), colspecs=colspecs) + + result = read_fwf(StringIO(test)) + tm.assert_frame_equal(result, expected) + - def test_multiple_delimiters(self): - test = r""" +def test_multiple_delimiters(): + test = r""" col1~~~~~col2 col3++++++++++++++++++col4 ~~22.....11.0+++foo~~~~~~~~~~Keanu Reeves 33+++122.33\\\bar.........Gerard Butler ++44~~~~12.01 baz~~Jennifer Love Hewitt ~~55 11+++foo++++Jada Pinkett-Smith ..66++++++.03~~~bar Bill Murray -""".strip('\r\n') - colspecs = ((0, 4), (7, 13), (15, 19), (21, 41)) - expected = read_fwf(StringIO(test), colspecs=colspecs, - delimiter=' +~.\\') - tm.assert_frame_equal(expected, read_fwf(StringIO(test), - delimiter=' +~.\\')) - - def test_variable_width_unicode(self): - if not compat.PY3: - pytest.skip( - 'Bytes-related test - only needs to work on Python 3') - test = """ +""".strip("\r\n") + delimiter = " +~.\\" + colspecs = ((0, 4), (7, 13), (15, 19), (21, 41)) + expected = read_fwf(StringIO(test), colspecs=colspecs, delimiter=delimiter) + + result = read_fwf(StringIO(test), delimiter=delimiter) + tm.assert_frame_equal(result, expected) + + +def test_variable_width_unicode(): + if not compat.PY3: + pytest.skip("Bytes-related test - only needs to work on Python 3") + + data = """ שלום שלום ום שלל של ום -""".strip('\r\n') - expected = read_fwf(BytesIO(test.encode('utf8')), - colspecs=[(0, 4), (5, 9)], - header=None, encoding='utf8') - tm.assert_frame_equal(expected, read_fwf( - BytesIO(test.encode('utf8')), header=None, encoding='utf8')) - - def test_dtype(self): - data = """ a b c +""".strip("\r\n") + encoding = "utf8" + kwargs = dict(header=None, encoding=encoding) + + expected = read_fwf(BytesIO(data.encode(encoding)), + colspecs=[(0, 4), (5, 9)], **kwargs) + result = read_fwf(BytesIO(data.encode(encoding)), **kwargs) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("dtype", [ + dict(), {"a": "float64", "b": str, "c": "int32"} +]) +def test_dtype(dtype): + data = """ a b c 1 2 3.2 3 4 5.2 """ - colspecs = [(0, 5), (5, 10), (10, None)] - result = pd.read_fwf(StringIO(data), colspecs=colspecs) - expected = pd.DataFrame({ - 'a': [1, 3], - 'b': [2, 4], - 'c': [3.2, 5.2]}, columns=['a', 'b', 'c']) - tm.assert_frame_equal(result, expected) + colspecs = [(0, 5), (5, 10), (10, None)] + result = read_fwf(StringIO(data), colspecs=colspecs, dtype=dtype) - expected['a'] = expected['a'].astype('float64') - expected['b'] = expected['b'].astype(str) - expected['c'] = expected['c'].astype('int32') - result = pd.read_fwf(StringIO(data), colspecs=colspecs, - dtype={'a': 'float64', 'b': str, 'c': 'int32'}) - tm.assert_frame_equal(result, expected) + expected = pd.DataFrame({ + "a": [1, 3], "b": [2, 4], + "c": [3.2, 5.2]}, columns=["a", "b", "c"]) - def test_skiprows_inference(self): - # GH11256 - test = """ + for col, dt in dtype.items(): + expected[col] = expected[col].astype(dt) + + tm.assert_frame_equal(result, expected) + + +def test_skiprows_inference(): + # see gh-11256 + data = """ Text contained in the file header DataCol1 DataCol2 0.0 1.0 101.6 956.1 """.strip() - expected = read_csv(StringIO(test), skiprows=2, - delim_whitespace=True) - tm.assert_frame_equal(expected, read_fwf( - StringIO(test), skiprows=2)) + skiprows = 2 + expected = read_csv(StringIO(data), skiprows=skiprows, + delim_whitespace=True) + + result = read_fwf(StringIO(data), skiprows=skiprows) + tm.assert_frame_equal(result, expected) - def test_skiprows_by_index_inference(self): - test = """ + +def test_skiprows_by_index_inference(): + data = """ To be skipped Not To Be Skipped Once more to be skipped 123 34 8 123 456 78 9 456 """.strip() + skiprows = [0, 2] + expected = read_csv(StringIO(data), skiprows=skiprows, + delim_whitespace=True) + + result = read_fwf(StringIO(data), skiprows=skiprows) + tm.assert_frame_equal(result, expected) - expected = read_csv(StringIO(test), skiprows=[0, 2], - delim_whitespace=True) - tm.assert_frame_equal(expected, read_fwf( - StringIO(test), skiprows=[0, 2])) - def test_skiprows_inference_empty(self): - test = """ +def test_skiprows_inference_empty(): + data = """ AA BBB C 12 345 6 78 901 2 """.strip() - with pytest.raises(EmptyDataError): - read_fwf(StringIO(test), skiprows=3) + msg = "No rows from which to infer column width" + with pytest.raises(EmptyDataError, match=msg): + read_fwf(StringIO(data), skiprows=3) + - def test_whitespace_preservation(self): - # Addresses Issue #16772 - data_expected = """ +def test_whitespace_preservation(): + # see gh-16772 + header = None + csv_data = """ a ,bbb cc,dd """ - expected = read_csv(StringIO(data_expected), header=None) - test_data = """ + fwf_data = """ a bbb ccdd """ - result = read_fwf(StringIO(test_data), widths=[3, 3], - header=None, skiprows=[0], delimiter="\n\t") + result = read_fwf(StringIO(fwf_data), widths=[3, 3], + header=header, skiprows=[0], delimiter="\n\t") + expected = read_csv(StringIO(csv_data), header=header) + tm.assert_frame_equal(result, expected) - tm.assert_frame_equal(result, expected) - def test_default_delimiter(self): - data_expected = """ +def test_default_delimiter(): + header = None + csv_data = """ a,bbb cc,dd""" - expected = read_csv(StringIO(data_expected), header=None) - test_data = """ + fwf_data = """ a \tbbb cc\tdd """ - result = read_fwf(StringIO(test_data), widths=[3, 3], - header=None, skiprows=[0]) + result = read_fwf(StringIO(fwf_data), widths=[3, 3], + header=header, skiprows=[0]) + expected = read_csv(StringIO(csv_data), header=header) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("compression", ["gzip", "bz2"]) +def test_fwf_compression(compression): + data = """1111111111 + 2222222222 + 3333333333""".strip() + + kwargs = dict(widths=[5, 5], names=["one", "two"]) + expected = read_fwf(StringIO(data), **kwargs) + + if compat.PY3: + data = bytes(data, encoding="utf-8") + + with tm.ensure_clean() as path: + tm.write_to_compressed(compression, path, data) + result = read_fwf(path, compression=compression, **kwargs) tm.assert_frame_equal(result, expected) diff --git a/pandas/tests/io/parser/test_usecols.py b/pandas/tests/io/parser/test_usecols.py new file mode 100644 index 0000000000000..068227908a285 --- /dev/null +++ b/pandas/tests/io/parser/test_usecols.py @@ -0,0 +1,535 @@ +# -*- coding: utf-8 -*- + +""" +Tests the usecols functionality during parsing +for all of the parsers defined in parsers.py +""" + +import numpy as np +import pytest + +from pandas._libs.tslib import Timestamp +from pandas.compat import PY2, StringIO + +from pandas import DataFrame, Index +import pandas.util.testing as tm + +_msg_validate_usecols_arg = ("'usecols' must either be list-like " + "of all strings, all unicode, all " + "integers or a callable.") +_msg_validate_usecols_names = ("Usecols do not match columns, columns " + "expected but not found: {0}") + + +def test_raise_on_mixed_dtype_usecols(all_parsers): + # See gh-12678 + data = """a,b,c + 1000,2000,3000 + 4000,5000,6000 + """ + usecols = [0, "b", 2] + parser = all_parsers + + with pytest.raises(ValueError, match=_msg_validate_usecols_arg): + parser.read_csv(StringIO(data), usecols=usecols) + + +@pytest.mark.parametrize("usecols", [(1, 2), ("b", "c")]) +def test_usecols(all_parsers, usecols): + data = """\ +a,b,c +1,2,3 +4,5,6 +7,8,9 +10,11,12""" + parser = all_parsers + result = parser.read_csv(StringIO(data), usecols=usecols) + + expected = DataFrame([[2, 3], [5, 6], [8, 9], + [11, 12]], columns=["b", "c"]) + tm.assert_frame_equal(result, expected) + + +def test_usecols_with_names(all_parsers): + data = """\ +a,b,c +1,2,3 +4,5,6 +7,8,9 +10,11,12""" + parser = all_parsers + names = ["foo", "bar"] + result = parser.read_csv(StringIO(data), names=names, + usecols=[1, 2], header=0) + + expected = DataFrame([[2, 3], [5, 6], [8, 9], + [11, 12]], columns=names) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("names,usecols", [ + (["b", "c"], [1, 2]), + (["a", "b", "c"], ["b", "c"]) +]) +def test_usecols_relative_to_names(all_parsers, names, usecols): + data = """\ +1,2,3 +4,5,6 +7,8,9 +10,11,12""" + parser = all_parsers + result = parser.read_csv(StringIO(data), names=names, + header=None, usecols=usecols) + + expected = DataFrame([[2, 3], [5, 6], [8, 9], + [11, 12]], columns=["b", "c"]) + tm.assert_frame_equal(result, expected) + + +def test_usecols_relative_to_names2(all_parsers): + # see gh-5766 + data = """\ +1,2,3 +4,5,6 +7,8,9 +10,11,12""" + parser = all_parsers + result = parser.read_csv(StringIO(data), names=["a", "b"], + header=None, usecols=[0, 1]) + + expected = DataFrame([[1, 2], [4, 5], [7, 8], + [10, 11]], columns=["a", "b"]) + tm.assert_frame_equal(result, expected) + + +def test_usecols_name_length_conflict(all_parsers): + data = """\ +1,2,3 +4,5,6 +7,8,9 +10,11,12""" + parser = all_parsers + msg = ("Number of passed names did not " + "match number of header fields in the file" + if parser.engine == "python" else + "Passed header names mismatches usecols") + + with pytest.raises(ValueError, match=msg): + parser.read_csv(StringIO(data), names=["a", "b"], + header=None, usecols=[1]) + + +def test_usecols_single_string(all_parsers): + # see gh-20558 + parser = all_parsers + data = """foo, bar, baz +1000, 2000, 3000 +4000, 5000, 6000""" + + with pytest.raises(ValueError, match=_msg_validate_usecols_arg): + parser.read_csv(StringIO(data), usecols="foo") + + +@pytest.mark.parametrize("data", ["a,b,c,d\n1,2,3,4\n5,6,7,8", + "a,b,c,d\n1,2,3,4,\n5,6,7,8,"]) +def test_usecols_index_col_false(all_parsers, data): + # see gh-9082 + parser = all_parsers + usecols = ["a", "c", "d"] + expected = DataFrame({"a": [1, 5], "c": [3, 7], "d": [4, 8]}) + + result = parser.read_csv(StringIO(data), usecols=usecols, index_col=False) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("index_col", ["b", 0]) +@pytest.mark.parametrize("usecols", [["b", "c"], [1, 2]]) +def test_usecols_index_col_conflict(all_parsers, usecols, index_col): + # see gh-4201: test that index_col as integer reflects usecols + parser = all_parsers + data = "a,b,c,d\nA,a,1,one\nB,b,2,two" + expected = DataFrame({"c": [1, 2]}, index=Index(["a", "b"], name="b")) + + result = parser.read_csv(StringIO(data), usecols=usecols, + index_col=index_col) + tm.assert_frame_equal(result, expected) + + +def test_usecols_index_col_conflict2(all_parsers): + # see gh-4201: test that index_col as integer reflects usecols + parser = all_parsers + data = "a,b,c,d\nA,a,1,one\nB,b,2,two" + + expected = DataFrame({"b": ["a", "b"], "c": [1, 2], "d": ("one", "two")}) + expected = expected.set_index(["b", "c"]) + + result = parser.read_csv(StringIO(data), usecols=["b", "c", "d"], + index_col=["b", "c"]) + tm.assert_frame_equal(result, expected) + + +def test_usecols_implicit_index_col(all_parsers): + # see gh-2654 + parser = all_parsers + data = "a,b,c\n4,apple,bat,5.7\n8,orange,cow,10" + + result = parser.read_csv(StringIO(data), usecols=["a", "b"]) + expected = DataFrame({"a": ["apple", "orange"], + "b": ["bat", "cow"]}, index=[4, 8]) + tm.assert_frame_equal(result, expected) + + +def test_usecols_regex_sep(all_parsers): + # see gh-2733 + parser = all_parsers + data = "a b c\n4 apple bat 5.7\n8 orange cow 10" + result = parser.read_csv(StringIO(data), sep=r"\s+", usecols=("a", "b")) + + expected = DataFrame({"a": ["apple", "orange"], + "b": ["bat", "cow"]}, index=[4, 8]) + tm.assert_frame_equal(result, expected) + + +def test_usecols_with_whitespace(all_parsers): + parser = all_parsers + data = "a b c\n4 apple bat 5.7\n8 orange cow 10" + + result = parser.read_csv(StringIO(data), delim_whitespace=True, + usecols=("a", "b")) + expected = DataFrame({"a": ["apple", "orange"], + "b": ["bat", "cow"]}, index=[4, 8]) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("usecols,expected", [ + # Column selection by index. + ([0, 1], DataFrame(data=[[1000, 2000], [4000, 5000]], + columns=["2", "0"])), + + # Column selection by name. + (["0", "1"], DataFrame(data=[[2000, 3000], [5000, 6000]], + columns=["0", "1"])), +]) +def test_usecols_with_integer_like_header(all_parsers, usecols, expected): + parser = all_parsers + data = """2,0,1 +1000,2000,3000 +4000,5000,6000""" + + result = parser.read_csv(StringIO(data), usecols=usecols) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("usecols", [[0, 2, 3], [3, 0, 2]]) +def test_usecols_with_parse_dates(all_parsers, usecols): + # see gh-9755 + data = """a,b,c,d,e +0,1,20140101,0900,4 +0,1,20140102,1000,4""" + parser = all_parsers + parse_dates = [[1, 2]] + + cols = { + "a": [0, 0], + "c_d": [ + Timestamp("2014-01-01 09:00:00"), + Timestamp("2014-01-02 10:00:00") + ] + } + expected = DataFrame(cols, columns=["c_d", "a"]) + result = parser.read_csv(StringIO(data), usecols=usecols, + parse_dates=parse_dates) + tm.assert_frame_equal(result, expected) + + +def test_usecols_with_parse_dates2(all_parsers): + # see gh-13604 + parser = all_parsers + data = """2008-02-07 09:40,1032.43 +2008-02-07 09:50,1042.54 +2008-02-07 10:00,1051.65""" + + names = ["date", "values"] + usecols = names[:] + parse_dates = [0] + + index = Index([Timestamp("2008-02-07 09:40"), + Timestamp("2008-02-07 09:50"), + Timestamp("2008-02-07 10:00")], + name="date") + cols = {"values": [1032.43, 1042.54, 1051.65]} + expected = DataFrame(cols, index=index) + + result = parser.read_csv(StringIO(data), parse_dates=parse_dates, + index_col=0, usecols=usecols, + header=None, names=names) + tm.assert_frame_equal(result, expected) + + +def test_usecols_with_parse_dates3(all_parsers): + # see gh-14792 + parser = all_parsers + data = """a,b,c,d,e,f,g,h,i,j +2016/09/21,1,1,2,3,4,5,6,7,8""" + + usecols = list("abcdefghij") + parse_dates = [0] + + cols = {"a": Timestamp("2016-09-21"), + "b": [1], "c": [1], "d": [2], + "e": [3], "f": [4], "g": [5], + "h": [6], "i": [7], "j": [8]} + expected = DataFrame(cols, columns=usecols) + + result = parser.read_csv(StringIO(data), usecols=usecols, + parse_dates=parse_dates) + tm.assert_frame_equal(result, expected) + + +def test_usecols_with_parse_dates4(all_parsers): + data = "a,b,c,d,e,f,g,h,i,j\n2016/09/21,1,1,2,3,4,5,6,7,8" + usecols = list("abcdefghij") + parse_dates = [[0, 1]] + parser = all_parsers + + cols = {"a_b": "2016/09/21 1", + "c": [1], "d": [2], "e": [3], "f": [4], + "g": [5], "h": [6], "i": [7], "j": [8]} + expected = DataFrame(cols, columns=["a_b"] + list("cdefghij")) + + result = parser.read_csv(StringIO(data), usecols=usecols, + parse_dates=parse_dates) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("usecols", [[0, 2, 3], [3, 0, 2]]) +@pytest.mark.parametrize("names", [ + list("abcde"), # Names span all columns in original data. + list("acd"), # Names span only the selected columns. +]) +def test_usecols_with_parse_dates_and_names(all_parsers, usecols, names): + # see gh-9755 + s = """0,1,20140101,0900,4 +0,1,20140102,1000,4""" + parse_dates = [[1, 2]] + parser = all_parsers + + cols = { + "a": [0, 0], + "c_d": [ + Timestamp("2014-01-01 09:00:00"), + Timestamp("2014-01-02 10:00:00") + ] + } + expected = DataFrame(cols, columns=["c_d", "a"]) + + result = parser.read_csv(StringIO(s), names=names, + parse_dates=parse_dates, + usecols=usecols) + tm.assert_frame_equal(result, expected) + + +def test_usecols_with_unicode_strings(all_parsers): + # see gh-13219 + data = """AAA,BBB,CCC,DDD +0.056674973,8,True,a +2.613230982,2,False,b +3.568935038,7,False,a""" + parser = all_parsers + + exp_data = { + "AAA": { + 0: 0.056674972999999997, + 1: 2.6132309819999997, + 2: 3.5689350380000002 + }, + "BBB": {0: 8, 1: 2, 2: 7} + } + expected = DataFrame(exp_data) + + result = parser.read_csv(StringIO(data), usecols=[u"AAA", u"BBB"]) + tm.assert_frame_equal(result, expected) + + +def test_usecols_with_single_byte_unicode_strings(all_parsers): + # see gh-13219 + data = """A,B,C,D +0.056674973,8,True,a +2.613230982,2,False,b +3.568935038,7,False,a""" + parser = all_parsers + + exp_data = { + "A": { + 0: 0.056674972999999997, + 1: 2.6132309819999997, + 2: 3.5689350380000002 + }, + "B": {0: 8, 1: 2, 2: 7} + } + expected = DataFrame(exp_data) + + result = parser.read_csv(StringIO(data), usecols=[u"A", u"B"]) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("usecols", [[u"AAA", b"BBB"], [b"AAA", u"BBB"]]) +def test_usecols_with_mixed_encoding_strings(all_parsers, usecols): + data = """AAA,BBB,CCC,DDD +0.056674973,8,True,a +2.613230982,2,False,b +3.568935038,7,False,a""" + parser = all_parsers + + with pytest.raises(ValueError, match=_msg_validate_usecols_arg): + parser.read_csv(StringIO(data), usecols=usecols) + + +@pytest.mark.parametrize("usecols", [ + ["あああ", "いい"], + pytest.param([u"あああ", u"いい"], marks=pytest.mark.skipif( + PY2, reason="Buggy behavior: see gh-13253")) +]) +def test_usecols_with_multi_byte_characters(all_parsers, usecols): + data = """あああ,いい,ううう,ええええ +0.056674973,8,True,a +2.613230982,2,False,b +3.568935038,7,False,a""" + parser = all_parsers + + exp_data = { + "あああ": { + 0: 0.056674972999999997, + 1: 2.6132309819999997, + 2: 3.5689350380000002 + }, + "いい": {0: 8, 1: 2, 2: 7} + } + expected = DataFrame(exp_data) + + result = parser.read_csv(StringIO(data), usecols=usecols) + tm.assert_frame_equal(result, expected) + + +def test_empty_usecols(all_parsers): + data = "a,b,c\n1,2,3\n4,5,6" + expected = DataFrame() + parser = all_parsers + + result = parser.read_csv(StringIO(data), usecols=set()) + tm.assert_frame_equal(result, expected) + + +def test_np_array_usecols(all_parsers): + # see gh-12546 + parser = all_parsers + data = "a,b,c\n1,2,3" + usecols = np.array(["a", "b"]) + + expected = DataFrame([[1, 2]], columns=usecols) + result = parser.read_csv(StringIO(data), usecols=usecols) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("usecols,expected", [ + (lambda x: x.upper() in ["AAA", "BBB", "DDD"], + DataFrame({ + "AaA": { + 0: 0.056674972999999997, + 1: 2.6132309819999997, + 2: 3.5689350380000002 + }, + "bBb": {0: 8, 1: 2, 2: 7}, + "ddd": {0: "a", 1: "b", 2: "a"} + })), + (lambda x: False, DataFrame()), +]) +def test_callable_usecols(all_parsers, usecols, expected): + # see gh-14154 + data = """AaA,bBb,CCC,ddd +0.056674973,8,True,a +2.613230982,2,False,b +3.568935038,7,False,a""" + parser = all_parsers + + result = parser.read_csv(StringIO(data), usecols=usecols) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("usecols", [["a", "c"], lambda x: x in ["a", "c"]]) +def test_incomplete_first_row(all_parsers, usecols): + # see gh-6710 + data = "1,2\n1,2,3" + parser = all_parsers + names = ["a", "b", "c"] + expected = DataFrame({"a": [1, 1], "c": [np.nan, 3]}) + + result = parser.read_csv(StringIO(data), names=names, usecols=usecols) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("data,usecols,kwargs,expected", [ + # see gh-8985 + ("19,29,39\n" * 2 + "10,20,30,40", [0, 1, 2], + dict(header=None), DataFrame([[19, 29, 39], [19, 29, 39], [10, 20, 30]])), + + # see gh-9549 + (("A,B,C\n1,2,3\n3,4,5\n1,2,4,5,1,6\n" + "1,2,3,,,1,\n1,2,3\n5,6,7"), ["A", "B", "C"], + dict(), DataFrame({"A": [1, 3, 1, 1, 1, 5], + "B": [2, 4, 2, 2, 2, 6], + "C": [3, 5, 4, 3, 3, 7]})), +]) +def test_uneven_length_cols(all_parsers, data, usecols, kwargs, expected): + # see gh-8985 + parser = all_parsers + result = parser.read_csv(StringIO(data), usecols=usecols, **kwargs) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.parametrize("usecols,kwargs,expected,msg", [ + (["a", "b", "c", "d"], dict(), + DataFrame({"a": [1, 5], "b": [2, 6], "c": [3, 7], "d": [4, 8]}), None), + (["a", "b", "c", "f"], dict(), None, + _msg_validate_usecols_names.format(r"\['f'\]")), + (["a", "b", "f"], dict(), None, + _msg_validate_usecols_names.format(r"\['f'\]")), + (["a", "b", "f", "g"], dict(), None, + _msg_validate_usecols_names.format(r"\[('f', 'g'|'g', 'f')\]")), + + # see gh-14671 + (None, dict(header=0, names=["A", "B", "C", "D"]), + DataFrame({"A": [1, 5], "B": [2, 6], "C": [3, 7], + "D": [4, 8]}), None), + (["A", "B", "C", "f"], dict(header=0, names=["A", "B", "C", "D"]), + None, _msg_validate_usecols_names.format(r"\['f'\]")), + (["A", "B", "f"], dict(names=["A", "B", "C", "D"]), + None, _msg_validate_usecols_names.format(r"\['f'\]")), +]) +def test_raises_on_usecols_names_mismatch(all_parsers, usecols, + kwargs, expected, msg): + data = "a,b,c,d\n1,2,3,4\n5,6,7,8" + kwargs.update(usecols=usecols) + parser = all_parsers + + if expected is None: + with pytest.raises(ValueError, match=msg): + parser.read_csv(StringIO(data), **kwargs) + else: + result = parser.read_csv(StringIO(data), **kwargs) + tm.assert_frame_equal(result, expected) + + +@pytest.mark.xfail( + reason="see gh-16469: works on the C engine but not the Python engine", + strict=False) +@pytest.mark.parametrize("usecols", [["A", "C"], [0, 2]]) +def test_usecols_subset_names_mismatch_orig_columns(all_parsers, usecols): + data = "a,b,c,d\n1,2,3,4\n5,6,7,8" + names = ["A", "B", "C", "D"] + parser = all_parsers + + result = parser.read_csv(StringIO(data), header=0, + names=names, usecols=usecols) + expected = DataFrame({"A": [1, 5], "C": [3, 7]}) + tm.assert_frame_equal(result, expected) diff --git a/pandas/tests/io/parser/usecols.py b/pandas/tests/io/parser/usecols.py deleted file mode 100644 index e9bb72be124d3..0000000000000 --- a/pandas/tests/io/parser/usecols.py +++ /dev/null @@ -1,550 +0,0 @@ -# -*- coding: utf-8 -*- - -""" -Tests the usecols functionality during parsing -for all of the parsers defined in parsers.py -""" - -import numpy as np -import pytest - -from pandas._libs.tslib import Timestamp -from pandas.compat import StringIO - -from pandas import DataFrame, Index -import pandas.util.testing as tm - - -class UsecolsTests(object): - msg_validate_usecols_arg = ("'usecols' must either be list-like of all " - "strings, all unicode, all integers or a " - "callable.") - msg_validate_usecols_names = ("Usecols do not match columns, columns " - "expected but not found: {0}") - - def test_raise_on_mixed_dtype_usecols(self): - # See gh-12678 - data = """a,b,c - 1000,2000,3000 - 4000,5000,6000 - """ - - usecols = [0, 'b', 2] - - with pytest.raises(ValueError, match=self.msg_validate_usecols_arg): - self.read_csv(StringIO(data), usecols=usecols) - - def test_usecols(self): - data = """\ -a,b,c -1,2,3 -4,5,6 -7,8,9 -10,11,12""" - - result = self.read_csv(StringIO(data), usecols=(1, 2)) - result2 = self.read_csv(StringIO(data), usecols=('b', 'c')) - exp = self.read_csv(StringIO(data)) - - assert len(result.columns) == 2 - assert (result['b'] == exp['b']).all() - assert (result['c'] == exp['c']).all() - - tm.assert_frame_equal(result, result2) - - result = self.read_csv(StringIO(data), usecols=[1, 2], header=0, - names=['foo', 'bar']) - expected = self.read_csv(StringIO(data), usecols=[1, 2]) - expected.columns = ['foo', 'bar'] - tm.assert_frame_equal(result, expected) - - data = """\ -1,2,3 -4,5,6 -7,8,9 -10,11,12""" - result = self.read_csv(StringIO(data), names=['b', 'c'], - header=None, usecols=[1, 2]) - - expected = self.read_csv(StringIO(data), names=['a', 'b', 'c'], - header=None) - expected = expected[['b', 'c']] - tm.assert_frame_equal(result, expected) - - result2 = self.read_csv(StringIO(data), names=['a', 'b', 'c'], - header=None, usecols=['b', 'c']) - tm.assert_frame_equal(result2, result) - - # see gh-5766 - result = self.read_csv(StringIO(data), names=['a', 'b'], - header=None, usecols=[0, 1]) - - expected = self.read_csv(StringIO(data), names=['a', 'b', 'c'], - header=None) - expected = expected[['a', 'b']] - tm.assert_frame_equal(result, expected) - - # length conflict, passed names and usecols disagree - pytest.raises(ValueError, self.read_csv, StringIO(data), - names=['a', 'b'], usecols=[1], header=None) - - def test_usecols_single_string(self): - # GH 20558 - data = """foo, bar, baz - 1000, 2000, 3000 - 4000, 5000, 6000 - """ - - usecols = 'foo' - - with pytest.raises(ValueError, match=self.msg_validate_usecols_arg): - self.read_csv(StringIO(data), usecols=usecols) - - def test_usecols_index_col_False(self): - # see gh-9082 - s = "a,b,c,d\n1,2,3,4\n5,6,7,8" - s_malformed = "a,b,c,d\n1,2,3,4,\n5,6,7,8," - cols = ['a', 'c', 'd'] - expected = DataFrame({'a': [1, 5], 'c': [3, 7], 'd': [4, 8]}) - df = self.read_csv(StringIO(s), usecols=cols, index_col=False) - tm.assert_frame_equal(expected, df) - df = self.read_csv(StringIO(s_malformed), - usecols=cols, index_col=False) - tm.assert_frame_equal(expected, df) - - def test_usecols_index_col_conflict(self): - # see gh-4201: test that index_col as integer reflects usecols - data = 'a,b,c,d\nA,a,1,one\nB,b,2,two' - expected = DataFrame({'c': [1, 2]}, index=Index( - ['a', 'b'], name='b')) - - df = self.read_csv(StringIO(data), usecols=['b', 'c'], - index_col=0) - tm.assert_frame_equal(expected, df) - - df = self.read_csv(StringIO(data), usecols=['b', 'c'], - index_col='b') - tm.assert_frame_equal(expected, df) - - df = self.read_csv(StringIO(data), usecols=[1, 2], - index_col='b') - tm.assert_frame_equal(expected, df) - - df = self.read_csv(StringIO(data), usecols=[1, 2], - index_col=0) - tm.assert_frame_equal(expected, df) - - expected = DataFrame( - {'b': ['a', 'b'], 'c': [1, 2], 'd': ('one', 'two')}) - expected = expected.set_index(['b', 'c']) - df = self.read_csv(StringIO(data), usecols=['b', 'c', 'd'], - index_col=['b', 'c']) - tm.assert_frame_equal(expected, df) - - def test_usecols_implicit_index_col(self): - # see gh-2654 - data = 'a,b,c\n4,apple,bat,5.7\n8,orange,cow,10' - - result = self.read_csv(StringIO(data), usecols=['a', 'b']) - expected = DataFrame({'a': ['apple', 'orange'], - 'b': ['bat', 'cow']}, index=[4, 8]) - - tm.assert_frame_equal(result, expected) - - def test_usecols_regex_sep(self): - # see gh-2733 - data = 'a b c\n4 apple bat 5.7\n8 orange cow 10' - - df = self.read_csv(StringIO(data), sep=r'\s+', usecols=('a', 'b')) - - expected = DataFrame({'a': ['apple', 'orange'], - 'b': ['bat', 'cow']}, index=[4, 8]) - tm.assert_frame_equal(df, expected) - - def test_usecols_with_whitespace(self): - data = 'a b c\n4 apple bat 5.7\n8 orange cow 10' - - result = self.read_csv(StringIO(data), delim_whitespace=True, - usecols=('a', 'b')) - expected = DataFrame({'a': ['apple', 'orange'], - 'b': ['bat', 'cow']}, index=[4, 8]) - - tm.assert_frame_equal(result, expected) - - def test_usecols_with_integer_like_header(self): - data = """2,0,1 - 1000,2000,3000 - 4000,5000,6000 - """ - - usecols = [0, 1] # column selection by index - expected = DataFrame(data=[[1000, 2000], - [4000, 5000]], - columns=['2', '0']) - df = self.read_csv(StringIO(data), usecols=usecols) - tm.assert_frame_equal(df, expected) - - usecols = ['0', '1'] # column selection by name - expected = DataFrame(data=[[2000, 3000], - [5000, 6000]], - columns=['0', '1']) - df = self.read_csv(StringIO(data), usecols=usecols) - tm.assert_frame_equal(df, expected) - - def test_usecols_with_parse_dates(self): - # See gh-9755 - s = """a,b,c,d,e - 0,1,20140101,0900,4 - 0,1,20140102,1000,4""" - parse_dates = [[1, 2]] - - cols = { - 'a': [0, 0], - 'c_d': [ - Timestamp('2014-01-01 09:00:00'), - Timestamp('2014-01-02 10:00:00') - ] - } - expected = DataFrame(cols, columns=['c_d', 'a']) - - df = self.read_csv(StringIO(s), usecols=[0, 2, 3], - parse_dates=parse_dates) - tm.assert_frame_equal(df, expected) - - df = self.read_csv(StringIO(s), usecols=[3, 0, 2], - parse_dates=parse_dates) - tm.assert_frame_equal(df, expected) - - # See gh-13604 - s = """2008-02-07 09:40,1032.43 - 2008-02-07 09:50,1042.54 - 2008-02-07 10:00,1051.65 - """ - parse_dates = [0] - names = ['date', 'values'] - usecols = names[:] - - index = Index([Timestamp('2008-02-07 09:40'), - Timestamp('2008-02-07 09:50'), - Timestamp('2008-02-07 10:00')], - name='date') - cols = {'values': [1032.43, 1042.54, 1051.65]} - expected = DataFrame(cols, index=index) - - df = self.read_csv(StringIO(s), parse_dates=parse_dates, index_col=0, - usecols=usecols, header=None, names=names) - tm.assert_frame_equal(df, expected) - - # See gh-14792 - s = """a,b,c,d,e,f,g,h,i,j - 2016/09/21,1,1,2,3,4,5,6,7,8""" - parse_dates = [0] - usecols = list('abcdefghij') - cols = {'a': Timestamp('2016-09-21'), - 'b': [1], 'c': [1], 'd': [2], - 'e': [3], 'f': [4], 'g': [5], - 'h': [6], 'i': [7], 'j': [8]} - expected = DataFrame(cols, columns=usecols) - df = self.read_csv(StringIO(s), usecols=usecols, - parse_dates=parse_dates) - tm.assert_frame_equal(df, expected) - - s = """a,b,c,d,e,f,g,h,i,j\n2016/09/21,1,1,2,3,4,5,6,7,8""" - parse_dates = [[0, 1]] - usecols = list('abcdefghij') - cols = {'a_b': '2016/09/21 1', - 'c': [1], 'd': [2], 'e': [3], 'f': [4], - 'g': [5], 'h': [6], 'i': [7], 'j': [8]} - expected = DataFrame(cols, columns=['a_b'] + list('cdefghij')) - df = self.read_csv(StringIO(s), usecols=usecols, - parse_dates=parse_dates) - tm.assert_frame_equal(df, expected) - - def test_usecols_with_parse_dates_and_full_names(self): - # See gh-9755 - s = """0,1,20140101,0900,4 - 0,1,20140102,1000,4""" - parse_dates = [[1, 2]] - names = list('abcde') - - cols = { - 'a': [0, 0], - 'c_d': [ - Timestamp('2014-01-01 09:00:00'), - Timestamp('2014-01-02 10:00:00') - ] - } - expected = DataFrame(cols, columns=['c_d', 'a']) - - df = self.read_csv(StringIO(s), names=names, - usecols=[0, 2, 3], - parse_dates=parse_dates) - tm.assert_frame_equal(df, expected) - - df = self.read_csv(StringIO(s), names=names, - usecols=[3, 0, 2], - parse_dates=parse_dates) - tm.assert_frame_equal(df, expected) - - def test_usecols_with_parse_dates_and_usecol_names(self): - # See gh-9755 - s = """0,1,20140101,0900,4 - 0,1,20140102,1000,4""" - parse_dates = [[1, 2]] - names = list('acd') - - cols = { - 'a': [0, 0], - 'c_d': [ - Timestamp('2014-01-01 09:00:00'), - Timestamp('2014-01-02 10:00:00') - ] - } - expected = DataFrame(cols, columns=['c_d', 'a']) - - df = self.read_csv(StringIO(s), names=names, - usecols=[0, 2, 3], - parse_dates=parse_dates) - tm.assert_frame_equal(df, expected) - - df = self.read_csv(StringIO(s), names=names, - usecols=[3, 0, 2], - parse_dates=parse_dates) - tm.assert_frame_equal(df, expected) - - def test_usecols_with_unicode_strings(self): - # see gh-13219 - - s = '''AAA,BBB,CCC,DDD - 0.056674973,8,True,a - 2.613230982,2,False,b - 3.568935038,7,False,a - ''' - - data = { - 'AAA': { - 0: 0.056674972999999997, - 1: 2.6132309819999997, - 2: 3.5689350380000002 - }, - 'BBB': {0: 8, 1: 2, 2: 7} - } - expected = DataFrame(data) - - df = self.read_csv(StringIO(s), usecols=[u'AAA', u'BBB']) - tm.assert_frame_equal(df, expected) - - def test_usecols_with_single_byte_unicode_strings(self): - # see gh-13219 - - s = '''A,B,C,D - 0.056674973,8,True,a - 2.613230982,2,False,b - 3.568935038,7,False,a - ''' - - data = { - 'A': { - 0: 0.056674972999999997, - 1: 2.6132309819999997, - 2: 3.5689350380000002 - }, - 'B': {0: 8, 1: 2, 2: 7} - } - expected = DataFrame(data) - - df = self.read_csv(StringIO(s), usecols=[u'A', u'B']) - tm.assert_frame_equal(df, expected) - - def test_usecols_with_mixed_encoding_strings(self): - s = '''AAA,BBB,CCC,DDD - 0.056674973,8,True,a - 2.613230982,2,False,b - 3.568935038,7,False,a - ''' - - with pytest.raises(ValueError, match=self.msg_validate_usecols_arg): - self.read_csv(StringIO(s), usecols=[u'AAA', b'BBB']) - - with pytest.raises(ValueError, match=self.msg_validate_usecols_arg): - self.read_csv(StringIO(s), usecols=[b'AAA', u'BBB']) - - def test_usecols_with_multibyte_characters(self): - s = '''あああ,いい,ううう,ええええ - 0.056674973,8,True,a - 2.613230982,2,False,b - 3.568935038,7,False,a - ''' - data = { - 'あああ': { - 0: 0.056674972999999997, - 1: 2.6132309819999997, - 2: 3.5689350380000002 - }, - 'いい': {0: 8, 1: 2, 2: 7} - } - expected = DataFrame(data) - - df = self.read_csv(StringIO(s), usecols=['あああ', 'いい']) - tm.assert_frame_equal(df, expected) - - def test_usecols_with_multibyte_unicode_characters(self): - pytest.skip('TODO: see gh-13253') - - s = '''あああ,いい,ううう,ええええ - 0.056674973,8,True,a - 2.613230982,2,False,b - 3.568935038,7,False,a - ''' - data = { - 'あああ': { - 0: 0.056674972999999997, - 1: 2.6132309819999997, - 2: 3.5689350380000002 - }, - 'いい': {0: 8, 1: 2, 2: 7} - } - expected = DataFrame(data) - - df = self.read_csv(StringIO(s), usecols=[u'あああ', u'いい']) - tm.assert_frame_equal(df, expected) - - def test_empty_usecols(self): - # should not raise - data = 'a,b,c\n1,2,3\n4,5,6' - expected = DataFrame() - result = self.read_csv(StringIO(data), usecols=set()) - tm.assert_frame_equal(result, expected) - - def test_np_array_usecols(self): - # See gh-12546 - data = 'a,b,c\n1,2,3' - usecols = np.array(['a', 'b']) - - expected = DataFrame([[1, 2]], columns=usecols) - result = self.read_csv(StringIO(data), usecols=usecols) - tm.assert_frame_equal(result, expected) - - def test_callable_usecols(self): - # See gh-14154 - s = '''AaA,bBb,CCC,ddd - 0.056674973,8,True,a - 2.613230982,2,False,b - 3.568935038,7,False,a - ''' - - data = { - 'AaA': { - 0: 0.056674972999999997, - 1: 2.6132309819999997, - 2: 3.5689350380000002 - }, - 'bBb': {0: 8, 1: 2, 2: 7}, - 'ddd': {0: 'a', 1: 'b', 2: 'a'} - } - expected = DataFrame(data) - df = self.read_csv(StringIO(s), usecols=lambda x: - x.upper() in ['AAA', 'BBB', 'DDD']) - tm.assert_frame_equal(df, expected) - - # Check that a callable returning only False returns - # an empty DataFrame - expected = DataFrame() - df = self.read_csv(StringIO(s), usecols=lambda x: False) - tm.assert_frame_equal(df, expected) - - def test_incomplete_first_row(self): - # see gh-6710 - data = '1,2\n1,2,3' - names = ['a', 'b', 'c'] - expected = DataFrame({'a': [1, 1], - 'c': [np.nan, 3]}) - - usecols = ['a', 'c'] - df = self.read_csv(StringIO(data), names=names, usecols=usecols) - tm.assert_frame_equal(df, expected) - - usecols = lambda x: x in ['a', 'c'] - df = self.read_csv(StringIO(data), names=names, usecols=usecols) - tm.assert_frame_equal(df, expected) - - def test_uneven_length_cols(self): - # see gh-8985 - usecols = [0, 1, 2] - data = '19,29,39\n' * 2 + '10,20,30,40' - expected = DataFrame([[19, 29, 39], - [19, 29, 39], - [10, 20, 30]]) - df = self.read_csv(StringIO(data), header=None, usecols=usecols) - tm.assert_frame_equal(df, expected) - - # see gh-9549 - usecols = ['A', 'B', 'C'] - data = ('A,B,C\n1,2,3\n3,4,5\n1,2,4,5,1,6\n' - '1,2,3,,,1,\n1,2,3\n5,6,7') - expected = DataFrame({'A': [1, 3, 1, 1, 1, 5], - 'B': [2, 4, 2, 2, 2, 6], - 'C': [3, 5, 4, 3, 3, 7]}) - df = self.read_csv(StringIO(data), usecols=usecols) - tm.assert_frame_equal(df, expected) - - def test_raise_on_usecols_names_mismatch(self): - # GH 14671 - data = 'a,b,c,d\n1,2,3,4\n5,6,7,8' - - usecols = ['a', 'b', 'c', 'd'] - df = self.read_csv(StringIO(data), usecols=usecols) - expected = DataFrame({'a': [1, 5], 'b': [2, 6], 'c': [3, 7], - 'd': [4, 8]}) - tm.assert_frame_equal(df, expected) - - usecols = ['a', 'b', 'c', 'f'] - msg = self.msg_validate_usecols_names.format(r"\['f'\]") - - with pytest.raises(ValueError, match=msg): - self.read_csv(StringIO(data), usecols=usecols) - - usecols = ['a', 'b', 'f'] - msg = self.msg_validate_usecols_names.format(r"\['f'\]") - - with pytest.raises(ValueError, match=msg): - self.read_csv(StringIO(data), usecols=usecols) - - usecols = ['a', 'b', 'f', 'g'] - msg = self.msg_validate_usecols_names.format( - r"\[('f', 'g'|'g', 'f')\]") - with pytest.raises(ValueError, match=msg): - self.read_csv(StringIO(data), usecols=usecols) - - names = ['A', 'B', 'C', 'D'] - - df = self.read_csv(StringIO(data), header=0, names=names) - expected = DataFrame({'A': [1, 5], 'B': [2, 6], 'C': [3, 7], - 'D': [4, 8]}) - tm.assert_frame_equal(df, expected) - - # TODO: https://github.com/pandas-dev/pandas/issues/16469 - # usecols = ['A','C'] - # df = self.read_csv(StringIO(data), header=0, names=names, - # usecols=usecols) - # expected = DataFrame({'A': [1,5], 'C': [3,7]}) - # tm.assert_frame_equal(df, expected) - # - # usecols = [0,2] - # df = self.read_csv(StringIO(data), header=0, names=names, - # usecols=usecols) - # expected = DataFrame({'A': [1,5], 'C': [3,7]}) - # tm.assert_frame_equal(df, expected) - - usecols = ['A', 'B', 'C', 'f'] - msg = self.msg_validate_usecols_names.format(r"\['f'\]") - - with pytest.raises(ValueError, match=msg): - self.read_csv(StringIO(data), header=0, names=names, - usecols=usecols) - - usecols = ['A', 'B', 'f'] - msg = self.msg_validate_usecols_names.format(r"\['f'\]") - - with pytest.raises(ValueError, match=msg): - self.read_csv(StringIO(data), names=names, usecols=usecols) diff --git a/pandas/tests/io/test_excel.py b/pandas/tests/io/test_excel.py index 34fcb17127439..033d600ffc09b 100644 --- a/pandas/tests/io/test_excel.py +++ b/pandas/tests/io/test_excel.py @@ -1,31 +1,31 @@ -# pylint: disable=E1101 -import os -import warnings -from datetime import datetime, date, time, timedelta +from collections import OrderedDict +import contextlib +from datetime import date, datetime, time, timedelta from distutils.version import LooseVersion from functools import partial +import os +import warnings from warnings import catch_warnings -from collections import OrderedDict import numpy as np -import pytest from numpy import nan +import pytest -import pandas as pd -import pandas.util.testing as tm +from pandas.compat import PY36, BytesIO, iteritems, map, range, u import pandas.util._test_decorators as td + +import pandas as pd from pandas import DataFrame, Index, MultiIndex, Series -from pandas.compat import u, range, map, BytesIO, iteritems, PY36 -from pandas.core.config import set_option, get_option +from pandas.core.config import get_option, set_option +import pandas.util.testing as tm +from pandas.util.testing import ensure_clean, makeCustomDataframe as mkdf + from pandas.io.common import URLError from pandas.io.excel import ( - ExcelFile, ExcelWriter, read_excel, _XlwtWriter, _OpenpyxlWriter, - register_writer, _XlsxWriter -) + ExcelFile, ExcelWriter, _OpenpyxlWriter, _XlsxWriter, _XlwtWriter, + read_excel, register_writer) from pandas.io.formats.excel import ExcelFormatter from pandas.io.parsers import read_csv -from pandas.util.testing import ensure_clean, makeCustomDataframe as mkdf - _seriesd = tm.getSeriesData() _tsd = tm.getTimeSeriesData() @@ -36,6 +36,20 @@ _mixed_frame['foo'] = 'bar' +@contextlib.contextmanager +def ignore_xlrd_time_clock_warning(): + """ + Context manager to ignore warnings raised by the xlrd library, + regarding the deprecation of `time.clock` in Python 3.7. + """ + with warnings.catch_warnings(): + warnings.filterwarnings( + action='ignore', + message='time.clock has been deprecated', + category=DeprecationWarning) + yield + + @td.skip_if_no('xlrd', '1.0.0') class SharedItems(object): @@ -114,20 +128,23 @@ def test_usecols_int(self, ext): # usecols as int with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - df1 = self.get_exceldf("test1", ext, "Sheet1", - index_col=0, usecols=3) + with ignore_xlrd_time_clock_warning(): + df1 = self.get_exceldf("test1", ext, "Sheet1", + index_col=0, usecols=3) # usecols as int with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - df2 = self.get_exceldf("test1", ext, "Sheet2", skiprows=[1], - index_col=0, usecols=3) + with ignore_xlrd_time_clock_warning(): + df2 = self.get_exceldf("test1", ext, "Sheet2", skiprows=[1], + index_col=0, usecols=3) # parse_cols instead of usecols, usecols as int with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - df3 = self.get_exceldf("test1", ext, "Sheet2", skiprows=[1], - index_col=0, parse_cols=3) + with ignore_xlrd_time_clock_warning(): + df3 = self.get_exceldf("test1", ext, "Sheet2", skiprows=[1], + index_col=0, parse_cols=3) # TODO add index to xls file) tm.assert_frame_equal(df1, df_ref, check_names=False) @@ -145,8 +162,9 @@ def test_usecols_list(self, ext): index_col=0, usecols=[0, 2, 3]) with tm.assert_produces_warning(FutureWarning): - df3 = self.get_exceldf('test1', ext, 'Sheet2', skiprows=[1], - index_col=0, parse_cols=[0, 2, 3]) + with ignore_xlrd_time_clock_warning(): + df3 = self.get_exceldf('test1', ext, 'Sheet2', skiprows=[1], + index_col=0, parse_cols=[0, 2, 3]) # TODO add index to xls file) tm.assert_frame_equal(df1, dfref, check_names=False) @@ -165,8 +183,9 @@ def test_usecols_str(self, ext): index_col=0, usecols='A:D') with tm.assert_produces_warning(FutureWarning): - df4 = self.get_exceldf('test1', ext, 'Sheet2', skiprows=[1], - index_col=0, parse_cols='A:D') + with ignore_xlrd_time_clock_warning(): + df4 = self.get_exceldf('test1', ext, 'Sheet2', skiprows=[1], + index_col=0, parse_cols='A:D') # TODO add index to xls, read xls ignores index name ? tm.assert_frame_equal(df2, df1, check_names=False) @@ -241,10 +260,22 @@ def test_index_col_empty(self, ext): index_col=["A", "B", "C"]) expected = DataFrame(columns=["D", "E", "F"], index=MultiIndex(levels=[[]] * 3, - labels=[[]] * 3, + codes=[[]] * 3, names=["A", "B", "C"])) tm.assert_frame_equal(result, expected) + @pytest.mark.parametrize("index_col", [None, 2]) + def test_index_col_with_unnamed(self, ext, index_col): + # see gh-18792 + result = self.get_exceldf("test1", ext, "Sheet4", + index_col=index_col) + expected = DataFrame([["i1", "a", "x"], ["i2", "b", "y"]], + columns=["Unnamed: 0", "col1", "col2"]) + if index_col: + expected = expected.set_index(expected.columns[index_col]) + + tm.assert_frame_equal(result, expected) + def test_usecols_pass_non_existent_column(self, ext): msg = ("Usecols do not match columns, " "columns expected but not found: " + r"\['E'\]") @@ -618,8 +649,9 @@ def test_sheet_name_and_sheetname(self, ext): df1 = self.get_exceldf(filename, ext, sheet_name=sheet_name, index_col=0) # doc with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - df2 = self.get_exceldf(filename, ext, index_col=0, - sheetname=sheet_name) # backward compat + with ignore_xlrd_time_clock_warning(): + df2 = self.get_exceldf(filename, ext, index_col=0, + sheetname=sheet_name) # backward compat excel = self.get_excelfile(filename, ext) df1_parse = excel.parse(sheet_name=sheet_name, index_col=0) # doc @@ -691,25 +723,19 @@ def test_read_from_http_url(self, ext): local_table = self.get_exceldf('test1', ext) tm.assert_frame_equal(url_table, local_table) - @td.skip_if_no("s3fs") @td.skip_if_not_us_locale - def test_read_from_s3_url(self, ext): - moto = pytest.importorskip("moto") - boto3 = pytest.importorskip("boto3") + def test_read_from_s3_url(self, ext, s3_resource): + # Bucket "pandas-test" created in tests/io/conftest.py + file_name = os.path.join(self.dirpath, 'test1' + ext) - with moto.mock_s3(): - conn = boto3.resource("s3", region_name="us-east-1") - conn.create_bucket(Bucket="pandas-test") - file_name = os.path.join(self.dirpath, 'test1' + ext) + with open(file_name, "rb") as f: + s3_resource.Bucket("pandas-test").put_object(Key="test1" + ext, + Body=f) - with open(file_name, "rb") as f: - conn.Bucket("pandas-test").put_object(Key="test1" + ext, - Body=f) - - url = ('s3://pandas-test/test1' + ext) - url_table = read_excel(url) - local_table = self.get_exceldf('test1', ext) - tm.assert_frame_equal(url_table, local_table) + url = ('s3://pandas-test/test1' + ext) + url_table = read_excel(url) + local_table = self.get_exceldf('test1', ext) + tm.assert_frame_equal(url_table, local_table) @pytest.mark.slow # ignore warning from old xlrd @@ -903,9 +929,9 @@ def test_read_excel_multiindex_empty_level(self, ext): }) expected = DataFrame({ - ("One", u"x"): {0: 1}, - ("Two", u"X"): {0: 3}, - ("Two", u"Y"): {0: 7}, + ("One", "x"): {0: 1}, + ("Two", "X"): {0: 3}, + ("Two", "Y"): {0: 7}, ("Zero", "Unnamed: 4_level_1"): {0: 0} }) @@ -922,9 +948,9 @@ def test_read_excel_multiindex_empty_level(self, ext): expected = pd.DataFrame({ ("Beg", "Unnamed: 1_level_1"): {0: 0}, - ("Middle", u"x"): {0: 1}, - ("Tail", u"X"): {0: 3}, - ("Tail", u"Y"): {0: 7} + ("Middle", "x"): {0: 1}, + ("Tail", "X"): {0: 3}, + ("Tail", "Y"): {0: 7} }) df.to_excel(path) @@ -988,7 +1014,7 @@ def test_excel_old_index_format(self, ext): "R_l0_g2", "R_l0_g3", "R_l0_g4"], ["R1", "R_l1_g0", "R_l1_g1", "R_l1_g2", "R_l1_g3", "R_l1_g4"]], - labels=[[0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5]], + codes=[[0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5]], names=[None, None]) si = Index(["R0", "R_l0_g0", "R_l0_g1", "R_l0_g2", "R_l0_g3", "R_l0_g4"], name=None) @@ -1015,7 +1041,7 @@ def test_excel_old_index_format(self, ext): "R_l0_g3", "R_l0_g4"], ["R_l1_g0", "R_l1_g1", "R_l1_g2", "R_l1_g3", "R_l1_g4"]], - labels=[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]], + codes=[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]], names=[None, None]) si = Index(["R_l0_g0", "R_l0_g1", "R_l0_g2", "R_l0_g3", "R_l0_g4"], name=None) @@ -2357,8 +2383,7 @@ def check_called(func): pytest.param('xlwt', marks=pytest.mark.xfail(reason='xlwt does not support ' 'openpyxl-compatible ' - 'style dicts', - strict=True)), + 'style dicts')), 'xlsxwriter', 'openpyxl', ]) diff --git a/pandas/tests/io/test_feather.py b/pandas/tests/io/test_feather.py index 16b59526c8233..44d642399ced9 100644 --- a/pandas/tests/io/test_feather.py +++ b/pandas/tests/io/test_feather.py @@ -26,13 +26,16 @@ def check_error_on_write(self, df, exc): with ensure_clean() as path: to_feather(df, path) - def check_round_trip(self, df, **kwargs): + def check_round_trip(self, df, expected=None, **kwargs): + + if expected is None: + expected = df with ensure_clean() as path: to_feather(df, path) result = read_feather(path, **kwargs) - assert_frame_equal(result, df) + assert_frame_equal(result, expected) def test_error(self): @@ -74,6 +77,16 @@ def test_stringify_columns(self): df = pd.DataFrame(np.arange(12).reshape(4, 3)).copy() self.check_error_on_write(df, ValueError) + def test_read_columns(self): + # GH 24025 + df = pd.DataFrame({'col1': list('abc'), + 'col2': list(range(1, 4)), + 'col3': list('xyz'), + 'col4': list(range(4, 7))}) + columns = ['col1', 'col3'] + self.check_round_trip(df, expected=df[columns], + columns=columns) + def test_unsupported_other(self): # period @@ -87,15 +100,19 @@ def test_rw_nthreads(self): "the 'nthreads' keyword is deprecated, " "use 'use_threads' instead" ) - with tm.assert_produces_warning(FutureWarning) as w: + # TODO: make the warning work with check_stacklevel=True + with tm.assert_produces_warning( + FutureWarning, check_stacklevel=False) as w: self.check_round_trip(df, nthreads=2) - assert len(w) == 1 - assert expected_warning in str(w[0]) + # we have an extra FutureWarning because of #GH23752 + assert any(expected_warning in str(x) for x in w) - with tm.assert_produces_warning(FutureWarning) as w: + # TODO: make the warning work with check_stacklevel=True + with tm.assert_produces_warning( + FutureWarning, check_stacklevel=False) as w: self.check_round_trip(df, nthreads=1) - assert len(w) == 1 - assert expected_warning in str(w[0]) + # we have an extra FutureWarnings because of #GH23752 + assert any(expected_warning in str(x) for x in w) def test_rw_use_threads(self): df = pd.DataFrame({'A': np.arange(100000)}) diff --git a/pandas/tests/io/test_html.py b/pandas/tests/io/test_html.py index 4201f751959b5..492089644fb15 100644 --- a/pandas/tests/io/test_html.py +++ b/pandas/tests/io/test_html.py @@ -798,7 +798,7 @@ def test_header_inferred_from_rows_with_only_th(self): """)[0] columns = MultiIndex(levels=[['A', 'B'], ['a', 'b']], - labels=[[0, 1], [0, 1]]) + codes=[[0, 1], [0, 1]]) expected = DataFrame(data=[[1, 2]], columns=columns) tm.assert_frame_equal(result, expected) @@ -995,7 +995,7 @@ def test_ignore_empty_rows_when_inferring_header(self): """)[0] columns = MultiIndex(levels=[['A', 'B'], ['a', 'b']], - labels=[[0, 1], [0, 1]]) + codes=[[0, 1], [0, 1]]) expected = DataFrame(data=[[1, 2]], columns=columns) tm.assert_frame_equal(result, expected) diff --git a/pandas/tests/io/test_packers.py b/pandas/tests/io/test_packers.py index 8b7151620ee0c..4cccac83e0a35 100644 --- a/pandas/tests/io/test_packers.py +++ b/pandas/tests/io/test_packers.py @@ -940,3 +940,9 @@ def test_msgpacks_legacy(self, current_packers_data, all_packers_data, except ImportError: # blosc not installed pass + + def test_msgpack_period_freq(self): + # https://github.com/pandas-dev/pandas/issues/24135 + s = Series(np.random.rand(5), index=date_range('20130101', periods=5)) + r = read_msgpack(s.to_msgpack()) + repr(r) diff --git a/pandas/tests/io/test_parquet.py b/pandas/tests/io/test_parquet.py index 6024fccb15c76..5964c44a31f48 100644 --- a/pandas/tests/io/test_parquet.py +++ b/pandas/tests/io/test_parquet.py @@ -201,8 +201,7 @@ def test_options_get_engine(fp, pa): @pytest.mark.xfail(is_platform_windows() or is_platform_mac(), - reason="reading pa metadata failing on Windows/mac", - strict=True) + reason="reading pa metadata failing on Windows/mac") def test_cross_engine_pa_fp(df_cross_compat, pa, fp): # cross-compat with differing reading/writing engines @@ -404,7 +403,8 @@ def test_basic(self, pa, df_full): check_round_trip(df, pa) # TODO: This doesn't fail on all systems; track down which - @pytest.mark.xfail(reason="pyarrow fails on this (ARROW-1883)") + @pytest.mark.xfail(reason="pyarrow fails on this (ARROW-1883)", + strict=False) def test_basic_subset_columns(self, pa, df_full): # GH18628 @@ -422,7 +422,6 @@ def test_duplicate_columns(self, pa): columns=list('aaa')).copy() self.check_error_on_write(df, pa, ValueError) - @pytest.mark.xfail(reason="failing for pyarrow < 0.11.0") def test_unsupported(self, pa): # period df = pd.DataFrame({'a': pd.period_range('2013', freq='M', periods=3)}) diff --git a/pandas/tests/io/test_pytables.py b/pandas/tests/io/test_pytables.py index 4a68719eedc9a..1c4d00c8b3e15 100644 --- a/pandas/tests/io/test_pytables.py +++ b/pandas/tests/io/test_pytables.py @@ -146,6 +146,11 @@ def teardown_method(self, method): @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") class TestHDFStore(Base): + def test_format_kwarg_in_constructor(self): + # GH 13291 + with ensure_clean_path(self.path) as path: + pytest.raises(ValueError, HDFStore, path, format='table') + def test_context(self): path = create_tempfile(self.path) try: @@ -199,8 +204,6 @@ def roundtrip(key, obj, **kwargs): def test_long_strings(self): # GH6166 - # unconversion of long strings was being chopped in earlier - # versions of numpy < 1.7.2 df = DataFrame({'a': tm.rands_array(100, size=10)}, index=tm.rands_array(100, size=10)) @@ -1776,8 +1779,8 @@ def test_append_diff_item_order(self): def test_append_hierarchical(self): index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], names=['foo', 'bar']) df = DataFrame(np.random.randn(10, 3), index=index, columns=['A', 'B', 'C']) @@ -1910,8 +1913,8 @@ def test_select_columns_in_where(self): # in the `where` argument index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], names=['foo_name', 'bar_name']) # With a DataFrame @@ -2879,8 +2882,8 @@ def test_can_serialize_dates(self): def test_store_hierarchical(self): index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], names=['foo', 'bar']) frame = DataFrame(np.random.randn(10, 3), index=index, columns=['A', 'B', 'C']) diff --git a/pandas/tests/io/test_stata.py b/pandas/tests/io/test_stata.py index fb08af36e8325..3413b8fdf18d1 100644 --- a/pandas/tests/io/test_stata.py +++ b/pandas/tests/io/test_stata.py @@ -16,7 +16,7 @@ import pandas as pd import pandas.util.testing as tm import pandas.compat as compat -from pandas.compat import iterkeys +from pandas.compat import iterkeys, PY3, ResourceWarning from pandas.core.dtypes.common import is_categorical_dtype from pandas.core.frame import DataFrame, Series from pandas.io.parsers import read_csv @@ -1546,3 +1546,33 @@ def test_all_none_exception(self, version): output.to_stata(path, version=version) assert 'Only string-like' in excinfo.value.args[0] assert 'Column `none`' in excinfo.value.args[0] + + @pytest.mark.parametrize('version', [114, 117]) + def test_invalid_file_not_written(self, version): + content = 'Here is one __�__ Another one __·__ Another one __½__' + df = DataFrame([content], columns=['invalid']) + expected_exc = UnicodeEncodeError if PY3 else UnicodeDecodeError + with tm.ensure_clean() as path: + with pytest.raises(expected_exc): + with tm.assert_produces_warning(ResourceWarning): + df.to_stata(path) + + def test_strl_latin1(self): + # GH 23573, correct GSO data to reflect correct size + output = DataFrame([[u'pandas'] * 2, [u'þâÑÐŧ'] * 2], + columns=['var_str', 'var_strl']) + + with tm.ensure_clean() as path: + output.to_stata(path, version=117, convert_strl=['var_strl']) + with open(path, 'rb') as reread: + content = reread.read() + expected = u'þâÑÐŧ' + assert expected.encode('latin-1') in content + assert expected.encode('utf-8') in content + gsos = content.split(b'strls')[1][1:-2] + for gso in gsos.split(b'GSO')[1:]: + val = gso.split(b'\x00')[-2] + size = gso[gso.find(b'\x82') + 1] + if not PY3: + size = ord(size) + assert len(val) == size - 1 diff --git a/pandas/tests/plotting/test_datetimelike.py b/pandas/tests/plotting/test_datetimelike.py index 4865638671ea9..7a28f05514dd5 100644 --- a/pandas/tests/plotting/test_datetimelike.py +++ b/pandas/tests/plotting/test_datetimelike.py @@ -1,5 +1,5 @@ """ Test cases for time series specific (freq conversion, etc) """ - +import sys from datetime import datetime, timedelta, date, time import pickle @@ -1075,7 +1075,7 @@ def test_irreg_dtypes(self): _, ax = self.plt.subplots() _check_plot_works(df.plot, ax=ax) - @pytest.mark.xfail(not PY3, reason="failing on mpl 1.4.3 on PY2") + @pytest.mark.xfail(reason="fails with py2.7.15", strict=False) @pytest.mark.slow def test_time(self): t = datetime(1, 1, 1, 3, 30, 0) @@ -1274,7 +1274,7 @@ def test_format_date_axis(self): @pytest.mark.slow def test_ax_plot(self): - x = DatetimeIndex(start='2012-01-02', periods=10, freq='D') + x = date_range(start='2012-01-02', periods=10, freq='D') y = lrange(len(x)) _, ax = self.plt.subplots() lines = ax.plot(x, y, label='Y') @@ -1557,7 +1557,10 @@ def _check_plot_works(f, freq=None, series=None, *args, **kwargs): # GH18439 # this is supported only in Python 3 pickle since # pickle in Python2 doesn't support instancemethod pickling - if PY3: + # TODO(statsmodels 0.10.0): Remove the statsmodels check + # https://github.com/pandas-dev/pandas/issues/24088 + # https://github.com/statsmodels/statsmodels/issues/4772 + if PY3 and 'statsmodels' not in sys.modules: with ensure_clean(return_filelike=True) as path: pickle.dump(fig, path) finally: diff --git a/pandas/tests/plotting/test_frame.py b/pandas/tests/plotting/test_frame.py index 25dfbaba762c9..350d1bb153274 100644 --- a/pandas/tests/plotting/test_frame.py +++ b/pandas/tests/plotting/test_frame.py @@ -69,8 +69,7 @@ def test_plot(self): self._check_axes_shape(axes, axes_num=4, layout=(4, 1)) df = DataFrame({'x': [1, 2], 'y': [3, 4]}) - # mpl >= 1.5.2 (or slightly below) throw AttributError - with pytest.raises((TypeError, AttributeError)): + with pytest.raises(AttributeError, match='Unknown property blarg'): df.plot.line(blarg=True) df = DataFrame(np.random.rand(10, 3), @@ -489,8 +488,7 @@ def test_subplots_timeseries_y_axis(self): testdata.plot(y="text") @pytest.mark.xfail(reason='not support for period, categorical, ' - 'datetime_mixed_tz', - strict=True) + 'datetime_mixed_tz') def test_subplots_timeseries_y_axis_not_supported(self): """ This test will fail for: @@ -2558,6 +2556,7 @@ def test_errorbar_asymmetrical(self): tm.close() + # This XPASSES when tested with mpl == 3.0.1 @td.xfail_if_mpl_2_2 def test_table(self): df = DataFrame(np.random.rand(10, 3), @@ -2967,13 +2966,9 @@ def test_passed_bar_colors(self): def test_rcParams_bar_colors(self): import matplotlib as mpl color_tuples = [(0.9, 0, 0, 1), (0, 0.9, 0, 1), (0, 0, 0.9, 1)] - try: # mpl 1.5 - with mpl.rc_context( - rc={'axes.prop_cycle': mpl.cycler("color", color_tuples)}): - barplot = pd.DataFrame([[1, 2, 3]]).plot(kind="bar") - except (AttributeError, KeyError): # mpl 1.4 - with mpl.rc_context(rc={'axes.color_cycle': color_tuples}): - barplot = pd.DataFrame([[1, 2, 3]]).plot(kind="bar") + with mpl.rc_context( + rc={'axes.prop_cycle': mpl.cycler("color", color_tuples)}): + barplot = pd.DataFrame([[1, 2, 3]]).plot(kind="bar") assert color_tuples == [c.get_facecolor() for c in barplot.patches] @pytest.mark.parametrize('method', ['line', 'barh', 'bar']) @@ -2993,6 +2988,22 @@ def test_secondary_axis_font_size(self, method): self._check_ticks_props(axes=ax.right_ax, ylabelsize=fontsize) + def test_misc_bindings(self, mock): + df = pd.DataFrame(randn(10, 10), columns=list('abcdefghij')) + p1 = mock.patch('pandas.plotting._misc.scatter_matrix', + return_value=2) + p2 = mock.patch('pandas.plotting._misc.andrews_curves', + return_value=2) + p3 = mock.patch('pandas.plotting._misc.parallel_coordinates', + return_value=2) + p4 = mock.patch('pandas.plotting._misc.radviz', + return_value=2) + with p1, p2, p3, p4: + assert df.plot.scatter_matrix() == 2 + assert df.plot.andrews_curves('a') == 2 + assert df.plot.parallel_coordinates('a') == 2 + assert df.plot.radviz('a') == 2 + def _generate_4_axes_via_gridspec(): import matplotlib.pyplot as plt diff --git a/pandas/tests/plotting/test_misc.py b/pandas/tests/plotting/test_misc.py index 1f0a0d6bfee95..9ae3e7fc423f4 100644 --- a/pandas/tests/plotting/test_misc.py +++ b/pandas/tests/plotting/test_misc.py @@ -61,6 +61,7 @@ def test_bootstrap_plot(self): @td.skip_if_no_mpl class TestDataFramePlots(TestPlotBase): + # This XPASSES when tested with mpl == 3.0.1 @td.xfail_if_mpl_2_2 @td.skip_if_no_scipy def test_scatter_matrix_axis(self): diff --git a/pandas/tests/plotting/test_series.py b/pandas/tests/plotting/test_series.py index dc708278836d2..b5c69bb9e6443 100644 --- a/pandas/tests/plotting/test_series.py +++ b/pandas/tests/plotting/test_series.py @@ -767,10 +767,11 @@ def test_errorbar_plot(self): s.plot(yerr=np.arange(11)) s_err = ['zzz'] * 10 - # in mpl 1.5+ this is a TypeError - with pytest.raises((ValueError, TypeError)): + # MPL > 2.0.0 will most likely use TypeError here + with pytest.raises((TypeError, ValueError)): s.plot(yerr=s_err) + # This XPASSES when tested with mpl == 3.0.1 @td.xfail_if_mpl_2_2 def test_table(self): _check_plot_works(self.series.plot, table=True) @@ -876,3 +877,16 @@ def test_custom_business_day_freq(self): freq=CustomBusinessDay(holidays=['2014-05-26']))) _check_plot_works(s.plot) + + def test_misc_bindings(self, mock): + s = Series(randn(10)) + p1 = mock.patch('pandas.plotting._misc.lag_plot', + return_value=2) + p2 = mock.patch('pandas.plotting._misc.autocorrelation_plot', + return_value=2) + p3 = mock.patch('pandas.plotting._misc.bootstrap_plot', + return_value=2) + with p1, p2, p3: + assert s.plot.lag() == 2 + assert s.plot.autocorrelation() == 2 + assert s.plot.bootstrap() == 2 diff --git a/pandas/tests/reductions/__init__.py b/pandas/tests/reductions/__init__.py new file mode 100644 index 0000000000000..e3851753b6742 --- /dev/null +++ b/pandas/tests/reductions/__init__.py @@ -0,0 +1,4 @@ +""" +Tests for reductions where we want to test for matching behavior across +Array, Index, Series, and DataFrame methods. +""" diff --git a/pandas/tests/reductions/test_reductions.py b/pandas/tests/reductions/test_reductions.py new file mode 100644 index 0000000000000..e7f984919d80b --- /dev/null +++ b/pandas/tests/reductions/test_reductions.py @@ -0,0 +1,817 @@ +# -*- coding: utf-8 -*- +from datetime import datetime + +import numpy as np +import pytest + +import pandas as pd +from pandas import Categorical, DataFrame, Index, PeriodIndex, Series, compat +from pandas.core import nanops +import pandas.util.testing as tm + + +def get_objs(): + indexes = [ + tm.makeBoolIndex(10, name='a'), + tm.makeIntIndex(10, name='a'), + tm.makeFloatIndex(10, name='a'), + tm.makeDateIndex(10, name='a'), + tm.makeDateIndex(10, name='a').tz_localize(tz='US/Eastern'), + tm.makePeriodIndex(10, name='a'), + tm.makeStringIndex(10, name='a'), + tm.makeUnicodeIndex(10, name='a') + ] + + arr = np.random.randn(10) + series = [Series(arr, index=idx, name='a') for idx in indexes] + + objs = indexes + series + return objs + + +objs = get_objs() + + +class TestReductions(object): + + @pytest.mark.parametrize('opname', ['max', 'min']) + @pytest.mark.parametrize('obj', objs) + def test_ops(self, opname, obj): + result = getattr(obj, opname)() + if not isinstance(obj, PeriodIndex): + expected = getattr(obj.values, opname)() + else: + expected = pd.Period( + ordinal=getattr(obj._ndarray_values, opname)(), + freq=obj.freq) + try: + assert result == expected + except TypeError: + # comparing tz-aware series with np.array results in + # TypeError + expected = expected.astype('M8[ns]').astype('int64') + assert result.value == expected + + def test_nanops(self): + # GH#7261 + for opname in ['max', 'min']: + for klass in [Index, Series]: + + obj = klass([np.nan, 2.0]) + assert getattr(obj, opname)() == 2.0 + + obj = klass([np.nan]) + assert pd.isna(getattr(obj, opname)()) + + obj = klass([]) + assert pd.isna(getattr(obj, opname)()) + + obj = klass([pd.NaT, datetime(2011, 11, 1)]) + # check DatetimeIndex monotonic path + assert getattr(obj, opname)() == datetime(2011, 11, 1) + + obj = klass([pd.NaT, datetime(2011, 11, 1), pd.NaT]) + # check DatetimeIndex non-monotonic path + assert getattr(obj, opname)(), datetime(2011, 11, 1) + + # argmin/max + obj = Index(np.arange(5, dtype='int64')) + assert obj.argmin() == 0 + assert obj.argmax() == 4 + + obj = Index([np.nan, 1, np.nan, 2]) + assert obj.argmin() == 1 + assert obj.argmax() == 3 + + obj = Index([np.nan]) + assert obj.argmin() == -1 + assert obj.argmax() == -1 + + obj = Index([pd.NaT, datetime(2011, 11, 1), datetime(2011, 11, 2), + pd.NaT]) + assert obj.argmin() == 1 + assert obj.argmax() == 2 + + obj = Index([pd.NaT]) + assert obj.argmin() == -1 + assert obj.argmax() == -1 + + +class TestSeriesReductions(object): + # Note: the name TestSeriesReductions indicates these tests + # were moved from a series-specific test file, _not_ that these tests are + # intended long-term to be series-specific + + def test_sum_inf(self): + s = Series(np.random.randn(10)) + s2 = s.copy() + + s[5:8] = np.inf + s2[5:8] = np.nan + + assert np.isinf(s.sum()) + + arr = np.random.randn(100, 100).astype('f4') + arr[:, 2] = np.inf + + with pd.option_context("mode.use_inf_as_na", True): + tm.assert_almost_equal(s.sum(), s2.sum()) + + res = nanops.nansum(arr, axis=1) + assert np.isinf(res).all() + + @pytest.mark.parametrize("use_bottleneck", [True, False]) + @pytest.mark.parametrize("method, unit", [ + ("sum", 0.0), + ("prod", 1.0) + ]) + def test_empty(self, method, unit, use_bottleneck): + with pd.option_context("use_bottleneck", use_bottleneck): + # GH#9422 / GH#18921 + # Entirely empty + s = Series([]) + # NA by default + result = getattr(s, method)() + assert result == unit + + # Explicit + result = getattr(s, method)(min_count=0) + assert result == unit + + result = getattr(s, method)(min_count=1) + assert pd.isna(result) + + # Skipna, default + result = getattr(s, method)(skipna=True) + result == unit + + # Skipna, explicit + result = getattr(s, method)(skipna=True, min_count=0) + assert result == unit + + result = getattr(s, method)(skipna=True, min_count=1) + assert pd.isna(result) + + # All-NA + s = Series([np.nan]) + # NA by default + result = getattr(s, method)() + assert result == unit + + # Explicit + result = getattr(s, method)(min_count=0) + assert result == unit + + result = getattr(s, method)(min_count=1) + assert pd.isna(result) + + # Skipna, default + result = getattr(s, method)(skipna=True) + result == unit + + # skipna, explicit + result = getattr(s, method)(skipna=True, min_count=0) + assert result == unit + + result = getattr(s, method)(skipna=True, min_count=1) + assert pd.isna(result) + + # Mix of valid, empty + s = Series([np.nan, 1]) + # Default + result = getattr(s, method)() + assert result == 1.0 + + # Explicit + result = getattr(s, method)(min_count=0) + assert result == 1.0 + + result = getattr(s, method)(min_count=1) + assert result == 1.0 + + # Skipna + result = getattr(s, method)(skipna=True) + assert result == 1.0 + + result = getattr(s, method)(skipna=True, min_count=0) + assert result == 1.0 + + result = getattr(s, method)(skipna=True, min_count=1) + assert result == 1.0 + + # GH#844 (changed in GH#9422) + df = DataFrame(np.empty((10, 0))) + assert (getattr(df, method)(1) == unit).all() + + s = pd.Series([1]) + result = getattr(s, method)(min_count=2) + assert pd.isna(result) + + s = pd.Series([np.nan]) + result = getattr(s, method)(min_count=2) + assert pd.isna(result) + + s = pd.Series([np.nan, 1]) + result = getattr(s, method)(min_count=2) + assert pd.isna(result) + + @pytest.mark.parametrize('method, unit', [ + ('sum', 0.0), + ('prod', 1.0), + ]) + def test_empty_multi(self, method, unit): + s = pd.Series([1, np.nan, np.nan, np.nan], + index=pd.MultiIndex.from_product([('a', 'b'), (0, 1)])) + # 1 / 0 by default + result = getattr(s, method)(level=0) + expected = pd.Series([1, unit], index=['a', 'b']) + tm.assert_series_equal(result, expected) + + # min_count=0 + result = getattr(s, method)(level=0, min_count=0) + expected = pd.Series([1, unit], index=['a', 'b']) + tm.assert_series_equal(result, expected) + + # min_count=1 + result = getattr(s, method)(level=0, min_count=1) + expected = pd.Series([1, np.nan], index=['a', 'b']) + tm.assert_series_equal(result, expected) + + @pytest.mark.parametrize( + "method", ['mean', 'median', 'std', 'var']) + def test_ops_consistency_on_empty(self, method): + + # GH#7869 + # consistency on empty + + # float + result = getattr(Series(dtype=float), method)() + assert pd.isna(result) + + # timedelta64[ns] + result = getattr(Series(dtype='m8[ns]'), method)() + assert result is pd.NaT + + def test_nansum_buglet(self): + ser = Series([1.0, np.nan], index=[0, 1]) + result = np.nansum(ser) + tm.assert_almost_equal(result, 1) + + @pytest.mark.parametrize("use_bottleneck", [True, False]) + def test_sum_overflow(self, use_bottleneck): + + with pd.option_context('use_bottleneck', use_bottleneck): + # GH#6915 + # overflowing on the smaller int dtypes + for dtype in ['int32', 'int64']: + v = np.arange(5000000, dtype=dtype) + s = Series(v) + + result = s.sum(skipna=False) + assert int(result) == v.sum(dtype='int64') + result = s.min(skipna=False) + assert int(result) == 0 + result = s.max(skipna=False) + assert int(result) == v[-1] + + for dtype in ['float32', 'float64']: + v = np.arange(5000000, dtype=dtype) + s = Series(v) + + result = s.sum(skipna=False) + assert result == v.sum(dtype=dtype) + result = s.min(skipna=False) + assert np.allclose(float(result), 0.0) + result = s.max(skipna=False) + assert np.allclose(float(result), v[-1]) + + def test_empty_timeseries_reductions_return_nat(self): + # covers GH#11245 + for dtype in ('m8[ns]', 'm8[ns]', 'M8[ns]', 'M8[ns, UTC]'): + assert Series([], dtype=dtype).min() is pd.NaT + assert Series([], dtype=dtype).max() is pd.NaT + + def test_numpy_argmin_deprecated(self): + # See GH#16830 + data = np.arange(1, 11) + + s = Series(data, index=data) + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + # The deprecation of Series.argmin also causes a deprecation + # warning when calling np.argmin. This behavior is temporary + # until the implementation of Series.argmin is corrected. + result = np.argmin(s) + + assert result == 1 + + with tm.assert_produces_warning(FutureWarning): + # argmin is aliased to idxmin + result = s.argmin() + + assert result == 1 + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + msg = "the 'out' parameter is not supported" + with pytest.raises(ValueError, match=msg): + np.argmin(s, out=data) + + def test_numpy_argmax_deprecated(self): + # See GH#16830 + data = np.arange(1, 11) + + s = Series(data, index=data) + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + # The deprecation of Series.argmax also causes a deprecation + # warning when calling np.argmax. This behavior is temporary + # until the implementation of Series.argmax is corrected. + result = np.argmax(s) + assert result == 10 + + with tm.assert_produces_warning(FutureWarning): + # argmax is aliased to idxmax + result = s.argmax() + + assert result == 10 + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + msg = "the 'out' parameter is not supported" + with pytest.raises(ValueError, match=msg): + np.argmax(s, out=data) + + def test_idxmin(self): + # test idxmin + # _check_stat_op approach can not be used here because of isna check. + string_series = tm.makeStringSeries().rename('series') + + # add some NaNs + string_series[5:15] = np.NaN + + # skipna or no + assert string_series[string_series.idxmin()] == string_series.min() + assert pd.isna(string_series.idxmin(skipna=False)) + + # no NaNs + nona = string_series.dropna() + assert nona[nona.idxmin()] == nona.min() + assert (nona.index.values.tolist().index(nona.idxmin()) == + nona.values.argmin()) + + # all NaNs + allna = string_series * np.nan + assert pd.isna(allna.idxmin()) + + # datetime64[ns] + s = Series(pd.date_range('20130102', periods=6)) + result = s.idxmin() + assert result == 0 + + s[0] = np.nan + result = s.idxmin() + assert result == 1 + + def test_idxmax(self): + # test idxmax + # _check_stat_op approach can not be used here because of isna check. + string_series = tm.makeStringSeries().rename('series') + + # add some NaNs + string_series[5:15] = np.NaN + + # skipna or no + assert string_series[string_series.idxmax()] == string_series.max() + assert pd.isna(string_series.idxmax(skipna=False)) + + # no NaNs + nona = string_series.dropna() + assert nona[nona.idxmax()] == nona.max() + assert (nona.index.values.tolist().index(nona.idxmax()) == + nona.values.argmax()) + + # all NaNs + allna = string_series * np.nan + assert pd.isna(allna.idxmax()) + + from pandas import date_range + s = Series(date_range('20130102', periods=6)) + result = s.idxmax() + assert result == 5 + + s[5] = np.nan + result = s.idxmax() + assert result == 4 + + # Float64Index + # GH#5914 + s = pd.Series([1, 2, 3], [1.1, 2.1, 3.1]) + result = s.idxmax() + assert result == 3.1 + result = s.idxmin() + assert result == 1.1 + + s = pd.Series(s.index, s.index) + result = s.idxmax() + assert result == 3.1 + result = s.idxmin() + assert result == 1.1 + + def test_all_any(self): + ts = tm.makeTimeSeries() + bool_series = ts > 0 + assert not bool_series.all() + assert bool_series.any() + + # Alternative types, with implicit 'object' dtype. + s = Series(['abc', True]) + assert 'abc' == s.any() # 'abc' || True => 'abc' + + def test_all_any_params(self): + # Check skipna, with implicit 'object' dtype. + s1 = Series([np.nan, True]) + s2 = Series([np.nan, False]) + assert s1.all(skipna=False) # nan && True => True + assert s1.all(skipna=True) + assert np.isnan(s2.any(skipna=False)) # nan || False => nan + assert not s2.any(skipna=True) + + # Check level. + s = pd.Series([False, False, True, True, False, True], + index=[0, 0, 1, 1, 2, 2]) + tm.assert_series_equal(s.all(level=0), Series([False, True, False])) + tm.assert_series_equal(s.any(level=0), Series([False, True, True])) + + # bool_only is not implemented with level option. + with pytest.raises(NotImplementedError): + s.any(bool_only=True, level=0) + with pytest.raises(NotImplementedError): + s.all(bool_only=True, level=0) + + # bool_only is not implemented alone. + with pytest.raises(NotImplementedError): + s.any(bool_only=True,) + with pytest.raises(NotImplementedError): + s.all(bool_only=True) + + def test_timedelta64_analytics(self): + + # index min/max + dti = pd.date_range('2012-1-1', periods=3, freq='D') + td = Series(dti) - pd.Timestamp('20120101') + + result = td.idxmin() + assert result == 0 + + result = td.idxmax() + assert result == 2 + + # GH#2982 + # with NaT + td[0] = np.nan + + result = td.idxmin() + assert result == 1 + + result = td.idxmax() + assert result == 2 + + # abs + s1 = Series(pd.date_range('20120101', periods=3)) + s2 = Series(pd.date_range('20120102', periods=3)) + expected = Series(s2 - s1) + + # FIXME: don't leave commented-out code + # this fails as numpy returns timedelta64[us] + # result = np.abs(s1-s2) + # assert_frame_equal(result,expected) + + result = (s1 - s2).abs() + tm.assert_series_equal(result, expected) + + # max/min + result = td.max() + expected = pd.Timedelta('2 days') + assert result == expected + + result = td.min() + expected = pd.Timedelta('1 days') + assert result == expected + + @pytest.mark.parametrize( + "test_input,error_type", + [ + (pd.Series([]), ValueError), + + # For strings, or any Series with dtype 'O' + (pd.Series(['foo', 'bar', 'baz']), TypeError), + (pd.Series([(1,), (2,)]), TypeError), + + # For mixed data types + ( + pd.Series(['foo', 'foo', 'bar', 'bar', None, np.nan, 'baz']), + TypeError + ), + ] + ) + def test_assert_idxminmax_raises(self, test_input, error_type): + """ + Cases where ``Series.argmax`` and related should raise an exception + """ + with pytest.raises(error_type): + test_input.idxmin() + with pytest.raises(error_type): + test_input.idxmin(skipna=False) + with pytest.raises(error_type): + test_input.idxmax() + with pytest.raises(error_type): + test_input.idxmax(skipna=False) + + def test_idxminmax_with_inf(self): + # For numeric data with NA and Inf (GH #13595) + s = pd.Series([0, -np.inf, np.inf, np.nan]) + + assert s.idxmin() == 1 + assert np.isnan(s.idxmin(skipna=False)) + + assert s.idxmax() == 2 + assert np.isnan(s.idxmax(skipna=False)) + + # Using old-style behavior that treats floating point nan, -inf, and + # +inf as missing + with pd.option_context('mode.use_inf_as_na', True): + assert s.idxmin() == 0 + assert np.isnan(s.idxmin(skipna=False)) + assert s.idxmax() == 0 + np.isnan(s.idxmax(skipna=False)) + + +class TestDatetime64SeriesReductions(object): + # Note: the name TestDatetime64SeriesReductions indicates these tests + # were moved from a series-specific test file, _not_ that these tests are + # intended long-term to be series-specific + + @pytest.mark.parametrize('nat_ser', [ + Series([pd.NaT, pd.NaT]), + Series([pd.NaT, pd.Timedelta('nat')]), + Series([pd.Timedelta('nat'), pd.Timedelta('nat')])]) + def test_minmax_nat_series(self, nat_ser): + # GH#23282 + assert nat_ser.min() is pd.NaT + assert nat_ser.max() is pd.NaT + + @pytest.mark.parametrize('nat_df', [ + pd.DataFrame([pd.NaT, pd.NaT]), + pd.DataFrame([pd.NaT, pd.Timedelta('nat')]), + pd.DataFrame([pd.Timedelta('nat'), pd.Timedelta('nat')])]) + def test_minmax_nat_dataframe(self, nat_df): + # GH#23282 + assert nat_df.min()[0] is pd.NaT + assert nat_df.max()[0] is pd.NaT + + def test_min_max(self): + rng = pd.date_range('1/1/2000', '12/31/2000') + rng2 = rng.take(np.random.permutation(len(rng))) + + the_min = rng2.min() + the_max = rng2.max() + assert isinstance(the_min, pd.Timestamp) + assert isinstance(the_max, pd.Timestamp) + assert the_min == rng[0] + assert the_max == rng[-1] + + assert rng.min() == rng[0] + assert rng.max() == rng[-1] + + def test_min_max_series(self): + rng = pd.date_range('1/1/2000', periods=10, freq='4h') + lvls = ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'] + df = DataFrame({'TS': rng, 'V': np.random.randn(len(rng)), 'L': lvls}) + + result = df.TS.max() + exp = pd.Timestamp(df.TS.iat[-1]) + assert isinstance(result, pd.Timestamp) + assert result == exp + + result = df.TS.min() + exp = pd.Timestamp(df.TS.iat[0]) + assert isinstance(result, pd.Timestamp) + assert result == exp + + +class TestCategoricalSeriesReductions(object): + # Note: the name TestCategoricalSeriesReductions indicates these tests + # were moved from a series-specific test file, _not_ that these tests are + # intended long-term to be series-specific + + def test_min_max(self): + # unordered cats have no min/max + cat = Series(Categorical(["a", "b", "c", "d"], ordered=False)) + with pytest.raises(TypeError): + cat.min() + with pytest.raises(TypeError): + cat.max() + + cat = Series(Categorical(["a", "b", "c", "d"], ordered=True)) + _min = cat.min() + _max = cat.max() + assert _min == "a" + assert _max == "d" + + cat = Series(Categorical(["a", "b", "c", "d"], categories=[ + 'd', 'c', 'b', 'a'], ordered=True)) + _min = cat.min() + _max = cat.max() + assert _min == "d" + assert _max == "a" + + cat = Series(Categorical( + [np.nan, "b", "c", np.nan], categories=['d', 'c', 'b', 'a' + ], ordered=True)) + _min = cat.min() + _max = cat.max() + assert np.isnan(_min) + assert _max == "b" + + cat = Series(Categorical( + [np.nan, 1, 2, np.nan], categories=[5, 4, 3, 2, 1], ordered=True)) + _min = cat.min() + _max = cat.max() + assert np.isnan(_min) + assert _max == 1 + + +class TestSeriesMode(object): + # Note: the name TestSeriesMode indicates these tests + # were moved from a series-specific test file, _not_ that these tests are + # intended long-term to be series-specific + + @pytest.mark.parametrize('dropna, expected', [ + (True, Series([], dtype=np.float64)), + (False, Series([], dtype=np.float64)) + ]) + def test_mode_empty(self, dropna, expected): + s = Series([], dtype=np.float64) + result = s.mode(dropna) + tm.assert_series_equal(result, expected) + + @pytest.mark.parametrize('dropna, data, expected', [ + (True, [1, 1, 1, 2], [1]), + (True, [1, 1, 1, 2, 3, 3, 3], [1, 3]), + (False, [1, 1, 1, 2], [1]), + (False, [1, 1, 1, 2, 3, 3, 3], [1, 3]), + ]) + @pytest.mark.parametrize( + 'dt', + list(np.typecodes['AllInteger'] + np.typecodes['Float']) + ) + def test_mode_numerical(self, dropna, data, expected, dt): + s = Series(data, dtype=dt) + result = s.mode(dropna) + expected = Series(expected, dtype=dt) + tm.assert_series_equal(result, expected) + + @pytest.mark.parametrize('dropna, expected', [ + (True, [1.0]), + (False, [1, np.nan]), + ]) + def test_mode_numerical_nan(self, dropna, expected): + s = Series([1, 1, 2, np.nan, np.nan]) + result = s.mode(dropna) + expected = Series(expected) + tm.assert_series_equal(result, expected) + + @pytest.mark.parametrize('dropna, expected1, expected2, expected3', [ + (True, ['b'], ['bar'], ['nan']), + (False, ['b'], [np.nan], ['nan']) + ]) + def test_mode_str_obj(self, dropna, expected1, expected2, expected3): + # Test string and object types. + data = ['a'] * 2 + ['b'] * 3 + + s = Series(data, dtype='c') + result = s.mode(dropna) + expected1 = Series(expected1, dtype='c') + tm.assert_series_equal(result, expected1) + + data = ['foo', 'bar', 'bar', np.nan, np.nan, np.nan] + + s = Series(data, dtype=object) + result = s.mode(dropna) + expected2 = Series(expected2, dtype=object) + tm.assert_series_equal(result, expected2) + + data = ['foo', 'bar', 'bar', np.nan, np.nan, np.nan] + + s = Series(data, dtype=object).astype(str) + result = s.mode(dropna) + expected3 = Series(expected3, dtype=str) + tm.assert_series_equal(result, expected3) + + @pytest.mark.parametrize('dropna, expected1, expected2', [ + (True, ['foo'], ['foo']), + (False, ['foo'], [np.nan]) + ]) + def test_mode_mixeddtype(self, dropna, expected1, expected2): + s = Series([1, 'foo', 'foo']) + result = s.mode(dropna) + expected = Series(expected1) + tm.assert_series_equal(result, expected) + + s = Series([1, 'foo', 'foo', np.nan, np.nan, np.nan]) + result = s.mode(dropna) + expected = Series(expected2, dtype=object) + tm.assert_series_equal(result, expected) + + @pytest.mark.parametrize('dropna, expected1, expected2', [ + (True, ['1900-05-03', '2011-01-03', '2013-01-02'], + ['2011-01-03', '2013-01-02']), + (False, [np.nan], [np.nan, '2011-01-03', '2013-01-02']), + ]) + def test_mode_datetime(self, dropna, expected1, expected2): + s = Series(['2011-01-03', '2013-01-02', + '1900-05-03', 'nan', 'nan'], dtype='M8[ns]') + result = s.mode(dropna) + expected1 = Series(expected1, dtype='M8[ns]') + tm.assert_series_equal(result, expected1) + + s = Series(['2011-01-03', '2013-01-02', '1900-05-03', + '2011-01-03', '2013-01-02', 'nan', 'nan'], + dtype='M8[ns]') + result = s.mode(dropna) + expected2 = Series(expected2, dtype='M8[ns]') + tm.assert_series_equal(result, expected2) + + @pytest.mark.parametrize('dropna, expected1, expected2', [ + (True, ['-1 days', '0 days', '1 days'], ['2 min', '1 day']), + (False, [np.nan], [np.nan, '2 min', '1 day']), + ]) + def test_mode_timedelta(self, dropna, expected1, expected2): + # gh-5986: Test timedelta types. + + s = Series(['1 days', '-1 days', '0 days', 'nan', 'nan'], + dtype='timedelta64[ns]') + result = s.mode(dropna) + expected1 = Series(expected1, dtype='timedelta64[ns]') + tm.assert_series_equal(result, expected1) + + s = Series(['1 day', '1 day', '-1 day', '-1 day 2 min', + '2 min', '2 min', 'nan', 'nan'], + dtype='timedelta64[ns]') + result = s.mode(dropna) + expected2 = Series(expected2, dtype='timedelta64[ns]') + tm.assert_series_equal(result, expected2) + + @pytest.mark.parametrize('dropna, expected1, expected2, expected3', [ + (True, Categorical([1, 2], categories=[1, 2]), + Categorical(['a'], categories=[1, 'a']), + Categorical([3, 1], categories=[3, 2, 1], ordered=True)), + (False, Categorical([np.nan], categories=[1, 2]), + Categorical([np.nan, 'a'], categories=[1, 'a']), + Categorical([np.nan, 3, 1], categories=[3, 2, 1], ordered=True)), + ]) + def test_mode_category(self, dropna, expected1, expected2, expected3): + s = Series(Categorical([1, 2, np.nan, np.nan])) + result = s.mode(dropna) + expected1 = Series(expected1, dtype='category') + tm.assert_series_equal(result, expected1) + + s = Series(Categorical([1, 'a', 'a', np.nan, np.nan])) + result = s.mode(dropna) + expected2 = Series(expected2, dtype='category') + tm.assert_series_equal(result, expected2) + + s = Series(Categorical([1, 1, 2, 3, 3, np.nan, np.nan], + categories=[3, 2, 1], ordered=True)) + result = s.mode(dropna) + expected3 = Series(expected3, dtype='category') + tm.assert_series_equal(result, expected3) + + @pytest.mark.parametrize('dropna, expected1, expected2', [ + (True, [2**63], [1, 2**63]), + (False, [2**63], [1, 2**63]) + ]) + def test_mode_intoverflow(self, dropna, expected1, expected2): + # Test for uint64 overflow. + s = Series([1, 2**63, 2**63], dtype=np.uint64) + result = s.mode(dropna) + expected1 = Series(expected1, dtype=np.uint64) + tm.assert_series_equal(result, expected1) + + s = Series([1, 2**63], dtype=np.uint64) + result = s.mode(dropna) + expected2 = Series(expected2, dtype=np.uint64) + tm.assert_series_equal(result, expected2) + + @pytest.mark.skipif(not compat.PY3, reason="only PY3") + def test_mode_sortwarning(self): + # Check for the warning that is raised when the mode + # results cannot be sorted + + expected = Series(['foo', np.nan]) + s = Series([1, 'foo', 'foo', np.nan, np.nan]) + + with tm.assert_produces_warning(UserWarning, check_stacklevel=False): + result = s.mode(dropna=False) + result = result.sort_values().reset_index(drop=True) + + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/reductions/test_stat_reductions.py b/pandas/tests/reductions/test_stat_reductions.py new file mode 100644 index 0000000000000..1146e0793d4f5 --- /dev/null +++ b/pandas/tests/reductions/test_stat_reductions.py @@ -0,0 +1,202 @@ +# -*- coding: utf-8 -*- +""" +Tests for statistical reductions of 2nd moment or higher: var, skew, kurt, ... +""" + +import numpy as np +import pytest + +from pandas.compat import lrange +import pandas.util._test_decorators as td + +import pandas as pd +from pandas import DataFrame, Series, compat +import pandas.util.testing as tm + + +class TestSeriesStatReductions(object): + # Note: the name TestSeriesStatReductions indicates these tests + # were moved from a series-specific test file, _not_ that these tests are + # intended long-term to be series-specific + + def _check_stat_op(self, name, alternate, string_series_, + check_objects=False, check_allna=False): + + with pd.option_context('use_bottleneck', False): + f = getattr(Series, name) + + # add some NaNs + string_series_[5:15] = np.NaN + + # idxmax, idxmin, min, and max are valid for dates + if name not in ['max', 'min']: + ds = Series(pd.date_range('1/1/2001', periods=10)) + with pytest.raises(TypeError): + f(ds) + + # skipna or no + assert pd.notna(f(string_series_)) + assert pd.isna(f(string_series_, skipna=False)) + + # check the result is correct + nona = string_series_.dropna() + tm.assert_almost_equal(f(nona), alternate(nona.values)) + tm.assert_almost_equal(f(string_series_), alternate(nona.values)) + + allna = string_series_ * np.nan + + if check_allna: + assert np.isnan(f(allna)) + + # dtype=object with None, it works! + s = Series([1, 2, 3, None, 5]) + f(s) + + # GH#2888 + items = [0] + items.extend(lrange(2 ** 40, 2 ** 40 + 1000)) + s = Series(items, dtype='int64') + tm.assert_almost_equal(float(f(s)), float(alternate(s.values))) + + # check date range + if check_objects: + s = Series(pd.bdate_range('1/1/2000', periods=10)) + res = f(s) + exp = alternate(s) + assert res == exp + + # check on string data + if name not in ['sum', 'min', 'max']: + with pytest.raises(TypeError): + f(Series(list('abc'))) + + # Invalid axis. + with pytest.raises(ValueError): + f(string_series_, axis=1) + + # Unimplemented numeric_only parameter. + if 'numeric_only' in compat.signature(f).args: + with pytest.raises(NotImplementedError, match=name): + f(string_series_, numeric_only=True) + + def test_sum(self): + string_series = tm.makeStringSeries().rename('series') + self._check_stat_op('sum', np.sum, string_series, check_allna=False) + + def test_mean(self): + string_series = tm.makeStringSeries().rename('series') + self._check_stat_op('mean', np.mean, string_series) + + def test_median(self): + string_series = tm.makeStringSeries().rename('series') + self._check_stat_op('median', np.median, string_series) + + # test with integers, test failure + int_ts = Series(np.ones(10, dtype=int), index=lrange(10)) + tm.assert_almost_equal(np.median(int_ts), int_ts.median()) + + def test_prod(self): + string_series = tm.makeStringSeries().rename('series') + self._check_stat_op('prod', np.prod, string_series) + + def test_min(self): + string_series = tm.makeStringSeries().rename('series') + self._check_stat_op('min', np.min, string_series, check_objects=True) + + def test_max(self): + string_series = tm.makeStringSeries().rename('series') + self._check_stat_op('max', np.max, string_series, check_objects=True) + + def test_var_std(self): + string_series = tm.makeStringSeries().rename('series') + datetime_series = tm.makeTimeSeries().rename('ts') + + alt = lambda x: np.std(x, ddof=1) + self._check_stat_op('std', alt, string_series) + + alt = lambda x: np.var(x, ddof=1) + self._check_stat_op('var', alt, string_series) + + result = datetime_series.std(ddof=4) + expected = np.std(datetime_series.values, ddof=4) + tm.assert_almost_equal(result, expected) + + result = datetime_series.var(ddof=4) + expected = np.var(datetime_series.values, ddof=4) + tm.assert_almost_equal(result, expected) + + # 1 - element series with ddof=1 + s = datetime_series.iloc[[0]] + result = s.var(ddof=1) + assert pd.isna(result) + + result = s.std(ddof=1) + assert pd.isna(result) + + def test_sem(self): + string_series = tm.makeStringSeries().rename('series') + datetime_series = tm.makeTimeSeries().rename('ts') + + alt = lambda x: np.std(x, ddof=1) / np.sqrt(len(x)) + self._check_stat_op('sem', alt, string_series) + + result = datetime_series.sem(ddof=4) + expected = np.std(datetime_series.values, + ddof=4) / np.sqrt(len(datetime_series.values)) + tm.assert_almost_equal(result, expected) + + # 1 - element series with ddof=1 + s = datetime_series.iloc[[0]] + result = s.sem(ddof=1) + assert pd.isna(result) + + @td.skip_if_no_scipy + def test_skew(self): + from scipy.stats import skew + + string_series = tm.makeStringSeries().rename('series') + + alt = lambda x: skew(x, bias=False) + self._check_stat_op('skew', alt, string_series) + + # test corner cases, skew() returns NaN unless there's at least 3 + # values + min_N = 3 + for i in range(1, min_N + 1): + s = Series(np.ones(i)) + df = DataFrame(np.ones((i, i))) + if i < min_N: + assert np.isnan(s.skew()) + assert np.isnan(df.skew()).all() + else: + assert 0 == s.skew() + assert (df.skew() == 0).all() + + @td.skip_if_no_scipy + def test_kurt(self): + from scipy.stats import kurtosis + + string_series = tm.makeStringSeries().rename('series') + + alt = lambda x: kurtosis(x, bias=False) + self._check_stat_op('kurt', alt, string_series) + + index = pd.MultiIndex( + levels=[['bar'], ['one', 'two', 'three'], [0, 1]], + codes=[[0, 0, 0, 0, 0, 0], [0, 1, 2, 0, 1, 2], [0, 1, 0, 1, 0, 1]] + ) + s = Series(np.random.randn(6), index=index) + tm.assert_almost_equal(s.kurt(), s.kurt(level=0)['bar']) + + # test corner cases, kurt() returns NaN unless there's at least 4 + # values + min_N = 4 + for i in range(1, min_N + 1): + s = Series(np.ones(i)) + df = DataFrame(np.ones((i, i))) + if i < min_N: + assert np.isnan(s.kurt()) + assert np.isnan(df.kurt()).all() + else: + assert 0 == s.kurt() + assert (df.kurt() == 0).all() diff --git a/pandas/tests/resample/__init__.py b/pandas/tests/resample/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/pandas/tests/resample/conftest.py b/pandas/tests/resample/conftest.py new file mode 100644 index 0000000000000..b84f88da85cc0 --- /dev/null +++ b/pandas/tests/resample/conftest.py @@ -0,0 +1,110 @@ +from datetime import datetime + +import numpy as np +import pytest + +from pandas import DataFrame, Series +from pandas.core.indexes.datetimes import date_range +from pandas.core.indexes.period import period_range + +# The various methods we support +downsample_methods = ['min', 'max', 'first', 'last', 'sum', 'mean', 'sem', + 'median', 'prod', 'var', 'std', 'ohlc', 'quantile'] +upsample_methods = ['count', 'size'] +series_methods = ['nunique'] +resample_methods = downsample_methods + upsample_methods + series_methods + + +@pytest.fixture(params=downsample_methods) +def downsample_method(request): + """Fixture for parametrization of Grouper downsample methods.""" + return request.param + + +@pytest.fixture(params=upsample_methods) +def upsample_method(request): + """Fixture for parametrization of Grouper upsample methods.""" + return request.param + + +@pytest.fixture(params=resample_methods) +def resample_method(request): + """Fixture for parametrization of Grouper resample methods.""" + return request.param + + +@pytest.fixture +def simple_date_range_series(): + """ + Series with date range index and random data for test purposes. + """ + def _simple_date_range_series(start, end, freq='D'): + rng = date_range(start, end, freq=freq) + return Series(np.random.randn(len(rng)), index=rng) + return _simple_date_range_series + + +@pytest.fixture +def simple_period_range_series(): + """ + Series with period range index and random data for test purposes. + """ + def _simple_period_range_series(start, end, freq='D'): + rng = period_range(start, end, freq=freq) + return Series(np.random.randn(len(rng)), index=rng) + return _simple_period_range_series + + +@pytest.fixture +def _index_start(): + return datetime(2005, 1, 1) + + +@pytest.fixture +def _index_end(): + return datetime(2005, 1, 10) + + +@pytest.fixture +def _index_freq(): + return 'D' + + +@pytest.fixture +def index(_index_factory, _index_start, _index_end, _index_freq): + return _index_factory(_index_start, _index_end, freq=_index_freq) + + +@pytest.fixture +def _static_values(index): + return np.arange(len(index)) + + +@pytest.fixture +def series(index, _series_name, _static_values): + return Series(_static_values, index=index, name=_series_name) + + +@pytest.fixture +def empty_series(series): + return series[:0] + + +@pytest.fixture +def frame(index, _series_name, _static_values): + # _series_name is intentionally unused + return DataFrame({'value': _static_values}, index=index) + + +@pytest.fixture +def empty_frame(series): + index = series.index[:0] + return DataFrame(index=index) + + +@pytest.fixture(params=[Series, DataFrame]) +def series_and_frame(request, series, frame): + if request.param == Series: + return series + if request.param == DataFrame: + return frame diff --git a/pandas/tests/resample/test_base.py b/pandas/tests/resample/test_base.py new file mode 100644 index 0000000000000..31199dc01b659 --- /dev/null +++ b/pandas/tests/resample/test_base.py @@ -0,0 +1,222 @@ +from datetime import datetime, timedelta + +import numpy as np +import pytest + +from pandas.compat import range, zip + +import pandas as pd +from pandas import DataFrame, Series +from pandas.core.groupby.groupby import DataError +from pandas.core.indexes.datetimes import date_range +from pandas.core.indexes.period import PeriodIndex, period_range +from pandas.core.indexes.timedeltas import TimedeltaIndex, timedelta_range +from pandas.core.resample import TimeGrouper +import pandas.util.testing as tm +from pandas.util.testing import ( + assert_almost_equal, assert_frame_equal, assert_index_equal, + assert_series_equal) + +# a fixture value can be overridden by the test parameter value. Note that the +# value of the fixture can be overridden this way even if the test doesn't use +# it directly (doesn't mention it in the function prototype). +# see https://docs.pytest.org/en/latest/fixture.html#override-a-fixture-with-direct-test-parametrization # noqa +# in this module we override the fixture values defined in conftest.py +# tuples of '_index_factory,_series_name,_index_start,_index_end' +DATE_RANGE = (date_range, 'dti', datetime(2005, 1, 1), datetime(2005, 1, 10)) +PERIOD_RANGE = ( + period_range, 'pi', datetime(2005, 1, 1), datetime(2005, 1, 10)) +TIMEDELTA_RANGE = (timedelta_range, 'tdi', '1 day', '10 day') + +ALL_TIMESERIES_INDEXES = [DATE_RANGE, PERIOD_RANGE, TIMEDELTA_RANGE] + + +def pytest_generate_tests(metafunc): + # called once per each test function + if metafunc.function.__name__.endswith('_all_ts'): + metafunc.parametrize( + '_index_factory,_series_name,_index_start,_index_end', + ALL_TIMESERIES_INDEXES) + + +@pytest.fixture +def create_index(_index_factory): + def _create_index(*args, **kwargs): + """ return the _index_factory created using the args, kwargs """ + return _index_factory(*args, **kwargs) + return _create_index + + +@pytest.mark.parametrize('freq', ['2D', '1H']) +@pytest.mark.parametrize( + '_index_factory,_series_name,_index_start,_index_end', + [DATE_RANGE, TIMEDELTA_RANGE] +) +def test_asfreq(series_and_frame, freq, create_index): + obj = series_and_frame + + result = obj.resample(freq).asfreq() + new_index = create_index(obj.index[0], obj.index[-1], freq=freq) + expected = obj.reindex(new_index) + assert_almost_equal(result, expected) + + +@pytest.mark.parametrize( + '_index_factory,_series_name,_index_start,_index_end', + [DATE_RANGE, TIMEDELTA_RANGE] +) +def test_asfreq_fill_value(series, create_index): + # test for fill value during resampling, issue 3715 + + s = series + + result = s.resample('1H').asfreq() + new_index = create_index(s.index[0], s.index[-1], freq='1H') + expected = s.reindex(new_index) + assert_series_equal(result, expected) + + frame = s.to_frame('value') + frame.iloc[1] = None + result = frame.resample('1H').asfreq(fill_value=4.0) + new_index = create_index(frame.index[0], + frame.index[-1], freq='1H') + expected = frame.reindex(new_index, fill_value=4.0) + assert_frame_equal(result, expected) + + +def test_resample_interpolate_all_ts(frame): + # # 12925 + df = frame + assert_frame_equal( + df.resample('1T').asfreq().interpolate(), + df.resample('1T').interpolate()) + + +def test_raises_on_non_datetimelike_index(): + # this is a non datetimelike index + xp = DataFrame() + pytest.raises(TypeError, lambda: xp.resample('A').mean()) + + +@pytest.mark.parametrize('freq', ['M', 'D', 'H']) +def test_resample_empty_series_all_ts(freq, empty_series, resample_method): + # GH12771 & GH12868 + + if resample_method == 'ohlc': + pytest.skip('need to test for ohlc from GH13083') + + s = empty_series + result = getattr(s.resample(freq), resample_method)() + + expected = s.copy() + expected.index = s.index._shallow_copy(freq=freq) + assert_index_equal(result.index, expected.index) + assert result.index.freq == expected.index.freq + assert_series_equal(result, expected, check_dtype=False) + + +@pytest.mark.parametrize('freq', ['M', 'D', 'H']) +def test_resample_empty_dataframe_all_ts(empty_frame, freq, resample_method): + # GH13212 + df = empty_frame + # count retains dimensions too + result = getattr(df.resample(freq), resample_method)() + if resample_method != 'size': + expected = df.copy() + else: + # GH14962 + expected = Series([]) + + expected.index = df.index._shallow_copy(freq=freq) + assert_index_equal(result.index, expected.index) + assert result.index.freq == expected.index.freq + assert_almost_equal(result, expected, check_dtype=False) + + # test size for GH13212 (currently stays as df) + + +@pytest.mark.parametrize("index", tm.all_timeseries_index_generator(0)) +@pytest.mark.parametrize( + "dtype", + [np.float, np.int, np.object, 'datetime64[ns]']) +def test_resample_empty_dtypes(index, dtype, resample_method): + + # Empty series were sometimes causing a segfault (for the functions + # with Cython bounds-checking disabled) or an IndexError. We just run + # them to ensure they no longer do. (GH #10228) + empty_series = Series([], index, dtype) + try: + getattr(empty_series.resample('d'), resample_method)() + except DataError: + # Ignore these since some combinations are invalid + # (ex: doing mean with dtype of np.object) + pass + + +def test_resample_loffset_arg_type_all_ts(frame, create_index): + # GH 13218, 15002 + df = frame + expected_means = [df.values[i:i + 2].mean() + for i in range(0, len(df.values), 2)] + expected_index = create_index(df.index[0], + periods=len(df.index) / 2, + freq='2D') + + # loffset coerces PeriodIndex to DateTimeIndex + if isinstance(expected_index, PeriodIndex): + expected_index = expected_index.to_timestamp() + + expected_index += timedelta(hours=2) + expected = DataFrame({'value': expected_means}, index=expected_index) + + for arg in ['mean', {'value': 'mean'}, ['mean']]: + + result_agg = df.resample('2D', loffset='2H').agg(arg) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result_how = df.resample('2D', how=arg, loffset='2H') + + if isinstance(arg, list): + expected.columns = pd.MultiIndex.from_tuples([('value', + 'mean')]) + + # GH 13022, 7687 - TODO: fix resample w/ TimedeltaIndex + if isinstance(expected.index, TimedeltaIndex): + with pytest.raises(AssertionError): + assert_frame_equal(result_agg, expected) + assert_frame_equal(result_how, expected) + else: + assert_frame_equal(result_agg, expected) + assert_frame_equal(result_how, expected) + + +def test_apply_to_empty_series_all_ts(empty_series): + # GH 14313 + s = empty_series + for freq in ['M', 'D', 'H']: + result = s.resample(freq).apply(lambda x: 1) + expected = s.resample(freq).apply(np.sum) + + assert_series_equal(result, expected, check_dtype=False) + + +def test_resampler_is_iterable_all_ts(series): + # GH 15314 + freq = 'H' + tg = TimeGrouper(freq, convention='start') + grouped = series.groupby(tg) + resampled = series.resample(freq) + for (rk, rv), (gk, gv) in zip(resampled, grouped): + assert rk == gk + assert_series_equal(rv, gv) + + +def test_resample_quantile_all_ts(series): + # GH 15023 + s = series + q = 0.75 + freq = 'H' + result = s.resample(freq).quantile(q) + expected = s.resample(freq).agg(lambda x: x.quantile(q)) + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/resample/test_datetime_index.py b/pandas/tests/resample/test_datetime_index.py new file mode 100644 index 0000000000000..57a276fb0fe14 --- /dev/null +++ b/pandas/tests/resample/test_datetime_index.py @@ -0,0 +1,1508 @@ +from datetime import datetime, timedelta +from functools import partial +from warnings import catch_warnings, simplefilter + +import numpy as np +import pytest +import pytz + +from pandas.compat import range +from pandas.errors import UnsupportedFunctionCall + +import pandas as pd +from pandas import ( + DataFrame, Index, Panel, Series, Timedelta, Timestamp, isna, notna) +from pandas.core.indexes.datetimes import date_range +from pandas.core.indexes.period import Period, period_range +from pandas.core.indexes.timedeltas import timedelta_range +from pandas.core.resample import ( + DatetimeIndex, TimeGrouper, _get_timestamp_range_edges) +import pandas.util.testing as tm +from pandas.util.testing import ( + assert_almost_equal, assert_frame_equal, assert_series_equal) + +import pandas.tseries.offsets as offsets +from pandas.tseries.offsets import BDay, Minute + + +class TestDatetimeIndex(object): + def setup_method(self, method): + dti = date_range(start=datetime(2005, 1, 1), + end=datetime(2005, 1, 10), freq='Min') + + self.series = Series(np.random.rand(len(dti)), dti) + + def test_custom_grouper(self): + + dti = date_range(freq='Min', start=datetime(2005, 1, 1), + end=datetime(2005, 1, 10)) + + s = Series(np.array([1] * len(dti)), index=dti, dtype='int64') + + b = TimeGrouper(Minute(5)) + g = s.groupby(b) + + # check all cython functions work + funcs = ['add', 'mean', 'prod', 'ohlc', 'min', 'max', 'var'] + for f in funcs: + g._cython_agg_general(f) + + b = TimeGrouper(Minute(5), closed='right', label='right') + g = s.groupby(b) + # check all cython functions work + funcs = ['add', 'mean', 'prod', 'ohlc', 'min', 'max', 'var'] + for f in funcs: + g._cython_agg_general(f) + + assert g.ngroups == 2593 + assert notna(g.mean()).all() + + # construct expected val + arr = [1] + [5] * 2592 + idx = dti[0:-1:5] + idx = idx.append(dti[-1:]) + expect = Series(arr, index=idx) + + # GH2763 - return in put dtype if we can + result = g.agg(np.sum) + assert_series_equal(result, expect) + + df = DataFrame(np.random.rand(len(dti), 10), + index=dti, dtype='float64') + r = df.groupby(b).agg(np.sum) + + assert len(r.columns) == 10 + assert len(r.index) == 2593 + + def test_resample_basic(self): + rng = date_range('1/1/2000 00:00:00', '1/1/2000 00:13:00', freq='min', + name='index') + s = Series(np.random.randn(14), index=rng) + + result = s.resample('5min', closed='right', label='right').mean() + + exp_idx = date_range('1/1/2000', periods=4, freq='5min', name='index') + expected = Series([s[0], s[1:6].mean(), s[6:11].mean(), s[11:].mean()], + index=exp_idx) + assert_series_equal(result, expected) + assert result.index.name == 'index' + + result = s.resample('5min', closed='left', label='right').mean() + + exp_idx = date_range('1/1/2000 00:05', periods=3, freq='5min', + name='index') + expected = Series([s[:5].mean(), s[5:10].mean(), + s[10:].mean()], index=exp_idx) + assert_series_equal(result, expected) + + s = self.series + result = s.resample('5Min').last() + grouper = TimeGrouper(Minute(5), closed='left', label='left') + expect = s.groupby(grouper).agg(lambda x: x[-1]) + assert_series_equal(result, expect) + + def test_resample_string_kwargs(self): + # Test for issue #19303 + rng = date_range('1/1/2000 00:00:00', '1/1/2000 00:13:00', freq='min', + name='index') + s = Series(np.random.randn(14), index=rng) + + # Check that wrong keyword argument strings raise an error + with pytest.raises(ValueError): + s.resample('5min', label='righttt').mean() + with pytest.raises(ValueError): + s.resample('5min', closed='righttt').mean() + with pytest.raises(ValueError): + s.resample('5min', convention='starttt').mean() + + def test_resample_how(self, downsample_method): + if downsample_method == 'ohlc': + pytest.skip('covered by test_resample_how_ohlc') + + rng = date_range('1/1/2000 00:00:00', '1/1/2000 00:13:00', freq='min', + name='index') + s = Series(np.random.randn(14), index=rng) + + grouplist = np.ones_like(s) + grouplist[0] = 0 + grouplist[1:6] = 1 + grouplist[6:11] = 2 + grouplist[11:] = 3 + expected = s.groupby(grouplist).agg(downsample_method) + expected.index = date_range( + '1/1/2000', periods=4, freq='5min', name='index') + + result = getattr(s.resample( + '5min', closed='right', label='right'), downsample_method)() + + assert result.index.name == 'index' # redundant assert? + assert_series_equal(result, expected) + + def test_resample_how_ohlc(self): + rng = date_range('1/1/2000 00:00:00', '1/1/2000 00:13:00', freq='min', + name='index') + s = Series(np.random.randn(14), index=rng) + grouplist = np.ones_like(s) + grouplist[0] = 0 + grouplist[1:6] = 1 + grouplist[6:11] = 2 + grouplist[11:] = 3 + + def _ohlc(group): + if isna(group).all(): + return np.repeat(np.nan, 4) + return [group[0], group.max(), group.min(), group[-1]] + + inds = date_range('1/1/2000', periods=4, freq='5min', name='index') + expected = s.groupby(grouplist).agg(_ohlc) + expected = DataFrame(expected.values.tolist(), + index=Index(inds, name='index'), + columns=['open', 'high', 'low', 'close']) + + result = s.resample('5min', closed='right', label='right').ohlc() + + assert result.index.name == 'index' # redundant assert? + assert_frame_equal(result, expected) + + def test_numpy_compat(self): + # see gh-12811 + s = Series([1, 2, 3, 4, 5], index=date_range( + '20130101', periods=5, freq='s')) + r = s.resample('2s') + + msg = "numpy operations are not valid with resample" + + for func in ('min', 'max', 'sum', 'prod', + 'mean', 'var', 'std'): + with pytest.raises(UnsupportedFunctionCall, match=msg): + getattr(r, func)(func, 1, 2, 3) + with pytest.raises(UnsupportedFunctionCall, match=msg): + getattr(r, func)(axis=1) + + def test_resample_how_callables(self): + # GH#7929 + data = np.arange(5, dtype=np.int64) + ind = date_range(start='2014-01-01', periods=len(data), freq='d') + df = DataFrame({"A": data, "B": data}, index=ind) + + def fn(x, a=1): + return str(type(x)) + + class FnClass(object): + + def __call__(self, x): + return str(type(x)) + + df_standard = df.resample("M").apply(fn) + df_lambda = df.resample("M").apply(lambda x: str(type(x))) + df_partial = df.resample("M").apply(partial(fn)) + df_partial2 = df.resample("M").apply(partial(fn, a=2)) + df_class = df.resample("M").apply(FnClass()) + + assert_frame_equal(df_standard, df_lambda) + assert_frame_equal(df_standard, df_partial) + assert_frame_equal(df_standard, df_partial2) + assert_frame_equal(df_standard, df_class) + + def test_resample_with_timedeltas(self): + + expected = DataFrame({'A': np.arange(1480)}) + expected = expected.groupby(expected.index // 30).sum() + expected.index = pd.timedelta_range('0 days', freq='30T', periods=50) + + df = DataFrame({'A': np.arange(1480)}, index=pd.to_timedelta( + np.arange(1480), unit='T')) + result = df.resample('30T').sum() + + assert_frame_equal(result, expected) + + s = df['A'] + result = s.resample('30T').sum() + assert_series_equal(result, expected['A']) + + def test_resample_single_period_timedelta(self): + + s = Series(list(range(5)), index=pd.timedelta_range( + '1 day', freq='s', periods=5)) + result = s.resample('2s').sum() + expected = Series([1, 5, 4], index=pd.timedelta_range( + '1 day', freq='2s', periods=3)) + assert_series_equal(result, expected) + + def test_resample_timedelta_idempotency(self): + + # GH 12072 + index = pd.timedelta_range('0', periods=9, freq='10L') + series = Series(range(9), index=index) + result = series.resample('10L').mean() + expected = series + assert_series_equal(result, expected) + + def test_resample_rounding(self): + # GH 8371 + # odd results when rounding is needed + + data = """date,time,value +11-08-2014,00:00:01.093,1 +11-08-2014,00:00:02.159,1 +11-08-2014,00:00:02.667,1 +11-08-2014,00:00:03.175,1 +11-08-2014,00:00:07.058,1 +11-08-2014,00:00:07.362,1 +11-08-2014,00:00:08.324,1 +11-08-2014,00:00:08.830,1 +11-08-2014,00:00:08.982,1 +11-08-2014,00:00:09.815,1 +11-08-2014,00:00:10.540,1 +11-08-2014,00:00:11.061,1 +11-08-2014,00:00:11.617,1 +11-08-2014,00:00:13.607,1 +11-08-2014,00:00:14.535,1 +11-08-2014,00:00:15.525,1 +11-08-2014,00:00:17.960,1 +11-08-2014,00:00:20.674,1 +11-08-2014,00:00:21.191,1""" + + from pandas.compat import StringIO + df = pd.read_csv(StringIO(data), parse_dates={'timestamp': [ + 'date', 'time']}, index_col='timestamp') + df.index.name = None + result = df.resample('6s').sum() + expected = DataFrame({'value': [ + 4, 9, 4, 2 + ]}, index=date_range('2014-11-08', freq='6s', periods=4)) + assert_frame_equal(result, expected) + + result = df.resample('7s').sum() + expected = DataFrame({'value': [ + 4, 10, 4, 1 + ]}, index=date_range('2014-11-08', freq='7s', periods=4)) + assert_frame_equal(result, expected) + + result = df.resample('11s').sum() + expected = DataFrame({'value': [ + 11, 8 + ]}, index=date_range('2014-11-08', freq='11s', periods=2)) + assert_frame_equal(result, expected) + + result = df.resample('13s').sum() + expected = DataFrame({'value': [ + 13, 6 + ]}, index=date_range('2014-11-08', freq='13s', periods=2)) + assert_frame_equal(result, expected) + + result = df.resample('17s').sum() + expected = DataFrame({'value': [ + 16, 3 + ]}, index=date_range('2014-11-08', freq='17s', periods=2)) + assert_frame_equal(result, expected) + + def test_resample_basic_from_daily(self): + # from daily + dti = date_range(start=datetime(2005, 1, 1), + end=datetime(2005, 1, 10), freq='D', name='index') + + s = Series(np.random.rand(len(dti)), dti) + + # to weekly + result = s.resample('w-sun').last() + + assert len(result) == 3 + assert (result.index.dayofweek == [6, 6, 6]).all() + assert result.iloc[0] == s['1/2/2005'] + assert result.iloc[1] == s['1/9/2005'] + assert result.iloc[2] == s.iloc[-1] + + result = s.resample('W-MON').last() + assert len(result) == 2 + assert (result.index.dayofweek == [0, 0]).all() + assert result.iloc[0] == s['1/3/2005'] + assert result.iloc[1] == s['1/10/2005'] + + result = s.resample('W-TUE').last() + assert len(result) == 2 + assert (result.index.dayofweek == [1, 1]).all() + assert result.iloc[0] == s['1/4/2005'] + assert result.iloc[1] == s['1/10/2005'] + + result = s.resample('W-WED').last() + assert len(result) == 2 + assert (result.index.dayofweek == [2, 2]).all() + assert result.iloc[0] == s['1/5/2005'] + assert result.iloc[1] == s['1/10/2005'] + + result = s.resample('W-THU').last() + assert len(result) == 2 + assert (result.index.dayofweek == [3, 3]).all() + assert result.iloc[0] == s['1/6/2005'] + assert result.iloc[1] == s['1/10/2005'] + + result = s.resample('W-FRI').last() + assert len(result) == 2 + assert (result.index.dayofweek == [4, 4]).all() + assert result.iloc[0] == s['1/7/2005'] + assert result.iloc[1] == s['1/10/2005'] + + # to biz day + result = s.resample('B').last() + assert len(result) == 7 + assert (result.index.dayofweek == [4, 0, 1, 2, 3, 4, 0]).all() + + assert result.iloc[0] == s['1/2/2005'] + assert result.iloc[1] == s['1/3/2005'] + assert result.iloc[5] == s['1/9/2005'] + assert result.index.name == 'index' + + def test_resample_upsampling_picked_but_not_correct(self): + + # Test for issue #3020 + dates = date_range('01-Jan-2014', '05-Jan-2014', freq='D') + series = Series(1, index=dates) + + result = series.resample('D').mean() + assert result.index[0] == dates[0] + + # GH 5955 + # incorrect deciding to upsample when the axis frequency matches the + # resample frequency + + import datetime + s = Series(np.arange(1., 6), index=[datetime.datetime( + 1975, 1, i, 12, 0) for i in range(1, 6)]) + expected = Series(np.arange(1., 6), index=date_range( + '19750101', periods=5, freq='D')) + + result = s.resample('D').count() + assert_series_equal(result, Series(1, index=expected.index)) + + result1 = s.resample('D').sum() + result2 = s.resample('D').mean() + assert_series_equal(result1, expected) + assert_series_equal(result2, expected) + + def test_resample_frame_basic(self): + df = tm.makeTimeDataFrame() + + b = TimeGrouper('M') + g = df.groupby(b) + + # check all cython functions work + funcs = ['add', 'mean', 'prod', 'min', 'max', 'var'] + for f in funcs: + g._cython_agg_general(f) + + result = df.resample('A').mean() + assert_series_equal(result['A'], df['A'].resample('A').mean()) + + result = df.resample('M').mean() + assert_series_equal(result['A'], df['A'].resample('M').mean()) + + df.resample('M', kind='period').mean() + df.resample('W-WED', kind='period').mean() + + @pytest.mark.parametrize('loffset', [timedelta(minutes=1), + '1min', Minute(1), + np.timedelta64(1, 'm')]) + def test_resample_loffset(self, loffset): + # GH 7687 + rng = date_range('1/1/2000 00:00:00', '1/1/2000 00:13:00', freq='min') + s = Series(np.random.randn(14), index=rng) + + result = s.resample('5min', closed='right', label='right', + loffset=loffset).mean() + idx = date_range('1/1/2000', periods=4, freq='5min') + expected = Series([s[0], s[1:6].mean(), s[6:11].mean(), s[11:].mean()], + index=idx + timedelta(minutes=1)) + assert_series_equal(result, expected) + assert result.index.freq == Minute(5) + + # from daily + dti = date_range(start=datetime(2005, 1, 1), + end=datetime(2005, 1, 10), freq='D') + ser = Series(np.random.rand(len(dti)), dti) + + # to weekly + result = ser.resample('w-sun').last() + business_day_offset = BDay() + expected = ser.resample('w-sun', loffset=-business_day_offset).last() + assert result.index[0] - business_day_offset == expected.index[0] + + def test_resample_loffset_upsample(self): + # GH 20744 + rng = date_range('1/1/2000 00:00:00', '1/1/2000 00:13:00', freq='min') + s = Series(np.random.randn(14), index=rng) + + result = s.resample('5min', closed='right', label='right', + loffset=timedelta(minutes=1)).ffill() + idx = date_range('1/1/2000', periods=4, freq='5min') + expected = Series([s[0], s[5], s[10], s[-1]], + index=idx + timedelta(minutes=1)) + + assert_series_equal(result, expected) + + def test_resample_loffset_count(self): + # GH 12725 + start_time = '1/1/2000 00:00:00' + rng = date_range(start_time, periods=100, freq='S') + ts = Series(np.random.randn(len(rng)), index=rng) + + result = ts.resample('10S', loffset='1s').count() + + expected_index = ( + date_range(start_time, periods=10, freq='10S') + + timedelta(seconds=1) + ) + expected = Series(10, index=expected_index) + + assert_series_equal(result, expected) + + # Same issue should apply to .size() since it goes through + # same code path + result = ts.resample('10S', loffset='1s').size() + + assert_series_equal(result, expected) + + def test_resample_upsample(self): + # from daily + dti = date_range(start=datetime(2005, 1, 1), + end=datetime(2005, 1, 10), freq='D', name='index') + + s = Series(np.random.rand(len(dti)), dti) + + # to minutely, by padding + result = s.resample('Min').pad() + assert len(result) == 12961 + assert result[0] == s[0] + assert result[-1] == s[-1] + + assert result.index.name == 'index' + + def test_resample_how_method(self): + # GH9915 + s = Series([11, 22], + index=[Timestamp('2015-03-31 21:48:52.672000'), + Timestamp('2015-03-31 21:49:52.739000')]) + expected = Series([11, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, 22], + index=[Timestamp('2015-03-31 21:48:50'), + Timestamp('2015-03-31 21:49:00'), + Timestamp('2015-03-31 21:49:10'), + Timestamp('2015-03-31 21:49:20'), + Timestamp('2015-03-31 21:49:30'), + Timestamp('2015-03-31 21:49:40'), + Timestamp('2015-03-31 21:49:50')]) + assert_series_equal(s.resample("10S").mean(), expected) + + def test_resample_extra_index_point(self): + # GH#9756 + index = date_range(start='20150101', end='20150331', freq='BM') + expected = DataFrame({'A': Series([21, 41, 63], index=index)}) + + index = date_range(start='20150101', end='20150331', freq='B') + df = DataFrame( + {'A': Series(range(len(index)), index=index)}, dtype='int64') + result = df.resample('BM').last() + assert_frame_equal(result, expected) + + def test_upsample_with_limit(self): + rng = date_range('1/1/2000', periods=3, freq='5t') + ts = Series(np.random.randn(len(rng)), rng) + + result = ts.resample('t').ffill(limit=2) + expected = ts.reindex(result.index, method='ffill', limit=2) + assert_series_equal(result, expected) + + def test_nearest_upsample_with_limit(self): + rng = date_range('1/1/2000', periods=3, freq='5t') + ts = Series(np.random.randn(len(rng)), rng) + + result = ts.resample('t').nearest(limit=2) + expected = ts.reindex(result.index, method='nearest', limit=2) + assert_series_equal(result, expected) + + def test_resample_ohlc(self): + s = self.series + + grouper = TimeGrouper(Minute(5)) + expect = s.groupby(grouper).agg(lambda x: x[-1]) + result = s.resample('5Min').ohlc() + + assert len(result) == len(expect) + assert len(result.columns) == 4 + + xs = result.iloc[-2] + assert xs['open'] == s[-6] + assert xs['high'] == s[-6:-1].max() + assert xs['low'] == s[-6:-1].min() + assert xs['close'] == s[-2] + + xs = result.iloc[0] + assert xs['open'] == s[0] + assert xs['high'] == s[:5].max() + assert xs['low'] == s[:5].min() + assert xs['close'] == s[4] + + def test_resample_ohlc_result(self): + + # GH 12332 + index = pd.date_range('1-1-2000', '2-15-2000', freq='h') + index = index.union(pd.date_range('4-15-2000', '5-15-2000', freq='h')) + s = Series(range(len(index)), index=index) + + a = s.loc[:'4-15-2000'].resample('30T').ohlc() + assert isinstance(a, DataFrame) + + b = s.loc[:'4-14-2000'].resample('30T').ohlc() + assert isinstance(b, DataFrame) + + # GH12348 + # raising on odd period + rng = date_range('2013-12-30', '2014-01-07') + index = rng.drop([Timestamp('2014-01-01'), + Timestamp('2013-12-31'), + Timestamp('2014-01-04'), + Timestamp('2014-01-05')]) + df = DataFrame(data=np.arange(len(index)), index=index) + result = df.resample('B').mean() + expected = df.reindex(index=date_range(rng[0], rng[-1], freq='B')) + assert_frame_equal(result, expected) + + def test_resample_ohlc_dataframe(self): + df = ( + DataFrame({ + 'PRICE': { + Timestamp('2011-01-06 10:59:05', tz=None): 24990, + Timestamp('2011-01-06 12:43:33', tz=None): 25499, + Timestamp('2011-01-06 12:54:09', tz=None): 25499}, + 'VOLUME': { + Timestamp('2011-01-06 10:59:05', tz=None): 1500000000, + Timestamp('2011-01-06 12:43:33', tz=None): 5000000000, + Timestamp('2011-01-06 12:54:09', tz=None): 100000000}}) + ).reindex(['VOLUME', 'PRICE'], axis=1) + res = df.resample('H').ohlc() + exp = pd.concat([df['VOLUME'].resample('H').ohlc(), + df['PRICE'].resample('H').ohlc()], + axis=1, + keys=['VOLUME', 'PRICE']) + assert_frame_equal(exp, res) + + df.columns = [['a', 'b'], ['c', 'd']] + res = df.resample('H').ohlc() + exp.columns = pd.MultiIndex.from_tuples([ + ('a', 'c', 'open'), ('a', 'c', 'high'), ('a', 'c', 'low'), + ('a', 'c', 'close'), ('b', 'd', 'open'), ('b', 'd', 'high'), + ('b', 'd', 'low'), ('b', 'd', 'close')]) + assert_frame_equal(exp, res) + + # dupe columns fail atm + # df.columns = ['PRICE', 'PRICE'] + + def test_resample_dup_index(self): + + # GH 4812 + # dup columns with resample raising + df = DataFrame(np.random.randn(4, 12), index=[2000, 2000, 2000, 2000], + columns=[Period(year=2000, month=i + 1, freq='M') + for i in range(12)]) + df.iloc[3, :] = np.nan + result = df.resample('Q', axis=1).mean() + expected = df.groupby(lambda x: int((x.month - 1) / 3), axis=1).mean() + expected.columns = [ + Period(year=2000, quarter=i + 1, freq='Q') for i in range(4)] + assert_frame_equal(result, expected) + + def test_resample_reresample(self): + dti = date_range(start=datetime(2005, 1, 1), + end=datetime(2005, 1, 10), freq='D') + s = Series(np.random.rand(len(dti)), dti) + bs = s.resample('B', closed='right', label='right').mean() + result = bs.resample('8H').mean() + assert len(result) == 22 + assert isinstance(result.index.freq, offsets.DateOffset) + assert result.index.freq == offsets.Hour(8) + + def test_resample_timestamp_to_period(self, simple_date_range_series): + ts = simple_date_range_series('1/1/1990', '1/1/2000') + + result = ts.resample('A-DEC', kind='period').mean() + expected = ts.resample('A-DEC').mean() + expected.index = period_range('1990', '2000', freq='a-dec') + assert_series_equal(result, expected) + + result = ts.resample('A-JUN', kind='period').mean() + expected = ts.resample('A-JUN').mean() + expected.index = period_range('1990', '2000', freq='a-jun') + assert_series_equal(result, expected) + + result = ts.resample('M', kind='period').mean() + expected = ts.resample('M').mean() + expected.index = period_range('1990-01', '2000-01', freq='M') + assert_series_equal(result, expected) + + result = ts.resample('M', kind='period').mean() + expected = ts.resample('M').mean() + expected.index = period_range('1990-01', '2000-01', freq='M') + assert_series_equal(result, expected) + + def test_ohlc_5min(self): + def _ohlc(group): + if isna(group).all(): + return np.repeat(np.nan, 4) + return [group[0], group.max(), group.min(), group[-1]] + + rng = date_range('1/1/2000 00:00:00', '1/1/2000 5:59:50', freq='10s') + ts = Series(np.random.randn(len(rng)), index=rng) + + resampled = ts.resample('5min', closed='right', + label='right').ohlc() + + assert (resampled.loc['1/1/2000 00:00'] == ts[0]).all() + + exp = _ohlc(ts[1:31]) + assert (resampled.loc['1/1/2000 00:05'] == exp).all() + + exp = _ohlc(ts['1/1/2000 5:55:01':]) + assert (resampled.loc['1/1/2000 6:00:00'] == exp).all() + + def test_downsample_non_unique(self): + rng = date_range('1/1/2000', '2/29/2000') + rng2 = rng.repeat(5).values + ts = Series(np.random.randn(len(rng2)), index=rng2) + + result = ts.resample('M').mean() + + expected = ts.groupby(lambda x: x.month).mean() + assert len(result) == 2 + assert_almost_equal(result[0], expected[1]) + assert_almost_equal(result[1], expected[2]) + + def test_asfreq_non_unique(self): + # GH #1077 + rng = date_range('1/1/2000', '2/29/2000') + rng2 = rng.repeat(2).values + ts = Series(np.random.randn(len(rng2)), index=rng2) + + pytest.raises(Exception, ts.asfreq, 'B') + + def test_resample_axis1(self): + rng = date_range('1/1/2000', '2/29/2000') + df = DataFrame(np.random.randn(3, len(rng)), columns=rng, + index=['a', 'b', 'c']) + + result = df.resample('M', axis=1).mean() + expected = df.T.resample('M').mean().T + tm.assert_frame_equal(result, expected) + + def test_resample_panel(self): + rng = date_range('1/1/2000', '6/30/2000') + n = len(rng) + + with catch_warnings(record=True): + simplefilter("ignore", FutureWarning) + panel = Panel(np.random.randn(3, n, 5), + items=['one', 'two', 'three'], + major_axis=rng, + minor_axis=['a', 'b', 'c', 'd', 'e']) + + result = panel.resample('M', axis=1).mean() + + def p_apply(panel, f): + result = {} + for item in panel.items: + result[item] = f(panel[item]) + return Panel(result, items=panel.items) + + expected = p_apply(panel, lambda x: x.resample('M').mean()) + tm.assert_panel_equal(result, expected) + + panel2 = panel.swapaxes(1, 2) + result = panel2.resample('M', axis=2).mean() + expected = p_apply(panel2, + lambda x: x.resample('M', axis=1).mean()) + tm.assert_panel_equal(result, expected) + + @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") + def test_resample_panel_numpy(self): + rng = date_range('1/1/2000', '6/30/2000') + n = len(rng) + + with catch_warnings(record=True): + panel = Panel(np.random.randn(3, n, 5), + items=['one', 'two', 'three'], + major_axis=rng, + minor_axis=['a', 'b', 'c', 'd', 'e']) + + result = panel.resample('M', axis=1).apply(lambda x: x.mean(1)) + expected = panel.resample('M', axis=1).mean() + tm.assert_panel_equal(result, expected) + + panel = panel.swapaxes(1, 2) + result = panel.resample('M', axis=2).apply(lambda x: x.mean(2)) + expected = panel.resample('M', axis=2).mean() + tm.assert_panel_equal(result, expected) + + def test_resample_anchored_ticks(self): + # If a fixed delta (5 minute, 4 hour) evenly divides a day, we should + # "anchor" the origin at midnight so we get regular intervals rather + # than starting from the first timestamp which might start in the + # middle of a desired interval + + rng = date_range('1/1/2000 04:00:00', periods=86400, freq='s') + ts = Series(np.random.randn(len(rng)), index=rng) + ts[:2] = np.nan # so results are the same + + freqs = ['t', '5t', '15t', '30t', '4h', '12h'] + for freq in freqs: + result = ts[2:].resample(freq, closed='left', label='left').mean() + expected = ts.resample(freq, closed='left', label='left').mean() + assert_series_equal(result, expected) + + def test_resample_single_group(self): + mysum = lambda x: x.sum() + + rng = date_range('2000-1-1', '2000-2-10', freq='D') + ts = Series(np.random.randn(len(rng)), index=rng) + assert_series_equal(ts.resample('M').sum(), + ts.resample('M').apply(mysum)) + + rng = date_range('2000-1-1', '2000-1-10', freq='D') + ts = Series(np.random.randn(len(rng)), index=rng) + assert_series_equal(ts.resample('M').sum(), + ts.resample('M').apply(mysum)) + + # GH 3849 + s = Series([30.1, 31.6], index=[Timestamp('20070915 15:30:00'), + Timestamp('20070915 15:40:00')]) + expected = Series([0.75], index=[Timestamp('20070915')]) + result = s.resample('D').apply(lambda x: np.std(x)) + assert_series_equal(result, expected) + + def test_resample_base(self): + rng = date_range('1/1/2000 00:00:00', '1/1/2000 02:00', freq='s') + ts = Series(np.random.randn(len(rng)), index=rng) + + resampled = ts.resample('5min', base=2).mean() + exp_rng = date_range('12/31/1999 23:57:00', '1/1/2000 01:57', + freq='5min') + tm.assert_index_equal(resampled.index, exp_rng) + + def test_resample_base_with_timedeltaindex(self): + + # GH 10530 + rng = timedelta_range(start='0s', periods=25, freq='s') + ts = Series(np.random.randn(len(rng)), index=rng) + + with_base = ts.resample('2s', base=5).mean() + without_base = ts.resample('2s').mean() + + exp_without_base = timedelta_range(start='0s', end='25s', freq='2s') + exp_with_base = timedelta_range(start='5s', end='29s', freq='2s') + + tm.assert_index_equal(without_base.index, exp_without_base) + tm.assert_index_equal(with_base.index, exp_with_base) + + def test_resample_categorical_data_with_timedeltaindex(self): + # GH #12169 + df = DataFrame({'Group_obj': 'A'}, + index=pd.to_timedelta(list(range(20)), unit='s')) + df['Group'] = df['Group_obj'].astype('category') + result = df.resample('10s').agg(lambda x: (x.value_counts().index[0])) + expected = DataFrame({'Group_obj': ['A', 'A'], + 'Group': ['A', 'A']}, + index=pd.to_timedelta([0, 10], unit='s')) + expected = expected.reindex(['Group_obj', 'Group'], axis=1) + expected['Group'] = expected['Group_obj'].astype('category') + tm.assert_frame_equal(result, expected) + + def test_resample_daily_anchored(self): + rng = date_range('1/1/2000 0:00:00', periods=10000, freq='T') + ts = Series(np.random.randn(len(rng)), index=rng) + ts[:2] = np.nan # so results are the same + + result = ts[2:].resample('D', closed='left', label='left').mean() + expected = ts.resample('D', closed='left', label='left').mean() + assert_series_equal(result, expected) + + def test_resample_to_period_monthly_buglet(self): + # GH #1259 + + rng = date_range('1/1/2000', '12/31/2000') + ts = Series(np.random.randn(len(rng)), index=rng) + + result = ts.resample('M', kind='period').mean() + exp_index = period_range('Jan-2000', 'Dec-2000', freq='M') + tm.assert_index_equal(result.index, exp_index) + + def test_period_with_agg(self): + + # aggregate a period resampler with a lambda + s2 = Series(np.random.randint(0, 5, 50), + index=pd.period_range('2012-01-01', freq='H', periods=50), + dtype='float64') + + expected = s2.to_timestamp().resample('D').mean().to_period() + result = s2.resample('D').agg(lambda x: x.mean()) + assert_series_equal(result, expected) + + def test_resample_segfault(self): + # GH 8573 + # segfaulting in older versions + all_wins_and_wagers = [ + (1, datetime(2013, 10, 1, 16, 20), 1, 0), + (2, datetime(2013, 10, 1, 16, 10), 1, 0), + (2, datetime(2013, 10, 1, 18, 15), 1, 0), + (2, datetime(2013, 10, 1, 16, 10, 31), 1, 0)] + + df = DataFrame.from_records(all_wins_and_wagers, + columns=("ID", "timestamp", "A", "B") + ).set_index("timestamp") + result = df.groupby("ID").resample("5min").sum() + expected = df.groupby("ID").apply(lambda x: x.resample("5min").sum()) + assert_frame_equal(result, expected) + + def test_resample_dtype_preservation(self): + + # GH 12202 + # validation tests for dtype preservation + + df = DataFrame({'date': pd.date_range(start='2016-01-01', + periods=4, freq='W'), + 'group': [1, 1, 2, 2], + 'val': Series([5, 6, 7, 8], + dtype='int32')} + ).set_index('date') + + result = df.resample('1D').ffill() + assert result.val.dtype == np.int32 + + result = df.groupby('group').resample('1D').ffill() + assert result.val.dtype == np.int32 + + def test_resample_dtype_coerceion(self): + + pytest.importorskip('scipy.interpolate') + + # GH 16361 + df = {"a": [1, 3, 1, 4]} + df = DataFrame(df, index=pd.date_range("2017-01-01", "2017-01-04")) + + expected = (df.astype("float64") + .resample("H") + .mean() + ["a"] + .interpolate("cubic") + ) + + result = df.resample("H")["a"].mean().interpolate("cubic") + tm.assert_series_equal(result, expected) + + result = df.resample("H").mean()["a"].interpolate("cubic") + tm.assert_series_equal(result, expected) + + def test_weekly_resample_buglet(self): + # #1327 + rng = date_range('1/1/2000', freq='B', periods=20) + ts = Series(np.random.randn(len(rng)), index=rng) + + resampled = ts.resample('W').mean() + expected = ts.resample('W-SUN').mean() + assert_series_equal(resampled, expected) + + def test_monthly_resample_error(self): + # #1451 + dates = date_range('4/16/2012 20:00', periods=5000, freq='h') + ts = Series(np.random.randn(len(dates)), index=dates) + # it works! + ts.resample('M') + + def test_nanosecond_resample_error(self): + # GH 12307 - Values falls after last bin when + # Resampling using pd.tseries.offsets.Nano as period + start = 1443707890427 + exp_start = 1443707890400 + indx = pd.date_range( + start=pd.to_datetime(start), + periods=10, + freq='100n' + ) + ts = Series(range(len(indx)), index=indx) + r = ts.resample(pd.tseries.offsets.Nano(100)) + result = r.agg('mean') + + exp_indx = pd.date_range( + start=pd.to_datetime(exp_start), + periods=10, + freq='100n' + ) + exp = Series(range(len(exp_indx)), index=exp_indx) + + assert_series_equal(result, exp) + + def test_resample_anchored_intraday(self, simple_date_range_series): + # #1471, #1458 + + rng = date_range('1/1/2012', '4/1/2012', freq='100min') + df = DataFrame(rng.month, index=rng) + + result = df.resample('M').mean() + expected = df.resample( + 'M', kind='period').mean().to_timestamp(how='end') + expected.index += Timedelta(1, 'ns') - Timedelta(1, 'D') + tm.assert_frame_equal(result, expected) + + result = df.resample('M', closed='left').mean() + exp = df.tshift(1, freq='D').resample('M', kind='period').mean() + exp = exp.to_timestamp(how='end') + + exp.index = exp.index + Timedelta(1, 'ns') - Timedelta(1, 'D') + tm.assert_frame_equal(result, exp) + + rng = date_range('1/1/2012', '4/1/2012', freq='100min') + df = DataFrame(rng.month, index=rng) + + result = df.resample('Q').mean() + expected = df.resample( + 'Q', kind='period').mean().to_timestamp(how='end') + expected.index += Timedelta(1, 'ns') - Timedelta(1, 'D') + tm.assert_frame_equal(result, expected) + + result = df.resample('Q', closed='left').mean() + expected = df.tshift(1, freq='D').resample('Q', kind='period', + closed='left').mean() + expected = expected.to_timestamp(how='end') + expected.index += Timedelta(1, 'ns') - Timedelta(1, 'D') + tm.assert_frame_equal(result, expected) + + ts = simple_date_range_series('2012-04-29 23:00', '2012-04-30 5:00', + freq='h') + resampled = ts.resample('M').mean() + assert len(resampled) == 1 + + def test_resample_anchored_monthstart(self, simple_date_range_series): + ts = simple_date_range_series('1/1/2000', '12/31/2002') + + freqs = ['MS', 'BMS', 'QS-MAR', 'AS-DEC', 'AS-JUN'] + + for freq in freqs: + ts.resample(freq).mean() + + def test_resample_anchored_multiday(self): + # When resampling a range spanning multiple days, ensure that the + # start date gets used to determine the offset. Fixes issue where + # a one day period is not a multiple of the frequency. + # + # See: https://github.com/pandas-dev/pandas/issues/8683 + + index = pd.date_range( + '2014-10-14 23:06:23.206', periods=3, freq='400L' + ) | pd.date_range( + '2014-10-15 23:00:00', periods=2, freq='2200L') + + s = Series(np.random.randn(5), index=index) + + # Ensure left closing works + result = s.resample('2200L').mean() + assert result.index[-1] == Timestamp('2014-10-15 23:00:02.000') + + # Ensure right closing works + result = s.resample('2200L', label='right').mean() + assert result.index[-1] == Timestamp('2014-10-15 23:00:04.200') + + def test_corner_cases(self, simple_period_range_series, + simple_date_range_series): + # miscellaneous test coverage + + rng = date_range('1/1/2000', periods=12, freq='t') + ts = Series(np.random.randn(len(rng)), index=rng) + + result = ts.resample('5t', closed='right', label='left').mean() + ex_index = date_range('1999-12-31 23:55', periods=4, freq='5t') + tm.assert_index_equal(result.index, ex_index) + + len0pts = simple_period_range_series( + '2007-01', '2010-05', freq='M')[:0] + # it works + result = len0pts.resample('A-DEC').mean() + assert len(result) == 0 + + # resample to periods + ts = simple_date_range_series( + '2000-04-28', '2000-04-30 11:00', freq='h') + result = ts.resample('M', kind='period').mean() + assert len(result) == 1 + assert result.index[0] == Period('2000-04', freq='M') + + def test_anchored_lowercase_buglet(self): + dates = date_range('4/16/2012 20:00', periods=50000, freq='s') + ts = Series(np.random.randn(len(dates)), index=dates) + # it works! + ts.resample('d').mean() + + def test_upsample_apply_functions(self): + # #1596 + rng = pd.date_range('2012-06-12', periods=4, freq='h') + + ts = Series(np.random.randn(len(rng)), index=rng) + + result = ts.resample('20min').aggregate(['mean', 'sum']) + assert isinstance(result, DataFrame) + + def test_resample_not_monotonic(self): + rng = pd.date_range('2012-06-12', periods=200, freq='h') + ts = Series(np.random.randn(len(rng)), index=rng) + + ts = ts.take(np.random.permutation(len(ts))) + + result = ts.resample('D').sum() + exp = ts.sort_index().resample('D').sum() + assert_series_equal(result, exp) + + def test_resample_median_bug_1688(self): + + for dtype in ['int64', 'int32', 'float64', 'float32']: + df = DataFrame([1, 2], index=[datetime(2012, 1, 1, 0, 0, 0), + datetime(2012, 1, 1, 0, 5, 0)], + dtype=dtype) + + result = df.resample("T").apply(lambda x: x.mean()) + exp = df.asfreq('T') + tm.assert_frame_equal(result, exp) + + result = df.resample("T").median() + exp = df.asfreq('T') + tm.assert_frame_equal(result, exp) + + def test_how_lambda_functions(self, simple_date_range_series): + + ts = simple_date_range_series('1/1/2000', '4/1/2000') + + result = ts.resample('M').apply(lambda x: x.mean()) + exp = ts.resample('M').mean() + tm.assert_series_equal(result, exp) + + foo_exp = ts.resample('M').mean() + foo_exp.name = 'foo' + bar_exp = ts.resample('M').std() + bar_exp.name = 'bar' + + result = ts.resample('M').apply( + [lambda x: x.mean(), lambda x: x.std(ddof=1)]) + result.columns = ['foo', 'bar'] + tm.assert_series_equal(result['foo'], foo_exp) + tm.assert_series_equal(result['bar'], bar_exp) + + # this is a MI Series, so comparing the names of the results + # doesn't make sense + result = ts.resample('M').aggregate({'foo': lambda x: x.mean(), + 'bar': lambda x: x.std(ddof=1)}) + tm.assert_series_equal(result['foo'], foo_exp, check_names=False) + tm.assert_series_equal(result['bar'], bar_exp, check_names=False) + + def test_resample_unequal_times(self): + # #1772 + start = datetime(1999, 3, 1, 5) + # end hour is less than start + end = datetime(2012, 7, 31, 4) + bad_ind = date_range(start, end, freq="30min") + df = DataFrame({'close': 1}, index=bad_ind) + + # it works! + df.resample('AS').sum() + + def test_resample_consistency(self): + + # GH 6418 + # resample with bfill / limit / reindex consistency + + i30 = pd.date_range('2002-02-02', periods=4, freq='30T') + s = Series(np.arange(4.), index=i30) + s[2] = np.NaN + + # Upsample by factor 3 with reindex() and resample() methods: + i10 = pd.date_range(i30[0], i30[-1], freq='10T') + + s10 = s.reindex(index=i10, method='bfill') + s10_2 = s.reindex(index=i10, method='bfill', limit=2) + rl = s.reindex_like(s10, method='bfill', limit=2) + r10_2 = s.resample('10Min').bfill(limit=2) + r10 = s.resample('10Min').bfill() + + # s10_2, r10, r10_2, rl should all be equal + assert_series_equal(s10_2, r10) + assert_series_equal(s10_2, r10_2) + assert_series_equal(s10_2, rl) + + def test_resample_timegrouper(self): + # GH 7227 + dates1 = [datetime(2014, 10, 1), datetime(2014, 9, 3), + datetime(2014, 11, 5), datetime(2014, 9, 5), + datetime(2014, 10, 8), datetime(2014, 7, 15)] + + dates2 = dates1[:2] + [pd.NaT] + dates1[2:4] + [pd.NaT] + dates1[4:] + dates3 = [pd.NaT] + dates1 + [pd.NaT] + + for dates in [dates1, dates2, dates3]: + df = DataFrame(dict(A=dates, B=np.arange(len(dates)))) + result = df.set_index('A').resample('M').count() + exp_idx = pd.DatetimeIndex(['2014-07-31', '2014-08-31', + '2014-09-30', + '2014-10-31', '2014-11-30'], + freq='M', name='A') + expected = DataFrame({'B': [1, 0, 2, 2, 1]}, index=exp_idx) + assert_frame_equal(result, expected) + + result = df.groupby(pd.Grouper(freq='M', key='A')).count() + assert_frame_equal(result, expected) + + df = DataFrame(dict(A=dates, B=np.arange(len(dates)), C=np.arange( + len(dates)))) + result = df.set_index('A').resample('M').count() + expected = DataFrame({'B': [1, 0, 2, 2, 1], 'C': [1, 0, 2, 2, 1]}, + index=exp_idx, columns=['B', 'C']) + assert_frame_equal(result, expected) + + result = df.groupby(pd.Grouper(freq='M', key='A')).count() + assert_frame_equal(result, expected) + + def test_resample_nunique(self): + + # GH 12352 + df = DataFrame({ + 'ID': {Timestamp('2015-06-05 00:00:00'): '0010100903', + Timestamp('2015-06-08 00:00:00'): '0010150847'}, + 'DATE': {Timestamp('2015-06-05 00:00:00'): '2015-06-05', + Timestamp('2015-06-08 00:00:00'): '2015-06-08'}}) + r = df.resample('D') + g = df.groupby(pd.Grouper(freq='D')) + expected = df.groupby(pd.Grouper(freq='D')).ID.apply(lambda x: + x.nunique()) + assert expected.name == 'ID' + + for t in [r, g]: + result = r.ID.nunique() + assert_series_equal(result, expected) + + result = df.ID.resample('D').nunique() + assert_series_equal(result, expected) + + result = df.ID.groupby(pd.Grouper(freq='D')).nunique() + assert_series_equal(result, expected) + + def test_resample_nunique_with_date_gap(self): + # GH 13453 + index = pd.date_range('1-1-2000', '2-15-2000', freq='h') + index2 = pd.date_range('4-15-2000', '5-15-2000', freq='h') + index3 = index.append(index2) + s = Series(range(len(index3)), index=index3, dtype='int64') + r = s.resample('M') + + # Since all elements are unique, these should all be the same + results = [ + r.count(), + r.nunique(), + r.agg(Series.nunique), + r.agg('nunique') + ] + + assert_series_equal(results[0], results[1]) + assert_series_equal(results[0], results[2]) + assert_series_equal(results[0], results[3]) + + @pytest.mark.parametrize('n', [10000, 100000]) + @pytest.mark.parametrize('k', [10, 100, 1000]) + def test_resample_group_info(self, n, k): + # GH10914 + dr = date_range(start='2015-08-27', periods=n // 10, freq='T') + ts = Series(np.random.randint(0, n // k, n).astype('int64'), + index=np.random.choice(dr, n)) + + left = ts.resample('30T').nunique() + ix = date_range(start=ts.index.min(), end=ts.index.max(), + freq='30T') + + vals = ts.values + bins = np.searchsorted(ix.values, ts.index, side='right') + + sorter = np.lexsort((vals, bins)) + vals, bins = vals[sorter], bins[sorter] + + mask = np.r_[True, vals[1:] != vals[:-1]] + mask |= np.r_[True, bins[1:] != bins[:-1]] + + arr = np.bincount(bins[mask] - 1, + minlength=len(ix)).astype('int64', copy=False) + right = Series(arr, index=ix) + + assert_series_equal(left, right) + + def test_resample_size(self): + n = 10000 + dr = date_range('2015-09-19', periods=n, freq='T') + ts = Series(np.random.randn(n), index=np.random.choice(dr, n)) + + left = ts.resample('7T').size() + ix = date_range(start=left.index.min(), end=ts.index.max(), freq='7T') + + bins = np.searchsorted(ix.values, ts.index.values, side='right') + val = np.bincount(bins, minlength=len(ix) + 1)[1:].astype('int64', + copy=False) + + right = Series(val, index=ix) + assert_series_equal(left, right) + + def test_resample_across_dst(self): + # The test resamples a DatetimeIndex with values before and after a + # DST change + # Issue: 14682 + + # The DatetimeIndex we will start with + # (note that DST happens at 03:00+02:00 -> 02:00+01:00) + # 2016-10-30 02:23:00+02:00, 2016-10-30 02:23:00+01:00 + df1 = DataFrame([1477786980, 1477790580], columns=['ts']) + dti1 = DatetimeIndex(pd.to_datetime(df1.ts, unit='s') + .dt.tz_localize('UTC') + .dt.tz_convert('Europe/Madrid')) + + # The expected DatetimeIndex after resampling. + # 2016-10-30 02:00:00+02:00, 2016-10-30 02:00:00+01:00 + df2 = DataFrame([1477785600, 1477789200], columns=['ts']) + dti2 = DatetimeIndex(pd.to_datetime(df2.ts, unit='s') + .dt.tz_localize('UTC') + .dt.tz_convert('Europe/Madrid')) + df = DataFrame([5, 5], index=dti1) + + result = df.resample(rule='H').sum() + expected = DataFrame([5, 5], index=dti2) + + assert_frame_equal(result, expected) + + def test_resample_dst_anchor(self): + # 5172 + dti = DatetimeIndex([datetime(2012, 11, 4, 23)], tz='US/Eastern') + df = DataFrame([5], index=dti) + assert_frame_equal(df.resample(rule='D').sum(), + DataFrame([5], index=df.index.normalize())) + df.resample(rule='MS').sum() + assert_frame_equal( + df.resample(rule='MS').sum(), + DataFrame([5], index=DatetimeIndex([datetime(2012, 11, 1)], + tz='US/Eastern'))) + + dti = date_range('2013-09-30', '2013-11-02', freq='30Min', + tz='Europe/Paris') + values = range(dti.size) + df = DataFrame({"a": values, + "b": values, + "c": values}, index=dti, dtype='int64') + how = {"a": "min", "b": "max", "c": "count"} + + assert_frame_equal( + df.resample("W-MON").agg(how)[["a", "b", "c"]], + DataFrame({"a": [0, 48, 384, 720, 1056, 1394], + "b": [47, 383, 719, 1055, 1393, 1586], + "c": [48, 336, 336, 336, 338, 193]}, + index=date_range('9/30/2013', '11/4/2013', + freq='W-MON', tz='Europe/Paris')), + 'W-MON Frequency') + + assert_frame_equal( + df.resample("2W-MON").agg(how)[["a", "b", "c"]], + DataFrame({"a": [0, 48, 720, 1394], + "b": [47, 719, 1393, 1586], + "c": [48, 672, 674, 193]}, + index=date_range('9/30/2013', '11/11/2013', + freq='2W-MON', tz='Europe/Paris')), + '2W-MON Frequency') + + assert_frame_equal( + df.resample("MS").agg(how)[["a", "b", "c"]], + DataFrame({"a": [0, 48, 1538], + "b": [47, 1537, 1586], + "c": [48, 1490, 49]}, + index=date_range('9/1/2013', '11/1/2013', + freq='MS', tz='Europe/Paris')), + 'MS Frequency') + + assert_frame_equal( + df.resample("2MS").agg(how)[["a", "b", "c"]], + DataFrame({"a": [0, 1538], + "b": [1537, 1586], + "c": [1538, 49]}, + index=date_range('9/1/2013', '11/1/2013', + freq='2MS', tz='Europe/Paris')), + '2MS Frequency') + + df_daily = df['10/26/2013':'10/29/2013'] + assert_frame_equal( + df_daily.resample("D").agg({"a": "min", "b": "max", "c": "count"}) + [["a", "b", "c"]], + DataFrame({"a": [1248, 1296, 1346, 1394], + "b": [1295, 1345, 1393, 1441], + "c": [48, 50, 48, 48]}, + index=date_range('10/26/2013', '10/29/2013', + freq='D', tz='Europe/Paris')), + 'D Frequency') + + def test_downsample_across_dst(self): + # GH 8531 + tz = pytz.timezone('Europe/Berlin') + dt = datetime(2014, 10, 26) + dates = date_range(tz.localize(dt), periods=4, freq='2H') + result = Series(5, index=dates).resample('H').mean() + expected = Series([5., np.nan] * 3 + [5.], + index=date_range(tz.localize(dt), periods=7, + freq='H')) + tm.assert_series_equal(result, expected) + + def test_downsample_across_dst_weekly(self): + # GH 9119, GH 21459 + df = DataFrame(index=DatetimeIndex([ + '2017-03-25', '2017-03-26', '2017-03-27', + '2017-03-28', '2017-03-29' + ], tz='Europe/Amsterdam'), + data=[11, 12, 13, 14, 15]) + result = df.resample('1W').sum() + expected = DataFrame([23, 42], index=pd.DatetimeIndex([ + '2017-03-26', '2017-04-02' + ], tz='Europe/Amsterdam')) + tm.assert_frame_equal(result, expected) + + idx = pd.date_range("2013-04-01", "2013-05-01", tz='Europe/London', + freq='H') + s = Series(index=idx) + result = s.resample('W').mean() + expected = Series(index=pd.date_range( + '2013-04-07', freq='W', periods=5, tz='Europe/London' + )) + tm.assert_series_equal(result, expected) + + def test_resample_with_nat(self): + # GH 13020 + index = DatetimeIndex([pd.NaT, + '1970-01-01 00:00:00', + pd.NaT, + '1970-01-01 00:00:01', + '1970-01-01 00:00:02']) + frame = DataFrame([2, 3, 5, 7, 11], index=index) + + index_1s = DatetimeIndex(['1970-01-01 00:00:00', + '1970-01-01 00:00:01', + '1970-01-01 00:00:02']) + frame_1s = DataFrame([3, 7, 11], index=index_1s) + assert_frame_equal(frame.resample('1s').mean(), frame_1s) + + index_2s = DatetimeIndex(['1970-01-01 00:00:00', + '1970-01-01 00:00:02']) + frame_2s = DataFrame([5, 11], index=index_2s) + assert_frame_equal(frame.resample('2s').mean(), frame_2s) + + index_3s = DatetimeIndex(['1970-01-01 00:00:00']) + frame_3s = DataFrame([7], index=index_3s) + assert_frame_equal(frame.resample('3s').mean(), frame_3s) + + assert_frame_equal(frame.resample('60s').mean(), frame_3s) + + def test_resample_timedelta_values(self): + # GH 13119 + # check that timedelta dtype is preserved when NaT values are + # introduced by the resampling + + times = timedelta_range('1 day', '4 day', freq='4D') + df = DataFrame({'time': times}, index=times) + + times2 = timedelta_range('1 day', '4 day', freq='2D') + exp = Series(times2, index=times2, name='time') + exp.iloc[1] = pd.NaT + + res = df.resample('2D').first()['time'] + tm.assert_series_equal(res, exp) + res = df['time'].resample('2D').first() + tm.assert_series_equal(res, exp) + + def test_resample_datetime_values(self): + # GH 13119 + # check that datetime dtype is preserved when NaT values are + # introduced by the resampling + + dates = [datetime(2016, 1, 15), datetime(2016, 1, 19)] + df = DataFrame({'timestamp': dates}, index=dates) + + exp = Series([datetime(2016, 1, 15), pd.NaT, datetime(2016, 1, 19)], + index=date_range('2016-01-15', periods=3, freq='2D'), + name='timestamp') + + res = df.resample('2D').first()['timestamp'] + tm.assert_series_equal(res, exp) + res = df['timestamp'].resample('2D').first() + tm.assert_series_equal(res, exp) + + def test_resample_apply_with_additional_args(self): + # GH 14615 + def f(data, add_arg): + return np.mean(data) * add_arg + + multiplier = 10 + result = self.series.resample('D').apply(f, multiplier) + expected = self.series.resample('D').mean().multiply(multiplier) + tm.assert_series_equal(result, expected) + + # Testing as kwarg + result = self.series.resample('D').apply(f, add_arg=multiplier) + expected = self.series.resample('D').mean().multiply(multiplier) + tm.assert_series_equal(result, expected) + + # Testing dataframe + df = pd.DataFrame({"A": 1, "B": 2}, + index=pd.date_range('2017', periods=10)) + result = df.groupby("A").resample("D").agg(f, multiplier) + expected = df.groupby("A").resample('D').mean().multiply(multiplier) + assert_frame_equal(result, expected) + + @pytest.mark.parametrize('k', [1, 2, 3]) + @pytest.mark.parametrize('n1, freq1, n2, freq2', [ + (30, 'S', 0.5, 'Min'), + (60, 'S', 1, 'Min'), + (3600, 'S', 1, 'H'), + (60, 'Min', 1, 'H'), + (21600, 'S', 0.25, 'D'), + (86400, 'S', 1, 'D'), + (43200, 'S', 0.5, 'D'), + (1440, 'Min', 1, 'D'), + (12, 'H', 0.5, 'D'), + (24, 'H', 1, 'D'), + ]) + def test_resample_equivalent_offsets(self, n1, freq1, n2, freq2, k): + # GH 24127 + n1_ = n1 * k + n2_ = n2 * k + s = pd.Series(0, index=pd.date_range('19910905 13:00', + '19911005 07:00', + freq=freq1)) + s = s + range(len(s)) + + result1 = s.resample(str(n1_) + freq1).mean() + result2 = s.resample(str(n2_) + freq2).mean() + assert_series_equal(result1, result2) + + @pytest.mark.parametrize('first,last,offset,exp_first,exp_last', [ + ('19910905', '19920406', 'D', '19910905', '19920407'), + ('19910905 00:00', '19920406 06:00', 'D', '19910905', '19920407'), + ('19910905 06:00', '19920406 06:00', 'H', '19910905 06:00', + '19920406 07:00'), + ('19910906', '19920406', 'M', '19910831', '19920430'), + ('19910831', '19920430', 'M', '19910831', '19920531'), + ('1991-08', '1992-04', 'M', '19910831', '19920531'), + ]) + def test_get_timestamp_range_edges(self, first, last, offset, + exp_first, exp_last): + first = pd.Period(first) + first = first.to_timestamp(first.freq) + last = pd.Period(last) + last = last.to_timestamp(last.freq) + + exp_first = pd.Timestamp(exp_first, freq=offset) + exp_last = pd.Timestamp(exp_last, freq=offset) + + offset = pd.tseries.frequencies.to_offset(offset) + result = _get_timestamp_range_edges(first, last, offset) + expected = (exp_first, exp_last) + assert result == expected diff --git a/pandas/tests/resample/test_period_index.py b/pandas/tests/resample/test_period_index.py new file mode 100644 index 0000000000000..0b393437a3072 --- /dev/null +++ b/pandas/tests/resample/test_period_index.py @@ -0,0 +1,757 @@ +from datetime import datetime, timedelta + +import dateutil +import numpy as np +import pytest +import pytz + +from pandas._libs.tslibs.ccalendar import DAYS, MONTHS +from pandas._libs.tslibs.period import IncompatibleFrequency +from pandas.compat import lrange, range, zip + +import pandas as pd +from pandas import DataFrame, Series, Timestamp +from pandas.core.indexes.datetimes import date_range +from pandas.core.indexes.period import Period, PeriodIndex, period_range +from pandas.core.resample import _get_period_range_edges +import pandas.util.testing as tm +from pandas.util.testing import ( + assert_almost_equal, assert_frame_equal, assert_series_equal) + +import pandas.tseries.offsets as offsets + + +@pytest.fixture() +def _index_factory(): + return period_range + + +@pytest.fixture +def _series_name(): + return 'pi' + + +class TestPeriodIndex(object): + + @pytest.mark.parametrize('freq', ['2D', '1H', '2H']) + @pytest.mark.parametrize('kind', ['period', None, 'timestamp']) + def test_asfreq(self, series_and_frame, freq, kind): + # GH 12884, 15944 + # make sure .asfreq() returns PeriodIndex (except kind='timestamp') + + obj = series_and_frame + if kind == 'timestamp': + expected = obj.to_timestamp().resample(freq).asfreq() + else: + start = obj.index[0].to_timestamp(how='start') + end = (obj.index[-1] + obj.index.freq).to_timestamp(how='start') + new_index = date_range(start=start, end=end, freq=freq, + closed='left') + expected = obj.to_timestamp().reindex(new_index).to_period(freq) + result = obj.resample(freq, kind=kind).asfreq() + assert_almost_equal(result, expected) + + def test_asfreq_fill_value(self, series): + # test for fill value during resampling, issue 3715 + + s = series + new_index = date_range(s.index[0].to_timestamp(how='start'), + (s.index[-1]).to_timestamp(how='start'), + freq='1H') + expected = s.to_timestamp().reindex(new_index, fill_value=4.0) + result = s.resample('1H', kind='timestamp').asfreq(fill_value=4.0) + assert_series_equal(result, expected) + + frame = s.to_frame('value') + new_index = date_range(frame.index[0].to_timestamp(how='start'), + (frame.index[-1]).to_timestamp(how='start'), + freq='1H') + expected = frame.to_timestamp().reindex(new_index, fill_value=3.0) + result = frame.resample('1H', kind='timestamp').asfreq(fill_value=3.0) + assert_frame_equal(result, expected) + + @pytest.mark.parametrize('freq', ['H', '12H', '2D', 'W']) + @pytest.mark.parametrize('kind', [None, 'period', 'timestamp']) + def test_selection(self, index, freq, kind): + # This is a bug, these should be implemented + # GH 14008 + rng = np.arange(len(index), dtype=np.int64) + df = DataFrame({'date': index, 'a': rng}, + index=pd.MultiIndex.from_arrays([rng, index], + names=['v', 'd'])) + with pytest.raises(NotImplementedError): + df.resample(freq, on='date', kind=kind) + with pytest.raises(NotImplementedError): + df.resample(freq, level='d', kind=kind) + + @pytest.mark.parametrize('month', MONTHS) + @pytest.mark.parametrize('meth', ['ffill', 'bfill']) + @pytest.mark.parametrize('conv', ['start', 'end']) + @pytest.mark.parametrize('targ', ['D', 'B', 'M']) + def test_annual_upsample_cases(self, targ, conv, meth, month, + simple_period_range_series): + ts = simple_period_range_series( + '1/1/1990', '12/31/1991', freq='A-%s' % month) + + result = getattr(ts.resample(targ, convention=conv), meth)() + expected = result.to_timestamp(targ, how=conv) + expected = expected.asfreq(targ, meth).to_period() + assert_series_equal(result, expected) + + def test_basic_downsample(self, simple_period_range_series): + ts = simple_period_range_series('1/1/1990', '6/30/1995', freq='M') + result = ts.resample('a-dec').mean() + + expected = ts.groupby(ts.index.year).mean() + expected.index = period_range('1/1/1990', '6/30/1995', freq='a-dec') + assert_series_equal(result, expected) + + # this is ok + assert_series_equal(ts.resample('a-dec').mean(), result) + assert_series_equal(ts.resample('a').mean(), result) + + def test_not_subperiod(self, simple_period_range_series): + # These are incompatible period rules for resampling + ts = simple_period_range_series('1/1/1990', '6/30/1995', freq='w-wed') + pytest.raises(ValueError, lambda: ts.resample('a-dec').mean()) + pytest.raises(ValueError, lambda: ts.resample('q-mar').mean()) + pytest.raises(ValueError, lambda: ts.resample('M').mean()) + pytest.raises(ValueError, lambda: ts.resample('w-thu').mean()) + + @pytest.mark.parametrize('freq', ['D', '2D']) + def test_basic_upsample(self, freq, simple_period_range_series): + ts = simple_period_range_series('1/1/1990', '6/30/1995', freq='M') + result = ts.resample('a-dec').mean() + + resampled = result.resample(freq, convention='end').ffill() + expected = result.to_timestamp(freq, how='end') + expected = expected.asfreq(freq, 'ffill').to_period(freq) + assert_series_equal(resampled, expected) + + def test_upsample_with_limit(self): + rng = period_range('1/1/2000', periods=5, freq='A') + ts = Series(np.random.randn(len(rng)), rng) + + result = ts.resample('M', convention='end').ffill(limit=2) + expected = ts.asfreq('M').reindex(result.index, method='ffill', + limit=2) + assert_series_equal(result, expected) + + def test_annual_upsample(self, simple_period_range_series): + ts = simple_period_range_series('1/1/1990', '12/31/1995', freq='A-DEC') + df = DataFrame({'a': ts}) + rdf = df.resample('D').ffill() + exp = df['a'].resample('D').ffill() + assert_series_equal(rdf['a'], exp) + + rng = period_range('2000', '2003', freq='A-DEC') + ts = Series([1, 2, 3, 4], index=rng) + + result = ts.resample('M').ffill() + ex_index = period_range('2000-01', '2003-12', freq='M') + + expected = ts.asfreq('M', how='start').reindex(ex_index, + method='ffill') + assert_series_equal(result, expected) + + @pytest.mark.parametrize('month', MONTHS) + @pytest.mark.parametrize('target', ['D', 'B', 'M']) + @pytest.mark.parametrize('convention', ['start', 'end']) + def test_quarterly_upsample(self, month, target, convention, + simple_period_range_series): + freq = 'Q-{month}'.format(month=month) + ts = simple_period_range_series('1/1/1990', '12/31/1995', freq=freq) + result = ts.resample(target, convention=convention).ffill() + expected = result.to_timestamp(target, how=convention) + expected = expected.asfreq(target, 'ffill').to_period() + assert_series_equal(result, expected) + + @pytest.mark.parametrize('target', ['D', 'B']) + @pytest.mark.parametrize('convention', ['start', 'end']) + def test_monthly_upsample(self, target, convention, + simple_period_range_series): + ts = simple_period_range_series('1/1/1990', '12/31/1995', freq='M') + result = ts.resample(target, convention=convention).ffill() + expected = result.to_timestamp(target, how=convention) + expected = expected.asfreq(target, 'ffill').to_period() + assert_series_equal(result, expected) + + def test_resample_basic(self): + # GH3609 + s = Series(range(100), index=date_range( + '20130101', freq='s', periods=100, name='idx'), dtype='float') + s[10:30] = np.nan + index = PeriodIndex([ + Period('2013-01-01 00:00', 'T'), + Period('2013-01-01 00:01', 'T')], name='idx') + expected = Series([34.5, 79.5], index=index) + result = s.to_period().resample('T', kind='period').mean() + assert_series_equal(result, expected) + result2 = s.resample('T', kind='period').mean() + assert_series_equal(result2, expected) + + @pytest.mark.parametrize('freq,expected_vals', [('M', [31, 29, 31, 9]), + ('2M', [31 + 29, 31 + 9])]) + def test_resample_count(self, freq, expected_vals): + # GH12774 + series = Series(1, index=pd.period_range(start='2000', periods=100)) + result = series.resample(freq).count() + expected_index = pd.period_range(start='2000', freq=freq, + periods=len(expected_vals)) + expected = Series(expected_vals, index=expected_index) + assert_series_equal(result, expected) + + def test_resample_same_freq(self, resample_method): + + # GH12770 + series = Series(range(3), index=pd.period_range( + start='2000', periods=3, freq='M')) + expected = series + + result = getattr(series.resample('M'), resample_method)() + assert_series_equal(result, expected) + + def test_resample_incompat_freq(self): + + with pytest.raises(IncompatibleFrequency): + Series(range(3), index=pd.period_range( + start='2000', periods=3, freq='M')).resample('W').mean() + + def test_with_local_timezone_pytz(self): + # see gh-5430 + local_timezone = pytz.timezone('America/Los_Angeles') + + start = datetime(year=2013, month=11, day=1, hour=0, minute=0, + tzinfo=pytz.utc) + # 1 day later + end = datetime(year=2013, month=11, day=2, hour=0, minute=0, + tzinfo=pytz.utc) + + index = pd.date_range(start, end, freq='H') + + series = Series(1, index=index) + series = series.tz_convert(local_timezone) + result = series.resample('D', kind='period').mean() + + # Create the expected series + # Index is moved back a day with the timezone conversion from UTC to + # Pacific + expected_index = (pd.period_range(start=start, end=end, freq='D') - + offsets.Day()) + expected = Series(1, index=expected_index) + assert_series_equal(result, expected) + + def test_resample_with_pytz(self): + # GH 13238 + s = Series(2, index=pd.date_range('2017-01-01', periods=48, freq="H", + tz="US/Eastern")) + result = s.resample("D").mean() + expected = Series(2, index=pd.DatetimeIndex(['2017-01-01', + '2017-01-02'], + tz="US/Eastern")) + assert_series_equal(result, expected) + # Especially assert that the timezone is LMT for pytz + assert result.index.tz == pytz.timezone('US/Eastern') + + def test_with_local_timezone_dateutil(self): + # see gh-5430 + local_timezone = 'dateutil/America/Los_Angeles' + + start = datetime(year=2013, month=11, day=1, hour=0, minute=0, + tzinfo=dateutil.tz.tzutc()) + # 1 day later + end = datetime(year=2013, month=11, day=2, hour=0, minute=0, + tzinfo=dateutil.tz.tzutc()) + + index = pd.date_range(start, end, freq='H', name='idx') + + series = Series(1, index=index) + series = series.tz_convert(local_timezone) + result = series.resample('D', kind='period').mean() + + # Create the expected series + # Index is moved back a day with the timezone conversion from UTC to + # Pacific + expected_index = (pd.period_range(start=start, end=end, freq='D', + name='idx') - offsets.Day()) + expected = Series(1, index=expected_index) + assert_series_equal(result, expected) + + def test_resample_nonexistent_time_bin_edge(self): + # GH 19375 + index = date_range('2017-03-12', '2017-03-12 1:45:00', freq='15T') + s = Series(np.zeros(len(index)), index=index) + expected = s.tz_localize('US/Pacific') + result = expected.resample('900S').mean() + tm.assert_series_equal(result, expected) + + # GH 23742 + index = date_range(start='2017-10-10', end='2017-10-20', freq='1H') + index = index.tz_localize('UTC').tz_convert('America/Sao_Paulo') + df = DataFrame(data=list(range(len(index))), index=index) + result = df.groupby(pd.Grouper(freq='1D')).count() + expected = date_range(start='2017-10-09', end='2017-10-20', freq='D', + tz="America/Sao_Paulo", nonexistent='shift', + closed='left') + tm.assert_index_equal(result.index, expected) + + def test_resample_ambiguous_time_bin_edge(self): + # GH 10117 + idx = pd.date_range("2014-10-25 22:00:00", "2014-10-26 00:30:00", + freq="30T", tz="Europe/London") + expected = Series(np.zeros(len(idx)), index=idx) + result = expected.resample('30T').mean() + tm.assert_series_equal(result, expected) + + def test_fill_method_and_how_upsample(self): + # GH2073 + s = Series(np.arange(9, dtype='int64'), + index=date_range('2010-01-01', periods=9, freq='Q')) + last = s.resample('M').ffill() + both = s.resample('M').ffill().resample('M').last().astype('int64') + assert_series_equal(last, both) + + @pytest.mark.parametrize('day', DAYS) + @pytest.mark.parametrize('target', ['D', 'B']) + @pytest.mark.parametrize('convention', ['start', 'end']) + def test_weekly_upsample(self, day, target, convention, + simple_period_range_series): + freq = 'W-{day}'.format(day=day) + ts = simple_period_range_series('1/1/1990', '12/31/1995', freq=freq) + result = ts.resample(target, convention=convention).ffill() + expected = result.to_timestamp(target, how=convention) + expected = expected.asfreq(target, 'ffill').to_period() + assert_series_equal(result, expected) + + def test_resample_to_timestamps(self, simple_period_range_series): + ts = simple_period_range_series('1/1/1990', '12/31/1995', freq='M') + + result = ts.resample('A-DEC', kind='timestamp').mean() + expected = ts.to_timestamp(how='start').resample('A-DEC').mean() + assert_series_equal(result, expected) + + def test_resample_to_quarterly(self, simple_period_range_series): + for month in MONTHS: + ts = simple_period_range_series( + '1990', '1992', freq='A-%s' % month) + quar_ts = ts.resample('Q-%s' % month).ffill() + + stamps = ts.to_timestamp('D', how='start') + qdates = period_range(ts.index[0].asfreq('D', 'start'), + ts.index[-1].asfreq('D', 'end'), + freq='Q-%s' % month) + + expected = stamps.reindex(qdates.to_timestamp('D', 's'), + method='ffill') + expected.index = qdates + + assert_series_equal(quar_ts, expected) + + # conforms, but different month + ts = simple_period_range_series('1990', '1992', freq='A-JUN') + + for how in ['start', 'end']: + result = ts.resample('Q-MAR', convention=how).ffill() + expected = ts.asfreq('Q-MAR', how=how) + expected = expected.reindex(result.index, method='ffill') + + # .to_timestamp('D') + # expected = expected.resample('Q-MAR').ffill() + + assert_series_equal(result, expected) + + def test_resample_fill_missing(self): + rng = PeriodIndex([2000, 2005, 2007, 2009], freq='A') + + s = Series(np.random.randn(4), index=rng) + + stamps = s.to_timestamp() + filled = s.resample('A').ffill() + expected = stamps.resample('A').ffill().to_period('A') + assert_series_equal(filled, expected) + + def test_cant_fill_missing_dups(self): + rng = PeriodIndex([2000, 2005, 2005, 2007, 2007], freq='A') + s = Series(np.random.randn(5), index=rng) + pytest.raises(Exception, lambda: s.resample('A').ffill()) + + @pytest.mark.parametrize('freq', ['5min']) + @pytest.mark.parametrize('kind', ['period', None, 'timestamp']) + def test_resample_5minute(self, freq, kind): + rng = period_range('1/1/2000', '1/5/2000', freq='T') + ts = Series(np.random.randn(len(rng)), index=rng) + expected = ts.to_timestamp().resample(freq).mean() + if kind != 'timestamp': + expected = expected.to_period(freq) + result = ts.resample(freq, kind=kind).mean() + assert_series_equal(result, expected) + + def test_upsample_daily_business_daily(self, simple_period_range_series): + ts = simple_period_range_series('1/1/2000', '2/1/2000', freq='B') + + result = ts.resample('D').asfreq() + expected = ts.asfreq('D').reindex(period_range('1/3/2000', '2/1/2000')) + assert_series_equal(result, expected) + + ts = simple_period_range_series('1/1/2000', '2/1/2000') + result = ts.resample('H', convention='s').asfreq() + exp_rng = period_range('1/1/2000', '2/1/2000 23:00', freq='H') + expected = ts.asfreq('H', how='s').reindex(exp_rng) + assert_series_equal(result, expected) + + def test_resample_irregular_sparse(self): + dr = date_range(start='1/1/2012', freq='5min', periods=1000) + s = Series(np.array(100), index=dr) + # subset the data. + subset = s[:'2012-01-04 06:55'] + + result = subset.resample('10min').apply(len) + expected = s.resample('10min').apply(len).loc[result.index] + assert_series_equal(result, expected) + + def test_resample_weekly_all_na(self): + rng = date_range('1/1/2000', periods=10, freq='W-WED') + ts = Series(np.random.randn(len(rng)), index=rng) + + result = ts.resample('W-THU').asfreq() + + assert result.isna().all() + + result = ts.resample('W-THU').asfreq().ffill()[:-1] + expected = ts.asfreq('W-THU').ffill() + assert_series_equal(result, expected) + + def test_resample_tz_localized(self): + dr = date_range(start='2012-4-13', end='2012-5-1') + ts = Series(lrange(len(dr)), dr) + + ts_utc = ts.tz_localize('UTC') + ts_local = ts_utc.tz_convert('America/Los_Angeles') + + result = ts_local.resample('W').mean() + + ts_local_naive = ts_local.copy() + ts_local_naive.index = [x.replace(tzinfo=None) + for x in ts_local_naive.index.to_pydatetime()] + + exp = ts_local_naive.resample( + 'W').mean().tz_localize('America/Los_Angeles') + + assert_series_equal(result, exp) + + # it works + result = ts_local.resample('D').mean() + + # #2245 + idx = date_range('2001-09-20 15:59', '2001-09-20 16:00', freq='T', + tz='Australia/Sydney') + s = Series([1, 2], index=idx) + + result = s.resample('D', closed='right', label='right').mean() + ex_index = date_range('2001-09-21', periods=1, freq='D', + tz='Australia/Sydney') + expected = Series([1.5], index=ex_index) + + assert_series_equal(result, expected) + + # for good measure + result = s.resample('D', kind='period').mean() + ex_index = period_range('2001-09-20', periods=1, freq='D') + expected = Series([1.5], index=ex_index) + assert_series_equal(result, expected) + + # GH 6397 + # comparing an offset that doesn't propagate tz's + rng = date_range('1/1/2011', periods=20000, freq='H') + rng = rng.tz_localize('EST') + ts = DataFrame(index=rng) + ts['first'] = np.random.randn(len(rng)) + ts['second'] = np.cumsum(np.random.randn(len(rng))) + expected = DataFrame( + { + 'first': ts.resample('A').sum()['first'], + 'second': ts.resample('A').mean()['second']}, + columns=['first', 'second']) + result = ts.resample( + 'A').agg({'first': np.sum, + 'second': np.mean}).reindex(columns=['first', 'second']) + assert_frame_equal(result, expected) + + def test_closed_left_corner(self): + # #1465 + s = Series(np.random.randn(21), + index=date_range(start='1/1/2012 9:30', + freq='1min', periods=21)) + s[0] = np.nan + + result = s.resample('10min', closed='left', label='right').mean() + exp = s[1:].resample('10min', closed='left', label='right').mean() + assert_series_equal(result, exp) + + result = s.resample('10min', closed='left', label='left').mean() + exp = s[1:].resample('10min', closed='left', label='left').mean() + + ex_index = date_range(start='1/1/2012 9:30', freq='10min', periods=3) + + tm.assert_index_equal(result.index, ex_index) + assert_series_equal(result, exp) + + def test_quarterly_resampling(self): + rng = period_range('2000Q1', periods=10, freq='Q-DEC') + ts = Series(np.arange(10), index=rng) + + result = ts.resample('A').mean() + exp = ts.to_timestamp().resample('A').mean().to_period() + assert_series_equal(result, exp) + + def test_resample_weekly_bug_1726(self): + # 8/6/12 is a Monday + ind = date_range(start="8/6/2012", end="8/26/2012", freq="D") + n = len(ind) + data = [[x] * 5 for x in range(n)] + df = DataFrame(data, columns=['open', 'high', 'low', 'close', 'vol'], + index=ind) + + # it works! + df.resample('W-MON', closed='left', label='left').first() + + def test_resample_with_dst_time_change(self): + # GH 15549 + index = pd.DatetimeIndex([1457537600000000000, 1458059600000000000], + tz='UTC').tz_convert('America/Chicago') + df = pd.DataFrame([1, 2], index=index) + result = df.resample('12h', closed='right', + label='right').last().ffill() + + expected_index_values = ['2016-03-09 12:00:00-06:00', + '2016-03-10 00:00:00-06:00', + '2016-03-10 12:00:00-06:00', + '2016-03-11 00:00:00-06:00', + '2016-03-11 12:00:00-06:00', + '2016-03-12 00:00:00-06:00', + '2016-03-12 12:00:00-06:00', + '2016-03-13 00:00:00-06:00', + '2016-03-13 13:00:00-05:00', + '2016-03-14 01:00:00-05:00', + '2016-03-14 13:00:00-05:00', + '2016-03-15 01:00:00-05:00', + '2016-03-15 13:00:00-05:00'] + index = pd.to_datetime(expected_index_values, utc=True).tz_convert( + 'America/Chicago') + expected = pd.DataFrame([1.0, 1.0, 1.0, 1.0, 1.0, + 1.0, 1.0, 1.0, 1.0, 1.0, + 1.0, 1.0, 2.0], index=index) + assert_frame_equal(result, expected) + + def test_resample_bms_2752(self): + # GH2753 + foo = Series(index=pd.bdate_range('20000101', '20000201')) + res1 = foo.resample("BMS").mean() + res2 = foo.resample("BMS").mean().resample("B").mean() + assert res1.index[0] == Timestamp('20000103') + assert res1.index[0] == res2.index[0] + + # def test_monthly_convention_span(self): + # rng = period_range('2000-01', periods=3, freq='M') + # ts = Series(np.arange(3), index=rng) + + # # hacky way to get same thing + # exp_index = period_range('2000-01-01', '2000-03-31', freq='D') + # expected = ts.asfreq('D', how='end').reindex(exp_index) + # expected = expected.fillna(method='bfill') + + # result = ts.resample('D', convention='span').mean() + + # assert_series_equal(result, expected) + + def test_default_right_closed_label(self): + end_freq = ['D', 'Q', 'M', 'D'] + end_types = ['M', 'A', 'Q', 'W'] + + for from_freq, to_freq in zip(end_freq, end_types): + idx = date_range(start='8/15/2012', periods=100, freq=from_freq) + df = DataFrame(np.random.randn(len(idx), 2), idx) + + resampled = df.resample(to_freq).mean() + assert_frame_equal(resampled, df.resample(to_freq, closed='right', + label='right').mean()) + + def test_default_left_closed_label(self): + others = ['MS', 'AS', 'QS', 'D', 'H'] + others_freq = ['D', 'Q', 'M', 'H', 'T'] + + for from_freq, to_freq in zip(others_freq, others): + idx = date_range(start='8/15/2012', periods=100, freq=from_freq) + df = DataFrame(np.random.randn(len(idx), 2), idx) + + resampled = df.resample(to_freq).mean() + assert_frame_equal(resampled, df.resample(to_freq, closed='left', + label='left').mean()) + + def test_all_values_single_bin(self): + # 2070 + index = period_range(start="2012-01-01", end="2012-12-31", freq="M") + s = Series(np.random.randn(len(index)), index=index) + + result = s.resample("A").mean() + tm.assert_almost_equal(result[0], s.mean()) + + def test_evenly_divisible_with_no_extra_bins(self): + # 4076 + # when the frequency is evenly divisible, sometimes extra bins + + df = DataFrame(np.random.randn(9, 3), + index=date_range('2000-1-1', periods=9)) + result = df.resample('5D').mean() + expected = pd.concat( + [df.iloc[0:5].mean(), df.iloc[5:].mean()], axis=1).T + expected.index = [Timestamp('2000-1-1'), Timestamp('2000-1-6')] + assert_frame_equal(result, expected) + + index = date_range(start='2001-5-4', periods=28) + df = DataFrame( + [{'REST_KEY': 1, 'DLY_TRN_QT': 80, 'DLY_SLS_AMT': 90, + 'COOP_DLY_TRN_QT': 30, 'COOP_DLY_SLS_AMT': 20}] * 28 + + [{'REST_KEY': 2, 'DLY_TRN_QT': 70, 'DLY_SLS_AMT': 10, + 'COOP_DLY_TRN_QT': 50, 'COOP_DLY_SLS_AMT': 20}] * 28, + index=index.append(index)).sort_index() + + index = date_range('2001-5-4', periods=4, freq='7D') + expected = DataFrame( + [{'REST_KEY': 14, 'DLY_TRN_QT': 14, 'DLY_SLS_AMT': 14, + 'COOP_DLY_TRN_QT': 14, 'COOP_DLY_SLS_AMT': 14}] * 4, + index=index) + result = df.resample('7D').count() + assert_frame_equal(result, expected) + + expected = DataFrame( + [{'REST_KEY': 21, 'DLY_TRN_QT': 1050, 'DLY_SLS_AMT': 700, + 'COOP_DLY_TRN_QT': 560, 'COOP_DLY_SLS_AMT': 280}] * 4, + index=index) + result = df.resample('7D').sum() + assert_frame_equal(result, expected) + + @pytest.mark.parametrize('kind', ['period', None, 'timestamp']) + @pytest.mark.parametrize('agg_arg', ['mean', {'value': 'mean'}, ['mean']]) + def test_loffset_returns_datetimeindex(self, frame, kind, agg_arg): + # make sure passing loffset returns DatetimeIndex in all cases + # basic method taken from Base.test_resample_loffset_arg_type() + df = frame + expected_means = [df.values[i:i + 2].mean() + for i in range(0, len(df.values), 2)] + expected_index = period_range( + df.index[0], periods=len(df.index) / 2, freq='2D') + + # loffset coerces PeriodIndex to DateTimeIndex + expected_index = expected_index.to_timestamp() + expected_index += timedelta(hours=2) + expected = DataFrame({'value': expected_means}, index=expected_index) + + result_agg = df.resample('2D', loffset='2H', kind=kind).agg(agg_arg) + with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): + result_how = df.resample('2D', how=agg_arg, loffset='2H', + kind=kind) + if isinstance(agg_arg, list): + expected.columns = pd.MultiIndex.from_tuples([('value', 'mean')]) + assert_frame_equal(result_agg, expected) + assert_frame_equal(result_how, expected) + + @pytest.mark.parametrize('freq, period_mult', [('H', 24), ('12H', 2)]) + @pytest.mark.parametrize('kind', [None, 'period']) + def test_upsampling_ohlc(self, freq, period_mult, kind): + # GH 13083 + pi = PeriodIndex(start='2000', freq='D', periods=10) + s = Series(range(len(pi)), index=pi) + expected = s.to_timestamp().resample(freq).ohlc().to_period(freq) + + # timestamp-based resampling doesn't include all sub-periods + # of the last original period, so extend accordingly: + new_index = PeriodIndex(start='2000', freq=freq, + periods=period_mult * len(pi)) + expected = expected.reindex(new_index) + result = s.resample(freq, kind=kind).ohlc() + assert_frame_equal(result, expected) + + @pytest.mark.parametrize('periods, values', + [([pd.NaT, '1970-01-01 00:00:00', pd.NaT, + '1970-01-01 00:00:02', '1970-01-01 00:00:03'], + [2, 3, 5, 7, 11]), + ([pd.NaT, pd.NaT, '1970-01-01 00:00:00', pd.NaT, + pd.NaT, pd.NaT, '1970-01-01 00:00:02', + '1970-01-01 00:00:03', pd.NaT, pd.NaT], + [1, 2, 3, 5, 6, 8, 7, 11, 12, 13])]) + @pytest.mark.parametrize('freq, expected_values', + [('1s', [3, np.NaN, 7, 11]), + ('2s', [3, int((7 + 11) / 2)]), + ('3s', [int((3 + 7) / 2), 11])]) + def test_resample_with_nat(self, periods, values, freq, expected_values): + # GH 13224 + index = PeriodIndex(periods, freq='S') + frame = DataFrame(values, index=index) + + expected_index = period_range('1970-01-01 00:00:00', + periods=len(expected_values), freq=freq) + expected = DataFrame(expected_values, index=expected_index) + result = frame.resample(freq).mean() + assert_frame_equal(result, expected) + + def test_resample_with_only_nat(self): + # GH 13224 + pi = PeriodIndex([pd.NaT] * 3, freq='S') + frame = DataFrame([2, 3, 5], index=pi) + expected_index = PeriodIndex(data=[], freq=pi.freq) + expected = DataFrame([], index=expected_index) + result = frame.resample('1s').mean() + assert_frame_equal(result, expected) + + @pytest.mark.parametrize('start,end,start_freq,end_freq,base', [ + ('19910905', '19910909 03:00', 'H', '24H', 10), + ('19910905', '19910909 12:00', 'H', '24H', 10), + ('19910905', '19910909 23:00', 'H', '24H', 10), + ('19910905 10:00', '19910909', 'H', '24H', 10), + ('19910905 10:00', '19910909 10:00', 'H', '24H', 10), + ('19910905', '19910909 10:00', 'H', '24H', 10), + ('19910905 12:00', '19910909', 'H', '24H', 10), + ('19910905 12:00', '19910909 03:00', 'H', '24H', 10), + ('19910905 12:00', '19910909 12:00', 'H', '24H', 10), + ('19910905 12:00', '19910909 12:00', 'H', '24H', 34), + ('19910905 12:00', '19910909 12:00', 'H', '17H', 10), + ('19910905 12:00', '19910909 12:00', 'H', '17H', 3), + ('19910905 12:00', '19910909 1:00', 'H', 'M', 3), + ('19910905', '19910913 06:00', '2H', '24H', 10), + ('19910905', '19910905 01:39', 'Min', '5Min', 3), + ('19910905', '19910905 03:18', '2Min', '5Min', 3), + ]) + def test_resample_with_non_zero_base(self, start, end, start_freq, + end_freq, base): + # GH 23882 + s = pd.Series(0, index=pd.period_range(start, end, freq=start_freq)) + s = s + np.arange(len(s)) + result = s.resample(end_freq, base=base).mean() + result = result.to_timestamp(end_freq) + # to_timestamp casts 24H -> D + result = result.asfreq(end_freq) if end_freq == '24H' else result + expected = s.to_timestamp().resample(end_freq, base=base).mean() + assert_series_equal(result, expected) + + @pytest.mark.parametrize('first,last,offset,exp_first,exp_last', [ + ('19910905', '19920406', 'D', '19910905', '19920406'), + ('19910905 00:00', '19920406 06:00', 'D', '19910905', '19920406'), + ('19910905 06:00', '19920406 06:00', 'H', '19910905 06:00', + '19920406 06:00'), + ('19910906', '19920406', 'M', '1991-09', '1992-04'), + ('19910831', '19920430', 'M', '1991-08', '1992-04'), + ('1991-08', '1992-04', 'M', '1991-08', '1992-04'), + ]) + def test_get_period_range_edges(self, first, last, offset, + exp_first, exp_last): + first = pd.Period(first) + last = pd.Period(last) + + exp_first = pd.Period(exp_first, freq=offset) + exp_last = pd.Period(exp_last, freq=offset) + + offset = pd.tseries.frequencies.to_offset(offset) + result = _get_period_range_edges(first, last, offset) + expected = (exp_first, exp_last) + assert result == expected diff --git a/pandas/tests/resample/test_resample_api.py b/pandas/tests/resample/test_resample_api.py new file mode 100644 index 0000000000000..69684daf05f3d --- /dev/null +++ b/pandas/tests/resample/test_resample_api.py @@ -0,0 +1,544 @@ +# pylint: disable=E1101 + +from datetime import datetime + +import numpy as np +import pytest + +from pandas.compat import OrderedDict, range + +import pandas as pd +from pandas import DataFrame, Series +from pandas.core.indexes.datetimes import date_range +import pandas.util.testing as tm +from pandas.util.testing import assert_frame_equal, assert_series_equal + +dti = date_range(start=datetime(2005, 1, 1), + end=datetime(2005, 1, 10), freq='Min') + +test_series = Series(np.random.rand(len(dti)), dti) +test_frame = DataFrame( + {'A': test_series, 'B': test_series, 'C': np.arange(len(dti))}) + + +def test_str(): + + r = test_series.resample('H') + assert ('DatetimeIndexResampler [freq=, axis=0, closed=left, ' + 'label=left, convention=start, base=0]' in str(r)) + + +def test_api(): + + r = test_series.resample('H') + result = r.mean() + assert isinstance(result, Series) + assert len(result) == 217 + + r = test_series.to_frame().resample('H') + result = r.mean() + assert isinstance(result, DataFrame) + assert len(result) == 217 + + +def test_groupby_resample_api(): + + # GH 12448 + # .groupby(...).resample(...) hitting warnings + # when appropriate + df = DataFrame({'date': pd.date_range(start='2016-01-01', + periods=4, + freq='W'), + 'group': [1, 1, 2, 2], + 'val': [5, 6, 7, 8]}).set_index('date') + + # replication step + i = pd.date_range('2016-01-03', periods=8).tolist() + \ + pd.date_range('2016-01-17', periods=8).tolist() + index = pd.MultiIndex.from_arrays([[1] * 8 + [2] * 8, i], + names=['group', 'date']) + expected = DataFrame({'val': [5] * 7 + [6] + [7] * 7 + [8]}, + index=index) + result = df.groupby('group').apply( + lambda x: x.resample('1D').ffill())[['val']] + assert_frame_equal(result, expected) + + +def test_groupby_resample_on_api(): + + # GH 15021 + # .groupby(...).resample(on=...) results in an unexpected + # keyword warning. + df = DataFrame({'key': ['A', 'B'] * 5, + 'dates': pd.date_range('2016-01-01', periods=10), + 'values': np.random.randn(10)}) + + expected = df.set_index('dates').groupby('key').resample('D').mean() + + result = df.groupby('key').resample('D', on='dates').mean() + assert_frame_equal(result, expected) + + +def test_pipe(): + # GH17905 + + # series + r = test_series.resample('H') + expected = r.max() - r.mean() + result = r.pipe(lambda x: x.max() - x.mean()) + tm.assert_series_equal(result, expected) + + # dataframe + r = test_frame.resample('H') + expected = r.max() - r.mean() + result = r.pipe(lambda x: x.max() - x.mean()) + tm.assert_frame_equal(result, expected) + + +def test_getitem(): + + r = test_frame.resample('H') + tm.assert_index_equal(r._selected_obj.columns, test_frame.columns) + + r = test_frame.resample('H')['B'] + assert r._selected_obj.name == test_frame.columns[1] + + # technically this is allowed + r = test_frame.resample('H')['A', 'B'] + tm.assert_index_equal(r._selected_obj.columns, + test_frame.columns[[0, 1]]) + + r = test_frame.resample('H')['A', 'B'] + tm.assert_index_equal(r._selected_obj.columns, + test_frame.columns[[0, 1]]) + + +def test_select_bad_cols(): + + g = test_frame.resample('H') + pytest.raises(KeyError, g.__getitem__, ['D']) + + pytest.raises(KeyError, g.__getitem__, ['A', 'D']) + with pytest.raises(KeyError, match='^[^A]+$'): + # A should not be referenced as a bad column... + # will have to rethink regex if you change message! + g[['A', 'D']] + + +def test_attribute_access(): + + r = test_frame.resample('H') + tm.assert_series_equal(r.A.sum(), r['A'].sum()) + + +def test_api_compat_before_use(): + + # make sure that we are setting the binner + # on these attributes + for attr in ['groups', 'ngroups', 'indices']: + rng = pd.date_range('1/1/2012', periods=100, freq='S') + ts = Series(np.arange(len(rng)), index=rng) + rs = ts.resample('30s') + + # before use + getattr(rs, attr) + + # after grouper is initialized is ok + rs.mean() + getattr(rs, attr) + + +def tests_skip_nuisance(): + + df = test_frame + df['D'] = 'foo' + r = df.resample('H') + result = r[['A', 'B']].sum() + expected = pd.concat([r.A.sum(), r.B.sum()], axis=1) + assert_frame_equal(result, expected) + + expected = r[['A', 'B', 'C']].sum() + result = r.sum() + assert_frame_equal(result, expected) + + +def test_downsample_but_actually_upsampling(): + + # this is reindex / asfreq + rng = pd.date_range('1/1/2012', periods=100, freq='S') + ts = Series(np.arange(len(rng), dtype='int64'), index=rng) + result = ts.resample('20s').asfreq() + expected = Series([0, 20, 40, 60, 80], + index=pd.date_range('2012-01-01 00:00:00', + freq='20s', + periods=5)) + assert_series_equal(result, expected) + + +def test_combined_up_downsampling_of_irregular(): + + # since we are reallydoing an operation like this + # ts2.resample('2s').mean().ffill() + # preserve these semantics + + rng = pd.date_range('1/1/2012', periods=100, freq='S') + ts = Series(np.arange(len(rng)), index=rng) + ts2 = ts.iloc[[0, 1, 2, 3, 5, 7, 11, 15, 16, 25, 30]] + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = ts2.resample('2s', how='mean', fill_method='ffill') + expected = ts2.resample('2s').mean().ffill() + assert_series_equal(result, expected) + + +def test_transform(): + + r = test_series.resample('20min') + expected = test_series.groupby( + pd.Grouper(freq='20min')).transform('mean') + result = r.transform('mean') + assert_series_equal(result, expected) + + +def test_fillna(): + + # need to upsample here + rng = pd.date_range('1/1/2012', periods=10, freq='2S') + ts = Series(np.arange(len(rng), dtype='int64'), index=rng) + r = ts.resample('s') + + expected = r.ffill() + result = r.fillna(method='ffill') + assert_series_equal(result, expected) + + expected = r.bfill() + result = r.fillna(method='bfill') + assert_series_equal(result, expected) + + with pytest.raises(ValueError): + r.fillna(0) + + +def test_apply_without_aggregation(): + + # both resample and groupby should work w/o aggregation + r = test_series.resample('20min') + g = test_series.groupby(pd.Grouper(freq='20min')) + + for t in [g, r]: + result = t.apply(lambda x: x) + assert_series_equal(result, test_series) + + +def test_agg_consistency(): + + # make sure that we are consistent across + # similar aggregations with and w/o selection list + df = DataFrame(np.random.randn(1000, 3), + index=pd.date_range('1/1/2012', freq='S', periods=1000), + columns=['A', 'B', 'C']) + + r = df.resample('3T') + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + expected = r[['A', 'B', 'C']].agg({'r1': 'mean', 'r2': 'sum'}) + result = r.agg({'r1': 'mean', 'r2': 'sum'}) + assert_frame_equal(result, expected) + +# TODO: once GH 14008 is fixed, move these tests into +# `Base` test class + + +def test_agg(): + # test with all three Resampler apis and TimeGrouper + + np.random.seed(1234) + index = date_range(datetime(2005, 1, 1), + datetime(2005, 1, 10), freq='D') + index.name = 'date' + df = DataFrame(np.random.rand(10, 2), columns=list('AB'), index=index) + df_col = df.reset_index() + df_mult = df_col.copy() + df_mult.index = pd.MultiIndex.from_arrays([range(10), df.index], + names=['index', 'date']) + r = df.resample('2D') + cases = [ + r, + df_col.resample('2D', on='date'), + df_mult.resample('2D', level='date'), + df.groupby(pd.Grouper(freq='2D')) + ] + + a_mean = r['A'].mean() + a_std = r['A'].std() + a_sum = r['A'].sum() + b_mean = r['B'].mean() + b_std = r['B'].std() + b_sum = r['B'].sum() + + expected = pd.concat([a_mean, a_std, b_mean, b_std], axis=1) + expected.columns = pd.MultiIndex.from_product([['A', 'B'], + ['mean', 'std']]) + for t in cases: + result = t.aggregate([np.mean, np.std]) + assert_frame_equal(result, expected) + + expected = pd.concat([a_mean, b_std], axis=1) + for t in cases: + result = t.aggregate({'A': np.mean, + 'B': np.std}) + assert_frame_equal(result, expected, check_like=True) + + expected = pd.concat([a_mean, a_std], axis=1) + expected.columns = pd.MultiIndex.from_tuples([('A', 'mean'), + ('A', 'std')]) + for t in cases: + result = t.aggregate({'A': ['mean', 'std']}) + assert_frame_equal(result, expected) + + expected = pd.concat([a_mean, a_sum], axis=1) + expected.columns = ['mean', 'sum'] + for t in cases: + result = t['A'].aggregate(['mean', 'sum']) + assert_frame_equal(result, expected) + + expected = pd.concat([a_mean, a_sum], axis=1) + expected.columns = pd.MultiIndex.from_tuples([('A', 'mean'), + ('A', 'sum')]) + for t in cases: + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t.aggregate({'A': {'mean': 'mean', 'sum': 'sum'}}) + assert_frame_equal(result, expected, check_like=True) + + expected = pd.concat([a_mean, a_sum, b_mean, b_sum], axis=1) + expected.columns = pd.MultiIndex.from_tuples([('A', 'mean'), + ('A', 'sum'), + ('B', 'mean2'), + ('B', 'sum2')]) + for t in cases: + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t.aggregate({'A': {'mean': 'mean', 'sum': 'sum'}, + 'B': {'mean2': 'mean', 'sum2': 'sum'}}) + assert_frame_equal(result, expected, check_like=True) + + expected = pd.concat([a_mean, a_std, b_mean, b_std], axis=1) + expected.columns = pd.MultiIndex.from_tuples([('A', 'mean'), + ('A', 'std'), + ('B', 'mean'), + ('B', 'std')]) + for t in cases: + result = t.aggregate({'A': ['mean', 'std'], + 'B': ['mean', 'std']}) + assert_frame_equal(result, expected, check_like=True) + + expected = pd.concat([a_mean, a_sum, b_mean, b_sum], axis=1) + expected.columns = pd.MultiIndex.from_tuples([('r1', 'A', 'mean'), + ('r1', 'A', 'sum'), + ('r2', 'B', 'mean'), + ('r2', 'B', 'sum')]) + + +def test_agg_misc(): + # test with all three Resampler apis and TimeGrouper + + np.random.seed(1234) + index = date_range(datetime(2005, 1, 1), + datetime(2005, 1, 10), freq='D') + index.name = 'date' + df = DataFrame(np.random.rand(10, 2), columns=list('AB'), index=index) + df_col = df.reset_index() + df_mult = df_col.copy() + df_mult.index = pd.MultiIndex.from_arrays([range(10), df.index], + names=['index', 'date']) + + r = df.resample('2D') + cases = [ + r, + df_col.resample('2D', on='date'), + df_mult.resample('2D', level='date'), + df.groupby(pd.Grouper(freq='2D')) + ] + + # passed lambda + for t in cases: + result = t.agg({'A': np.sum, + 'B': lambda x: np.std(x, ddof=1)}) + rcustom = t['B'].apply(lambda x: np.std(x, ddof=1)) + expected = pd.concat([r['A'].sum(), rcustom], axis=1) + assert_frame_equal(result, expected, check_like=True) + + # agg with renamers + expected = pd.concat([t['A'].sum(), + t['B'].sum(), + t['A'].mean(), + t['B'].mean()], + axis=1) + expected.columns = pd.MultiIndex.from_tuples([('result1', 'A'), + ('result1', 'B'), + ('result2', 'A'), + ('result2', 'B')]) + + for t in cases: + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t[['A', 'B']].agg(OrderedDict([('result1', np.sum), + ('result2', np.mean)])) + assert_frame_equal(result, expected, check_like=True) + + # agg with different hows + expected = pd.concat([t['A'].sum(), + t['A'].std(), + t['B'].mean(), + t['B'].std()], + axis=1) + expected.columns = pd.MultiIndex.from_tuples([('A', 'sum'), + ('A', 'std'), + ('B', 'mean'), + ('B', 'std')]) + for t in cases: + result = t.agg(OrderedDict([('A', ['sum', 'std']), + ('B', ['mean', 'std'])])) + assert_frame_equal(result, expected, check_like=True) + + # equivalent of using a selection list / or not + for t in cases: + result = t[['A', 'B']].agg({'A': ['sum', 'std'], + 'B': ['mean', 'std']}) + assert_frame_equal(result, expected, check_like=True) + + # series like aggs + for t in cases: + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t['A'].agg({'A': ['sum', 'std']}) + expected = pd.concat([t['A'].sum(), + t['A'].std()], + axis=1) + expected.columns = pd.MultiIndex.from_tuples([('A', 'sum'), + ('A', 'std')]) + assert_frame_equal(result, expected, check_like=True) + + expected = pd.concat([t['A'].agg(['sum', 'std']), + t['A'].agg(['mean', 'std'])], + axis=1) + expected.columns = pd.MultiIndex.from_tuples([('A', 'sum'), + ('A', 'std'), + ('B', 'mean'), + ('B', 'std')]) + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t['A'].agg({'A': ['sum', 'std'], + 'B': ['mean', 'std']}) + assert_frame_equal(result, expected, check_like=True) + + # errors + # invalid names in the agg specification + for t in cases: + with pytest.raises(KeyError): + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + t[['A']].agg({'A': ['sum', 'std'], + 'B': ['mean', 'std']}) + + +def test_agg_nested_dicts(): + + np.random.seed(1234) + index = date_range(datetime(2005, 1, 1), + datetime(2005, 1, 10), freq='D') + index.name = 'date' + df = DataFrame(np.random.rand(10, 2), columns=list('AB'), index=index) + df_col = df.reset_index() + df_mult = df_col.copy() + df_mult.index = pd.MultiIndex.from_arrays([range(10), df.index], + names=['index', 'date']) + r = df.resample('2D') + cases = [ + r, + df_col.resample('2D', on='date'), + df_mult.resample('2D', level='date'), + df.groupby(pd.Grouper(freq='2D')) + ] + + for t in cases: + def f(): + t.aggregate({'r1': {'A': ['mean', 'sum']}, + 'r2': {'B': ['mean', 'sum']}}) + pytest.raises(ValueError, f) + + for t in cases: + expected = pd.concat([t['A'].mean(), t['A'].std(), t['B'].mean(), + t['B'].std()], axis=1) + expected.columns = pd.MultiIndex.from_tuples([('ra', 'mean'), ( + 'ra', 'std'), ('rb', 'mean'), ('rb', 'std')]) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t[['A', 'B']].agg({'A': {'ra': ['mean', 'std']}, + 'B': {'rb': ['mean', 'std']}}) + assert_frame_equal(result, expected, check_like=True) + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + result = t.agg({'A': {'ra': ['mean', 'std']}, + 'B': {'rb': ['mean', 'std']}}) + assert_frame_equal(result, expected, check_like=True) + + +def test_try_aggregate_non_existing_column(): + # GH 16766 + data = [ + {'dt': datetime(2017, 6, 1, 0), 'x': 1.0, 'y': 2.0}, + {'dt': datetime(2017, 6, 1, 1), 'x': 2.0, 'y': 2.0}, + {'dt': datetime(2017, 6, 1, 2), 'x': 3.0, 'y': 1.5} + ] + df = DataFrame(data).set_index('dt') + + # Error as we don't have 'z' column + with pytest.raises(KeyError): + df.resample('30T').agg({'x': ['mean'], + 'y': ['median'], + 'z': ['sum']}) + + +def test_selection_api_validation(): + # GH 13500 + index = date_range(datetime(2005, 1, 1), + datetime(2005, 1, 10), freq='D') + + rng = np.arange(len(index), dtype=np.int64) + df = DataFrame({'date': index, 'a': rng}, + index=pd.MultiIndex.from_arrays([rng, index], + names=['v', 'd'])) + df_exp = DataFrame({'a': rng}, index=index) + + # non DatetimeIndex + with pytest.raises(TypeError): + df.resample('2D', level='v') + + with pytest.raises(ValueError): + df.resample('2D', on='date', level='d') + + with pytest.raises(TypeError): + df.resample('2D', on=['a', 'date']) + + with pytest.raises(KeyError): + df.resample('2D', level=['a', 'date']) + + # upsampling not allowed + with pytest.raises(ValueError): + df.resample('2D', level='d').asfreq() + + with pytest.raises(ValueError): + df.resample('2D', on='date').asfreq() + + exp = df_exp.resample('2D').sum() + exp.index.name = 'date' + assert_frame_equal(exp, df.resample('2D', on='date').sum()) + + exp.index.name = 'd' + assert_frame_equal(exp, df.resample('2D', level='d').sum()) diff --git a/pandas/tests/resample/test_resampler_grouper.py b/pandas/tests/resample/test_resampler_grouper.py new file mode 100644 index 0000000000000..b61acfc3d2c5e --- /dev/null +++ b/pandas/tests/resample/test_resampler_grouper.py @@ -0,0 +1,260 @@ +# pylint: disable=E1101 + +from textwrap import dedent + +import numpy as np + +from pandas.compat import range + +import pandas as pd +from pandas import DataFrame, Series, Timestamp +from pandas.core.indexes.datetimes import date_range +import pandas.util.testing as tm +from pandas.util.testing import assert_frame_equal, assert_series_equal + +test_frame = DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8, + 'B': np.arange(40)}, + index=date_range('1/1/2000', + freq='s', + periods=40)) + + +def test_tab_complete_ipython6_warning(ip): + from IPython.core.completer import provisionalcompleter + code = dedent("""\ + import pandas.util.testing as tm + s = tm.makeTimeSeries() + rs = s.resample("D") + """) + ip.run_code(code) + + with tm.assert_produces_warning(None): + with provisionalcompleter('ignore'): + list(ip.Completer.completions('rs.', 1)) + + +def test_deferred_with_groupby(): + + # GH 12486 + # support deferred resample ops with groupby + data = [['2010-01-01', 'A', 2], ['2010-01-02', 'A', 3], + ['2010-01-05', 'A', 8], ['2010-01-10', 'A', 7], + ['2010-01-13', 'A', 3], ['2010-01-01', 'B', 5], + ['2010-01-03', 'B', 2], ['2010-01-04', 'B', 1], + ['2010-01-11', 'B', 7], ['2010-01-14', 'B', 3]] + + df = DataFrame(data, columns=['date', 'id', 'score']) + df.date = pd.to_datetime(df.date) + + def f(x): + return x.set_index('date').resample('D').asfreq() + expected = df.groupby('id').apply(f) + result = df.set_index('date').groupby('id').resample('D').asfreq() + assert_frame_equal(result, expected) + + df = DataFrame({'date': pd.date_range(start='2016-01-01', + periods=4, + freq='W'), + 'group': [1, 1, 2, 2], + 'val': [5, 6, 7, 8]}).set_index('date') + + def f(x): + return x.resample('1D').ffill() + expected = df.groupby('group').apply(f) + result = df.groupby('group').resample('1D').ffill() + assert_frame_equal(result, expected) + + +def test_getitem(): + g = test_frame.groupby('A') + + expected = g.B.apply(lambda x: x.resample('2s').mean()) + + result = g.resample('2s').B.mean() + assert_series_equal(result, expected) + + result = g.B.resample('2s').mean() + assert_series_equal(result, expected) + + result = g.resample('2s').mean().B + assert_series_equal(result, expected) + + +def test_getitem_multiple(): + + # GH 13174 + # multiple calls after selection causing an issue with aliasing + data = [{'id': 1, 'buyer': 'A'}, {'id': 2, 'buyer': 'B'}] + df = DataFrame(data, index=pd.date_range('2016-01-01', periods=2)) + r = df.groupby('id').resample('1D') + result = r['buyer'].count() + expected = Series([1, 1], + index=pd.MultiIndex.from_tuples( + [(1, Timestamp('2016-01-01')), + (2, Timestamp('2016-01-02'))], + names=['id', None]), + name='buyer') + assert_series_equal(result, expected) + + result = r['buyer'].count() + assert_series_equal(result, expected) + + +def test_groupby_resample_on_api_with_getitem(): + # GH 17813 + df = pd.DataFrame({'id': list('aabbb'), + 'date': pd.date_range('1-1-2016', periods=5), + 'data': 1}) + exp = df.set_index('date').groupby('id').resample('2D')['data'].sum() + result = df.groupby('id').resample('2D', on='date')['data'].sum() + assert_series_equal(result, exp) + + +def test_nearest(): + + # GH 17496 + # Resample nearest + index = pd.date_range('1/1/2000', periods=3, freq='T') + result = Series(range(3), index=index).resample('20s').nearest() + + expected = Series( + [0, 0, 1, 1, 1, 2, 2], + index=pd.DatetimeIndex( + ['2000-01-01 00:00:00', '2000-01-01 00:00:20', + '2000-01-01 00:00:40', '2000-01-01 00:01:00', + '2000-01-01 00:01:20', '2000-01-01 00:01:40', + '2000-01-01 00:02:00'], + dtype='datetime64[ns]', + freq='20S')) + assert_series_equal(result, expected) + + +def test_methods(): + g = test_frame.groupby('A') + r = g.resample('2s') + + for f in ['first', 'last', 'median', 'sem', 'sum', 'mean', + 'min', 'max']: + result = getattr(r, f)() + expected = g.apply(lambda x: getattr(x.resample('2s'), f)()) + assert_frame_equal(result, expected) + + for f in ['size']: + result = getattr(r, f)() + expected = g.apply(lambda x: getattr(x.resample('2s'), f)()) + assert_series_equal(result, expected) + + for f in ['count']: + result = getattr(r, f)() + expected = g.apply(lambda x: getattr(x.resample('2s'), f)()) + assert_frame_equal(result, expected) + + # series only + for f in ['nunique']: + result = getattr(r.B, f)() + expected = g.B.apply(lambda x: getattr(x.resample('2s'), f)()) + assert_series_equal(result, expected) + + for f in ['nearest', 'backfill', 'ffill', 'asfreq']: + result = getattr(r, f)() + expected = g.apply(lambda x: getattr(x.resample('2s'), f)()) + assert_frame_equal(result, expected) + + result = r.ohlc() + expected = g.apply(lambda x: x.resample('2s').ohlc()) + assert_frame_equal(result, expected) + + for f in ['std', 'var']: + result = getattr(r, f)(ddof=1) + expected = g.apply(lambda x: getattr(x.resample('2s'), f)(ddof=1)) + assert_frame_equal(result, expected) + + +def test_apply(): + + g = test_frame.groupby('A') + r = g.resample('2s') + + # reduction + expected = g.resample('2s').sum() + + def f(x): + return x.resample('2s').sum() + + result = r.apply(f) + assert_frame_equal(result, expected) + + def f(x): + return x.resample('2s').apply(lambda y: y.sum()) + + result = g.apply(f) + assert_frame_equal(result, expected) + + +def test_apply_with_mutated_index(): + # GH 15169 + index = pd.date_range('1-1-2015', '12-31-15', freq='D') + df = DataFrame(data={'col1': np.random.rand(len(index))}, index=index) + + def f(x): + s = Series([1, 2], index=['a', 'b']) + return s + + expected = df.groupby(pd.Grouper(freq='M')).apply(f) + + result = df.resample('M').apply(f) + assert_frame_equal(result, expected) + + # A case for series + expected = df['col1'].groupby(pd.Grouper(freq='M')).apply(f) + result = df['col1'].resample('M').apply(f) + assert_series_equal(result, expected) + + +def test_resample_groupby_with_label(): + # GH 13235 + index = date_range('2000-01-01', freq='2D', periods=5) + df = DataFrame(index=index, + data={'col0': [0, 0, 1, 1, 2], 'col1': [1, 1, 1, 1, 1]} + ) + result = df.groupby('col0').resample('1W', label='left').sum() + + mi = [np.array([0, 0, 1, 2]), + pd.to_datetime(np.array(['1999-12-26', '2000-01-02', + '2000-01-02', '2000-01-02']) + ) + ] + mindex = pd.MultiIndex.from_arrays(mi, names=['col0', None]) + expected = DataFrame(data={'col0': [0, 0, 2, 2], 'col1': [1, 1, 2, 1]}, + index=mindex + ) + + assert_frame_equal(result, expected) + + +def test_consistency_with_window(): + + # consistent return values with window + df = test_frame + expected = pd.Int64Index([1, 2, 3], name='A') + result = df.groupby('A').resample('2s').mean() + assert result.index.nlevels == 2 + tm.assert_index_equal(result.index.levels[0], expected) + + result = df.groupby('A').rolling(20).mean() + assert result.index.nlevels == 2 + tm.assert_index_equal(result.index.levels[0], expected) + + +def test_median_duplicate_columns(): + # GH 14233 + + df = DataFrame(np.random.randn(20, 3), + columns=list('aaa'), + index=pd.date_range('2012-01-01', periods=20, freq='s')) + df2 = df.copy() + df2.columns = ['a', 'b', 'c'] + expected = df2.resample('5s').median() + result = df.resample('5s').median() + expected.columns = result.columns + assert_frame_equal(result, expected) diff --git a/pandas/tests/resample/test_time_grouper.py b/pandas/tests/resample/test_time_grouper.py new file mode 100644 index 0000000000000..ec29b55ac9d67 --- /dev/null +++ b/pandas/tests/resample/test_time_grouper.py @@ -0,0 +1,287 @@ +from datetime import datetime +from operator import methodcaller + +import numpy as np +import pytest + +import pandas as pd +from pandas import DataFrame, Panel, Series +from pandas.core.indexes.datetimes import date_range +from pandas.core.resample import TimeGrouper +import pandas.util.testing as tm +from pandas.util.testing import assert_frame_equal, assert_series_equal + +test_series = Series(np.random.randn(1000), + index=date_range('1/1/2000', periods=1000)) + + +def test_apply(): + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + grouper = pd.TimeGrouper(freq='A', label='right', closed='right') + + grouped = test_series.groupby(grouper) + + def f(x): + return x.sort_values()[-3:] + + applied = grouped.apply(f) + expected = test_series.groupby(lambda x: x.year).apply(f) + + applied.index = applied.index.droplevel(0) + expected.index = expected.index.droplevel(0) + assert_series_equal(applied, expected) + + +def test_count(): + test_series[::3] = np.nan + + expected = test_series.groupby(lambda x: x.year).count() + + with tm.assert_produces_warning(FutureWarning, + check_stacklevel=False): + grouper = pd.TimeGrouper(freq='A', label='right', closed='right') + result = test_series.groupby(grouper).count() + expected.index = result.index + assert_series_equal(result, expected) + + result = test_series.resample('A').count() + expected.index = result.index + assert_series_equal(result, expected) + + +def test_numpy_reduction(): + result = test_series.resample('A', closed='right').prod() + + expected = test_series.groupby(lambda x: x.year).agg(np.prod) + expected.index = result.index + + assert_series_equal(result, expected) + + +def test_apply_iteration(): + # #2300 + N = 1000 + ind = pd.date_range(start="2000-01-01", freq="D", periods=N) + df = DataFrame({'open': 1, 'close': 2}, index=ind) + tg = TimeGrouper('M') + + _, grouper, _ = tg._get_grouper(df) + + # Errors + grouped = df.groupby(grouper, group_keys=False) + + def f(df): + return df['close'] / df['open'] + + # it works! + result = grouped.apply(f) + tm.assert_index_equal(result.index, df.index) + + +@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") +def test_panel_aggregation(): + ind = pd.date_range('1/1/2000', periods=100) + data = np.random.randn(2, len(ind), 4) + + wp = Panel(data, items=['Item1', 'Item2'], major_axis=ind, + minor_axis=['A', 'B', 'C', 'D']) + + tg = TimeGrouper('M', axis=1) + _, grouper, _ = tg._get_grouper(wp) + bingrouped = wp.groupby(grouper) + binagg = bingrouped.mean() + + def f(x): + assert (isinstance(x, Panel)) + return x.mean(1) + + result = bingrouped.agg(f) + tm.assert_panel_equal(result, binagg) + + +@pytest.mark.parametrize('name, func', [ + ('Int64Index', tm.makeIntIndex), + ('Index', tm.makeUnicodeIndex), + ('Float64Index', tm.makeFloatIndex), + ('MultiIndex', lambda m: tm.makeCustomIndex(m, 2)) +]) +def test_fails_on_no_datetime_index(name, func): + n = 2 + index = func(n) + df = DataFrame({'a': np.random.randn(n)}, index=index) + + msg = ("Only valid with DatetimeIndex, TimedeltaIndex " + "or PeriodIndex, but got an instance of %r" % name) + with pytest.raises(TypeError, match=msg): + df.groupby(TimeGrouper('D')) + + +def test_aaa_group_order(): + # GH 12840 + # check TimeGrouper perform stable sorts + n = 20 + data = np.random.randn(n, 4) + df = DataFrame(data, columns=['A', 'B', 'C', 'D']) + df['key'] = [datetime(2013, 1, 1), datetime(2013, 1, 2), + datetime(2013, 1, 3), datetime(2013, 1, 4), + datetime(2013, 1, 5)] * 4 + grouped = df.groupby(TimeGrouper(key='key', freq='D')) + + tm.assert_frame_equal(grouped.get_group(datetime(2013, 1, 1)), + df[::5]) + tm.assert_frame_equal(grouped.get_group(datetime(2013, 1, 2)), + df[1::5]) + tm.assert_frame_equal(grouped.get_group(datetime(2013, 1, 3)), + df[2::5]) + tm.assert_frame_equal(grouped.get_group(datetime(2013, 1, 4)), + df[3::5]) + tm.assert_frame_equal(grouped.get_group(datetime(2013, 1, 5)), + df[4::5]) + + +def test_aggregate_normal(resample_method): + """Check TimeGrouper's aggregation is identical as normal groupby.""" + + if resample_method == 'ohlc': + pytest.xfail(reason='DataError: No numeric types to aggregate') + + data = np.random.randn(20, 4) + normal_df = DataFrame(data, columns=['A', 'B', 'C', 'D']) + normal_df['key'] = [1, 2, 3, 4, 5] * 4 + + dt_df = DataFrame(data, columns=['A', 'B', 'C', 'D']) + dt_df['key'] = [datetime(2013, 1, 1), datetime(2013, 1, 2), + datetime(2013, 1, 3), datetime(2013, 1, 4), + datetime(2013, 1, 5)] * 4 + + normal_grouped = normal_df.groupby('key') + dt_grouped = dt_df.groupby(TimeGrouper(key='key', freq='D')) + + expected = getattr(normal_grouped, resample_method)() + dt_result = getattr(dt_grouped, resample_method)() + expected.index = date_range(start='2013-01-01', freq='D', + periods=5, name='key') + tm.assert_equal(expected, dt_result) + + # if TimeGrouper is used included, 'nth' doesn't work yet + + """ + for func in ['nth']: + expected = getattr(normal_grouped, func)(3) + expected.index = date_range(start='2013-01-01', + freq='D', periods=5, name='key') + dt_result = getattr(dt_grouped, func)(3) + assert_frame_equal(expected, dt_result) + """ + + +@pytest.mark.parametrize('method, method_args, unit', [ + ('sum', dict(), 0), + ('sum', dict(min_count=0), 0), + ('sum', dict(min_count=1), np.nan), + ('prod', dict(), 1), + ('prod', dict(min_count=0), 1), + ('prod', dict(min_count=1), np.nan) +]) +def test_resample_entirly_nat_window(method, method_args, unit): + s = pd.Series([0] * 2 + [np.nan] * 2, + index=pd.date_range('2017', periods=4)) + result = methodcaller(method, **method_args)(s.resample("2d")) + expected = pd.Series([0.0, unit], + index=pd.to_datetime(['2017-01-01', + '2017-01-03'])) + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize('func, fill_value', [ + ('min', np.nan), + ('max', np.nan), + ('sum', 0), + ('prod', 1), + ('count', 0), +]) +def test_aggregate_with_nat(func, fill_value): + # check TimeGrouper's aggregation is identical as normal groupby + # if NaT is included, 'var', 'std', 'mean', 'first','last' + # and 'nth' doesn't work yet + + n = 20 + data = np.random.randn(n, 4).astype('int64') + normal_df = DataFrame(data, columns=['A', 'B', 'C', 'D']) + normal_df['key'] = [1, 2, np.nan, 4, 5] * 4 + + dt_df = DataFrame(data, columns=['A', 'B', 'C', 'D']) + dt_df['key'] = [datetime(2013, 1, 1), datetime(2013, 1, 2), pd.NaT, + datetime(2013, 1, 4), datetime(2013, 1, 5)] * 4 + + normal_grouped = normal_df.groupby('key') + dt_grouped = dt_df.groupby(TimeGrouper(key='key', freq='D')) + + normal_result = getattr(normal_grouped, func)() + dt_result = getattr(dt_grouped, func)() + + pad = DataFrame([[fill_value] * 4], index=[3], + columns=['A', 'B', 'C', 'D']) + expected = normal_result.append(pad) + expected = expected.sort_index() + expected.index = date_range(start='2013-01-01', freq='D', + periods=5, name='key') + assert_frame_equal(expected, dt_result) + assert dt_result.index.name == 'key' + + +def test_aggregate_with_nat_size(): + # GH 9925 + n = 20 + data = np.random.randn(n, 4).astype('int64') + normal_df = DataFrame(data, columns=['A', 'B', 'C', 'D']) + normal_df['key'] = [1, 2, np.nan, 4, 5] * 4 + + dt_df = DataFrame(data, columns=['A', 'B', 'C', 'D']) + dt_df['key'] = [datetime(2013, 1, 1), datetime(2013, 1, 2), pd.NaT, + datetime(2013, 1, 4), datetime(2013, 1, 5)] * 4 + + normal_grouped = normal_df.groupby('key') + dt_grouped = dt_df.groupby(TimeGrouper(key='key', freq='D')) + + normal_result = normal_grouped.size() + dt_result = dt_grouped.size() + + pad = Series([0], index=[3]) + expected = normal_result.append(pad) + expected = expected.sort_index() + expected.index = date_range(start='2013-01-01', freq='D', + periods=5, name='key') + assert_series_equal(expected, dt_result) + assert dt_result.index.name == 'key' + + +def test_repr(): + # GH18203 + result = repr(TimeGrouper(key='A', freq='H')) + expected = ("TimeGrouper(key='A', freq=, axis=0, sort=True, " + "closed='left', label='left', how='mean', " + "convention='e', base=0)") + assert result == expected + + +@pytest.mark.parametrize('method, method_args, expected_values', [ + ('sum', dict(), [1, 0, 1]), + ('sum', dict(min_count=0), [1, 0, 1]), + ('sum', dict(min_count=1), [1, np.nan, 1]), + ('sum', dict(min_count=2), [np.nan, np.nan, np.nan]), + ('prod', dict(), [1, 1, 1]), + ('prod', dict(min_count=0), [1, 1, 1]), + ('prod', dict(min_count=1), [1, np.nan, 1]), + ('prod', dict(min_count=2), [np.nan, np.nan, np.nan]), +]) +def test_upsample_sum(method, method_args, expected_values): + s = pd.Series(1, index=pd.date_range("2017", periods=2, freq="H")) + resampled = s.resample("30T") + index = pd.to_datetime(['2017-01-01T00:00:00', + '2017-01-01T00:30:00', + '2017-01-01T01:00:00']) + result = methodcaller(method, **method_args)(resampled) + expected = pd.Series(expected_values, index=index) + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/resample/test_timedelta.py b/pandas/tests/resample/test_timedelta.py new file mode 100644 index 0000000000000..5c81370d0d04b --- /dev/null +++ b/pandas/tests/resample/test_timedelta.py @@ -0,0 +1,41 @@ +import numpy as np + +import pandas as pd +from pandas import DataFrame +from pandas.core.indexes.timedeltas import timedelta_range +import pandas.util.testing as tm +from pandas.util.testing import assert_frame_equal + + +class TestTimedeltaIndex(object): + def test_asfreq_bug(self): + import datetime as dt + df = DataFrame(data=[1, 3], + index=[dt.timedelta(), dt.timedelta(minutes=3)]) + result = df.resample('1T').asfreq() + expected = DataFrame(data=[1, np.nan, np.nan, 3], + index=timedelta_range('0 day', + periods=4, + freq='1T')) + assert_frame_equal(result, expected) + + def test_resample_with_nat(self): + # GH 13223 + index = pd.to_timedelta(['0s', pd.NaT, '2s']) + result = DataFrame({'value': [2, 3, 5]}, index).resample('1s').mean() + expected = DataFrame({'value': [2.5, np.nan, 5.0]}, + index=timedelta_range('0 day', + periods=3, + freq='1S')) + assert_frame_equal(result, expected) + + def test_resample_as_freq_with_subperiod(self): + # GH 13022 + index = timedelta_range('00:00:00', '00:10:00', freq='5T') + df = DataFrame(data={'value': [1, 5, 10]}, index=index) + result = df.resample('2T').asfreq() + expected_data = {'value': [1, np.nan, np.nan, np.nan, np.nan, 10]} + expected = DataFrame(data=expected_data, + index=timedelta_range('00:00:00', + '00:10:00', freq='2T')) + tm.assert_frame_equal(result, expected) diff --git a/pandas/tests/reshape/data/cut_data.csv b/pandas/tests/reshape/data/cut_data.csv index 7d9d480599579..c198ec77e45da 100644 --- a/pandas/tests/reshape/data/cut_data.csv +++ b/pandas/tests/reshape/data/cut_data.csv @@ -1 +1 @@ -1.001 0.994 0.9951 0.9956 0.9956 0.9951 0.9949 1.001 0.994 0.9938 0.9908 0.9947 0.992 0.9912 1.0002 0.9914 0.9928 0.9892 0.9917 0.9955 0.9892 0.9912 0.993 0.9937 0.9951 0.9955 0.993 0.9961 0.9914 0.9906 0.9974 0.9934 0.992 0.9939 0.9962 0.9905 0.9934 0.9906 0.9999 0.9999 0.9937 0.9937 0.9954 0.9934 0.9934 0.9931 0.994 0.9939 0.9954 0.995 0.9917 0.9914 0.991 0.9911 0.993 0.9908 0.9962 0.9972 0.9931 0.9926 0.9951 0.9972 0.991 0.9931 0.9927 0.9934 0.9903 0.992 0.9926 0.9962 0.9956 0.9958 0.9964 0.9941 0.9926 0.9962 0.9898 0.9912 0.9961 0.9949 0.9929 0.9985 0.9946 0.9966 0.9974 0.9975 0.9974 0.9972 0.9974 0.9975 0.9974 0.9957 0.99 0.9899 0.9916 0.9969 0.9979 0.9913 0.9956 0.9979 0.9975 0.9962 0.997 1 0.9975 0.9974 0.9962 0.999 0.999 0.9927 0.9959 1 0.9982 0.9968 0.9968 0.994 0.9914 0.9911 0.9982 0.9982 0.9934 0.9984 0.9952 0.9952 0.9928 0.9912 0.994 0.9958 0.9924 0.9924 0.994 0.9958 0.9979 0.9982 0.9961 0.9979 0.992 0.9975 0.9917 0.9923 0.9927 0.9975 0.992 0.9947 0.9921 0.9905 0.9918 0.9951 0.9917 0.994 0.9934 0.9968 0.994 0.9919 0.9966 0.9979 0.9979 0.9898 0.9894 0.9894 0.9898 0.998 0.9932 0.9979 0.997 0.9972 0.9974 0.9896 0.9968 0.9958 0.9906 0.9917 0.9902 0.9918 0.999 0.9927 0.991 0.9972 0.9931 0.995 0.9951 0.9936 1.001 0.9979 0.997 0.9972 0.9954 0.9924 0.9906 0.9962 0.9962 1.001 0.9928 0.9942 0.9942 0.9942 0.9942 0.9961 0.998 0.9961 0.9984 0.998 0.9973 0.9949 0.9924 0.9972 0.9958 0.9968 0.9938 0.993 0.994 0.9918 0.9958 0.9944 0.9912 0.9961 0.9939 0.9961 0.9989 0.9938 0.9939 0.9971 0.9912 0.9936 0.9929 0.9998 0.9938 0.9969 0.9938 0.9998 0.9972 0.9976 0.9976 0.9979 0.9979 0.9979 0.9979 0.9972 0.9918 0.9982 0.9985 0.9944 0.9903 0.9934 0.9975 0.9923 0.99 0.9905 0.9905 0.996 0.9964 0.998 0.9975 0.9913 0.9932 0.9935 0.9927 0.9927 0.9912 0.9904 0.9939 0.9996 0.9944 0.9977 0.9912 0.9996 0.9965 0.9944 0.9945 0.9944 0.9965 0.9944 0.9972 0.9949 0.9966 0.9954 0.9954 0.9915 0.9919 0.9916 0.99 0.9909 0.9938 0.9982 0.9988 0.9961 0.9978 0.9979 0.9979 0.9979 0.9979 0.9945 1 0.9957 0.9968 0.9934 0.9976 0.9932 0.997 0.9923 0.9914 0.992 0.9914 0.9914 0.9949 0.9949 0.995 0.995 0.9927 0.9928 0.9917 0.9918 0.9954 0.9941 0.9941 0.9934 0.9927 0.9938 0.9933 0.9934 0.9927 0.9938 0.9927 0.9946 0.993 0.9946 0.9976 0.9944 0.9978 0.992 0.9912 0.9927 0.9906 0.9954 0.9923 0.9906 0.991 0.9972 0.9945 0.9934 0.9964 0.9948 0.9962 0.9931 0.993 0.9942 0.9906 0.9995 0.998 0.997 0.9914 0.992 0.9924 0.992 0.9937 0.9978 0.9978 0.9927 0.994 0.9935 0.9968 0.9941 0.9942 0.9978 0.9923 0.9912 0.9923 0.9927 0.9931 0.9941 0.9927 0.9931 0.9934 0.9936 0.9893 0.9893 0.9919 0.9924 0.9927 0.9919 0.9924 0.9975 0.9969 0.9936 0.991 0.9893 0.9906 0.9941 0.995 0.9983 0.9983 0.9916 0.9957 0.99 0.9976 0.992 0.9917 0.9917 0.9993 0.9908 0.9917 0.9976 0.9934 1 0.9918 0.992 0.9896 0.9932 0.992 0.9917 0.9999 0.998 0.9918 0.9918 0.9999 0.998 0.9927 0.9959 0.9927 0.9929 0.9898 0.9954 0.9954 0.9954 0.9954 0.9954 0.9954 0.9974 0.9936 0.9978 0.9974 0.9927 0.9934 0.9938 0.9922 0.992 0.9935 0.9906 0.9934 0.9934 0.9913 0.9938 0.9898 0.9975 0.9975 0.9937 0.9914 0.9982 0.9982 0.9929 0.9971 0.9921 0.9931 0.9924 0.9929 0.9982 0.9892 0.9956 0.9924 0.9971 0.9956 0.9982 0.9973 0.9932 0.9976 0.9962 0.9956 0.9932 0.9976 0.9992 0.9983 0.9937 0.99 0.9944 0.9938 0.9965 0.9893 0.9927 0.994 0.9928 0.9964 0.9917 0.9972 0.9964 0.9954 0.993 0.9928 0.9916 0.9936 0.9962 0.9899 0.9898 0.996 0.9907 0.994 0.9913 0.9976 0.9904 0.992 0.9976 0.999 0.9975 0.9937 0.9937 0.998 0.998 0.9944 0.9938 0.9907 0.9938 0.9921 0.9908 0.9931 0.9915 0.9952 0.9926 0.9934 0.992 0.9918 0.9942 0.9942 0.9942 0.9901 0.9898 0.9902 0.9934 0.9906 0.9898 0.9896 0.9922 0.9947 0.9945 0.9976 0.9976 0.9976 0.9987 0.9987 0.9976 0.992 0.9955 0.9953 0.9976 0.992 0.9952 0.9983 0.9933 0.9958 0.9922 0.9928 0.9976 0.9976 0.9916 0.9901 0.9976 0.9901 0.9916 0.9982 0.993 0.9969 0.991 0.9953 0.9924 0.9969 0.9928 0.9945 0.9967 0.9944 0.9928 0.9929 0.9948 0.9976 0.9912 0.9987 0.99 0.991 0.9933 0.9933 0.9899 0.9912 0.9912 0.9976 0.994 0.9947 0.9954 0.993 0.9954 0.9963 0.992 0.9926 0.995 0.9983 0.992 0.9968 0.9905 0.9904 0.9926 0.9968 0.9928 0.9949 0.9909 0.9937 0.9914 0.9905 0.9904 0.9924 0.9924 0.9965 0.9965 0.9993 0.9965 0.9908 0.992 0.9978 0.9978 0.9978 0.9978 0.9912 0.9928 0.9928 0.993 0.9993 0.9965 0.9937 0.9913 0.9934 0.9952 0.9983 0.9957 0.9957 0.9916 0.9999 0.9999 0.9936 0.9972 0.9933 0.9934 0.9931 0.9976 0.9937 0.9937 0.991 0.9979 0.9971 0.9969 0.9968 0.9961 0.993 0.9973 0.9944 0.9986 0.9986 0.9986 0.9986 0.9972 0.9917 0.992 0.9932 0.9936 0.9915 0.9922 0.9934 0.9952 0.9972 0.9934 0.9958 0.9944 0.9908 0.9958 0.9925 0.9966 0.9972 0.9912 0.995 0.9928 0.9968 0.9955 0.9981 0.991 0.991 0.991 0.992 0.9931 0.997 0.9948 0.9923 0.9976 0.9938 0.9984 0.9972 0.9922 0.9935 0.9944 0.9942 0.9944 0.9997 0.9977 0.9912 0.9982 0.9982 0.9983 0.998 0.9894 0.9927 0.9917 0.9904 0.993 0.9941 0.9943 0.99855 0.99345 0.998 0.9916 0.9916 0.99475 0.99325 0.9933 0.9969 1.0002 0.9933 0.9937 0.99685 0.99455 0.9917 0.99035 0.9914 0.99225 0.99155 0.9954 0.99455 0.9924 0.99695 0.99655 0.9934 0.998 0.9971 0.9948 0.998 0.9971 0.99215 0.9948 0.9915 0.99115 0.9932 0.9977 0.99535 0.99165 0.9953 0.9928 0.9958 0.9928 0.9928 0.9964 0.9987 0.9953 0.9932 0.9907 0.99755 0.99935 0.9932 0.9932 0.9958 0.99585 1.00055 0.9985 0.99505 0.992 0.9988 0.99175 0.9962 0.9962 0.9942 0.9927 0.9927 0.99985 0.997 0.9918 0.99215 0.99865 0.9992 1.0006 0.99135 0.99715 0.9992 1.0006 0.99865 0.99815 0.99815 0.99815 0.9949 0.99815 0.99815 0.99225 0.99445 0.99225 0.99335 0.99625 0.9971 0.9983 0.99445 0.99085 0.9977 0.9953 0.99775 0.99795 0.99505 0.9977 0.9975 0.99745 0.9976 0.99775 0.9953 0.9932 0.99405 1 0.99785 0.9939 0.9939 0.99675 0.9939 0.99675 0.98965 0.9971 0.99445 0.9945 0.9939 0.9958 0.9956 0.99055 0.9959 0.9925 0.9963 0.9935 0.99105 0.99045 0.9963 0.99155 0.99085 0.99085 0.99085 0.9924 0.9924 0.99975 0.99975 0.99315 0.9917 0.9917 0.99845 0.9921 0.99975 0.9909 0.99315 0.99855 0.9934 0.9978 0.9934 0.9949 0.99855 0.9986 0.99725 0.9946 0.99255 0.9996 0.9939 0.99 0.9937 0.9886 0.9934 1 0.9994 0.9926 0.9956 0.9978 0.9915 0.9939 0.9932 0.993 0.9898 0.9921 0.9932 0.9919 0.993 0.9953 0.9928 0.9928 0.9976 0.9906 0.9918 0.99185 0.9918 0.99185 0.994 0.9908 0.9928 0.9896 0.9908 0.9918 0.9952 0.9923 0.9915 0.9952 0.9947 0.9983 0.9975 0.995 0.9944 0.994 0.9944 0.9908 0.99795 0.9985 0.99425 0.99425 0.9943 0.9924 0.9946 0.9924 0.995 0.9919 0.99 0.9923 0.9956 0.9978 0.9978 0.9967 0.9934 0.9936 0.9932 0.9934 0.998 0.9978 0.9929 0.9974 0.99685 0.99495 0.99745 0.99505 0.992 0.9978 0.9956 0.9982 0.99485 0.9971 0.99265 0.9904 0.9965 0.9946 0.99965 0.9935 0.996 0.9942 0.9936 0.9965 0.9928 0.9928 0.9965 0.9936 0.9938 0.9926 0.9926 0.9983 0.9983 0.992 0.9983 0.9923 0.9972 0.9928 0.9928 0.9994 0.991 0.9906 0.9894 0.9898 0.9994 0.991 0.9925 0.9956 0.9946 0.9966 0.9951 0.9927 0.9927 0.9951 0.9894 0.9907 0.9925 0.9928 0.9941 0.9941 0.9925 0.9935 0.9932 0.9944 0.9972 0.994 0.9956 0.9927 0.9924 0.9966 0.9997 0.9936 0.9936 0.9952 0.9952 0.9928 0.9911 0.993 0.9911 0.9932 0.993 0.993 0.9932 0.9932 0.9943 0.9968 0.9994 0.9926 0.9968 0.9932 0.9916 0.9946 0.9925 0.9925 0.9935 0.9962 0.9928 0.993 0.993 0.9956 0.9941 0.9972 0.9948 0.9955 0.9972 0.9972 0.9983 0.9942 0.9936 0.9956 0.9953 0.9918 0.995 0.992 0.9952 1.001 0.9924 0.9932 0.9937 0.9918 0.9934 0.991 0.9962 0.9932 0.9908 0.9962 0.9918 0.9941 0.9931 0.9981 0.9931 0.9944 0.992 0.9966 0.9956 0.9956 0.9949 1.0002 0.9942 0.9923 0.9917 0.9931 0.992 1.0002 0.9953 0.9951 0.9974 0.9904 0.9974 0.9944 1.0004 0.9952 0.9956 0.995 0.995 0.9995 0.9942 0.9977 0.992 0.992 0.9995 0.9934 1.0006 0.9982 0.9928 0.9945 0.9963 0.9906 0.9956 0.9942 0.9962 0.9894 0.995 0.9908 0.9914 0.9938 0.9977 0.9922 0.992 0.9903 0.9893 0.9952 0.9903 0.9912 0.9983 0.9937 0.9932 0.9928 0.9922 0.9976 0.9922 0.9974 0.998 0.9931 0.9911 0.9944 0.9937 0.9974 0.989 0.992 0.9928 0.9918 0.9936 0.9944 0.9988 0.994 0.9953 0.9986 0.9914 0.9934 0.996 0.9937 0.9921 0.998 0.996 0.9933 0.9933 0.9959 0.9936 0.9953 0.9938 0.9952 0.9959 0.9959 0.9937 0.992 0.9967 0.9944 0.9998 0.9998 0.9942 0.9998 0.9945 0.9998 0.9946 0.9942 0.9928 0.9946 0.9927 0.9938 0.9918 0.9945 0.9966 0.9954 0.9913 0.9931 0.9986 0.9965 0.9984 0.9952 0.9956 0.9949 0.9954 0.996 0.9931 0.992 0.9912 0.9978 0.9938 0.9914 0.9932 0.9944 0.9913 0.9948 0.998 0.9998 0.9964 0.9992 0.9948 0.9998 0.998 0.9939 0.992 0.9922 0.9955 0.9917 0.9917 0.9954 0.9986 0.9955 0.9917 0.9907 0.9922 0.9958 0.993 0.9917 0.9926 0.9959 0.9906 0.9993 0.993 0.9906 0.992 0.992 0.994 0.9959 0.9908 0.9902 0.9908 0.9943 0.9921 0.9911 0.9986 0.992 0.992 0.9943 0.9937 0.993 0.9902 0.9928 0.9896 0.998 0.9954 0.9938 0.9918 0.9896 0.9944 0.9999 0.9953 0.992 0.9925 0.9981 0.9952 0.9927 0.9927 0.9911 0.9936 0.9959 0.9946 0.9948 0.9955 0.9951 0.9952 0.9946 0.9946 0.9944 0.9938 0.9963 0.991 1.0003 0.9966 0.9993 1.0003 0.9938 0.9965 0.9938 0.9993 0.9938 1.0003 0.9966 0.9942 0.9928 0.991 0.9911 0.9977 0.9927 0.9911 0.991 0.9912 0.9907 0.9902 0.992 0.994 0.9966 0.993 0.993 0.993 0.9966 0.9942 0.9925 0.9925 0.9928 0.995 0.9939 0.9958 0.9952 1 0.9948 0.99 0.9958 0.9948 0.9949 0.997 0.9927 0.9938 0.9949 0.9953 0.997 0.9932 0.9927 0.9932 0.9955 0.9914 0.991 0.992 0.9924 0.9927 0.9911 0.9958 0.9928 0.9902 0.994 0.994 0.9972 1.0004 0.991 0.9918 0.995 0.9941 0.9956 0.9956 0.9959 0.9922 0.9931 0.9959 0.9984 0.9908 0.991 0.9928 0.9936 0.9941 0.9924 0.9917 0.9906 0.995 0.9956 0.9955 0.9907 1 0.9953 0.9911 0.9922 0.9951 0.9948 0.9906 0.994 0.9907 0.9927 0.9914 0.9958 1 0.9984 0.9941 0.9944 0.998 0.998 0.9902 0.9911 0.9929 0.993 0.9918 0.992 0.9932 0.992 0.994 0.9923 0.993 0.9956 0.9907 0.99 0.9918 0.9926 0.995 0.99 0.99 0.9946 0.9907 0.9898 0.9918 0.9986 0.9986 0.9928 0.9986 0.9979 0.994 0.9937 0.9938 0.9942 0.9944 0.993 0.9986 0.9932 0.9934 0.9928 0.9925 0.9944 0.9909 0.9932 0.9934 1.0001 0.992 0.9916 0.998 0.9919 0.9925 0.9977 0.9944 0.991 0.99 0.9917 0.9923 0.9928 0.9923 0.9928 0.9902 0.9893 0.9917 0.9982 1.0005 0.9923 0.9951 0.9956 0.998 0.9928 0.9938 0.9914 0.9955 0.9924 0.9911 0.9917 0.9917 0.9932 0.9955 0.9929 0.9955 0.9958 1.0012 0.9968 0.9911 0.9924 0.991 0.9946 0.9928 0.9946 0.9917 0.9918 0.9926 0.9931 0.9932 0.9903 0.9928 0.9929 0.9958 0.9955 0.9911 0.9938 0.9942 0.9945 0.9962 0.992 0.9927 0.9948 0.9945 0.9942 0.9952 0.9942 0.9958 0.9918 0.9932 1.0004 0.9972 0.9998 0.9918 0.9918 0.9964 0.9936 0.9931 0.9938 0.9934 0.99 0.9914 0.9904 0.994 0.9938 0.9933 0.9909 0.9942 0.9945 0.9954 0.996 0.9991 0.993 0.9942 0.9934 0.9939 0.9937 0.994 0.9926 0.9951 0.9952 0.9935 0.9938 0.9939 0.9933 0.9927 0.998 0.9997 0.9981 0.992 0.9954 0.992 0.9997 0.9981 0.9943 0.9941 0.9936 0.9996 0.9932 0.9926 0.9936 0.992 0.9936 0.9996 0.993 0.9924 0.9928 0.9926 0.9952 0.9945 0.9945 0.9903 0.9932 0.9953 0.9936 0.9912 0.9962 0.9965 0.9932 0.9967 0.9953 0.9963 0.992 0.991 0.9958 0.99 0.991 0.9958 0.9938 0.9996 0.9946 0.9974 0.9945 0.9946 0.9974 0.9957 0.9931 0.9947 0.9953 0.9931 0.9946 0.9978 0.9989 1.0004 0.9938 0.9934 0.9978 0.9956 0.9982 0.9948 0.9956 0.9982 0.9926 0.991 0.9945 0.9916 0.9953 0.9938 0.9956 0.9906 0.9956 0.9932 0.9914 0.9938 0.996 0.9906 0.98815 0.9942 0.9903 0.9906 0.9935 1.0024 0.9968 0.9906 0.9941 0.9919 0.9928 0.9958 0.9932 0.9957 0.9937 0.9982 0.9928 0.9919 0.9956 0.9957 0.9954 0.993 0.9954 0.9987 0.9956 0.9928 0.9951 0.993 0.9928 0.9926 0.9938 1.0001 0.9933 0.9952 0.9934 0.9988 0.993 0.9952 0.9948 0.9998 0.9971 0.9998 0.9962 0.9948 0.99 0.9942 0.9965 0.9912 0.9978 0.9928 1.0103 0.9956 0.9936 0.9929 0.9966 0.9964 0.996 0.9959 0.9954 0.9914 1.0103 1.0004 0.9911 0.9938 0.9927 0.9922 0.9924 0.9963 0.9936 0.9951 0.9951 0.9955 0.9961 0.9936 0.992 0.9944 0.9944 1.0008 0.9962 0.9986 0.9986 1 0.9986 0.9982 1 0.9949 0.9915 0.9951 0.9986 0.9927 0.9955 0.9952 0.9928 0.9982 0.9914 0.9927 0.9918 0.9944 0.9969 0.9955 0.9954 0.9955 0.9921 0.9934 0.9998 0.9946 0.9984 0.9924 0.9939 0.995 0.9957 0.9953 0.9912 0.9939 0.9921 0.9954 0.9933 0.9941 0.995 0.9977 0.9912 0.9945 0.9952 0.9924 0.9986 0.9953 0.9939 0.9929 0.9988 0.9906 0.9914 0.9978 0.9928 0.9948 0.9978 0.9946 0.9908 0.9954 0.9906 0.99705 0.9982 0.9932 0.9977 0.994 0.9982 0.9929 0.9924 0.9966 0.9921 0.9967 0.9934 0.9914 0.99705 0.9961 0.9967 0.9926 0.99605 0.99435 0.9948 0.9916 0.997 0.9961 0.9967 0.9961 0.9955 0.9922 0.9918 0.9955 0.9941 0.9955 0.9955 0.9924 0.9973 0.999 0.9941 0.9922 0.9922 0.9953 0.9945 0.9945 0.9957 0.9932 0.9945 0.9913 0.9909 0.9939 0.991 0.9954 0.9943 0.993 1.0002 0.9946 0.9953 0.9918 0.9936 0.9984 0.9956 0.9966 0.9942 0.9984 0.9956 0.9966 0.9974 0.9944 1.0008 0.9974 1.0008 0.9928 0.9944 0.9908 0.9917 0.9911 0.9912 0.9953 0.9932 0.9896 0.9889 0.9912 0.9926 0.9911 0.9964 0.9974 0.9944 0.9974 0.9964 0.9963 0.9948 0.9948 0.9953 0.9948 0.9953 0.9949 0.9988 0.9954 0.992 0.9984 0.9954 0.9926 0.992 0.9976 0.9972 0.991 0.998 0.9966 0.998 1.0007 0.992 0.9925 0.991 0.9934 0.9955 0.9944 0.9981 0.9968 0.9946 0.9946 0.9981 0.9946 0.997 0.9924 0.9958 0.994 0.9958 0.9984 0.9948 0.9932 0.9952 0.9924 0.9945 0.9976 0.9976 0.9938 0.9997 0.994 0.9921 0.9986 0.9987 0.9991 0.9987 0.9991 0.9991 0.9948 0.9987 0.993 0.9988 1 0.9932 0.9991 0.9989 1 1 0.9952 0.9969 0.9966 0.9966 0.9976 0.99 0.9988 0.9942 0.9984 0.9932 0.9969 0.9966 0.9933 0.9916 0.9914 0.9966 0.9958 0.9926 0.9939 0.9953 0.9906 0.9914 0.9958 0.9926 0.9991 0.9994 0.9976 0.9966 0.9953 0.9923 0.993 0.9931 0.9932 0.9926 0.9938 0.9966 0.9974 0.9924 0.9948 0.9964 0.9924 0.9966 0.9974 0.9938 0.9928 0.9959 1.0001 0.9959 1.0001 0.9968 0.9932 0.9954 0.9992 0.9932 0.9939 0.9952 0.9996 0.9966 0.9925 0.996 0.9996 0.9973 0.9937 0.9966 1.0017 0.993 0.993 0.9959 0.9958 1.0017 0.9958 0.9979 0.9941 0.997 0.9934 0.9927 0.9944 0.9927 0.9963 1.0011 1.0011 0.9959 0.9973 0.9966 0.9932 0.9984 0.999 0.999 0.999 0.999 0.999 1.0006 0.9937 0.9954 0.997 0.9912 0.9939 0.999 0.9957 0.9926 0.9994 1.0004 0.9994 1.0004 1.0004 1.0002 0.9922 0.9922 0.9934 0.9926 0.9941 0.9994 1.0004 0.9924 0.9948 0.9935 0.9918 0.9948 0.9924 0.9979 0.993 0.994 0.991 0.993 0.9922 0.9979 0.9937 0.9928 0.9965 0.9928 0.9991 0.9948 0.9925 0.9958 0.9962 0.9965 0.9951 0.9944 0.9916 0.9987 0.9928 0.9926 0.9934 0.9944 0.9949 0.9926 0.997 0.9949 0.9948 0.992 0.9964 0.9926 0.9982 0.9955 0.9955 0.9958 0.9997 1.0001 1.0001 0.9918 0.9918 0.9931 1.0001 0.9926 0.9966 0.9932 0.9969 0.9925 0.9914 0.996 0.9952 0.9934 0.9939 0.9939 0.9906 0.9901 0.9948 0.995 0.9953 0.9953 0.9952 0.996 0.9948 0.9951 0.9931 0.9962 0.9948 0.9959 0.9962 0.9958 0.9948 0.9948 0.994 0.9942 0.9942 0.9948 0.9964 0.9958 0.9932 0.9986 0.9986 0.9988 0.9953 0.9983 1 0.9951 0.9983 0.9906 0.9981 0.9936 0.9951 0.9953 1.0005 0.9972 1 0.9969 1.0001 1.0001 1.0001 0.9934 0.9969 1.0001 0.9902 0.993 0.9914 0.9941 0.9967 0.9918 0.998 0.9967 0.9918 0.9957 0.9986 0.9958 0.9948 0.9918 0.9923 0.9998 0.9998 0.9914 0.9939 0.9966 0.995 0.9966 0.994 0.9972 0.9998 0.9998 0.9982 0.9924 0.9972 0.997 0.9954 0.9962 0.9972 0.9921 0.9905 0.9998 0.993 0.9941 0.9994 0.9962 0.992 0.9922 0.994 0.9897 0.9954 0.99 0.9948 0.9922 0.998 0.9944 0.9944 0.9986 0.9986 0.9986 0.9986 0.9986 0.996 0.9999 0.9986 0.9986 0.996 0.9951 0.9999 0.993 0.9982 0.992 0.9963 0.995 0.9956 0.997 0.9936 0.9935 0.9963 0.9967 0.9912 0.9981 0.9966 0.9967 0.9963 0.9935 0.9902 0.99 0.996 0.9966 0.9962 0.994 0.996 0.994 0.9944 0.9974 0.996 0.9922 0.9917 0.9918 0.9936 0.9938 0.9918 0.9939 0.9917 0.9981 0.9941 0.9928 0.9952 0.9898 0.9914 0.9981 0.9957 0.998 0.9957 0.9986 0.9983 0.9982 0.997 0.9947 0.997 0.9947 0.99416 0.99516 0.99496 0.9974 0.99579 0.9983 0.99471 0.9974 0.99644 0.99579 0.99699 0.99758 0.9977 0.99397 0.9983 0.99471 0.99243 0.9962 1.00182 0.99384 0.99582 0.9962 0.9924 0.99466 0.99212 0.99449 0.99748 0.99449 0.99748 0.99475 0.99189 0.99827 0.99752 0.99827 0.99479 0.99752 0.99642 1.00047 0.99382 0.99784 0.99486 0.99537 0.99382 0.99838 0.99566 0.99268 0.99566 0.99468 0.9933 0.99307 0.99907 0.99907 0.99907 0.99907 0.99471 0.99471 0.99907 0.99148 0.99383 0.99365 0.99272 0.99148 0.99235 0.99508 0.9946 0.99674 0.99018 0.99235 0.99084 0.99856 0.99591 0.9975 0.9944 0.99173 0.99378 0.99805 0.99534 0.99232 0.99805 0.99078 0.99534 0.99061 0.99182 0.9966 0.9912 0.99779 0.99814 0.99096 0.99379 0.99426 0.99228 0.99335 0.99595 0.99297 0.99687 0.99297 0.99687 0.99445 0.9986 0.99154 0.9981 0.98993 1.00241 0.99716 0.99437 0.9972 0.99756 0.99509 0.99572 0.99756 0.99175 0.99254 0.99509 0.99676 0.9979 0.99194 0.99077 0.99782 0.99942 0.99708 0.99353 0.99256 0.99199 0.9918 0.99354 0.99244 0.99831 0.99396 0.99724 0.99524 0.9927 0.99802 0.99512 0.99438 0.99679 0.99652 0.99698 0.99474 0.99511 0.99582 0.99125 0.99256 0.9911 0.99168 0.9911 0.99556 1.00098 0.99516 0.99516 0.99518 0.99347 0.9929 0.99347 0.99841 0.99362 0.99361 0.9914 0.99114 0.9925 0.99453 0.9938 0.9938 0.99806 0.9961 1.00016 0.9916 0.99116 0.99319 0.99517 0.99514 0.99566 0.99166 0.99587 0.99558 0.99117 0.99399 0.99741 0.99405 0.99622 1.00051 0.99803 0.99405 0.99773 0.99397 0.99622 0.99713 0.99274 1.00118 0.99176 0.9969 0.99771 0.99411 0.99771 0.99411 0.99194 0.99558 0.99194 0.99558 0.99577 0.99564 0.99578 0.99888 1.00014 0.99441 0.99594 0.99437 0.99594 0.9979 0.99434 0.99203 0.998 0.99316 0.998 0.99314 0.99316 0.99612 0.99295 0.99394 0.99642 0.99642 0.99248 0.99268 0.99954 0.99692 0.99592 0.99592 0.99692 0.99822 0.99822 0.99402 0.99404 0.99787 0.99347 0.99838 0.99839 0.99375 0.99155 0.9936 0.99434 0.9922 0.99571 0.99658 0.99076 0.99496 0.9937 0.99076 0.99542 0.99825 0.99289 0.99432 0.99523 0.99542 0.9959 0.99543 0.99662 0.99088 0.99088 0.99922 0.9966 0.99466 0.99922 0.99836 0.99836 0.99238 0.99645 1 1 0.99376 1 0.99513 0.99556 0.99556 0.99543 0.99886 0.99526 0.99166 0.99691 0.99732 0.99573 0.99656 0.99112 0.99214 0.99165 0.99004 0.99463 0.99683 0.99004 0.99596 0.99898 0.99114 0.99508 0.99306 0.99898 0.99508 0.99114 0.99342 0.99345 0.99772 0.99239 0.99502 0.99502 0.99479 0.99207 0.99497 0.99828 0.99542 0.99542 0.99228 0.99706 0.99497 0.99669 0.99828 0.99269 0.99196 0.99662 0.99475 0.99544 0.99944 0.99475 0.99544 0.9966 0.99066 0.9907 0.99066 0.998 0.9907 0.99066 0.99307 0.99106 0.99696 0.99106 0.99307 0.99167 0.99902 0.98992 0.99182 0.99556 0.99582 0.99182 0.98972 0.99352 0.9946 0.99273 0.99628 0.99582 0.99553 0.98914 0.99354 0.99976 0.99808 0.99808 0.99808 0.99808 0.99808 0.99808 0.9919 0.99808 0.99499 0.99655 0.99615 0.99296 0.99482 0.99079 0.99366 0.99434 0.98958 0.99434 0.99938 0.99059 0.99835 0.98958 0.99159 0.99159 0.98931 0.9938 0.99558 0.99563 0.98931 0.99691 0.9959 0.99159 0.99628 0.99076 0.99678 0.99678 0.99678 0.99089 0.99537 1.0002 0.99628 0.99089 0.99678 0.99076 0.99332 0.99316 0.99272 0.99636 0.99202 0.99148 0.99064 0.99884 0.99773 1.00013 0.98974 0.99773 1.00013 0.99112 0.99136 0.99132 0.99642 0.99488 0.99527 0.99578 0.99352 0.99199 0.99198 0.99756 0.99578 0.99561 0.99347 0.98936 0.99786 0.99705 0.9942 0.9948 0.99116 0.99688 0.98974 0.99542 0.99154 0.99118 0.99044 0.9914 0.9979 0.98892 0.99114 0.99188 0.99583 0.98892 0.98892 0.99704 0.9911 0.99334 0.99334 0.99094 0.99014 0.99304 0.99652 0.98944 0.99772 0.99367 0.99304 0.99183 0.99126 0.98944 0.99577 0.99772 0.99652 0.99428 0.99388 0.99208 0.99256 0.99388 0.9925 0.99904 0.99216 0.99208 0.99428 0.99165 0.99924 0.99924 0.99924 0.9956 0.99562 0.9972 0.99924 0.9958 0.99976 0.99976 0.99296 0.9957 0.9958 0.99579 0.99541 0.99976 0.99518 0.99168 0.99276 0.99085 0.99873 0.99172 0.99312 0.99276 0.9972 0.99278 0.99092 0.9962 0.99053 0.99858 0.9984 0.99335 0.99053 0.9949 0.9962 0.99092 0.99532 0.99727 0.99026 0.99668 0.99727 0.9952 0.99144 0.99144 0.99015 0.9914 0.99693 0.99035 0.99693 0.99035 0.99006 0.99126 0.98994 0.98985 0.9971 0.99882 0.99477 0.99478 0.99576 0.99578 0.99354 0.99244 0.99084 0.99612 0.99356 0.98952 0.99612 0.99084 0.99244 0.99955 0.99374 0.9892 0.99144 0.99352 0.99352 0.9935 0.99237 0.99144 0.99022 0.99032 1.03898 0.99587 0.99587 0.99587 0.99976 0.99354 0.99976 0.99552 0.99552 0.99587 0.99604 0.99584 0.98894 0.9963 0.993 0.98894 0.9963 0.99068 0.98964 0.99604 0.99584 0.9923 0.99437 0.993 0.99238 0.99801 0.99802 0.99566 0.99067 0.99066 0.9929 0.9934 0.99067 0.98912 0.99066 0.99228 0.98912 0.9958 0.99052 0.99312 0.9968 0.99502 0.99084 0.99573 0.99256 0.9959 0.99084 0.99084 0.99644 0.99526 0.9954 0.99095 0.99188 0.9909 0.99256 0.9959 0.99581 0.99132 0.98936 0.99136 0.99142 0.99232 0.99232 0.993 0.99311 0.99132 0.98993 0.99208 0.99776 0.99839 0.99574 0.99093 0.99156 0.99278 0.9924 0.98984 0.99035 0.9924 0.99165 0.9923 0.99278 0.99008 0.98964 0.99156 0.9909 0.98984 0.9889 0.99178 0.99076 0.9889 0.99046 0.98999 0.98946 0.98976 0.99046 0.99672 0.99482 0.98945 0.98883 0.99362 0.99075 0.99436 0.98988 0.99158 0.99265 0.99195 0.99168 0.9918 0.99313 0.9895 0.9932 0.99848 0.9909 0.99014 0.9952 0.99652 0.99848 0.99104 0.99772 0.9922 0.99076 0.99622 0.9902 0.99114 0.9938 0.99594 0.9902 0.99035 0.99032 0.99558 0.99622 0.99076 0.99413 0.99043 0.99043 0.98982 0.98934 0.9902 0.99449 0.99629 0.9948 0.98984 0.99326 0.99834 0.99555 0.98975 0.99216 0.99216 0.99834 0.9901 0.98975 0.99573 0.99326 0.99215 0.98993 0.99218 0.99555 0.99564 0.99564 0.99397 0.99576 0.99601 0.99564 0.99397 0.98713 0.99308 0.99308 0.99582 0.99494 0.9929 0.99471 0.9929 0.9929 0.99037 0.99304 0.99026 0.98986 0.99471 0.98951 0.99634 0.99368 0.99792 0.99026 0.99362 0.98919 0.99835 0.99835 0.99038 0.99104 0.99038 0.99286 0.99296 0.99835 0.9954 0.9914 0.99286 0.99604 0.99604 0.99119 0.99007 0.99507 0.99596 0.99011 0.99184 0.99469 0.99469 0.99406 0.99305 0.99096 0.98956 0.9921 0.99496 0.99406 0.99406 0.9888 0.98942 0.99082 0.98802 17.3 1.4 1.3 1.6 5.25 2.4 14.6 11.8 1.5 1.8 7.7 2 1.8 1.4 16.7 8.1 8 4.7 8.1 2.1 16.7 6.4 1.5 7.6 1.5 12.4 1.3 1.7 8.1 7.1 7.6 2.3 6.5 1.4 12.7 1.6 1.1 1.2 6.5 4.6 0.6 10.6 4.6 4.8 2.7 12.6 0.6 9.2 6.6 7 8.45 11.1 18.15 18.15 4.1 4.1 4.6 18.15 4.9 8.3 1.4 11.5 1.8 1.6 2.4 4.9 1.8 4.3 4.4 1.4 1.6 1.3 5.2 5.6 5.3 4.9 2.4 1.6 2.1 1.4 7.1 1.6 10.7 11.1 10.7 1.6 1.6 1.5 1.5 1.6 1.6 8 7.7 2.7 15.1 15.1 8.9 6 12.3 13.1 6.7 12.3 2.3 11.1 1.5 6.7 6 15.2 10.2 13.1 10.7 17.1 17.1 17.1 1.9 10.7 17.1 1.2 1.2 3.1 1.5 10.7 4.9 12.6 10.7 4.9 12.15 12 1.7 2.6 1.4 1.9 16.9 16.9 2.1 7 7.1 5.9 7.1 8.7 13.2 15.3 15.3 13.2 2.7 10.65 10 6.8 15.6 13.2 5.1 3 15.3 2.1 1.9 8.6 8.75 3.6 4.7 1.3 1.8 9.7 4 2.4 4.7 18.8 1.8 1.8 12.8 12.8 12.8 12.8 12.8 7.8 16.75 12.8 12.8 7.8 5.4 16.75 1.3 10.1 3.8 10.9 6.6 9.8 11.7 1.2 1.4 9.6 12.2 2.6 10.7 4.9 12.2 9.6 1.4 1.1 1 8.2 11.3 7.3 2.3 8.2 2.1 2 10 15.75 3.9 2 1.5 1.6 1.4 1.5 1.4 2 13.8 1.3 3.8 6.9 2.2 1.6 13.8 10.8 12.8 10.8 15.3 12.1 12 11.6 9.2 11.6 9.2 2.8 1.6 6.1 8.5 7.8 14.9 6.2 8.5 8.2 7.8 10.6 11.2 11.6 7.1 14.9 6.2 1.7 7.7 17.3 1.4 7.7 7.7 3.4 1.6 1.4 1.4 10.4 1.4 10.4 4.1 2.8 15.7 10.9 15.7 6.5 10.9 5.9 17.3 1.4 13.5 8.5 6.2 1.4 14.95 7.7 1.3 7.7 1.3 1.3 1.3 15.6 15.6 15.6 15.6 4.9 5 15.6 6.5 1.4 2.7 1.2 6.5 6.4 6.9 7.2 10.6 3.5 6.4 2.3 12.05 7 11.8 1.4 5 2.2 14.6 1.6 1.3 14.6 2.8 1.6 3.3 6.3 8.1 1.6 10.6 11.8 1.7 8.1 1.4 1.3 1.8 7.2 1.1 11.95 1.1 11.95 2.2 12.7 1.4 10.6 1.9 17.8 10.2 4.8 9.8 8.4 7.2 4.8 8.4 4.5 1.4 7.2 11 11.1 2.6 2 10.1 13.3 11.4 1.3 1.4 1.4 7 2 1.2 12.9 5 10.1 3.75 1.7 12.6 1.3 1.6 7.6 8.1 14.9 6 6 7.2 3 1.2 2 4.9 2 8.9 16.45 2 1.9 5.1 4.4 5.8 4.4 12.9 1.3 1.3 1.2 2.7 1.7 8.2 1.5 1.5 12.9 3.9 17.75 4.9 1.6 1.4 2 2 8.2 2.1 1.8 8.5 4.45 5.8 13 2.7 7.3 19.1 8.8 2.7 7.4 2.3 6.85 11.4 0.9 19.35 7.9 11.75 7.7 3 7.7 3 1.5 7.5 1.5 7.5 8.3 7.05 8.4 13.9 17.5 5.6 9.4 4.8 9.4 9.7 6.3 1.6 14.6 2.5 14.6 2.6 2.5 8.2 1.5 2.3 10 10 1.6 1.6 16 10.4 7.4 7.4 10.4 16.05 16.05 2.6 2.5 10.8 1.2 12.1 11.95 1.7 0.8 1.4 1.3 6.3 10.3 15.55 1.5 1.5 1.4 1.5 7.9 13 1 4.85 7.1 7.9 7.5 7.6 10.3 1.7 1.7 19.95 7.7 5.3 19.95 12.7 12.7 1.5 11.3 18.1 18.1 7 18.1 6.4 1.4 1.4 3.1 14.1 7.7 5.2 11.6 10.4 7.5 11.2 0.8 1.4 4.7 3.1 4 11.3 3.1 8.1 14.8 1.4 8.1 3.5 14.8 8.1 1.4 1.5 1.5 12.8 1.6 7.1 7.1 11.2 1.7 6.7 17.3 8.6 8.6 1.5 12.1 6.7 10.7 17.3 1.8 1.4 7.5 4.8 7.1 16.9 4.8 7.1 11.3 1.1 1.2 1.1 12.9 1.2 1.1 1.2 2.3 10 2.3 1.2 1.4 14.9 1.8 1.8 7 8.6 1.8 1.1 1.3 4.9 1.9 10.4 10 8.6 1.7 1.7 18.95 12.8 12.8 12.8 12.8 12.8 12.8 0.7 12.8 1.4 13.3 8.5 1.5 11.7 5 1.2 2.1 1.4 2.1 16 1.1 15.3 1.4 2.8 2.8 0.9 2.5 8.1 8.2 0.9 11.1 7.8 2.8 10.1 3.2 14.2 14.2 14.2 2.9 6 20.4 10.1 2.9 14.2 3.2 0.95 1.7 1.7 9 1.3 1.4 2.4 16 11.4 14.35 2.1 11.4 14.35 1.1 1.1 1.2 15.8 5.2 5.2 9.6 5.2 1.2 0.8 14.45 9.6 6.9 3.4 2.3 11 5.95 5.1 5.4 1.2 12.6 1 6.6 1.5 1 1.1 6.6 8.2 2 1.4 2 7.5 2 2 13.3 2.85 5.6 5.6 1 3.2 1 7.1 2.4 11.2 9.5 1 1.8 2.6 2.4 8 11.2 7.1 3.3 10.3 1.2 1.6 10.3 9.65 16.4 1.5 1.2 3.3 5 16.3 16.3 16.3 6.5 6.4 10.2 16.3 7.4 13.7 13.7 1.3 7.4 7.4 7.45 7.2 13.7 10.4 1.1 6.5 4.6 13.9 5.2 1.7 6.5 16.4 3.6 1.5 12.4 1.7 6.2 6.2 2.6 1.7 9.3 12.4 1.5 9.1 12 4.8 12.3 12 2.7 3.6 3.6 4.3 1.8 11.8 1.8 11.8 1.8 1.4 6.6 1.55 0.7 6.4 11.8 4.3 5.1 5.8 5.9 1.3 1.4 1.2 7.4 10.8 1.8 7.4 1.2 1.4 14.4 1.7 3.6 3.6 10.05 10.05 10.5 1.9 3.6 1.65 1.9 65.8 6.85 7.4 7.4 20.2 11 20.2 6.2 6.2 6.85 8 8.2 2.2 10.1 7.2 2.2 10.1 1.6 1.3 8 8.2 5.3 14 7.2 1.6 11.8 9.6 6.1 2.7 3.6 1.7 1.6 2.7 1 0.9 1.6 1 10.6 2 1.2 6.2 9.2 5 6.3 3.3 8 1.2 1.2 16.2 11.6 7.2 1.1 3.4 1.4 3.3 8 9.3 2.3 0.9 3.5 1.7 1.3 1.3 5.6 7.4 2.3 1 1.5 10 14.9 9.3 1 1 5.9 5 1.25 3.9 5 0.8 1 5.9 1.6 1.3 1 1.1 1.25 1.4 1.2 5 1.4 1.7 1.8 1.6 1.5 1.7 13.9 5.9 2.1 1.1 6.7 2.7 6.7 3.95 7.75 10.6 1.6 2.5 0.7 11.1 5.15 4.7 9.7 1.7 1.4 2 7.5 9.7 0.8 13.1 1.1 2.2 8.9 1.1 0.9 1.7 6.9 1.1 1 1 7.6 8.9 2.2 1.2 1 1 3.1 1.95 2.2 8.75 11.9 2.7 5.45 6.3 14.4 7.8 1.6 9.1 9.1 14.4 1.3 1.6 11.3 6.3 0.7 1.25 0.7 7.8 10.3 10.3 7.8 8.7 8.3 10.3 7.8 1.2 8.3 8.3 6.2 5 1.8 1.6 1.8 1.8 2.9 6 0.9 1.1 1.6 5.45 14.05 8 13.1 4.9 1.3 2.2 14.9 14.9 0.95 1.4 0.95 1.7 5.6 14.9 7.1 1.2 9.6 11.4 11.4 7.9 5 11.1 8 3.8 10.55 10.2 10.2 9.8 6.3 1.1 4.5 6.3 10.9 9.8 9.8 0.8 0.8 1.2 1.3 9.8 10.2 10.9 6.3 6.3 1.2 0.9 1.1 4.5 3.7 18.1 1.35 5.5 3.1 12.85 19.8 8.25 12.85 3.8 6.9 8.25 11.7 4.6 4 19.8 12.85 1.2 8.9 11.7 6.2 14.8 14.8 10.8 1.6 8.3 8.4 2.5 3.5 17.2 2.1 12.2 11.8 16.8 17.2 1.1 14.7 5.5 6.1 1.2 1.3 8.7 1.7 8.7 10.2 4.5 5.9 1.7 1.4 5.4 7.9 1.1 7 7 7.6 7 12.3 15.3 12.3 1.2 2.3 6.1 7.6 10.2 4.1 2.9 8.5 1.5 3.1 7.9 3.5 4.9 1.1 7 1.2 4.5 2.6 9.9 4.5 9.5 1.5 3.2 2.6 11.2 3.2 2.3 4.9 4.9 1.4 1.5 6.7 2.1 4.3 10.9 7 2.3 2.5 2.6 3.2 2.5 14.7 4.5 2.2 1.9 1.6 17.3 4.2 4.2 2.5 1.9 1.4 0.8 8 1.6 1.7 5.5 17.3 8.6 6.9 2.1 2.2 1.5 2.5 17.6 4.2 2.9 4.8 11.9 0.9 1.3 6.4 4.3 11.9 8.1 1.3 0.9 17.2 17.2 17.2 8.7 17.2 8.7 7.5 17.2 4.6 3.7 2.2 7.4 15.1 7.4 4.8 7.9 1 15.1 7.4 4.8 4.6 1.4 6.2 6.1 5.1 6.3 0.9 2.3 6.6 7.5 8.6 11.9 2.3 7.1 4.3 1.1 1 7.9 1 1 1 7.3 1.7 1.3 6.4 1.8 1.5 3.8 7.9 1 1.2 5.3 9.1 6.5 9.1 6.3 5.1 6.5 2.4 9.1 7.5 5 6.75 1.2 1.6 16.05 5 12.4 0.95 4.6 1.7 1 1.3 5 2.5 2.6 2.1 12.75 1.1 12.4 3.7 2.65 2.5 8.2 7.3 1.1 6.6 7 14.5 11.8 3 3.7 6 4.6 2.5 3.3 1 1.1 1.4 3.3 8.55 2.5 6.7 3.8 4.5 4.6 4.2 11.3 5.5 4.2 2.2 14.5 14.5 14.5 14.5 14.5 14.5 1.5 18.75 3.6 1.4 5.1 10.5 2 2.6 9.2 1.8 5.7 2.4 1.9 1.4 0.9 4.6 1.4 9.2 1.4 1.8 2.3 2.3 4.4 6.4 2.9 2.8 2.9 4.4 8.2 1 2.9 7 1.8 1.5 7 8.2 7.6 2.3 8.7 1 2.9 6.7 5 1.9 2 1.9 8.5 12.6 5.2 2.1 1.1 1.3 1.1 9.2 1.2 1.1 8.3 1.8 1.4 15.7 4.35 1.8 1.6 2 5 1.8 1.3 1 1.4 8.1 8.6 3.7 5.7 2.35 13.65 13.65 13.65 15.2 4.6 1.2 4.6 6.65 13.55 13.65 9.8 10.3 6.7 15.2 9.9 7.2 1.1 8.3 11.25 12.8 9.65 12.6 12.2 8.3 11.25 1.3 9.9 7.2 1.1 1.1 4.8 1.1 1.4 1.7 10.6 1.4 1.1 5.55 2.1 1.7 9 1.7 1.8 4.7 11.3 3.6 6.9 3.6 4.9 6.95 1.9 4.7 11.3 1.8 11.3 8.2 8.3 9.55 8.4 7.8 7.8 10.2 5.5 7.8 7.4 3.3 5 3.3 5 1.3 1.2 7.4 7.8 9.9 0.7 4.6 5.6 9.5 14.8 4.6 2.1 11.6 1.2 11.6 2.1 20.15 4.7 4.3 14.5 4.9 14.55 14.55 10.05 4.9 14.5 14.55 15.25 3.15 1.3 5.2 1.1 7.1 8.8 18.5 8.8 1.4 1.2 5 1.6 18.75 6 9.4 9.7 4.75 6 5.35 5.35 6.8 6.9 1.4 0.9 1.2 1.3 2.6 12 9.85 3.85 2 1.6 7.8 1.9 2 10.3 1.1 12 3.85 9.85 2 4 1.1 10.4 6.1 1.8 10.4 4.7 4 1.1 6.4 8.15 6.1 4.8 1.2 1.1 1.4 7.4 1.8 1 15.5 15.5 8.4 2.4 3.95 19.95 2 3 15.5 8.4 14.3 4.2 1.4 3 4.9 2.4 14.3 10.7 11 1.4 1.2 12.9 10.8 1.3 2 1.8 1.2 7.5 9.7 3.8 7.2 9.7 6.3 6.3 0.8 8.6 6.3 3.1 7.2 7.1 6.4 14.7 7.2 7.1 1.9 1.2 4.8 1.2 3.4 4.3 8.5 1.8 1.8 19.5 8.5 19.9 8.3 1.8 1.1 16.65 16.65 16.65 0.9 6.1 10.2 0.9 16.65 3.85 4.4 4.5 3.2 4.5 4.4 9.7 4.2 4.2 1.1 9.7 4.2 5.6 4.2 1.6 1.6 1.1 14.6 2.6 1.2 7.25 6.55 7 1.5 1.4 7.25 1 4.2 17.5 17.5 17.5 1.5 1.3 3.9 4.2 7.6 1 1.1 11.8 1.4 9.7 12.9 1.6 7.2 7.1 1.9 8.8 7.2 1.4 14.3 14.3 8.8 1.4 1.8 14.3 7.2 1.2 11.8 0.9 12.6 26.05 4.7 12.6 1.2 26.05 6.1 11.8 0.9 5.6 5.3 5.7 8 8 17.6 8 8.8 1.5 1.4 4.8 2.4 3.7 4.9 5.7 5.7 4.9 2 5.1 4.5 3.2 6.65 1.6 4 17.75 1.4 17.75 7.2 5.7 8.5 11.4 5.4 2.7 4.3 1.2 1.8 1.3 5.7 2.7 11.7 4.3 11 1.6 11.6 6.2 1.8 1.2 1 2.4 1.2 8.2 18.8 9.6 12.9 9.2 1.2 12.9 8 12.9 1.6 12 2.5 9.2 4.4 8.8 9.6 8 18.8 1.3 1.2 12.9 1.2 1.6 1.5 18.15 13.1 13.1 13.1 13.1 1 1.6 11.8 1.4 1 13.1 10.6 10.4 1.1 7.4 1.2 3.4 18.15 8 2.5 2 2 6.9 1.2 9.4 2.9 6.9 5.4 1.3 20.8 10.3 1.3 1.6 13.1 1.8 8 1.6 1.4 14.7 14.7 14.7 14.7 14.7 14.7 14.7 1.8 10.6 12.5 6.8 14.7 2.9 1.4 1.4 2.1 7.4 2.9 1.4 1.4 7.4 5 2.5 6.1 2.7 2.1 12.9 12.9 12.9 13.7 12.9 2.4 9.8 13.7 1.3 12.1 6.1 7.7 6.1 1.4 7.7 12.1 6.8 9.2 8.3 17.4 2.7 12.8 8.2 8.1 8.2 8.3 8 11.8 12 1.7 17.4 13.9 10.7 2 2.2 1.3 1.1 2 6.4 1.3 1.1 10.7 6.4 6.3 6.4 15.1 2 2 2.2 12.1 8.8 8.8 5.1 6.8 6.8 3.7 12.2 5.7 8.1 2.5 4 6.8 1 5.1 5.8 10.6 3.5 3.5 16.4 4.8 3.3 1.2 1.2 4.8 3.3 2.5 8.7 1.6 4 2.5 16.2 9 16.2 1.4 7 9 3.1 1.5 4.6 4.8 4.6 1.5 2.7 6.3 7.2 7.2 12.4 6.6 6.6 4 4.8 1.3 7.2 11.1 12.4 9.8 6.6 13.3 11.7 8 1.6 16.55 1.5 10.2 6.6 17.8 17.8 1.5 7.4 17.8 2 7.4 2 17.8 12.1 8.2 1.5 8.7 3.5 6.4 2.1 7.7 12.3 1.3 8.7 3.5 1.1 2.8 3.5 1.9 3.8 3.8 2.4 4.8 4.8 6.2 1.3 3.8 1.5 4.8 1.9 6.2 7.9 1.6 1.4 2.6 14.8 2.4 0.9 0.9 1.2 9.9 3.9 15.6 15.6 1.5 1.6 7.8 5.6 1.3 16.7 7.95 6.7 1.1 6.3 8.9 1 1.5 6.6 6.2 6.3 2.1 2.2 5.4 8.9 1 17.9 2.6 1.3 17.9 2.6 2.3 4.3 7.1 7.1 11.9 11.7 5.8 3.8 12.4 6.5 7.1 7.6 7.9 2.8 10.6 2.8 1.5 7.6 7.9 1.7 7.6 7.5 1.7 1.7 12.1 4.5 1.7 8 7.6 8.6 8.6 14.6 1.6 8.6 14.6 1.1 3.7 8.9 8.9 4.7 8.9 3.1 5.8 5.8 5.8 1 15.8 1.5 5.2 1.5 2.5 1 15.8 5.9 3.1 3.1 5.8 11.5 18 4.8 8.5 1.6 18 4.8 5.9 1.1 8.5 13.1 4.1 2.9 13.1 1.1 1.5 7.75 1.15 1 17.8 5.7 17.8 7.4 1.4 1.4 1 4.4 1.6 7.9 15.5 15.5 15.5 15.5 17.55 13.5 13.5 1.3 15.5 11.6 7.9 15.5 17.55 11.6 13.15 1.9 13.5 1.3 6.1 6.1 1.9 1.9 1.6 11.3 8.4 8.3 8.4 12.2 8 1.3 12.7 1.3 10.5 12.5 9.6 1.5 1.5 7.8 10.8 12.5 8.6 1.2 14.5 3.7 1.1 1.1 3.8 4.6 10.2 7.9 2.4 10.7 4.9 10.7 1.1 7.9 5.6 2.4 14.2 9.5 9.5 4.1 4.7 1.4 0.9 20.3 3.5 2.7 1.2 1.2 2 1.1 1.5 1.2 18.1 18.1 3.6 3.5 12.1 17.45 12.1 3 1.6 5.7 5.6 6.8 15.6 6 1.8 8.6 8.6 11.5 7.8 2.4 5 8.6 1.5 5.4 11.9 11.9 9 10 11.9 11.9 15.5 5.4 15 1.4 9.4 3.7 15 1.4 6.5 1.4 6.3 13.7 13.7 13.7 13.7 13.7 13.7 1.5 1.6 1.4 3.5 1 1.4 1.5 13.7 1.6 5.2 1.4 11.9 2.4 3.2 1.7 4.2 15.4 13 5.6 9.7 2.5 4 15.4 1.2 2 1.2 5.1 1.4 1.2 6.5 1.3 6.5 2.7 1.3 7.4 12.9 1.3 1.2 2.6 2.3 1.3 10.5 2.6 14.4 1.2 3.1 1.7 6 11.8 6.2 1.4 12.1 12.1 12.1 3.9 4.6 12.1 1.2 8.1 3.9 1.1 6.5 10.1 10.7 3.2 12.4 5.2 5 2.5 9.2 6.9 2 15 15 1.2 15 1.8 10.8 3.9 4.2 2 13.5 13.3 2.2 1.4 1.6 2.2 14.8 1.8 14.8 1.3 9.9 5.1 5.1 1.5 1.5 11.1 5.25 2.3 7.9 8 1.4 5.25 2.3 2.3 3.5 13.7 9.9 15.4 16 16 16 16 2.4 5.5 2.3 16.8 16 17.8 17.8 6.8 6.8 6.8 6.8 1.6 4.7 11.8 17.8 15.7 5.8 15.7 9 15.7 5.8 8.8 10.2 6.6 6.5 8.9 11.1 4.2 1.6 7.4 11.5 1.6 2 4.8 9.8 1.9 4.2 1.6 7.3 5.4 10.4 1.9 7.3 5.4 7.7 11.5 1.2 2.2 1 8.2 8.3 8.2 9.3 8.1 8.2 8.3 13.9 13.9 13.9 13.9 13.9 13.9 13.9 2 13.9 15.7 1.2 1.5 1.2 3.2 1.2 2.6 13.2 10.4 5.7 2.5 1.6 1.4 7.4 2.5 5.6 3.6 7.5 5.8 1.6 1.5 2.9 11.2 9.65 10.1 3.2 11.2 11.45 9.65 4.5 2.7 3.5 1.7 2.1 4.8 5 2.6 6.6 5 7.3 5 1.7 2.6 8.2 8.2 5 1.2 7.1 9.5 15.8 15.5 15.8 17.05 12.7 12.3 11.8 11.8 11.8 12.3 11.8 13.6 5.2 6.2 7.9 7.9 3.3 2.8 7.9 3.3 6.3 4.9 10.4 4.9 10.4 16 6.3 2.2 17.3 17.3 17.3 17.3 2.2 2.2 17.3 6.6 6.5 12.3 5 2.8 13.6 2.8 5.4 10.9 1.7 9.15 4.5 9.15 1.4 5.9 16.4 1.2 16.4 5.9 7.8 7.8 2.8 2.9 2.5 12.8 12.2 7.7 2.8 2.9 17.3 19.3 19.3 19.3 2.7 6.4 17.3 2.4 2.8 1.7 15.4 15.4 4.1 6.6 1.2 2.1 1 1.1 1.4 1.6 9.8 1.9 1.3 7.9 7.9 4.5 22.6 7.9 3.5 1.2 4.5 2 7.8 0.9 2.9 2.9 3.5 4.2 9.7 10.5 1.1 16.1 1.1 8.1 6.2 7.7 2.4 16.3 2.3 8.4 8.5 6 1.1 1.75 2.6 1.3 2.1 1.1 1.1 2.8 9 2.8 2.2 5.1 3.5 12.7 7.5 2 3.5 14.3 9.8 12.7 12.7 5.1 3.5 12.7 12.9 12.9 1.3 10.5 1.5 12.7 12.9 1.2 6.2 8.8 3.9 1.3 9.1 9.1 3.9 1.8 2.1 1.4 14.7 9.1 1.9 1.8 9.6 3.9 1.3 11.8 1.9 12 7.9 9.3 4.6 2.2 10.2 10.6 1.4 9.1 11.1 9.1 4.4 2.8 1.1 1.3 1.2 3.3 9.7 2.3 1.1 11.4 1.2 14.7 13.8 1.3 6.3 7.9 2 11.8 1.2 10 5.2 1.2 7.2 9.9 5.3 13.55 2.2 9.9 4.3 13 13.55 1 1.1 6.9 13.4 4.6 9.9 3 5.8 12.9 3.2 0.8 2.5 2.4 7.2 7.3 6.3 4.25 1.2 2 4.25 4.7 4.5 1.4 4.1 5.3 4.2 6.65 8.2 2.6 2.6 2 12.2 2.3 8.2 5 10.7 10.8 1.7 1.3 1.7 12.7 1.3 1.2 1.3 5.7 3.4 1.1 1 1 1.65 6.8 6.8 4.9 1.4 2.5 10.8 10.8 10.8 10.8 2.8 1.3 2 1.1 8.2 6 6.1 8.2 8.8 6.1 6 1.2 11.4 1.3 1.3 6.2 3.2 4.5 9.9 6.2 11.4 1.3 1.3 0.9 0.7 1 1 10.4 1.3 12.5 12.5 12.5 12.5 19.25 1.1 12.5 19.25 9 1.2 9 1.3 12.8 12.8 7.6 7.6 1.4 8.3 9 1.85 12.55 1.4 1.8 4 12.55 9 3 1.85 7.9 2.6 1.2 7.1 7.9 1.3 10.7 7.7 8.4 10.7 12.7 1.8 7.7 10.5 1.6 1.85 10.5 10.5 1 1.2 1.7 1.6 9 1.9 1.2 1.5 3.9 3.6 1.2 5 2.9 10.4 11.4 18.35 18.4 1.2 7.1 1.3 1.5 10.2 2.2 3.5 3.5 3.9 7.4 7.4 11 1.5 3.9 5.4 1.5 5 1.2 13 13 13 13 8.6 1.7 1.2 1.2 1.2 2 19.4 0.8 6.3 6.4 12.1 12.1 12.9 2.4 4.3 4.2 12.9 1.7 2.2 12.1 3.4 7.4 7.3 1.1 1.1 1.4 14.5 8 1.1 1.1 2.2 5.8 0.9 6.4 10.9 7.3 8.3 1.3 3.3 1 1.1 1 5.1 3.2 12.6 3.7 1.7 5.1 1 1.3 1.5 4.6 10.3 6.1 6.1 1.2 10.3 9.9 1.6 1.1 1.5 1.2 1.5 1.1 11.5 7.8 7.4 1.45 8.9 1.1 1 2.5 1.1 2.4 2.3 5.1 2.5 8.9 2.5 8.9 1.6 1.4 3.9 13.7 13.7 9.2 7.8 7.6 7.7 3 1.3 4 1.1 2 1.9 1.4 4.5 10.1 6.6 1.9 12.4 1.6 2.5 1.2 2.5 0.8 0.9 8.1 8.1 11.75 1.3 1.9 8.3 8.1 5.7 1.9 1.2 11.75 2.2 0.9 1.3 1.6 8 1.2 1.1 0.8 \ No newline at end of file +1.001 0.994 0.9951 0.9956 0.9956 0.9951 0.9949 1.001 0.994 0.9938 0.9908 0.9947 0.992 0.9912 1.0002 0.9914 0.9928 0.9892 0.9917 0.9955 0.9892 0.9912 0.993 0.9937 0.9951 0.9955 0.993 0.9961 0.9914 0.9906 0.9974 0.9934 0.992 0.9939 0.9962 0.9905 0.9934 0.9906 0.9999 0.9999 0.9937 0.9937 0.9954 0.9934 0.9934 0.9931 0.994 0.9939 0.9954 0.995 0.9917 0.9914 0.991 0.9911 0.993 0.9908 0.9962 0.9972 0.9931 0.9926 0.9951 0.9972 0.991 0.9931 0.9927 0.9934 0.9903 0.992 0.9926 0.9962 0.9956 0.9958 0.9964 0.9941 0.9926 0.9962 0.9898 0.9912 0.9961 0.9949 0.9929 0.9985 0.9946 0.9966 0.9974 0.9975 0.9974 0.9972 0.9974 0.9975 0.9974 0.9957 0.99 0.9899 0.9916 0.9969 0.9979 0.9913 0.9956 0.9979 0.9975 0.9962 0.997 1 0.9975 0.9974 0.9962 0.999 0.999 0.9927 0.9959 1 0.9982 0.9968 0.9968 0.994 0.9914 0.9911 0.9982 0.9982 0.9934 0.9984 0.9952 0.9952 0.9928 0.9912 0.994 0.9958 0.9924 0.9924 0.994 0.9958 0.9979 0.9982 0.9961 0.9979 0.992 0.9975 0.9917 0.9923 0.9927 0.9975 0.992 0.9947 0.9921 0.9905 0.9918 0.9951 0.9917 0.994 0.9934 0.9968 0.994 0.9919 0.9966 0.9979 0.9979 0.9898 0.9894 0.9894 0.9898 0.998 0.9932 0.9979 0.997 0.9972 0.9974 0.9896 0.9968 0.9958 0.9906 0.9917 0.9902 0.9918 0.999 0.9927 0.991 0.9972 0.9931 0.995 0.9951 0.9936 1.001 0.9979 0.997 0.9972 0.9954 0.9924 0.9906 0.9962 0.9962 1.001 0.9928 0.9942 0.9942 0.9942 0.9942 0.9961 0.998 0.9961 0.9984 0.998 0.9973 0.9949 0.9924 0.9972 0.9958 0.9968 0.9938 0.993 0.994 0.9918 0.9958 0.9944 0.9912 0.9961 0.9939 0.9961 0.9989 0.9938 0.9939 0.9971 0.9912 0.9936 0.9929 0.9998 0.9938 0.9969 0.9938 0.9998 0.9972 0.9976 0.9976 0.9979 0.9979 0.9979 0.9979 0.9972 0.9918 0.9982 0.9985 0.9944 0.9903 0.9934 0.9975 0.9923 0.99 0.9905 0.9905 0.996 0.9964 0.998 0.9975 0.9913 0.9932 0.9935 0.9927 0.9927 0.9912 0.9904 0.9939 0.9996 0.9944 0.9977 0.9912 0.9996 0.9965 0.9944 0.9945 0.9944 0.9965 0.9944 0.9972 0.9949 0.9966 0.9954 0.9954 0.9915 0.9919 0.9916 0.99 0.9909 0.9938 0.9982 0.9988 0.9961 0.9978 0.9979 0.9979 0.9979 0.9979 0.9945 1 0.9957 0.9968 0.9934 0.9976 0.9932 0.997 0.9923 0.9914 0.992 0.9914 0.9914 0.9949 0.9949 0.995 0.995 0.9927 0.9928 0.9917 0.9918 0.9954 0.9941 0.9941 0.9934 0.9927 0.9938 0.9933 0.9934 0.9927 0.9938 0.9927 0.9946 0.993 0.9946 0.9976 0.9944 0.9978 0.992 0.9912 0.9927 0.9906 0.9954 0.9923 0.9906 0.991 0.9972 0.9945 0.9934 0.9964 0.9948 0.9962 0.9931 0.993 0.9942 0.9906 0.9995 0.998 0.997 0.9914 0.992 0.9924 0.992 0.9937 0.9978 0.9978 0.9927 0.994 0.9935 0.9968 0.9941 0.9942 0.9978 0.9923 0.9912 0.9923 0.9927 0.9931 0.9941 0.9927 0.9931 0.9934 0.9936 0.9893 0.9893 0.9919 0.9924 0.9927 0.9919 0.9924 0.9975 0.9969 0.9936 0.991 0.9893 0.9906 0.9941 0.995 0.9983 0.9983 0.9916 0.9957 0.99 0.9976 0.992 0.9917 0.9917 0.9993 0.9908 0.9917 0.9976 0.9934 1 0.9918 0.992 0.9896 0.9932 0.992 0.9917 0.9999 0.998 0.9918 0.9918 0.9999 0.998 0.9927 0.9959 0.9927 0.9929 0.9898 0.9954 0.9954 0.9954 0.9954 0.9954 0.9954 0.9974 0.9936 0.9978 0.9974 0.9927 0.9934 0.9938 0.9922 0.992 0.9935 0.9906 0.9934 0.9934 0.9913 0.9938 0.9898 0.9975 0.9975 0.9937 0.9914 0.9982 0.9982 0.9929 0.9971 0.9921 0.9931 0.9924 0.9929 0.9982 0.9892 0.9956 0.9924 0.9971 0.9956 0.9982 0.9973 0.9932 0.9976 0.9962 0.9956 0.9932 0.9976 0.9992 0.9983 0.9937 0.99 0.9944 0.9938 0.9965 0.9893 0.9927 0.994 0.9928 0.9964 0.9917 0.9972 0.9964 0.9954 0.993 0.9928 0.9916 0.9936 0.9962 0.9899 0.9898 0.996 0.9907 0.994 0.9913 0.9976 0.9904 0.992 0.9976 0.999 0.9975 0.9937 0.9937 0.998 0.998 0.9944 0.9938 0.9907 0.9938 0.9921 0.9908 0.9931 0.9915 0.9952 0.9926 0.9934 0.992 0.9918 0.9942 0.9942 0.9942 0.9901 0.9898 0.9902 0.9934 0.9906 0.9898 0.9896 0.9922 0.9947 0.9945 0.9976 0.9976 0.9976 0.9987 0.9987 0.9976 0.992 0.9955 0.9953 0.9976 0.992 0.9952 0.9983 0.9933 0.9958 0.9922 0.9928 0.9976 0.9976 0.9916 0.9901 0.9976 0.9901 0.9916 0.9982 0.993 0.9969 0.991 0.9953 0.9924 0.9969 0.9928 0.9945 0.9967 0.9944 0.9928 0.9929 0.9948 0.9976 0.9912 0.9987 0.99 0.991 0.9933 0.9933 0.9899 0.9912 0.9912 0.9976 0.994 0.9947 0.9954 0.993 0.9954 0.9963 0.992 0.9926 0.995 0.9983 0.992 0.9968 0.9905 0.9904 0.9926 0.9968 0.9928 0.9949 0.9909 0.9937 0.9914 0.9905 0.9904 0.9924 0.9924 0.9965 0.9965 0.9993 0.9965 0.9908 0.992 0.9978 0.9978 0.9978 0.9978 0.9912 0.9928 0.9928 0.993 0.9993 0.9965 0.9937 0.9913 0.9934 0.9952 0.9983 0.9957 0.9957 0.9916 0.9999 0.9999 0.9936 0.9972 0.9933 0.9934 0.9931 0.9976 0.9937 0.9937 0.991 0.9979 0.9971 0.9969 0.9968 0.9961 0.993 0.9973 0.9944 0.9986 0.9986 0.9986 0.9986 0.9972 0.9917 0.992 0.9932 0.9936 0.9915 0.9922 0.9934 0.9952 0.9972 0.9934 0.9958 0.9944 0.9908 0.9958 0.9925 0.9966 0.9972 0.9912 0.995 0.9928 0.9968 0.9955 0.9981 0.991 0.991 0.991 0.992 0.9931 0.997 0.9948 0.9923 0.9976 0.9938 0.9984 0.9972 0.9922 0.9935 0.9944 0.9942 0.9944 0.9997 0.9977 0.9912 0.9982 0.9982 0.9983 0.998 0.9894 0.9927 0.9917 0.9904 0.993 0.9941 0.9943 0.99855 0.99345 0.998 0.9916 0.9916 0.99475 0.99325 0.9933 0.9969 1.0002 0.9933 0.9937 0.99685 0.99455 0.9917 0.99035 0.9914 0.99225 0.99155 0.9954 0.99455 0.9924 0.99695 0.99655 0.9934 0.998 0.9971 0.9948 0.998 0.9971 0.99215 0.9948 0.9915 0.99115 0.9932 0.9977 0.99535 0.99165 0.9953 0.9928 0.9958 0.9928 0.9928 0.9964 0.9987 0.9953 0.9932 0.9907 0.99755 0.99935 0.9932 0.9932 0.9958 0.99585 1.00055 0.9985 0.99505 0.992 0.9988 0.99175 0.9962 0.9962 0.9942 0.9927 0.9927 0.99985 0.997 0.9918 0.99215 0.99865 0.9992 1.0006 0.99135 0.99715 0.9992 1.0006 0.99865 0.99815 0.99815 0.99815 0.9949 0.99815 0.99815 0.99225 0.99445 0.99225 0.99335 0.99625 0.9971 0.9983 0.99445 0.99085 0.9977 0.9953 0.99775 0.99795 0.99505 0.9977 0.9975 0.99745 0.9976 0.99775 0.9953 0.9932 0.99405 1 0.99785 0.9939 0.9939 0.99675 0.9939 0.99675 0.98965 0.9971 0.99445 0.9945 0.9939 0.9958 0.9956 0.99055 0.9959 0.9925 0.9963 0.9935 0.99105 0.99045 0.9963 0.99155 0.99085 0.99085 0.99085 0.9924 0.9924 0.99975 0.99975 0.99315 0.9917 0.9917 0.99845 0.9921 0.99975 0.9909 0.99315 0.99855 0.9934 0.9978 0.9934 0.9949 0.99855 0.9986 0.99725 0.9946 0.99255 0.9996 0.9939 0.99 0.9937 0.9886 0.9934 1 0.9994 0.9926 0.9956 0.9978 0.9915 0.9939 0.9932 0.993 0.9898 0.9921 0.9932 0.9919 0.993 0.9953 0.9928 0.9928 0.9976 0.9906 0.9918 0.99185 0.9918 0.99185 0.994 0.9908 0.9928 0.9896 0.9908 0.9918 0.9952 0.9923 0.9915 0.9952 0.9947 0.9983 0.9975 0.995 0.9944 0.994 0.9944 0.9908 0.99795 0.9985 0.99425 0.99425 0.9943 0.9924 0.9946 0.9924 0.995 0.9919 0.99 0.9923 0.9956 0.9978 0.9978 0.9967 0.9934 0.9936 0.9932 0.9934 0.998 0.9978 0.9929 0.9974 0.99685 0.99495 0.99745 0.99505 0.992 0.9978 0.9956 0.9982 0.99485 0.9971 0.99265 0.9904 0.9965 0.9946 0.99965 0.9935 0.996 0.9942 0.9936 0.9965 0.9928 0.9928 0.9965 0.9936 0.9938 0.9926 0.9926 0.9983 0.9983 0.992 0.9983 0.9923 0.9972 0.9928 0.9928 0.9994 0.991 0.9906 0.9894 0.9898 0.9994 0.991 0.9925 0.9956 0.9946 0.9966 0.9951 0.9927 0.9927 0.9951 0.9894 0.9907 0.9925 0.9928 0.9941 0.9941 0.9925 0.9935 0.9932 0.9944 0.9972 0.994 0.9956 0.9927 0.9924 0.9966 0.9997 0.9936 0.9936 0.9952 0.9952 0.9928 0.9911 0.993 0.9911 0.9932 0.993 0.993 0.9932 0.9932 0.9943 0.9968 0.9994 0.9926 0.9968 0.9932 0.9916 0.9946 0.9925 0.9925 0.9935 0.9962 0.9928 0.993 0.993 0.9956 0.9941 0.9972 0.9948 0.9955 0.9972 0.9972 0.9983 0.9942 0.9936 0.9956 0.9953 0.9918 0.995 0.992 0.9952 1.001 0.9924 0.9932 0.9937 0.9918 0.9934 0.991 0.9962 0.9932 0.9908 0.9962 0.9918 0.9941 0.9931 0.9981 0.9931 0.9944 0.992 0.9966 0.9956 0.9956 0.9949 1.0002 0.9942 0.9923 0.9917 0.9931 0.992 1.0002 0.9953 0.9951 0.9974 0.9904 0.9974 0.9944 1.0004 0.9952 0.9956 0.995 0.995 0.9995 0.9942 0.9977 0.992 0.992 0.9995 0.9934 1.0006 0.9982 0.9928 0.9945 0.9963 0.9906 0.9956 0.9942 0.9962 0.9894 0.995 0.9908 0.9914 0.9938 0.9977 0.9922 0.992 0.9903 0.9893 0.9952 0.9903 0.9912 0.9983 0.9937 0.9932 0.9928 0.9922 0.9976 0.9922 0.9974 0.998 0.9931 0.9911 0.9944 0.9937 0.9974 0.989 0.992 0.9928 0.9918 0.9936 0.9944 0.9988 0.994 0.9953 0.9986 0.9914 0.9934 0.996 0.9937 0.9921 0.998 0.996 0.9933 0.9933 0.9959 0.9936 0.9953 0.9938 0.9952 0.9959 0.9959 0.9937 0.992 0.9967 0.9944 0.9998 0.9998 0.9942 0.9998 0.9945 0.9998 0.9946 0.9942 0.9928 0.9946 0.9927 0.9938 0.9918 0.9945 0.9966 0.9954 0.9913 0.9931 0.9986 0.9965 0.9984 0.9952 0.9956 0.9949 0.9954 0.996 0.9931 0.992 0.9912 0.9978 0.9938 0.9914 0.9932 0.9944 0.9913 0.9948 0.998 0.9998 0.9964 0.9992 0.9948 0.9998 0.998 0.9939 0.992 0.9922 0.9955 0.9917 0.9917 0.9954 0.9986 0.9955 0.9917 0.9907 0.9922 0.9958 0.993 0.9917 0.9926 0.9959 0.9906 0.9993 0.993 0.9906 0.992 0.992 0.994 0.9959 0.9908 0.9902 0.9908 0.9943 0.9921 0.9911 0.9986 0.992 0.992 0.9943 0.9937 0.993 0.9902 0.9928 0.9896 0.998 0.9954 0.9938 0.9918 0.9896 0.9944 0.9999 0.9953 0.992 0.9925 0.9981 0.9952 0.9927 0.9927 0.9911 0.9936 0.9959 0.9946 0.9948 0.9955 0.9951 0.9952 0.9946 0.9946 0.9944 0.9938 0.9963 0.991 1.0003 0.9966 0.9993 1.0003 0.9938 0.9965 0.9938 0.9993 0.9938 1.0003 0.9966 0.9942 0.9928 0.991 0.9911 0.9977 0.9927 0.9911 0.991 0.9912 0.9907 0.9902 0.992 0.994 0.9966 0.993 0.993 0.993 0.9966 0.9942 0.9925 0.9925 0.9928 0.995 0.9939 0.9958 0.9952 1 0.9948 0.99 0.9958 0.9948 0.9949 0.997 0.9927 0.9938 0.9949 0.9953 0.997 0.9932 0.9927 0.9932 0.9955 0.9914 0.991 0.992 0.9924 0.9927 0.9911 0.9958 0.9928 0.9902 0.994 0.994 0.9972 1.0004 0.991 0.9918 0.995 0.9941 0.9956 0.9956 0.9959 0.9922 0.9931 0.9959 0.9984 0.9908 0.991 0.9928 0.9936 0.9941 0.9924 0.9917 0.9906 0.995 0.9956 0.9955 0.9907 1 0.9953 0.9911 0.9922 0.9951 0.9948 0.9906 0.994 0.9907 0.9927 0.9914 0.9958 1 0.9984 0.9941 0.9944 0.998 0.998 0.9902 0.9911 0.9929 0.993 0.9918 0.992 0.9932 0.992 0.994 0.9923 0.993 0.9956 0.9907 0.99 0.9918 0.9926 0.995 0.99 0.99 0.9946 0.9907 0.9898 0.9918 0.9986 0.9986 0.9928 0.9986 0.9979 0.994 0.9937 0.9938 0.9942 0.9944 0.993 0.9986 0.9932 0.9934 0.9928 0.9925 0.9944 0.9909 0.9932 0.9934 1.0001 0.992 0.9916 0.998 0.9919 0.9925 0.9977 0.9944 0.991 0.99 0.9917 0.9923 0.9928 0.9923 0.9928 0.9902 0.9893 0.9917 0.9982 1.0005 0.9923 0.9951 0.9956 0.998 0.9928 0.9938 0.9914 0.9955 0.9924 0.9911 0.9917 0.9917 0.9932 0.9955 0.9929 0.9955 0.9958 1.0012 0.9968 0.9911 0.9924 0.991 0.9946 0.9928 0.9946 0.9917 0.9918 0.9926 0.9931 0.9932 0.9903 0.9928 0.9929 0.9958 0.9955 0.9911 0.9938 0.9942 0.9945 0.9962 0.992 0.9927 0.9948 0.9945 0.9942 0.9952 0.9942 0.9958 0.9918 0.9932 1.0004 0.9972 0.9998 0.9918 0.9918 0.9964 0.9936 0.9931 0.9938 0.9934 0.99 0.9914 0.9904 0.994 0.9938 0.9933 0.9909 0.9942 0.9945 0.9954 0.996 0.9991 0.993 0.9942 0.9934 0.9939 0.9937 0.994 0.9926 0.9951 0.9952 0.9935 0.9938 0.9939 0.9933 0.9927 0.998 0.9997 0.9981 0.992 0.9954 0.992 0.9997 0.9981 0.9943 0.9941 0.9936 0.9996 0.9932 0.9926 0.9936 0.992 0.9936 0.9996 0.993 0.9924 0.9928 0.9926 0.9952 0.9945 0.9945 0.9903 0.9932 0.9953 0.9936 0.9912 0.9962 0.9965 0.9932 0.9967 0.9953 0.9963 0.992 0.991 0.9958 0.99 0.991 0.9958 0.9938 0.9996 0.9946 0.9974 0.9945 0.9946 0.9974 0.9957 0.9931 0.9947 0.9953 0.9931 0.9946 0.9978 0.9989 1.0004 0.9938 0.9934 0.9978 0.9956 0.9982 0.9948 0.9956 0.9982 0.9926 0.991 0.9945 0.9916 0.9953 0.9938 0.9956 0.9906 0.9956 0.9932 0.9914 0.9938 0.996 0.9906 0.98815 0.9942 0.9903 0.9906 0.9935 1.0024 0.9968 0.9906 0.9941 0.9919 0.9928 0.9958 0.9932 0.9957 0.9937 0.9982 0.9928 0.9919 0.9956 0.9957 0.9954 0.993 0.9954 0.9987 0.9956 0.9928 0.9951 0.993 0.9928 0.9926 0.9938 1.0001 0.9933 0.9952 0.9934 0.9988 0.993 0.9952 0.9948 0.9998 0.9971 0.9998 0.9962 0.9948 0.99 0.9942 0.9965 0.9912 0.9978 0.9928 1.0103 0.9956 0.9936 0.9929 0.9966 0.9964 0.996 0.9959 0.9954 0.9914 1.0103 1.0004 0.9911 0.9938 0.9927 0.9922 0.9924 0.9963 0.9936 0.9951 0.9951 0.9955 0.9961 0.9936 0.992 0.9944 0.9944 1.0008 0.9962 0.9986 0.9986 1 0.9986 0.9982 1 0.9949 0.9915 0.9951 0.9986 0.9927 0.9955 0.9952 0.9928 0.9982 0.9914 0.9927 0.9918 0.9944 0.9969 0.9955 0.9954 0.9955 0.9921 0.9934 0.9998 0.9946 0.9984 0.9924 0.9939 0.995 0.9957 0.9953 0.9912 0.9939 0.9921 0.9954 0.9933 0.9941 0.995 0.9977 0.9912 0.9945 0.9952 0.9924 0.9986 0.9953 0.9939 0.9929 0.9988 0.9906 0.9914 0.9978 0.9928 0.9948 0.9978 0.9946 0.9908 0.9954 0.9906 0.99705 0.9982 0.9932 0.9977 0.994 0.9982 0.9929 0.9924 0.9966 0.9921 0.9967 0.9934 0.9914 0.99705 0.9961 0.9967 0.9926 0.99605 0.99435 0.9948 0.9916 0.997 0.9961 0.9967 0.9961 0.9955 0.9922 0.9918 0.9955 0.9941 0.9955 0.9955 0.9924 0.9973 0.999 0.9941 0.9922 0.9922 0.9953 0.9945 0.9945 0.9957 0.9932 0.9945 0.9913 0.9909 0.9939 0.991 0.9954 0.9943 0.993 1.0002 0.9946 0.9953 0.9918 0.9936 0.9984 0.9956 0.9966 0.9942 0.9984 0.9956 0.9966 0.9974 0.9944 1.0008 0.9974 1.0008 0.9928 0.9944 0.9908 0.9917 0.9911 0.9912 0.9953 0.9932 0.9896 0.9889 0.9912 0.9926 0.9911 0.9964 0.9974 0.9944 0.9974 0.9964 0.9963 0.9948 0.9948 0.9953 0.9948 0.9953 0.9949 0.9988 0.9954 0.992 0.9984 0.9954 0.9926 0.992 0.9976 0.9972 0.991 0.998 0.9966 0.998 1.0007 0.992 0.9925 0.991 0.9934 0.9955 0.9944 0.9981 0.9968 0.9946 0.9946 0.9981 0.9946 0.997 0.9924 0.9958 0.994 0.9958 0.9984 0.9948 0.9932 0.9952 0.9924 0.9945 0.9976 0.9976 0.9938 0.9997 0.994 0.9921 0.9986 0.9987 0.9991 0.9987 0.9991 0.9991 0.9948 0.9987 0.993 0.9988 1 0.9932 0.9991 0.9989 1 1 0.9952 0.9969 0.9966 0.9966 0.9976 0.99 0.9988 0.9942 0.9984 0.9932 0.9969 0.9966 0.9933 0.9916 0.9914 0.9966 0.9958 0.9926 0.9939 0.9953 0.9906 0.9914 0.9958 0.9926 0.9991 0.9994 0.9976 0.9966 0.9953 0.9923 0.993 0.9931 0.9932 0.9926 0.9938 0.9966 0.9974 0.9924 0.9948 0.9964 0.9924 0.9966 0.9974 0.9938 0.9928 0.9959 1.0001 0.9959 1.0001 0.9968 0.9932 0.9954 0.9992 0.9932 0.9939 0.9952 0.9996 0.9966 0.9925 0.996 0.9996 0.9973 0.9937 0.9966 1.0017 0.993 0.993 0.9959 0.9958 1.0017 0.9958 0.9979 0.9941 0.997 0.9934 0.9927 0.9944 0.9927 0.9963 1.0011 1.0011 0.9959 0.9973 0.9966 0.9932 0.9984 0.999 0.999 0.999 0.999 0.999 1.0006 0.9937 0.9954 0.997 0.9912 0.9939 0.999 0.9957 0.9926 0.9994 1.0004 0.9994 1.0004 1.0004 1.0002 0.9922 0.9922 0.9934 0.9926 0.9941 0.9994 1.0004 0.9924 0.9948 0.9935 0.9918 0.9948 0.9924 0.9979 0.993 0.994 0.991 0.993 0.9922 0.9979 0.9937 0.9928 0.9965 0.9928 0.9991 0.9948 0.9925 0.9958 0.9962 0.9965 0.9951 0.9944 0.9916 0.9987 0.9928 0.9926 0.9934 0.9944 0.9949 0.9926 0.997 0.9949 0.9948 0.992 0.9964 0.9926 0.9982 0.9955 0.9955 0.9958 0.9997 1.0001 1.0001 0.9918 0.9918 0.9931 1.0001 0.9926 0.9966 0.9932 0.9969 0.9925 0.9914 0.996 0.9952 0.9934 0.9939 0.9939 0.9906 0.9901 0.9948 0.995 0.9953 0.9953 0.9952 0.996 0.9948 0.9951 0.9931 0.9962 0.9948 0.9959 0.9962 0.9958 0.9948 0.9948 0.994 0.9942 0.9942 0.9948 0.9964 0.9958 0.9932 0.9986 0.9986 0.9988 0.9953 0.9983 1 0.9951 0.9983 0.9906 0.9981 0.9936 0.9951 0.9953 1.0005 0.9972 1 0.9969 1.0001 1.0001 1.0001 0.9934 0.9969 1.0001 0.9902 0.993 0.9914 0.9941 0.9967 0.9918 0.998 0.9967 0.9918 0.9957 0.9986 0.9958 0.9948 0.9918 0.9923 0.9998 0.9998 0.9914 0.9939 0.9966 0.995 0.9966 0.994 0.9972 0.9998 0.9998 0.9982 0.9924 0.9972 0.997 0.9954 0.9962 0.9972 0.9921 0.9905 0.9998 0.993 0.9941 0.9994 0.9962 0.992 0.9922 0.994 0.9897 0.9954 0.99 0.9948 0.9922 0.998 0.9944 0.9944 0.9986 0.9986 0.9986 0.9986 0.9986 0.996 0.9999 0.9986 0.9986 0.996 0.9951 0.9999 0.993 0.9982 0.992 0.9963 0.995 0.9956 0.997 0.9936 0.9935 0.9963 0.9967 0.9912 0.9981 0.9966 0.9967 0.9963 0.9935 0.9902 0.99 0.996 0.9966 0.9962 0.994 0.996 0.994 0.9944 0.9974 0.996 0.9922 0.9917 0.9918 0.9936 0.9938 0.9918 0.9939 0.9917 0.9981 0.9941 0.9928 0.9952 0.9898 0.9914 0.9981 0.9957 0.998 0.9957 0.9986 0.9983 0.9982 0.997 0.9947 0.997 0.9947 0.99416 0.99516 0.99496 0.9974 0.99579 0.9983 0.99471 0.9974 0.99644 0.99579 0.99699 0.99758 0.9977 0.99397 0.9983 0.99471 0.99243 0.9962 1.00182 0.99384 0.99582 0.9962 0.9924 0.99466 0.99212 0.99449 0.99748 0.99449 0.99748 0.99475 0.99189 0.99827 0.99752 0.99827 0.99479 0.99752 0.99642 1.00047 0.99382 0.99784 0.99486 0.99537 0.99382 0.99838 0.99566 0.99268 0.99566 0.99468 0.9933 0.99307 0.99907 0.99907 0.99907 0.99907 0.99471 0.99471 0.99907 0.99148 0.99383 0.99365 0.99272 0.99148 0.99235 0.99508 0.9946 0.99674 0.99018 0.99235 0.99084 0.99856 0.99591 0.9975 0.9944 0.99173 0.99378 0.99805 0.99534 0.99232 0.99805 0.99078 0.99534 0.99061 0.99182 0.9966 0.9912 0.99779 0.99814 0.99096 0.99379 0.99426 0.99228 0.99335 0.99595 0.99297 0.99687 0.99297 0.99687 0.99445 0.9986 0.99154 0.9981 0.98993 1.00241 0.99716 0.99437 0.9972 0.99756 0.99509 0.99572 0.99756 0.99175 0.99254 0.99509 0.99676 0.9979 0.99194 0.99077 0.99782 0.99942 0.99708 0.99353 0.99256 0.99199 0.9918 0.99354 0.99244 0.99831 0.99396 0.99724 0.99524 0.9927 0.99802 0.99512 0.99438 0.99679 0.99652 0.99698 0.99474 0.99511 0.99582 0.99125 0.99256 0.9911 0.99168 0.9911 0.99556 1.00098 0.99516 0.99516 0.99518 0.99347 0.9929 0.99347 0.99841 0.99362 0.99361 0.9914 0.99114 0.9925 0.99453 0.9938 0.9938 0.99806 0.9961 1.00016 0.9916 0.99116 0.99319 0.99517 0.99514 0.99566 0.99166 0.99587 0.99558 0.99117 0.99399 0.99741 0.99405 0.99622 1.00051 0.99803 0.99405 0.99773 0.99397 0.99622 0.99713 0.99274 1.00118 0.99176 0.9969 0.99771 0.99411 0.99771 0.99411 0.99194 0.99558 0.99194 0.99558 0.99577 0.99564 0.99578 0.99888 1.00014 0.99441 0.99594 0.99437 0.99594 0.9979 0.99434 0.99203 0.998 0.99316 0.998 0.99314 0.99316 0.99612 0.99295 0.99394 0.99642 0.99642 0.99248 0.99268 0.99954 0.99692 0.99592 0.99592 0.99692 0.99822 0.99822 0.99402 0.99404 0.99787 0.99347 0.99838 0.99839 0.99375 0.99155 0.9936 0.99434 0.9922 0.99571 0.99658 0.99076 0.99496 0.9937 0.99076 0.99542 0.99825 0.99289 0.99432 0.99523 0.99542 0.9959 0.99543 0.99662 0.99088 0.99088 0.99922 0.9966 0.99466 0.99922 0.99836 0.99836 0.99238 0.99645 1 1 0.99376 1 0.99513 0.99556 0.99556 0.99543 0.99886 0.99526 0.99166 0.99691 0.99732 0.99573 0.99656 0.99112 0.99214 0.99165 0.99004 0.99463 0.99683 0.99004 0.99596 0.99898 0.99114 0.99508 0.99306 0.99898 0.99508 0.99114 0.99342 0.99345 0.99772 0.99239 0.99502 0.99502 0.99479 0.99207 0.99497 0.99828 0.99542 0.99542 0.99228 0.99706 0.99497 0.99669 0.99828 0.99269 0.99196 0.99662 0.99475 0.99544 0.99944 0.99475 0.99544 0.9966 0.99066 0.9907 0.99066 0.998 0.9907 0.99066 0.99307 0.99106 0.99696 0.99106 0.99307 0.99167 0.99902 0.98992 0.99182 0.99556 0.99582 0.99182 0.98972 0.99352 0.9946 0.99273 0.99628 0.99582 0.99553 0.98914 0.99354 0.99976 0.99808 0.99808 0.99808 0.99808 0.99808 0.99808 0.9919 0.99808 0.99499 0.99655 0.99615 0.99296 0.99482 0.99079 0.99366 0.99434 0.98958 0.99434 0.99938 0.99059 0.99835 0.98958 0.99159 0.99159 0.98931 0.9938 0.99558 0.99563 0.98931 0.99691 0.9959 0.99159 0.99628 0.99076 0.99678 0.99678 0.99678 0.99089 0.99537 1.0002 0.99628 0.99089 0.99678 0.99076 0.99332 0.99316 0.99272 0.99636 0.99202 0.99148 0.99064 0.99884 0.99773 1.00013 0.98974 0.99773 1.00013 0.99112 0.99136 0.99132 0.99642 0.99488 0.99527 0.99578 0.99352 0.99199 0.99198 0.99756 0.99578 0.99561 0.99347 0.98936 0.99786 0.99705 0.9942 0.9948 0.99116 0.99688 0.98974 0.99542 0.99154 0.99118 0.99044 0.9914 0.9979 0.98892 0.99114 0.99188 0.99583 0.98892 0.98892 0.99704 0.9911 0.99334 0.99334 0.99094 0.99014 0.99304 0.99652 0.98944 0.99772 0.99367 0.99304 0.99183 0.99126 0.98944 0.99577 0.99772 0.99652 0.99428 0.99388 0.99208 0.99256 0.99388 0.9925 0.99904 0.99216 0.99208 0.99428 0.99165 0.99924 0.99924 0.99924 0.9956 0.99562 0.9972 0.99924 0.9958 0.99976 0.99976 0.99296 0.9957 0.9958 0.99579 0.99541 0.99976 0.99518 0.99168 0.99276 0.99085 0.99873 0.99172 0.99312 0.99276 0.9972 0.99278 0.99092 0.9962 0.99053 0.99858 0.9984 0.99335 0.99053 0.9949 0.9962 0.99092 0.99532 0.99727 0.99026 0.99668 0.99727 0.9952 0.99144 0.99144 0.99015 0.9914 0.99693 0.99035 0.99693 0.99035 0.99006 0.99126 0.98994 0.98985 0.9971 0.99882 0.99477 0.99478 0.99576 0.99578 0.99354 0.99244 0.99084 0.99612 0.99356 0.98952 0.99612 0.99084 0.99244 0.99955 0.99374 0.9892 0.99144 0.99352 0.99352 0.9935 0.99237 0.99144 0.99022 0.99032 1.03898 0.99587 0.99587 0.99587 0.99976 0.99354 0.99976 0.99552 0.99552 0.99587 0.99604 0.99584 0.98894 0.9963 0.993 0.98894 0.9963 0.99068 0.98964 0.99604 0.99584 0.9923 0.99437 0.993 0.99238 0.99801 0.99802 0.99566 0.99067 0.99066 0.9929 0.9934 0.99067 0.98912 0.99066 0.99228 0.98912 0.9958 0.99052 0.99312 0.9968 0.99502 0.99084 0.99573 0.99256 0.9959 0.99084 0.99084 0.99644 0.99526 0.9954 0.99095 0.99188 0.9909 0.99256 0.9959 0.99581 0.99132 0.98936 0.99136 0.99142 0.99232 0.99232 0.993 0.99311 0.99132 0.98993 0.99208 0.99776 0.99839 0.99574 0.99093 0.99156 0.99278 0.9924 0.98984 0.99035 0.9924 0.99165 0.9923 0.99278 0.99008 0.98964 0.99156 0.9909 0.98984 0.9889 0.99178 0.99076 0.9889 0.99046 0.98999 0.98946 0.98976 0.99046 0.99672 0.99482 0.98945 0.98883 0.99362 0.99075 0.99436 0.98988 0.99158 0.99265 0.99195 0.99168 0.9918 0.99313 0.9895 0.9932 0.99848 0.9909 0.99014 0.9952 0.99652 0.99848 0.99104 0.99772 0.9922 0.99076 0.99622 0.9902 0.99114 0.9938 0.99594 0.9902 0.99035 0.99032 0.99558 0.99622 0.99076 0.99413 0.99043 0.99043 0.98982 0.98934 0.9902 0.99449 0.99629 0.9948 0.98984 0.99326 0.99834 0.99555 0.98975 0.99216 0.99216 0.99834 0.9901 0.98975 0.99573 0.99326 0.99215 0.98993 0.99218 0.99555 0.99564 0.99564 0.99397 0.99576 0.99601 0.99564 0.99397 0.98713 0.99308 0.99308 0.99582 0.99494 0.9929 0.99471 0.9929 0.9929 0.99037 0.99304 0.99026 0.98986 0.99471 0.98951 0.99634 0.99368 0.99792 0.99026 0.99362 0.98919 0.99835 0.99835 0.99038 0.99104 0.99038 0.99286 0.99296 0.99835 0.9954 0.9914 0.99286 0.99604 0.99604 0.99119 0.99007 0.99507 0.99596 0.99011 0.99184 0.99469 0.99469 0.99406 0.99305 0.99096 0.98956 0.9921 0.99496 0.99406 0.99406 0.9888 0.98942 0.99082 0.98802 17.3 1.4 1.3 1.6 5.25 2.4 14.6 11.8 1.5 1.8 7.7 2 1.8 1.4 16.7 8.1 8 4.7 8.1 2.1 16.7 6.4 1.5 7.6 1.5 12.4 1.3 1.7 8.1 7.1 7.6 2.3 6.5 1.4 12.7 1.6 1.1 1.2 6.5 4.6 0.6 10.6 4.6 4.8 2.7 12.6 0.6 9.2 6.6 7 8.45 11.1 18.15 18.15 4.1 4.1 4.6 18.15 4.9 8.3 1.4 11.5 1.8 1.6 2.4 4.9 1.8 4.3 4.4 1.4 1.6 1.3 5.2 5.6 5.3 4.9 2.4 1.6 2.1 1.4 7.1 1.6 10.7 11.1 10.7 1.6 1.6 1.5 1.5 1.6 1.6 8 7.7 2.7 15.1 15.1 8.9 6 12.3 13.1 6.7 12.3 2.3 11.1 1.5 6.7 6 15.2 10.2 13.1 10.7 17.1 17.1 17.1 1.9 10.7 17.1 1.2 1.2 3.1 1.5 10.7 4.9 12.6 10.7 4.9 12.15 12 1.7 2.6 1.4 1.9 16.9 16.9 2.1 7 7.1 5.9 7.1 8.7 13.2 15.3 15.3 13.2 2.7 10.65 10 6.8 15.6 13.2 5.1 3 15.3 2.1 1.9 8.6 8.75 3.6 4.7 1.3 1.8 9.7 4 2.4 4.7 18.8 1.8 1.8 12.8 12.8 12.8 12.8 12.8 7.8 16.75 12.8 12.8 7.8 5.4 16.75 1.3 10.1 3.8 10.9 6.6 9.8 11.7 1.2 1.4 9.6 12.2 2.6 10.7 4.9 12.2 9.6 1.4 1.1 1 8.2 11.3 7.3 2.3 8.2 2.1 2 10 15.75 3.9 2 1.5 1.6 1.4 1.5 1.4 2 13.8 1.3 3.8 6.9 2.2 1.6 13.8 10.8 12.8 10.8 15.3 12.1 12 11.6 9.2 11.6 9.2 2.8 1.6 6.1 8.5 7.8 14.9 6.2 8.5 8.2 7.8 10.6 11.2 11.6 7.1 14.9 6.2 1.7 7.7 17.3 1.4 7.7 7.7 3.4 1.6 1.4 1.4 10.4 1.4 10.4 4.1 2.8 15.7 10.9 15.7 6.5 10.9 5.9 17.3 1.4 13.5 8.5 6.2 1.4 14.95 7.7 1.3 7.7 1.3 1.3 1.3 15.6 15.6 15.6 15.6 4.9 5 15.6 6.5 1.4 2.7 1.2 6.5 6.4 6.9 7.2 10.6 3.5 6.4 2.3 12.05 7 11.8 1.4 5 2.2 14.6 1.6 1.3 14.6 2.8 1.6 3.3 6.3 8.1 1.6 10.6 11.8 1.7 8.1 1.4 1.3 1.8 7.2 1.1 11.95 1.1 11.95 2.2 12.7 1.4 10.6 1.9 17.8 10.2 4.8 9.8 8.4 7.2 4.8 8.4 4.5 1.4 7.2 11 11.1 2.6 2 10.1 13.3 11.4 1.3 1.4 1.4 7 2 1.2 12.9 5 10.1 3.75 1.7 12.6 1.3 1.6 7.6 8.1 14.9 6 6 7.2 3 1.2 2 4.9 2 8.9 16.45 2 1.9 5.1 4.4 5.8 4.4 12.9 1.3 1.3 1.2 2.7 1.7 8.2 1.5 1.5 12.9 3.9 17.75 4.9 1.6 1.4 2 2 8.2 2.1 1.8 8.5 4.45 5.8 13 2.7 7.3 19.1 8.8 2.7 7.4 2.3 6.85 11.4 0.9 19.35 7.9 11.75 7.7 3 7.7 3 1.5 7.5 1.5 7.5 8.3 7.05 8.4 13.9 17.5 5.6 9.4 4.8 9.4 9.7 6.3 1.6 14.6 2.5 14.6 2.6 2.5 8.2 1.5 2.3 10 10 1.6 1.6 16 10.4 7.4 7.4 10.4 16.05 16.05 2.6 2.5 10.8 1.2 12.1 11.95 1.7 0.8 1.4 1.3 6.3 10.3 15.55 1.5 1.5 1.4 1.5 7.9 13 1 4.85 7.1 7.9 7.5 7.6 10.3 1.7 1.7 19.95 7.7 5.3 19.95 12.7 12.7 1.5 11.3 18.1 18.1 7 18.1 6.4 1.4 1.4 3.1 14.1 7.7 5.2 11.6 10.4 7.5 11.2 0.8 1.4 4.7 3.1 4 11.3 3.1 8.1 14.8 1.4 8.1 3.5 14.8 8.1 1.4 1.5 1.5 12.8 1.6 7.1 7.1 11.2 1.7 6.7 17.3 8.6 8.6 1.5 12.1 6.7 10.7 17.3 1.8 1.4 7.5 4.8 7.1 16.9 4.8 7.1 11.3 1.1 1.2 1.1 12.9 1.2 1.1 1.2 2.3 10 2.3 1.2 1.4 14.9 1.8 1.8 7 8.6 1.8 1.1 1.3 4.9 1.9 10.4 10 8.6 1.7 1.7 18.95 12.8 12.8 12.8 12.8 12.8 12.8 0.7 12.8 1.4 13.3 8.5 1.5 11.7 5 1.2 2.1 1.4 2.1 16 1.1 15.3 1.4 2.8 2.8 0.9 2.5 8.1 8.2 0.9 11.1 7.8 2.8 10.1 3.2 14.2 14.2 14.2 2.9 6 20.4 10.1 2.9 14.2 3.2 0.95 1.7 1.7 9 1.3 1.4 2.4 16 11.4 14.35 2.1 11.4 14.35 1.1 1.1 1.2 15.8 5.2 5.2 9.6 5.2 1.2 0.8 14.45 9.6 6.9 3.4 2.3 11 5.95 5.1 5.4 1.2 12.6 1 6.6 1.5 1 1.1 6.6 8.2 2 1.4 2 7.5 2 2 13.3 2.85 5.6 5.6 1 3.2 1 7.1 2.4 11.2 9.5 1 1.8 2.6 2.4 8 11.2 7.1 3.3 10.3 1.2 1.6 10.3 9.65 16.4 1.5 1.2 3.3 5 16.3 16.3 16.3 6.5 6.4 10.2 16.3 7.4 13.7 13.7 1.3 7.4 7.4 7.45 7.2 13.7 10.4 1.1 6.5 4.6 13.9 5.2 1.7 6.5 16.4 3.6 1.5 12.4 1.7 6.2 6.2 2.6 1.7 9.3 12.4 1.5 9.1 12 4.8 12.3 12 2.7 3.6 3.6 4.3 1.8 11.8 1.8 11.8 1.8 1.4 6.6 1.55 0.7 6.4 11.8 4.3 5.1 5.8 5.9 1.3 1.4 1.2 7.4 10.8 1.8 7.4 1.2 1.4 14.4 1.7 3.6 3.6 10.05 10.05 10.5 1.9 3.6 1.65 1.9 65.8 6.85 7.4 7.4 20.2 11 20.2 6.2 6.2 6.85 8 8.2 2.2 10.1 7.2 2.2 10.1 1.6 1.3 8 8.2 5.3 14 7.2 1.6 11.8 9.6 6.1 2.7 3.6 1.7 1.6 2.7 1 0.9 1.6 1 10.6 2 1.2 6.2 9.2 5 6.3 3.3 8 1.2 1.2 16.2 11.6 7.2 1.1 3.4 1.4 3.3 8 9.3 2.3 0.9 3.5 1.7 1.3 1.3 5.6 7.4 2.3 1 1.5 10 14.9 9.3 1 1 5.9 5 1.25 3.9 5 0.8 1 5.9 1.6 1.3 1 1.1 1.25 1.4 1.2 5 1.4 1.7 1.8 1.6 1.5 1.7 13.9 5.9 2.1 1.1 6.7 2.7 6.7 3.95 7.75 10.6 1.6 2.5 0.7 11.1 5.15 4.7 9.7 1.7 1.4 2 7.5 9.7 0.8 13.1 1.1 2.2 8.9 1.1 0.9 1.7 6.9 1.1 1 1 7.6 8.9 2.2 1.2 1 1 3.1 1.95 2.2 8.75 11.9 2.7 5.45 6.3 14.4 7.8 1.6 9.1 9.1 14.4 1.3 1.6 11.3 6.3 0.7 1.25 0.7 7.8 10.3 10.3 7.8 8.7 8.3 10.3 7.8 1.2 8.3 8.3 6.2 5 1.8 1.6 1.8 1.8 2.9 6 0.9 1.1 1.6 5.45 14.05 8 13.1 4.9 1.3 2.2 14.9 14.9 0.95 1.4 0.95 1.7 5.6 14.9 7.1 1.2 9.6 11.4 11.4 7.9 5 11.1 8 3.8 10.55 10.2 10.2 9.8 6.3 1.1 4.5 6.3 10.9 9.8 9.8 0.8 0.8 1.2 1.3 9.8 10.2 10.9 6.3 6.3 1.2 0.9 1.1 4.5 3.7 18.1 1.35 5.5 3.1 12.85 19.8 8.25 12.85 3.8 6.9 8.25 11.7 4.6 4 19.8 12.85 1.2 8.9 11.7 6.2 14.8 14.8 10.8 1.6 8.3 8.4 2.5 3.5 17.2 2.1 12.2 11.8 16.8 17.2 1.1 14.7 5.5 6.1 1.2 1.3 8.7 1.7 8.7 10.2 4.5 5.9 1.7 1.4 5.4 7.9 1.1 7 7 7.6 7 12.3 15.3 12.3 1.2 2.3 6.1 7.6 10.2 4.1 2.9 8.5 1.5 3.1 7.9 3.5 4.9 1.1 7 1.2 4.5 2.6 9.9 4.5 9.5 1.5 3.2 2.6 11.2 3.2 2.3 4.9 4.9 1.4 1.5 6.7 2.1 4.3 10.9 7 2.3 2.5 2.6 3.2 2.5 14.7 4.5 2.2 1.9 1.6 17.3 4.2 4.2 2.5 1.9 1.4 0.8 8 1.6 1.7 5.5 17.3 8.6 6.9 2.1 2.2 1.5 2.5 17.6 4.2 2.9 4.8 11.9 0.9 1.3 6.4 4.3 11.9 8.1 1.3 0.9 17.2 17.2 17.2 8.7 17.2 8.7 7.5 17.2 4.6 3.7 2.2 7.4 15.1 7.4 4.8 7.9 1 15.1 7.4 4.8 4.6 1.4 6.2 6.1 5.1 6.3 0.9 2.3 6.6 7.5 8.6 11.9 2.3 7.1 4.3 1.1 1 7.9 1 1 1 7.3 1.7 1.3 6.4 1.8 1.5 3.8 7.9 1 1.2 5.3 9.1 6.5 9.1 6.3 5.1 6.5 2.4 9.1 7.5 5 6.75 1.2 1.6 16.05 5 12.4 0.95 4.6 1.7 1 1.3 5 2.5 2.6 2.1 12.75 1.1 12.4 3.7 2.65 2.5 8.2 7.3 1.1 6.6 7 14.5 11.8 3 3.7 6 4.6 2.5 3.3 1 1.1 1.4 3.3 8.55 2.5 6.7 3.8 4.5 4.6 4.2 11.3 5.5 4.2 2.2 14.5 14.5 14.5 14.5 14.5 14.5 1.5 18.75 3.6 1.4 5.1 10.5 2 2.6 9.2 1.8 5.7 2.4 1.9 1.4 0.9 4.6 1.4 9.2 1.4 1.8 2.3 2.3 4.4 6.4 2.9 2.8 2.9 4.4 8.2 1 2.9 7 1.8 1.5 7 8.2 7.6 2.3 8.7 1 2.9 6.7 5 1.9 2 1.9 8.5 12.6 5.2 2.1 1.1 1.3 1.1 9.2 1.2 1.1 8.3 1.8 1.4 15.7 4.35 1.8 1.6 2 5 1.8 1.3 1 1.4 8.1 8.6 3.7 5.7 2.35 13.65 13.65 13.65 15.2 4.6 1.2 4.6 6.65 13.55 13.65 9.8 10.3 6.7 15.2 9.9 7.2 1.1 8.3 11.25 12.8 9.65 12.6 12.2 8.3 11.25 1.3 9.9 7.2 1.1 1.1 4.8 1.1 1.4 1.7 10.6 1.4 1.1 5.55 2.1 1.7 9 1.7 1.8 4.7 11.3 3.6 6.9 3.6 4.9 6.95 1.9 4.7 11.3 1.8 11.3 8.2 8.3 9.55 8.4 7.8 7.8 10.2 5.5 7.8 7.4 3.3 5 3.3 5 1.3 1.2 7.4 7.8 9.9 0.7 4.6 5.6 9.5 14.8 4.6 2.1 11.6 1.2 11.6 2.1 20.15 4.7 4.3 14.5 4.9 14.55 14.55 10.05 4.9 14.5 14.55 15.25 3.15 1.3 5.2 1.1 7.1 8.8 18.5 8.8 1.4 1.2 5 1.6 18.75 6 9.4 9.7 4.75 6 5.35 5.35 6.8 6.9 1.4 0.9 1.2 1.3 2.6 12 9.85 3.85 2 1.6 7.8 1.9 2 10.3 1.1 12 3.85 9.85 2 4 1.1 10.4 6.1 1.8 10.4 4.7 4 1.1 6.4 8.15 6.1 4.8 1.2 1.1 1.4 7.4 1.8 1 15.5 15.5 8.4 2.4 3.95 19.95 2 3 15.5 8.4 14.3 4.2 1.4 3 4.9 2.4 14.3 10.7 11 1.4 1.2 12.9 10.8 1.3 2 1.8 1.2 7.5 9.7 3.8 7.2 9.7 6.3 6.3 0.8 8.6 6.3 3.1 7.2 7.1 6.4 14.7 7.2 7.1 1.9 1.2 4.8 1.2 3.4 4.3 8.5 1.8 1.8 19.5 8.5 19.9 8.3 1.8 1.1 16.65 16.65 16.65 0.9 6.1 10.2 0.9 16.65 3.85 4.4 4.5 3.2 4.5 4.4 9.7 4.2 4.2 1.1 9.7 4.2 5.6 4.2 1.6 1.6 1.1 14.6 2.6 1.2 7.25 6.55 7 1.5 1.4 7.25 1 4.2 17.5 17.5 17.5 1.5 1.3 3.9 4.2 7.6 1 1.1 11.8 1.4 9.7 12.9 1.6 7.2 7.1 1.9 8.8 7.2 1.4 14.3 14.3 8.8 1.4 1.8 14.3 7.2 1.2 11.8 0.9 12.6 26.05 4.7 12.6 1.2 26.05 6.1 11.8 0.9 5.6 5.3 5.7 8 8 17.6 8 8.8 1.5 1.4 4.8 2.4 3.7 4.9 5.7 5.7 4.9 2 5.1 4.5 3.2 6.65 1.6 4 17.75 1.4 17.75 7.2 5.7 8.5 11.4 5.4 2.7 4.3 1.2 1.8 1.3 5.7 2.7 11.7 4.3 11 1.6 11.6 6.2 1.8 1.2 1 2.4 1.2 8.2 18.8 9.6 12.9 9.2 1.2 12.9 8 12.9 1.6 12 2.5 9.2 4.4 8.8 9.6 8 18.8 1.3 1.2 12.9 1.2 1.6 1.5 18.15 13.1 13.1 13.1 13.1 1 1.6 11.8 1.4 1 13.1 10.6 10.4 1.1 7.4 1.2 3.4 18.15 8 2.5 2 2 6.9 1.2 9.4 2.9 6.9 5.4 1.3 20.8 10.3 1.3 1.6 13.1 1.8 8 1.6 1.4 14.7 14.7 14.7 14.7 14.7 14.7 14.7 1.8 10.6 12.5 6.8 14.7 2.9 1.4 1.4 2.1 7.4 2.9 1.4 1.4 7.4 5 2.5 6.1 2.7 2.1 12.9 12.9 12.9 13.7 12.9 2.4 9.8 13.7 1.3 12.1 6.1 7.7 6.1 1.4 7.7 12.1 6.8 9.2 8.3 17.4 2.7 12.8 8.2 8.1 8.2 8.3 8 11.8 12 1.7 17.4 13.9 10.7 2 2.2 1.3 1.1 2 6.4 1.3 1.1 10.7 6.4 6.3 6.4 15.1 2 2 2.2 12.1 8.8 8.8 5.1 6.8 6.8 3.7 12.2 5.7 8.1 2.5 4 6.8 1 5.1 5.8 10.6 3.5 3.5 16.4 4.8 3.3 1.2 1.2 4.8 3.3 2.5 8.7 1.6 4 2.5 16.2 9 16.2 1.4 7 9 3.1 1.5 4.6 4.8 4.6 1.5 2.7 6.3 7.2 7.2 12.4 6.6 6.6 4 4.8 1.3 7.2 11.1 12.4 9.8 6.6 13.3 11.7 8 1.6 16.55 1.5 10.2 6.6 17.8 17.8 1.5 7.4 17.8 2 7.4 2 17.8 12.1 8.2 1.5 8.7 3.5 6.4 2.1 7.7 12.3 1.3 8.7 3.5 1.1 2.8 3.5 1.9 3.8 3.8 2.4 4.8 4.8 6.2 1.3 3.8 1.5 4.8 1.9 6.2 7.9 1.6 1.4 2.6 14.8 2.4 0.9 0.9 1.2 9.9 3.9 15.6 15.6 1.5 1.6 7.8 5.6 1.3 16.7 7.95 6.7 1.1 6.3 8.9 1 1.5 6.6 6.2 6.3 2.1 2.2 5.4 8.9 1 17.9 2.6 1.3 17.9 2.6 2.3 4.3 7.1 7.1 11.9 11.7 5.8 3.8 12.4 6.5 7.1 7.6 7.9 2.8 10.6 2.8 1.5 7.6 7.9 1.7 7.6 7.5 1.7 1.7 12.1 4.5 1.7 8 7.6 8.6 8.6 14.6 1.6 8.6 14.6 1.1 3.7 8.9 8.9 4.7 8.9 3.1 5.8 5.8 5.8 1 15.8 1.5 5.2 1.5 2.5 1 15.8 5.9 3.1 3.1 5.8 11.5 18 4.8 8.5 1.6 18 4.8 5.9 1.1 8.5 13.1 4.1 2.9 13.1 1.1 1.5 7.75 1.15 1 17.8 5.7 17.8 7.4 1.4 1.4 1 4.4 1.6 7.9 15.5 15.5 15.5 15.5 17.55 13.5 13.5 1.3 15.5 11.6 7.9 15.5 17.55 11.6 13.15 1.9 13.5 1.3 6.1 6.1 1.9 1.9 1.6 11.3 8.4 8.3 8.4 12.2 8 1.3 12.7 1.3 10.5 12.5 9.6 1.5 1.5 7.8 10.8 12.5 8.6 1.2 14.5 3.7 1.1 1.1 3.8 4.6 10.2 7.9 2.4 10.7 4.9 10.7 1.1 7.9 5.6 2.4 14.2 9.5 9.5 4.1 4.7 1.4 0.9 20.3 3.5 2.7 1.2 1.2 2 1.1 1.5 1.2 18.1 18.1 3.6 3.5 12.1 17.45 12.1 3 1.6 5.7 5.6 6.8 15.6 6 1.8 8.6 8.6 11.5 7.8 2.4 5 8.6 1.5 5.4 11.9 11.9 9 10 11.9 11.9 15.5 5.4 15 1.4 9.4 3.7 15 1.4 6.5 1.4 6.3 13.7 13.7 13.7 13.7 13.7 13.7 1.5 1.6 1.4 3.5 1 1.4 1.5 13.7 1.6 5.2 1.4 11.9 2.4 3.2 1.7 4.2 15.4 13 5.6 9.7 2.5 4 15.4 1.2 2 1.2 5.1 1.4 1.2 6.5 1.3 6.5 2.7 1.3 7.4 12.9 1.3 1.2 2.6 2.3 1.3 10.5 2.6 14.4 1.2 3.1 1.7 6 11.8 6.2 1.4 12.1 12.1 12.1 3.9 4.6 12.1 1.2 8.1 3.9 1.1 6.5 10.1 10.7 3.2 12.4 5.2 5 2.5 9.2 6.9 2 15 15 1.2 15 1.8 10.8 3.9 4.2 2 13.5 13.3 2.2 1.4 1.6 2.2 14.8 1.8 14.8 1.3 9.9 5.1 5.1 1.5 1.5 11.1 5.25 2.3 7.9 8 1.4 5.25 2.3 2.3 3.5 13.7 9.9 15.4 16 16 16 16 2.4 5.5 2.3 16.8 16 17.8 17.8 6.8 6.8 6.8 6.8 1.6 4.7 11.8 17.8 15.7 5.8 15.7 9 15.7 5.8 8.8 10.2 6.6 6.5 8.9 11.1 4.2 1.6 7.4 11.5 1.6 2 4.8 9.8 1.9 4.2 1.6 7.3 5.4 10.4 1.9 7.3 5.4 7.7 11.5 1.2 2.2 1 8.2 8.3 8.2 9.3 8.1 8.2 8.3 13.9 13.9 13.9 13.9 13.9 13.9 13.9 2 13.9 15.7 1.2 1.5 1.2 3.2 1.2 2.6 13.2 10.4 5.7 2.5 1.6 1.4 7.4 2.5 5.6 3.6 7.5 5.8 1.6 1.5 2.9 11.2 9.65 10.1 3.2 11.2 11.45 9.65 4.5 2.7 3.5 1.7 2.1 4.8 5 2.6 6.6 5 7.3 5 1.7 2.6 8.2 8.2 5 1.2 7.1 9.5 15.8 15.5 15.8 17.05 12.7 12.3 11.8 11.8 11.8 12.3 11.8 13.6 5.2 6.2 7.9 7.9 3.3 2.8 7.9 3.3 6.3 4.9 10.4 4.9 10.4 16 6.3 2.2 17.3 17.3 17.3 17.3 2.2 2.2 17.3 6.6 6.5 12.3 5 2.8 13.6 2.8 5.4 10.9 1.7 9.15 4.5 9.15 1.4 5.9 16.4 1.2 16.4 5.9 7.8 7.8 2.8 2.9 2.5 12.8 12.2 7.7 2.8 2.9 17.3 19.3 19.3 19.3 2.7 6.4 17.3 2.4 2.8 1.7 15.4 15.4 4.1 6.6 1.2 2.1 1 1.1 1.4 1.6 9.8 1.9 1.3 7.9 7.9 4.5 22.6 7.9 3.5 1.2 4.5 2 7.8 0.9 2.9 2.9 3.5 4.2 9.7 10.5 1.1 16.1 1.1 8.1 6.2 7.7 2.4 16.3 2.3 8.4 8.5 6 1.1 1.75 2.6 1.3 2.1 1.1 1.1 2.8 9 2.8 2.2 5.1 3.5 12.7 7.5 2 3.5 14.3 9.8 12.7 12.7 5.1 3.5 12.7 12.9 12.9 1.3 10.5 1.5 12.7 12.9 1.2 6.2 8.8 3.9 1.3 9.1 9.1 3.9 1.8 2.1 1.4 14.7 9.1 1.9 1.8 9.6 3.9 1.3 11.8 1.9 12 7.9 9.3 4.6 2.2 10.2 10.6 1.4 9.1 11.1 9.1 4.4 2.8 1.1 1.3 1.2 3.3 9.7 2.3 1.1 11.4 1.2 14.7 13.8 1.3 6.3 7.9 2 11.8 1.2 10 5.2 1.2 7.2 9.9 5.3 13.55 2.2 9.9 4.3 13 13.55 1 1.1 6.9 13.4 4.6 9.9 3 5.8 12.9 3.2 0.8 2.5 2.4 7.2 7.3 6.3 4.25 1.2 2 4.25 4.7 4.5 1.4 4.1 5.3 4.2 6.65 8.2 2.6 2.6 2 12.2 2.3 8.2 5 10.7 10.8 1.7 1.3 1.7 12.7 1.3 1.2 1.3 5.7 3.4 1.1 1 1 1.65 6.8 6.8 4.9 1.4 2.5 10.8 10.8 10.8 10.8 2.8 1.3 2 1.1 8.2 6 6.1 8.2 8.8 6.1 6 1.2 11.4 1.3 1.3 6.2 3.2 4.5 9.9 6.2 11.4 1.3 1.3 0.9 0.7 1 1 10.4 1.3 12.5 12.5 12.5 12.5 19.25 1.1 12.5 19.25 9 1.2 9 1.3 12.8 12.8 7.6 7.6 1.4 8.3 9 1.85 12.55 1.4 1.8 4 12.55 9 3 1.85 7.9 2.6 1.2 7.1 7.9 1.3 10.7 7.7 8.4 10.7 12.7 1.8 7.7 10.5 1.6 1.85 10.5 10.5 1 1.2 1.7 1.6 9 1.9 1.2 1.5 3.9 3.6 1.2 5 2.9 10.4 11.4 18.35 18.4 1.2 7.1 1.3 1.5 10.2 2.2 3.5 3.5 3.9 7.4 7.4 11 1.5 3.9 5.4 1.5 5 1.2 13 13 13 13 8.6 1.7 1.2 1.2 1.2 2 19.4 0.8 6.3 6.4 12.1 12.1 12.9 2.4 4.3 4.2 12.9 1.7 2.2 12.1 3.4 7.4 7.3 1.1 1.1 1.4 14.5 8 1.1 1.1 2.2 5.8 0.9 6.4 10.9 7.3 8.3 1.3 3.3 1 1.1 1 5.1 3.2 12.6 3.7 1.7 5.1 1 1.3 1.5 4.6 10.3 6.1 6.1 1.2 10.3 9.9 1.6 1.1 1.5 1.2 1.5 1.1 11.5 7.8 7.4 1.45 8.9 1.1 1 2.5 1.1 2.4 2.3 5.1 2.5 8.9 2.5 8.9 1.6 1.4 3.9 13.7 13.7 9.2 7.8 7.6 7.7 3 1.3 4 1.1 2 1.9 1.4 4.5 10.1 6.6 1.9 12.4 1.6 2.5 1.2 2.5 0.8 0.9 8.1 8.1 11.75 1.3 1.9 8.3 8.1 5.7 1.9 1.2 11.75 2.2 0.9 1.3 1.6 8 1.2 1.1 0.8 \ No newline at end of file diff --git a/pandas/tests/reshape/merge/test_join.py b/pandas/tests/reshape/merge/test_join.py index 5387a1043e00e..083ce16ef9296 100644 --- a/pandas/tests/reshape/merge/test_join.py +++ b/pandas/tests/reshape/merge/test_join.py @@ -401,8 +401,8 @@ def test_join_inner_multiindex(self): index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], names=['first', 'second']) to_join = DataFrame(np.random.randn(10, 3), index=index, columns=['j_one', 'j_two', 'j_three']) @@ -730,6 +730,31 @@ def test_panel_join_many(self): pytest.raises(ValueError, panels[0].join, panels[1:], how='right') + def test_join_multi_to_multi(self, join_type): + # GH 20475 + leftindex = MultiIndex.from_product([list('abc'), list('xy'), [1, 2]], + names=['abc', 'xy', 'num']) + left = DataFrame({'v1': range(12)}, index=leftindex) + + rightindex = MultiIndex.from_product([list('abc'), list('xy')], + names=['abc', 'xy']) + right = DataFrame({'v2': [100 * i for i in range(1, 7)]}, + index=rightindex) + + result = left.join(right, on=['abc', 'xy'], how=join_type) + expected = (left.reset_index() + .merge(right.reset_index(), + on=['abc', 'xy'], how=join_type) + .set_index(['abc', 'xy', 'num']) + ) + assert_frame_equal(expected, result) + + with pytest.raises(ValueError): + left.join(right, on='xy', how=join_type) + + with pytest.raises(ValueError): + right.join(left, on=['abc', 'xy'], how=join_type) + def _check_join(left, right, result, join_col, how='left', lsuffix='_x', rsuffix='_y'): diff --git a/pandas/tests/reshape/merge/test_merge.py b/pandas/tests/reshape/merge/test_merge.py index 7ee88f223cd95..970802e94662a 100644 --- a/pandas/tests/reshape/merge/test_merge.py +++ b/pandas/tests/reshape/merge/test_merge.py @@ -1326,6 +1326,16 @@ def test_merging_with_bool_or_int_cateorical_column(self, category_column, CDT(categories, ordered=ordered)) assert_frame_equal(expected, result) + def test_merge_on_int_array(self): + # GH 23020 + df = pd.DataFrame({'A': pd.Series([1, 2, np.nan], dtype='Int64'), + 'B': 1}) + result = pd.merge(df, df, on='A') + expected = pd.DataFrame({'A': pd.Series([1, 2, np.nan], dtype='Int64'), + 'B_x': 1, + 'B_y': 1}) + assert_frame_equal(result, expected) + @pytest.fixture def left_df(): @@ -1397,16 +1407,16 @@ def test_merge_index_types(index): assert_frame_equal(result, expected) -@pytest.mark.parametrize("on,left_on,right_on,left_index,right_index,nms,nm", [ - (['outer', 'inner'], None, None, False, False, ['outer', 'inner'], 'B'), - (None, None, None, True, True, ['outer', 'inner'], 'B'), - (None, ['outer', 'inner'], None, False, True, None, 'B'), - (None, None, ['outer', 'inner'], True, False, None, 'B'), - (['outer', 'inner'], None, None, False, False, ['outer', 'inner'], None), - (None, None, None, True, True, ['outer', 'inner'], None), - (None, ['outer', 'inner'], None, False, True, None, None), - (None, None, ['outer', 'inner'], True, False, None, None)]) -def test_merge_series(on, left_on, right_on, left_index, right_index, nms, nm): +@pytest.mark.parametrize("on,left_on,right_on,left_index,right_index,nm", [ + (['outer', 'inner'], None, None, False, False, 'B'), + (None, None, None, True, True, 'B'), + (None, ['outer', 'inner'], None, False, True, 'B'), + (None, None, ['outer', 'inner'], True, False, 'B'), + (['outer', 'inner'], None, None, False, False, None), + (None, None, None, True, True, None), + (None, ['outer', 'inner'], None, False, True, None), + (None, None, ['outer', 'inner'], True, False, None)]) +def test_merge_series(on, left_on, right_on, left_index, right_index, nm): # GH 21220 a = pd.DataFrame({"A": [1, 2, 3, 4]}, index=pd.MultiIndex.from_product([['a', 'b'], [0, 1]], @@ -1416,7 +1426,7 @@ def test_merge_series(on, left_on, right_on, left_index, right_index, nms, nm): names=['outer', 'inner']), name=nm) expected = pd.DataFrame({"A": [2, 4], "B": [1, 3]}, index=pd.MultiIndex.from_product([['a', 'b'], [1]], - names=nms)) + names=['outer', 'inner'])) if nm is not None: result = pd.merge(a, b, on=on, left_on=left_on, right_on=right_on, left_index=left_index, right_index=right_index) diff --git a/pandas/tests/reshape/merge/test_merge_asof.py b/pandas/tests/reshape/merge/test_merge_asof.py index 71db7844a9db5..3035412d7b836 100644 --- a/pandas/tests/reshape/merge/test_merge_asof.py +++ b/pandas/tests/reshape/merge/test_merge_asof.py @@ -621,22 +621,22 @@ def test_tolerance_nearest(self): def test_tolerance_tz(self): # GH 14844 left = pd.DataFrame( - {'date': pd.DatetimeIndex(start=pd.to_datetime('2016-01-02'), - freq='D', periods=5, - tz=pytz.timezone('UTC')), + {'date': pd.date_range(start=pd.to_datetime('2016-01-02'), + freq='D', periods=5, + tz=pytz.timezone('UTC')), 'value1': np.arange(5)}) right = pd.DataFrame( - {'date': pd.DatetimeIndex(start=pd.to_datetime('2016-01-01'), - freq='D', periods=5, - tz=pytz.timezone('UTC')), + {'date': pd.date_range(start=pd.to_datetime('2016-01-01'), + freq='D', periods=5, + tz=pytz.timezone('UTC')), 'value2': list("ABCDE")}) result = pd.merge_asof(left, right, on='date', tolerance=pd.Timedelta('1 day')) expected = pd.DataFrame( - {'date': pd.DatetimeIndex(start=pd.to_datetime('2016-01-02'), - freq='D', periods=5, - tz=pytz.timezone('UTC')), + {'date': pd.date_range(start=pd.to_datetime('2016-01-02'), + freq='D', periods=5, + tz=pytz.timezone('UTC')), 'value1': np.arange(5), 'value2': list("BCDEE")}) assert_frame_equal(result, expected) diff --git a/pandas/tests/reshape/merge/test_multi.py b/pandas/tests/reshape/merge/test_multi.py index a1158201844b0..7e8b5b1120bc6 100644 --- a/pandas/tests/reshape/merge/test_multi.py +++ b/pandas/tests/reshape/merge/test_multi.py @@ -32,8 +32,8 @@ def right(): """right dataframe (multi-indexed) for multi-index join tests""" index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], names=['key1', 'key2']) return DataFrame(np.random.randn(10, 3), index=index, @@ -83,8 +83,8 @@ class TestMergeMulti(object): def setup_method(self): self.index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], names=['first', 'second']) self.to_join = DataFrame(np.random.randn(10, 3), index=self.index, columns=['j_one', 'j_two', 'j_three']) @@ -505,19 +505,15 @@ def test_join_multi_levels(self): # invalid cases household.index.name = 'foo' - def f(): + with pytest.raises(ValueError): household.join(portfolio, how='inner') - pytest.raises(ValueError, f) - portfolio2 = portfolio.copy() portfolio2.index.set_names(['household_id', 'foo']) - def f(): + with pytest.raises(ValueError): portfolio2.join(portfolio, how='inner') - pytest.raises(ValueError, f) - def test_join_multi_levels2(self): # some more advanced merges diff --git a/pandas/tests/reshape/test_concat.py b/pandas/tests/reshape/test_concat.py index 07b00cef2669e..0706cb12ac5d0 100644 --- a/pandas/tests/reshape/test_concat.py +++ b/pandas/tests/reshape/test_concat.py @@ -335,9 +335,9 @@ def test_concatlike_datetimetz(self, tz_aware_fixture): @pytest.mark.parametrize('tz', ['UTC', 'US/Eastern', 'Asia/Tokyo', 'EST5EDT']) def test_concatlike_datetimetz_short(self, tz): - # GH 7795 - ix1 = pd.DatetimeIndex(start='2014-07-15', end='2014-07-17', - freq='D', tz=tz) + # GH#7795 + ix1 = pd.date_range(start='2014-07-15', end='2014-07-17', + freq='D', tz=tz) ix2 = pd.DatetimeIndex(['2014-07-11', '2014-07-21'], tz=tz) df1 = pd.DataFrame(0, index=ix1, columns=['A', 'B']) df2 = pd.DataFrame(0, index=ix2, columns=['A', 'B']) @@ -1188,8 +1188,8 @@ def test_concat_ignore_index(self, sort): def test_concat_multiindex_with_keys(self): index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], names=['first', 'second']) frame = DataFrame(np.random.randn(10, 3), index=index, columns=Index(['A', 'B', 'C'], name='exp')) @@ -1258,8 +1258,8 @@ def test_concat_keys_and_levels(self): names=names) expected = concat([df, df2, df, df2]) exp_index = MultiIndex(levels=levels + [[0]], - labels=[[0, 0, 1, 1], [0, 1, 0, 1], - [0, 0, 0, 0]], + codes=[[0, 0, 1, 1], [0, 1, 0, 1], + [0, 0, 0, 0]], names=names + [None]) expected.index = exp_index @@ -1591,10 +1591,10 @@ def test_concat_series(self): ts.index = DatetimeIndex(np.array(ts.index.values, dtype='M8[ns]')) - exp_labels = [np.repeat([0, 1, 2], [len(x) for x in pieces]), - np.arange(len(ts))] + exp_codes = [np.repeat([0, 1, 2], [len(x) for x in pieces]), + np.arange(len(ts))] exp_index = MultiIndex(levels=[[0, 1, 2], ts.index], - labels=exp_labels) + codes=exp_codes) expected.index = exp_index tm.assert_series_equal(result, expected) @@ -2141,8 +2141,8 @@ def test_concat_multiindex_rangeindex(self): df = DataFrame(np.random.randn(9, 2)) df.index = MultiIndex(levels=[pd.RangeIndex(3), pd.RangeIndex(3)], - labels=[np.repeat(np.arange(3), 3), - np.tile(np.arange(3), 3)]) + codes=[np.repeat(np.arange(3), 3), + np.tile(np.arange(3), 3)]) res = concat([df.iloc[[2, 3, 4], :], df.iloc[[5], :]]) exp = df.iloc[[2, 3, 4, 5], :] @@ -2161,7 +2161,7 @@ def test_concat_multiindex_dfs_with_deepcopy(self): expected_index = pd.MultiIndex(levels=[['s1', 's2'], ['a'], ['b', 'c']], - labels=[[0, 1], [0, 0], [0, 1]], + codes=[[0, 1], [0, 0], [0, 1]], names=['testname', None, None]) expected = pd.DataFrame([[0], [1]], index=expected_index) result_copy = pd.concat(deepcopy(example_dict), names=['testname']) @@ -2552,3 +2552,16 @@ def test_concat_series_name_npscalar_tuple(s1name, s2name): result = pd.concat([s1, s2]) expected = pd.Series({'a': 1, 'b': 2, 'c': 5, 'd': 6}) tm.assert_series_equal(result, expected) + + +def test_concat_categorical_tz(): + # GH-23816 + a = pd.Series(pd.date_range('2017-01-01', periods=2, tz='US/Pacific')) + b = pd.Series(['a', 'b'], dtype='category') + result = pd.concat([a, b], ignore_index=True) + expected = pd.Series([ + pd.Timestamp('2017-01-01', tz="US/Pacific"), + pd.Timestamp('2017-01-02', tz="US/Pacific"), + 'a', 'b' + ]) + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/reshape/test_cut.py b/pandas/tests/reshape/test_cut.py new file mode 100644 index 0000000000000..6833460fa515b --- /dev/null +++ b/pandas/tests/reshape/test_cut.py @@ -0,0 +1,458 @@ +import numpy as np +import pytest + +import pandas as pd +from pandas import ( + Categorical, DataFrame, DatetimeIndex, Index, Interval, IntervalIndex, + Series, TimedeltaIndex, Timestamp, cut, date_range, isna, qcut, + timedelta_range, to_datetime) +from pandas.api.types import CategoricalDtype as CDT +import pandas.core.reshape.tile as tmod +import pandas.util.testing as tm + + +def test_simple(): + data = np.ones(5, dtype="int64") + result = cut(data, 4, labels=False) + + expected = np.array([1, 1, 1, 1, 1]) + tm.assert_numpy_array_equal(result, expected, check_dtype=False) + + +def test_bins(): + data = np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]) + result, bins = cut(data, 3, retbins=True) + + intervals = IntervalIndex.from_breaks(bins.round(3)) + intervals = intervals.take([0, 0, 0, 1, 2, 0]) + expected = Categorical(intervals, ordered=True) + + tm.assert_categorical_equal(result, expected) + tm.assert_almost_equal(bins, np.array([0.1905, 3.36666667, + 6.53333333, 9.7])) + + +def test_right(): + data = np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1, 2.575]) + result, bins = cut(data, 4, right=True, retbins=True) + + intervals = IntervalIndex.from_breaks(bins.round(3)) + expected = Categorical(intervals, ordered=True) + expected = expected.take([0, 0, 0, 2, 3, 0, 0]) + + tm.assert_categorical_equal(result, expected) + tm.assert_almost_equal(bins, np.array([0.1905, 2.575, 4.95, 7.325, 9.7])) + + +def test_no_right(): + data = np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1, 2.575]) + result, bins = cut(data, 4, right=False, retbins=True) + + intervals = IntervalIndex.from_breaks(bins.round(3), closed="left") + intervals = intervals.take([0, 0, 0, 2, 3, 0, 1]) + expected = Categorical(intervals, ordered=True) + + tm.assert_categorical_equal(result, expected) + tm.assert_almost_equal(bins, np.array([0.2, 2.575, 4.95, 7.325, 9.7095])) + + +def test_array_like(): + data = [.2, 1.4, 2.5, 6.2, 9.7, 2.1] + result, bins = cut(data, 3, retbins=True) + + intervals = IntervalIndex.from_breaks(bins.round(3)) + intervals = intervals.take([0, 0, 0, 1, 2, 0]) + expected = Categorical(intervals, ordered=True) + + tm.assert_categorical_equal(result, expected) + tm.assert_almost_equal(bins, np.array([0.1905, 3.36666667, + 6.53333333, 9.7])) + + +def test_bins_from_interval_index(): + c = cut(range(5), 3) + expected = c + result = cut(range(5), bins=expected.categories) + tm.assert_categorical_equal(result, expected) + + expected = Categorical.from_codes(np.append(c.codes, -1), + categories=c.categories, + ordered=True) + result = cut(range(6), bins=expected.categories) + tm.assert_categorical_equal(result, expected) + + +def test_bins_from_interval_index_doc_example(): + # Make sure we preserve the bins. + ages = np.array([10, 15, 13, 12, 23, 25, 28, 59, 60]) + c = cut(ages, bins=[0, 18, 35, 70]) + expected = IntervalIndex.from_tuples([(0, 18), (18, 35), (35, 70)]) + tm.assert_index_equal(c.categories, expected) + + result = cut([25, 20, 50], bins=c.categories) + tm.assert_index_equal(result.categories, expected) + tm.assert_numpy_array_equal(result.codes, + np.array([1, 1, 2], dtype="int8")) + + +def test_bins_not_overlapping_from_interval_index(): + # see gh-23980 + msg = "Overlapping IntervalIndex is not accepted" + ii = IntervalIndex.from_tuples([(0, 10), (2, 12), (4, 14)]) + + with pytest.raises(ValueError, match=msg): + cut([5, 6], bins=ii) + + +def test_bins_not_monotonic(): + msg = "bins must increase monotonically" + data = [.2, 1.4, 2.5, 6.2, 9.7, 2.1] + + with pytest.raises(ValueError, match=msg): + cut(data, [0.1, 1.5, 1, 10]) + + +def test_wrong_num_labels(): + msg = "Bin labels must be one fewer than the number of bin edges" + data = [.2, 1.4, 2.5, 6.2, 9.7, 2.1] + + with pytest.raises(ValueError, match=msg): + cut(data, [0, 1, 10], labels=["foo", "bar", "baz"]) + + +@pytest.mark.parametrize("x,bins,msg", [ + ([], 2, "Cannot cut empty array"), + ([1, 2, 3], 0.5, "`bins` should be a positive integer") +]) +def test_cut_corner(x, bins, msg): + with pytest.raises(ValueError, match=msg): + cut(x, bins) + + +@pytest.mark.parametrize("arg", [2, np.eye(2), DataFrame(np.eye(2))]) +@pytest.mark.parametrize("cut_func", [cut, qcut]) +def test_cut_not_1d_arg(arg, cut_func): + msg = "Input array must be 1 dimensional" + with pytest.raises(ValueError, match=msg): + cut_func(arg, 2) + + +@pytest.mark.parametrize('data', [ + [0, 1, 2, 3, 4, np.inf], + [-np.inf, 0, 1, 2, 3, 4], + [-np.inf, 0, 1, 2, 3, 4, np.inf]]) +def test_int_bins_with_inf(data): + # GH 24314 + msg = 'cannot specify integer `bins` when input data contains infinity' + with pytest.raises(ValueError, match=msg): + cut(data, bins=3) + + +def test_cut_out_of_range_more(): + # see gh-1511 + name = "x" + + ser = Series([0, -1, 0, 1, -3], name=name) + ind = cut(ser, [0, 1], labels=False) + + exp = Series([np.nan, np.nan, np.nan, 0, np.nan], name=name) + tm.assert_series_equal(ind, exp) + + +@pytest.mark.parametrize("right,breaks,closed", [ + (True, [-1e-3, 0.25, 0.5, 0.75, 1], "right"), + (False, [0, 0.25, 0.5, 0.75, 1 + 1e-3], "left") +]) +def test_labels(right, breaks, closed): + arr = np.tile(np.arange(0, 1.01, 0.1), 4) + + result, bins = cut(arr, 4, retbins=True, right=right) + ex_levels = IntervalIndex.from_breaks(breaks, closed=closed) + tm.assert_index_equal(result.categories, ex_levels) + + +def test_cut_pass_series_name_to_factor(): + name = "foo" + ser = Series(np.random.randn(100), name=name) + + factor = cut(ser, 4) + assert factor.name == name + + +def test_label_precision(): + arr = np.arange(0, 0.73, 0.01) + result = cut(arr, 4, precision=2) + + ex_levels = IntervalIndex.from_breaks([-0.00072, 0.18, 0.36, 0.54, 0.72]) + tm.assert_index_equal(result.categories, ex_levels) + + +@pytest.mark.parametrize("labels", [None, False]) +def test_na_handling(labels): + arr = np.arange(0, 0.75, 0.01) + arr[::3] = np.nan + + result = cut(arr, 4, labels=labels) + result = np.asarray(result) + + expected = np.where(isna(arr), np.nan, result) + tm.assert_almost_equal(result, expected) + + +def test_inf_handling(): + data = np.arange(6) + data_ser = Series(data, dtype="int64") + + bins = [-np.inf, 2, 4, np.inf] + result = cut(data, bins) + result_ser = cut(data_ser, bins) + + ex_uniques = IntervalIndex.from_breaks(bins) + tm.assert_index_equal(result.categories, ex_uniques) + + assert result[5] == Interval(4, np.inf) + assert result[0] == Interval(-np.inf, 2) + assert result_ser[5] == Interval(4, np.inf) + assert result_ser[0] == Interval(-np.inf, 2) + + +def test_cut_out_of_bounds(): + arr = np.random.randn(100) + result = cut(arr, [-1, 0, 1]) + + mask = isna(result) + ex_mask = (arr < -1) | (arr > 1) + tm.assert_numpy_array_equal(mask, ex_mask) + + +@pytest.mark.parametrize("get_labels,get_expected", [ + (lambda labels: labels, + lambda labels: Categorical(["Medium"] + 4 * ["Small"] + + ["Medium", "Large"], + categories=labels, ordered=True)), + (lambda labels: Categorical.from_codes([0, 1, 2], labels), + lambda labels: Categorical.from_codes([1] + 4 * [0] + [1, 2], labels)) +]) +def test_cut_pass_labels(get_labels, get_expected): + bins = [0, 25, 50, 100] + arr = [50, 5, 10, 15, 20, 30, 70] + labels = ["Small", "Medium", "Large"] + + result = cut(arr, bins, labels=get_labels(labels)) + tm.assert_categorical_equal(result, get_expected(labels)) + + +def test_cut_pass_labels_compat(): + # see gh-16459 + arr = [50, 5, 10, 15, 20, 30, 70] + labels = ["Good", "Medium", "Bad"] + + result = cut(arr, 3, labels=labels) + exp = cut(arr, 3, labels=Categorical(labels, categories=labels, + ordered=True)) + tm.assert_categorical_equal(result, exp) + + +@pytest.mark.parametrize("x", [np.arange(11.), np.arange(11.) / 1e10]) +def test_round_frac_just_works(x): + # It works. + cut(x, 2) + + +@pytest.mark.parametrize("val,precision,expected", [ + (-117.9998, 3, -118), + (117.9998, 3, 118), + (117.9998, 2, 118), + (0.000123456, 2, 0.00012) +]) +def test_round_frac(val, precision, expected): + # see gh-1979 + result = tmod._round_frac(val, precision=precision) + assert result == expected + + +def test_cut_return_intervals(): + ser = Series([0, 1, 2, 3, 4, 5, 6, 7, 8]) + result = cut(ser, 3) + + exp_bins = np.linspace(0, 8, num=4).round(3) + exp_bins[0] -= 0.008 + + expected = Series(IntervalIndex.from_breaks(exp_bins, closed="right").take( + [0, 0, 0, 1, 1, 1, 2, 2, 2])).astype(CDT(ordered=True)) + tm.assert_series_equal(result, expected) + + +def test_series_ret_bins(): + # see gh-8589 + ser = Series(np.arange(4)) + result, bins = cut(ser, 2, retbins=True) + + expected = Series(IntervalIndex.from_breaks( + [-0.003, 1.5, 3], closed="right").repeat(2)).astype(CDT(ordered=True)) + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize("kwargs,msg", [ + (dict(duplicates="drop"), None), + (dict(), "Bin edges must be unique"), + (dict(duplicates="raise"), "Bin edges must be unique"), + (dict(duplicates="foo"), "invalid value for 'duplicates' parameter") +]) +def test_cut_duplicates_bin(kwargs, msg): + # see gh-20947 + bins = [0, 2, 4, 6, 10, 10] + values = Series(np.array([1, 3, 5, 7, 9]), index=["a", "b", "c", "d", "e"]) + + if msg is not None: + with pytest.raises(ValueError, match=msg): + cut(values, bins, **kwargs) + else: + result = cut(values, bins, **kwargs) + expected = cut(values, pd.unique(bins)) + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize("data", [9.0, -9.0, 0.0]) +@pytest.mark.parametrize("length", [1, 2]) +def test_single_bin(data, length): + # see gh-14652, gh-15428 + ser = Series([data] * length) + result = cut(ser, 1, labels=False) + + expected = Series([0] * length) + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize( + "array_1_writeable,array_2_writeable", + [(True, True), (True, False), (False, False)]) +def test_cut_read_only(array_1_writeable, array_2_writeable): + # issue 18773 + array_1 = np.arange(0, 100, 10) + array_1.flags.writeable = array_1_writeable + + array_2 = np.arange(0, 100, 10) + array_2.flags.writeable = array_2_writeable + + hundred_elements = np.arange(100) + tm.assert_categorical_equal(cut(hundred_elements, array_1), + cut(hundred_elements, array_2)) + + +@pytest.mark.parametrize("conv", [ + lambda v: Timestamp(v), + lambda v: to_datetime(v), + lambda v: np.datetime64(v), + lambda v: Timestamp(v).to_pydatetime(), +]) +def test_datetime_bin(conv): + data = [np.datetime64("2012-12-13"), np.datetime64("2012-12-15")] + bin_data = ["2012-12-12", "2012-12-14", "2012-12-16"] + + expected = Series(IntervalIndex([ + Interval(Timestamp(bin_data[0]), Timestamp(bin_data[1])), + Interval(Timestamp(bin_data[1]), Timestamp(bin_data[2]))])).astype( + CDT(ordered=True)) + + bins = [conv(v) for v in bin_data] + result = Series(cut(data, bins=bins)) + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize("data", [ + to_datetime(Series(["2013-01-01", "2013-01-02", "2013-01-03"])), + [np.datetime64("2013-01-01"), np.datetime64("2013-01-02"), + np.datetime64("2013-01-03")], + np.array([np.datetime64("2013-01-01"), np.datetime64("2013-01-02"), + np.datetime64("2013-01-03")]), + DatetimeIndex(["2013-01-01", "2013-01-02", "2013-01-03"]) +]) +def test_datetime_cut(data): + # see gh-14714 + # + # Testing time data when it comes in various collection types. + result, _ = cut(data, 3, retbins=True) + expected = Series(IntervalIndex([ + Interval(Timestamp("2012-12-31 23:57:07.200000"), + Timestamp("2013-01-01 16:00:00")), + Interval(Timestamp("2013-01-01 16:00:00"), + Timestamp("2013-01-02 08:00:00")), + Interval(Timestamp("2013-01-02 08:00:00"), + Timestamp("2013-01-03 00:00:00"))])).astype(CDT(ordered=True)) + tm.assert_series_equal(Series(result), expected) + + +@pytest.mark.parametrize("bins", [ + 3, [Timestamp("2013-01-01 04:57:07.200000"), + Timestamp("2013-01-01 21:00:00"), + Timestamp("2013-01-02 13:00:00"), + Timestamp("2013-01-03 05:00:00")]]) +@pytest.mark.parametrize("box", [list, np.array, Index, Series]) +def test_datetime_tz_cut(bins, box): + # see gh-19872 + tz = "US/Eastern" + s = Series(date_range("20130101", periods=3, tz=tz)) + + if not isinstance(bins, int): + bins = box(bins) + + result = cut(s, bins) + expected = Series(IntervalIndex([ + Interval(Timestamp("2012-12-31 23:57:07.200000", tz=tz), + Timestamp("2013-01-01 16:00:00", tz=tz)), + Interval(Timestamp("2013-01-01 16:00:00", tz=tz), + Timestamp("2013-01-02 08:00:00", tz=tz)), + Interval(Timestamp("2013-01-02 08:00:00", tz=tz), + Timestamp("2013-01-03 00:00:00", tz=tz))])).astype( + CDT(ordered=True)) + tm.assert_series_equal(result, expected) + + +def test_datetime_nan_error(): + msg = "bins must be of datetime64 dtype" + + with pytest.raises(ValueError, match=msg): + cut(date_range("20130101", periods=3), bins=[0, 2, 4]) + + +def test_datetime_nan_mask(): + result = cut(date_range("20130102", periods=5), + bins=date_range("20130101", periods=2)) + + mask = result.categories.isna() + tm.assert_numpy_array_equal(mask, np.array([False])) + + mask = result.isna() + tm.assert_numpy_array_equal(mask, np.array([False, True, True, + True, True])) + + +@pytest.mark.parametrize("tz", [None, "UTC", "US/Pacific"]) +def test_datetime_cut_roundtrip(tz): + # see gh-19891 + ser = Series(date_range("20180101", periods=3, tz=tz)) + result, result_bins = cut(ser, 2, retbins=True) + + expected = cut(ser, result_bins) + tm.assert_series_equal(result, expected) + + expected_bins = DatetimeIndex(["2017-12-31 23:57:07.200000", + "2018-01-02 00:00:00", + "2018-01-03 00:00:00"]) + expected_bins = expected_bins.tz_localize(tz) + tm.assert_index_equal(result_bins, expected_bins) + + +def test_timedelta_cut_roundtrip(): + # see gh-19891 + ser = Series(timedelta_range("1day", periods=3)) + result, result_bins = cut(ser, 2, retbins=True) + + expected = cut(ser, result_bins) + tm.assert_series_equal(result, expected) + + expected_bins = TimedeltaIndex(["0 days 23:57:07.200000", + "2 days 00:00:00", + "3 days 00:00:00"]) + tm.assert_index_equal(result_bins, expected_bins) diff --git a/pandas/tests/reshape/test_pivot.py b/pandas/tests/reshape/test_pivot.py index 69572f75fea1b..e32e1999836ec 100644 --- a/pandas/tests/reshape/test_pivot.py +++ b/pandas/tests/reshape/test_pivot.py @@ -451,7 +451,7 @@ def test_pivot_with_list_like_values(self, values, method): [4, 5, 6, 'q', 'w', 't']] index = Index(data=['one', 'two'], name='foo') columns = MultiIndex(levels=[['baz', 'zoo'], ['A', 'B', 'C']], - labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]], + codes=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]], names=[None, 'bar']) expected = DataFrame(data=data, index=index, columns=columns, dtype='object') @@ -482,15 +482,14 @@ def test_pivot_with_list_like_values_nans(self, values, method): ['C', np.nan, 3, np.nan]] index = Index(data=['q', 't', 'w', 'x', 'y', 'z'], name='zoo') columns = MultiIndex(levels=[['bar', 'baz'], ['one', 'two']], - labels=[[0, 0, 1, 1], [0, 1, 0, 1]], + codes=[[0, 0, 1, 1], [0, 1, 0, 1]], names=[None, 'foo']) expected = DataFrame(data=data, index=index, columns=columns, dtype='object') tm.assert_frame_equal(result, expected) @pytest.mark.xfail(reason='MultiIndexed unstack with tuple names fails' - 'with KeyError GH#19966', - strict=True) + 'with KeyError GH#19966') @pytest.mark.parametrize('method', [True, False]) def test_pivot_with_multiindex(self, method): # issue #17160 @@ -502,7 +501,7 @@ def test_pivot_with_multiindex(self, method): ['two', 'B', 5, 'w'], ['two', 'C', 6, 't']] columns = MultiIndex(levels=[['bar', 'baz'], ['first', 'second']], - labels=[[0, 0, 1, 1], [0, 1, 0, 1]]) + codes=[[0, 0, 1, 1], [0, 1, 0, 1]]) df = DataFrame(data=data, index=index, columns=columns, dtype='object') if method: result = df.pivot(index=('bar', 'first'), @@ -616,8 +615,7 @@ def test_margins_dtype(self): tm.assert_frame_equal(expected, result) @pytest.mark.xfail(reason='GH#17035 (len of floats is casted back to ' - 'floats)', - strict=True) + 'floats)') def test_margins_dtype_len(self): mi_val = list(product(['bar', 'foo'], ['one', 'two'])) + [('All', '')] mi = MultiIndex.from_tuples(mi_val, names=('A', 'B')) @@ -1102,8 +1100,7 @@ def test_pivot_table_margins_name_with_aggfunc_list(self): tm.assert_frame_equal(table, expected) @pytest.mark.xfail(reason='GH#17035 (np.mean of ints is casted back to ' - 'ints)', - strict=True) + 'ints)') def test_categorical_margins(self, observed): # GH 10989 df = pd.DataFrame({'x': np.arange(8), @@ -1118,8 +1115,7 @@ def test_categorical_margins(self, observed): tm.assert_frame_equal(table, expected) @pytest.mark.xfail(reason='GH#17035 (np.mean of ints is casted back to ' - 'ints)', - strict=True) + 'ints)') def test_categorical_margins_category(self, observed): df = pd.DataFrame({'x': np.arange(8), 'y': np.arange(8) // 4, @@ -1242,7 +1238,7 @@ def test_pivot_string_as_func(self): result = pivot_table(data, index='A', columns='B', aggfunc='sum') mi = MultiIndex(levels=[['C'], ['one', 'two']], - labels=[[0, 0], [0, 1]], names=[None, 'B']) + codes=[[0, 0], [0, 1]], names=[None, 'B']) expected = DataFrame({('C', 'one'): {'bar': 15, 'foo': 13}, ('C', 'two'): {'bar': 7, 'foo': 20}}, columns=mi).rename_axis('A') @@ -1251,7 +1247,7 @@ def test_pivot_string_as_func(self): result = pivot_table(data, index='A', columns='B', aggfunc=['sum', 'mean']) mi = MultiIndex(levels=[['sum', 'mean'], ['C'], ['one', 'two']], - labels=[[0, 0, 1, 1], [0, 0, 0, 0], [0, 1, 0, 1]], + codes=[[0, 0, 1, 1], [0, 0, 0, 0], [0, 1, 0, 1]], names=[None, None, 'B']) expected = DataFrame({('mean', 'C', 'one'): {'bar': 5.0, 'foo': 3.25}, ('mean', 'C', 'two'): {'bar': 7.0, @@ -1728,8 +1724,8 @@ def test_crosstab_with_numpy_size(self): values=df['D']) expected_index = pd.MultiIndex(levels=[['All', 'one', 'three', 'two'], ['', 'A', 'B', 'C']], - labels=[[1, 1, 1, 2, 2, 2, 3, 3, 3, 0], - [1, 2, 3, 1, 2, 3, 1, 2, 3, 0]], + codes=[[1, 1, 1, 2, 2, 2, 3, 3, 3, 0], + [1, 2, 3, 1, 2, 3, 1, 2, 3, 0]], names=['A', 'B']) expected_column = pd.Index(['bar', 'foo', 'All'], dtype='object', diff --git a/pandas/tests/reshape/test_qcut.py b/pandas/tests/reshape/test_qcut.py new file mode 100644 index 0000000000000..997df7fd7aa4c --- /dev/null +++ b/pandas/tests/reshape/test_qcut.py @@ -0,0 +1,199 @@ +import os + +import numpy as np +import pytest + +from pandas.compat import zip + +from pandas import ( + Categorical, DatetimeIndex, Interval, IntervalIndex, NaT, Series, + TimedeltaIndex, Timestamp, cut, date_range, isna, qcut, timedelta_range) +from pandas.api.types import CategoricalDtype as CDT +from pandas.core.algorithms import quantile +import pandas.util.testing as tm + +from pandas.tseries.offsets import Day, Nano + + +def test_qcut(): + arr = np.random.randn(1000) + + # We store the bins as Index that have been + # rounded to comparisons are a bit tricky. + labels, bins = qcut(arr, 4, retbins=True) + ex_bins = quantile(arr, [0, .25, .5, .75, 1.]) + + result = labels.categories.left.values + assert np.allclose(result, ex_bins[:-1], atol=1e-2) + + result = labels.categories.right.values + assert np.allclose(result, ex_bins[1:], atol=1e-2) + + ex_levels = cut(arr, ex_bins, include_lowest=True) + tm.assert_categorical_equal(labels, ex_levels) + + +def test_qcut_bounds(): + arr = np.random.randn(1000) + + factor = qcut(arr, 10, labels=False) + assert len(np.unique(factor)) == 10 + + +def test_qcut_specify_quantiles(): + arr = np.random.randn(100) + factor = qcut(arr, [0, .25, .5, .75, 1.]) + + expected = qcut(arr, 4) + tm.assert_categorical_equal(factor, expected) + + +def test_qcut_all_bins_same(): + with pytest.raises(ValueError, match="edges.*unique"): + qcut([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 3) + + +def test_qcut_include_lowest(): + values = np.arange(10) + ii = qcut(values, 4) + + ex_levels = IntervalIndex([Interval(-0.001, 2.25), Interval(2.25, 4.5), + Interval(4.5, 6.75), Interval(6.75, 9)]) + tm.assert_index_equal(ii.categories, ex_levels) + + +def test_qcut_nas(): + arr = np.random.randn(100) + arr[:20] = np.nan + + result = qcut(arr, 4) + assert isna(result[:20]).all() + + +def test_qcut_index(): + result = qcut([0, 2], 2) + intervals = [Interval(-0.001, 1), Interval(1, 2)] + + expected = Categorical(intervals, ordered=True) + tm.assert_categorical_equal(result, expected) + + +def test_qcut_binning_issues(datapath): + # see gh-1978, gh-1979 + cut_file = datapath(os.path.join("reshape", "data", "cut_data.csv")) + arr = np.loadtxt(cut_file) + result = qcut(arr, 20) + + starts = [] + ends = [] + + for lev in np.unique(result): + s = lev.left + e = lev.right + assert s != e + + starts.append(float(s)) + ends.append(float(e)) + + for (sp, sn), (ep, en) in zip(zip(starts[:-1], starts[1:]), + zip(ends[:-1], ends[1:])): + assert sp < sn + assert ep < en + assert ep <= sn + + +def test_qcut_return_intervals(): + ser = Series([0, 1, 2, 3, 4, 5, 6, 7, 8]) + res = qcut(ser, [0, 0.333, 0.666, 1]) + + exp_levels = np.array([Interval(-0.001, 2.664), + Interval(2.664, 5.328), Interval(5.328, 8)]) + exp = Series(exp_levels.take([0, 0, 0, 1, 1, 1, 2, 2, 2])).astype( + CDT(ordered=True)) + tm.assert_series_equal(res, exp) + + +@pytest.mark.parametrize("kwargs,msg", [ + (dict(duplicates="drop"), None), + (dict(), "Bin edges must be unique"), + (dict(duplicates="raise"), "Bin edges must be unique"), + (dict(duplicates="foo"), "invalid value for 'duplicates' parameter") +]) +def test_qcut_duplicates_bin(kwargs, msg): + # see gh-7751 + values = [0, 0, 0, 0, 1, 2, 3] + + if msg is not None: + with pytest.raises(ValueError, match=msg): + qcut(values, 3, **kwargs) + else: + result = qcut(values, 3, **kwargs) + expected = IntervalIndex([Interval(-0.001, 1), Interval(1, 3)]) + tm.assert_index_equal(result.categories, expected) + + +@pytest.mark.parametrize("data,start,end", [ + (9.0, 8.999, 9.0), + (0.0, -0.001, 0.0), + (-9.0, -9.001, -9.0), +]) +@pytest.mark.parametrize("length", [1, 2]) +@pytest.mark.parametrize("labels", [None, False]) +def test_single_quantile(data, start, end, length, labels): + # see gh-15431 + ser = Series([data] * length) + result = qcut(ser, 1, labels=labels) + + if labels is None: + intervals = IntervalIndex([Interval(start, end)] * + length, closed="right") + expected = Series(intervals).astype(CDT(ordered=True)) + else: + expected = Series([0] * length) + + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize("ser", [ + Series(DatetimeIndex(["20180101", NaT, "20180103"])), + Series(TimedeltaIndex(["0 days", NaT, "2 days"]))], + ids=lambda x: str(x.dtype)) +def test_qcut_nat(ser): + # see gh-19768 + intervals = IntervalIndex.from_tuples([ + (ser[0] - Nano(), ser[2] - Day()), + np.nan, (ser[2] - Day(), ser[2])]) + expected = Series(Categorical(intervals, ordered=True)) + + result = qcut(ser, 2) + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize("bins", [3, np.linspace(0, 1, 4)]) +def test_datetime_tz_qcut(bins): + # see gh-19872 + tz = "US/Eastern" + ser = Series(date_range("20130101", periods=3, tz=tz)) + + result = qcut(ser, bins) + expected = Series(IntervalIndex([ + Interval(Timestamp("2012-12-31 23:59:59.999999999", tz=tz), + Timestamp("2013-01-01 16:00:00", tz=tz)), + Interval(Timestamp("2013-01-01 16:00:00", tz=tz), + Timestamp("2013-01-02 08:00:00", tz=tz)), + Interval(Timestamp("2013-01-02 08:00:00", tz=tz), + Timestamp("2013-01-03 00:00:00", tz=tz))])).astype( + CDT(ordered=True)) + tm.assert_series_equal(result, expected) + + +@pytest.mark.parametrize("arg,expected_bins", [ + [timedelta_range("1day", periods=3), + TimedeltaIndex(["1 days", "2 days", "3 days"])], + [date_range("20180101", periods=3), + DatetimeIndex(["2018-01-01", "2018-01-02", "2018-01-03"])]]) +def test_date_like_qcut_bins(arg, expected_bins): + # see gh-19891 + ser = Series(arg) + result, result_bins = qcut(ser, 2, retbins=True) + tm.assert_index_equal(result_bins, expected_bins) diff --git a/pandas/tests/reshape/test_reshape.py b/pandas/tests/reshape/test_reshape.py index d8b3d9588f2f1..edbe70d308b96 100644 --- a/pandas/tests/reshape/test_reshape.py +++ b/pandas/tests/reshape/test_reshape.py @@ -5,6 +5,7 @@ from collections import OrderedDict from pandas import DataFrame, Series +from pandas.core.dtypes.common import is_integer_dtype from pandas.core.sparse.api import SparseDtype, SparseArray import pandas as pd @@ -54,23 +55,16 @@ def test_basic(self, sparse, dtype): 'b': [0, 1, 0], 'c': [0, 0, 1]}, dtype=self.effective_dtype(dtype)) - result = get_dummies(s_list, sparse=sparse, dtype=dtype) if sparse: - tm.assert_sp_frame_equal(result, - expected.to_sparse(kind='integer', - fill_value=0)) - else: - assert_frame_equal(result, expected) + expected = expected.apply(pd.SparseArray, fill_value=0.0) + result = get_dummies(s_list, sparse=sparse, dtype=dtype) + assert_frame_equal(result, expected) result = get_dummies(s_series, sparse=sparse, dtype=dtype) - if sparse: - expected = expected.to_sparse(kind='integer', fill_value=0) assert_frame_equal(result, expected) expected.index = list('ABC') result = get_dummies(s_series_index, sparse=sparse, dtype=dtype) - if sparse: - expected.to_sparse(kind='integer', fill_value=0) assert_frame_equal(result, expected) def test_basic_types(self, sparse, dtype): @@ -86,23 +80,27 @@ def test_basic_types(self, sparse, dtype): 'c': [0, 0, 1]}, dtype=self.effective_dtype(dtype), columns=list('abc')) - if not sparse: - compare = tm.assert_frame_equal - else: - expected = expected.to_sparse(fill_value=0, kind='integer') - compare = tm.assert_sp_frame_equal - + if sparse: + if is_integer_dtype(dtype): + fill_value = 0 + elif dtype == bool: + fill_value = False + else: + fill_value = 0.0 + + expected = expected.apply(SparseArray, fill_value=fill_value) result = get_dummies(s_list, sparse=sparse, dtype=dtype) - compare(result, expected) + tm.assert_frame_equal(result, expected) result = get_dummies(s_series, sparse=sparse, dtype=dtype) - compare(result, expected) + tm.assert_frame_equal(result, expected) result = get_dummies(s_df, columns=s_df.columns, sparse=sparse, dtype=dtype) if sparse: - dtype_name = 'Sparse[{}, 0]'.format( - self.effective_dtype(dtype).name + dtype_name = 'Sparse[{}, {}]'.format( + self.effective_dtype(dtype).name, + fill_value ) else: dtype_name = self.effective_dtype(dtype).name @@ -137,14 +135,13 @@ def test_just_na(self, sparse): assert res_series_index.index.tolist() == ['A'] def test_include_na(self, sparse, dtype): - if sparse: - pytest.xfail(reason='nan in index is problematic (GH 16894)') - s = ['a', 'b', np.nan] res = get_dummies(s, sparse=sparse, dtype=dtype) exp = DataFrame({'a': [1, 0, 0], 'b': [0, 1, 0]}, dtype=self.effective_dtype(dtype)) + if sparse: + exp = exp.apply(pd.SparseArray, fill_value=0.0) assert_frame_equal(res, exp) # Sparse dataframes do not allow nan labelled columns, see #GH8822 @@ -156,6 +153,8 @@ def test_include_na(self, sparse, dtype): exp_na = exp_na.reindex(['a', 'b', nan], axis=1) # hack (NaN handling in assert_index_equal) exp_na.columns = res_na.columns + if sparse: + exp_na = exp_na.apply(pd.SparseArray, fill_value=0.0) assert_frame_equal(res_na, exp_na) res_just_na = get_dummies([nan], dummy_na=True, @@ -175,10 +174,8 @@ def test_unicode(self, sparse): u('letter_%s') % eacute: [0, 1, 1]}, dtype=np.uint8) if sparse: - tm.assert_sp_frame_equal(res, exp.to_sparse(fill_value=0, - kind='integer')) - else: - assert_frame_equal(res, exp) + exp = exp.apply(pd.SparseArray, fill_value=0) + assert_frame_equal(res, exp) def test_dataframe_dummies_all_obj(self, df, sparse): df = df[['A', 'B']] @@ -189,16 +186,14 @@ def test_dataframe_dummies_all_obj(self, df, sparse): 'B_c': [0, 0, 1]}, dtype=np.uint8) if sparse: - expected = pd.SparseDataFrame({ + expected = pd.DataFrame({ "A_a": pd.SparseArray([1, 0, 1], dtype='uint8'), "A_b": pd.SparseArray([0, 1, 0], dtype='uint8'), "B_b": pd.SparseArray([1, 1, 0], dtype='uint8'), "B_c": pd.SparseArray([0, 0, 1], dtype='uint8'), }) - tm.assert_sp_frame_equal(result, expected) - else: - assert_frame_equal(result, expected) + assert_frame_equal(result, expected) def test_dataframe_dummies_mix_default(self, df, sparse, dtype): result = get_dummies(df, sparse=sparse, dtype=dtype) @@ -402,7 +397,7 @@ def test_basic_drop_first(self, sparse): result = get_dummies(s_list, drop_first=True, sparse=sparse) if sparse: - expected = expected.to_sparse(fill_value=0, kind='integer') + expected = expected.apply(pd.SparseArray, fill_value=0) assert_frame_equal(result, expected) result = get_dummies(s_series, drop_first=True, sparse=sparse) @@ -436,7 +431,7 @@ def test_basic_drop_first_NA(self, sparse): res = get_dummies(s_NA, drop_first=True, sparse=sparse) exp = DataFrame({'b': [0, 1, 0]}, dtype=np.uint8) if sparse: - exp = exp.to_sparse(fill_value=0, kind='integer') + exp = exp.apply(pd.SparseArray, fill_value=0) assert_frame_equal(res, exp) @@ -447,7 +442,7 @@ def test_basic_drop_first_NA(self, sparse): nan: [0, 0, 1]}, dtype=np.uint8).reindex(['b', nan], axis=1) if sparse: - exp_na = exp_na.to_sparse(fill_value=0, kind='integer') + exp_na = exp_na.apply(pd.SparseArray, fill_value=0) assert_frame_equal(res_na, exp_na) res_just_na = get_dummies([nan], dummy_na=True, drop_first=True, @@ -462,7 +457,7 @@ def test_dataframe_dummies_drop_first(self, df, sparse): 'B_c': [0, 0, 1]}, dtype=np.uint8) if sparse: - expected = expected.to_sparse(fill_value=0, kind='integer') + expected = expected.apply(pd.SparseArray, fill_value=0) assert_frame_equal(result, expected) def test_dataframe_dummies_drop_first_with_categorical( @@ -613,7 +608,7 @@ def test_preserve_categorical_dtype(self): for ordered in [False, True]: cidx = pd.CategoricalIndex(list("xyz"), ordered=ordered) midx = pd.MultiIndex(levels=[['a'], cidx], - labels=[[0, 0], [0, 1]]) + codes=[[0, 0], [0, 1]]) df = DataFrame([[10, 11]], index=midx) expected = DataFrame([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]], diff --git a/pandas/tests/reshape/test_tile.py b/pandas/tests/reshape/test_tile.py deleted file mode 100644 index f04e9a55a6c8d..0000000000000 --- a/pandas/tests/reshape/test_tile.py +++ /dev/null @@ -1,643 +0,0 @@ -import os -import pytest - -import numpy as np -from pandas.compat import zip - -import pandas as pd -from pandas import (DataFrame, Series, isna, to_datetime, DatetimeIndex, Index, - Timestamp, Interval, IntervalIndex, Categorical, - cut, qcut, date_range, timedelta_range, NaT, - TimedeltaIndex) -from pandas.tseries.offsets import Nano, Day -import pandas.util.testing as tm -from pandas.api.types import CategoricalDtype as CDT - -from pandas.core.algorithms import quantile -import pandas.core.reshape.tile as tmod - - -class TestCut(object): - - def test_simple(self): - data = np.ones(5, dtype='int64') - result = cut(data, 4, labels=False) - expected = np.array([1, 1, 1, 1, 1]) - tm.assert_numpy_array_equal(result, expected, - check_dtype=False) - - def test_bins(self): - data = np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]) - result, bins = cut(data, 3, retbins=True) - - intervals = IntervalIndex.from_breaks(bins.round(3)) - intervals = intervals.take([0, 0, 0, 1, 2, 0]) - expected = Categorical(intervals, ordered=True) - tm.assert_categorical_equal(result, expected) - tm.assert_almost_equal(bins, np.array([0.1905, 3.36666667, - 6.53333333, 9.7])) - - def test_right(self): - data = np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1, 2.575]) - result, bins = cut(data, 4, right=True, retbins=True) - intervals = IntervalIndex.from_breaks(bins.round(3)) - expected = Categorical(intervals, ordered=True) - expected = expected.take([0, 0, 0, 2, 3, 0, 0]) - tm.assert_categorical_equal(result, expected) - tm.assert_almost_equal(bins, np.array([0.1905, 2.575, 4.95, - 7.325, 9.7])) - - def test_noright(self): - data = np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1, 2.575]) - result, bins = cut(data, 4, right=False, retbins=True) - intervals = IntervalIndex.from_breaks(bins.round(3), closed='left') - intervals = intervals.take([0, 0, 0, 2, 3, 0, 1]) - expected = Categorical(intervals, ordered=True) - tm.assert_categorical_equal(result, expected) - tm.assert_almost_equal(bins, np.array([0.2, 2.575, 4.95, - 7.325, 9.7095])) - - def test_arraylike(self): - data = [.2, 1.4, 2.5, 6.2, 9.7, 2.1] - result, bins = cut(data, 3, retbins=True) - intervals = IntervalIndex.from_breaks(bins.round(3)) - intervals = intervals.take([0, 0, 0, 1, 2, 0]) - expected = Categorical(intervals, ordered=True) - tm.assert_categorical_equal(result, expected) - tm.assert_almost_equal(bins, np.array([0.1905, 3.36666667, - 6.53333333, 9.7])) - - def test_bins_from_intervalindex(self): - c = cut(range(5), 3) - expected = c - result = cut(range(5), bins=expected.categories) - tm.assert_categorical_equal(result, expected) - - expected = Categorical.from_codes(np.append(c.codes, -1), - categories=c.categories, - ordered=True) - result = cut(range(6), bins=expected.categories) - tm.assert_categorical_equal(result, expected) - - # doc example - # make sure we preserve the bins - ages = np.array([10, 15, 13, 12, 23, 25, 28, 59, 60]) - c = cut(ages, bins=[0, 18, 35, 70]) - expected = IntervalIndex.from_tuples([(0, 18), (18, 35), (35, 70)]) - tm.assert_index_equal(c.categories, expected) - - result = cut([25, 20, 50], bins=c.categories) - tm.assert_index_equal(result.categories, expected) - tm.assert_numpy_array_equal(result.codes, - np.array([1, 1, 2], dtype='int8')) - - def test_bins_not_monotonic(self): - data = [.2, 1.4, 2.5, 6.2, 9.7, 2.1] - pytest.raises(ValueError, cut, data, [0.1, 1.5, 1, 10]) - - def test_wrong_num_labels(self): - data = [.2, 1.4, 2.5, 6.2, 9.7, 2.1] - pytest.raises(ValueError, cut, data, [0, 1, 10], - labels=['foo', 'bar', 'baz']) - - def test_cut_corner(self): - # h3h - pytest.raises(ValueError, cut, [], 2) - - pytest.raises(ValueError, cut, [1, 2, 3], 0.5) - - @pytest.mark.parametrize('arg', [2, np.eye(2), DataFrame(np.eye(2))]) - @pytest.mark.parametrize('cut_func', [cut, qcut]) - def test_cut_not_1d_arg(self, arg, cut_func): - with pytest.raises(ValueError): - cut_func(arg, 2) - - def test_cut_out_of_range_more(self): - # #1511 - s = Series([0, -1, 0, 1, -3], name='x') - ind = cut(s, [0, 1], labels=False) - exp = Series([np.nan, np.nan, np.nan, 0, np.nan], name='x') - tm.assert_series_equal(ind, exp) - - def test_labels(self): - arr = np.tile(np.arange(0, 1.01, 0.1), 4) - - result, bins = cut(arr, 4, retbins=True) - ex_levels = IntervalIndex.from_breaks([-1e-3, 0.25, 0.5, 0.75, 1]) - tm.assert_index_equal(result.categories, ex_levels) - - result, bins = cut(arr, 4, retbins=True, right=False) - ex_levels = IntervalIndex.from_breaks([0, 0.25, 0.5, 0.75, 1 + 1e-3], - closed='left') - tm.assert_index_equal(result.categories, ex_levels) - - def test_cut_pass_series_name_to_factor(self): - s = Series(np.random.randn(100), name='foo') - - factor = cut(s, 4) - assert factor.name == 'foo' - - def test_label_precision(self): - arr = np.arange(0, 0.73, 0.01) - - result = cut(arr, 4, precision=2) - ex_levels = IntervalIndex.from_breaks([-0.00072, 0.18, 0.36, - 0.54, 0.72]) - tm.assert_index_equal(result.categories, ex_levels) - - def test_na_handling(self): - arr = np.arange(0, 0.75, 0.01) - arr[::3] = np.nan - - result = cut(arr, 4) - - result_arr = np.asarray(result) - - ex_arr = np.where(isna(arr), np.nan, result_arr) - - tm.assert_almost_equal(result_arr, ex_arr) - - result = cut(arr, 4, labels=False) - ex_result = np.where(isna(arr), np.nan, result) - tm.assert_almost_equal(result, ex_result) - - def test_inf_handling(self): - data = np.arange(6) - data_ser = Series(data, dtype='int64') - - bins = [-np.inf, 2, 4, np.inf] - result = cut(data, bins) - result_ser = cut(data_ser, bins) - - ex_uniques = IntervalIndex.from_breaks(bins) - tm.assert_index_equal(result.categories, ex_uniques) - assert result[5] == Interval(4, np.inf) - assert result[0] == Interval(-np.inf, 2) - assert result_ser[5] == Interval(4, np.inf) - assert result_ser[0] == Interval(-np.inf, 2) - - def test_qcut(self): - arr = np.random.randn(1000) - - # We store the bins as Index that have been rounded - # to comparisons are a bit tricky. - labels, bins = qcut(arr, 4, retbins=True) - ex_bins = quantile(arr, [0, .25, .5, .75, 1.]) - result = labels.categories.left.values - assert np.allclose(result, ex_bins[:-1], atol=1e-2) - result = labels.categories.right.values - assert np.allclose(result, ex_bins[1:], atol=1e-2) - - ex_levels = cut(arr, ex_bins, include_lowest=True) - tm.assert_categorical_equal(labels, ex_levels) - - def test_qcut_bounds(self): - arr = np.random.randn(1000) - - factor = qcut(arr, 10, labels=False) - assert len(np.unique(factor)) == 10 - - def test_qcut_specify_quantiles(self): - arr = np.random.randn(100) - - factor = qcut(arr, [0, .25, .5, .75, 1.]) - expected = qcut(arr, 4) - tm.assert_categorical_equal(factor, expected) - - def test_qcut_all_bins_same(self): - with pytest.raises(ValueError, match="edges.*unique"): - qcut([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 3) - - def test_cut_out_of_bounds(self): - arr = np.random.randn(100) - - result = cut(arr, [-1, 0, 1]) - - mask = isna(result) - ex_mask = (arr < -1) | (arr > 1) - tm.assert_numpy_array_equal(mask, ex_mask) - - def test_cut_pass_labels(self): - arr = [50, 5, 10, 15, 20, 30, 70] - bins = [0, 25, 50, 100] - labels = ['Small', 'Medium', 'Large'] - - result = cut(arr, bins, labels=labels) - exp = Categorical(['Medium'] + 4 * ['Small'] + ['Medium', 'Large'], - categories=labels, - ordered=True) - tm.assert_categorical_equal(result, exp) - - result = cut(arr, bins, labels=Categorical.from_codes([0, 1, 2], - labels)) - exp = Categorical.from_codes([1] + 4 * [0] + [1, 2], labels) - tm.assert_categorical_equal(result, exp) - - # issue 16459 - labels = ['Good', 'Medium', 'Bad'] - result = cut(arr, 3, labels=labels) - exp = cut(arr, 3, labels=Categorical(labels, categories=labels, - ordered=True)) - tm.assert_categorical_equal(result, exp) - - def test_qcut_include_lowest(self): - values = np.arange(10) - - ii = qcut(values, 4) - - ex_levels = IntervalIndex( - [Interval(-0.001, 2.25), - Interval(2.25, 4.5), - Interval(4.5, 6.75), - Interval(6.75, 9)]) - tm.assert_index_equal(ii.categories, ex_levels) - - def test_qcut_nas(self): - arr = np.random.randn(100) - arr[:20] = np.nan - - result = qcut(arr, 4) - assert isna(result[:20]).all() - - def test_qcut_index(self): - result = qcut([0, 2], 2) - intervals = [Interval(-0.001, 1), Interval(1, 2)] - expected = Categorical(intervals, ordered=True) - tm.assert_categorical_equal(result, expected) - - def test_round_frac(self): - # it works - result = cut(np.arange(11.), 2) - - result = cut(np.arange(11.) / 1e10, 2) - - # #1979, negative numbers - - result = tmod._round_frac(-117.9998, precision=3) - assert result == -118 - result = tmod._round_frac(117.9998, precision=3) - assert result == 118 - - result = tmod._round_frac(117.9998, precision=2) - assert result == 118 - result = tmod._round_frac(0.000123456, precision=2) - assert result == 0.00012 - - def test_qcut_binning_issues(self, datapath): - # #1978, 1979 - cut_file = datapath(os.path.join('reshape', 'data', 'cut_data.csv')) - arr = np.loadtxt(cut_file) - - result = qcut(arr, 20) - - starts = [] - ends = [] - for lev in np.unique(result): - s = lev.left - e = lev.right - assert s != e - - starts.append(float(s)) - ends.append(float(e)) - - for (sp, sn), (ep, en) in zip(zip(starts[:-1], starts[1:]), - zip(ends[:-1], ends[1:])): - assert sp < sn - assert ep < en - assert ep <= sn - - def test_cut_return_intervals(self): - s = Series([0, 1, 2, 3, 4, 5, 6, 7, 8]) - res = cut(s, 3) - exp_bins = np.linspace(0, 8, num=4).round(3) - exp_bins[0] -= 0.008 - exp = Series(IntervalIndex.from_breaks(exp_bins, closed='right').take( - [0, 0, 0, 1, 1, 1, 2, 2, 2])).astype(CDT(ordered=True)) - tm.assert_series_equal(res, exp) - - def test_qcut_return_intervals(self): - s = Series([0, 1, 2, 3, 4, 5, 6, 7, 8]) - res = qcut(s, [0, 0.333, 0.666, 1]) - exp_levels = np.array([Interval(-0.001, 2.664), - Interval(2.664, 5.328), Interval(5.328, 8)]) - exp = Series(exp_levels.take([0, 0, 0, 1, 1, 1, 2, 2, 2])).astype( - CDT(ordered=True)) - tm.assert_series_equal(res, exp) - - def test_series_retbins(self): - # GH 8589 - s = Series(np.arange(4)) - result, bins = cut(s, 2, retbins=True) - expected = Series(IntervalIndex.from_breaks( - [-0.003, 1.5, 3], closed='right').repeat(2)).astype( - CDT(ordered=True)) - tm.assert_series_equal(result, expected) - - result, bins = qcut(s, 2, retbins=True) - expected = Series(IntervalIndex.from_breaks( - [-0.001, 1.5, 3], closed='right').repeat(2)).astype( - CDT(ordered=True)) - tm.assert_series_equal(result, expected) - - def test_cut_duplicates_bin(self): - # issue 20947 - values = Series(np.array([1, 3, 5, 7, 9]), - index=["a", "b", "c", "d", "e"]) - bins = [0, 2, 4, 6, 10, 10] - result = cut(values, bins, duplicates='drop') - expected = cut(values, pd.unique(bins)) - tm.assert_series_equal(result, expected) - - pytest.raises(ValueError, cut, values, bins) - pytest.raises(ValueError, cut, values, bins, duplicates='raise') - - # invalid - pytest.raises(ValueError, cut, values, bins, duplicates='foo') - - def test_qcut_duplicates_bin(self): - # GH 7751 - values = [0, 0, 0, 0, 1, 2, 3] - expected = IntervalIndex([Interval(-0.001, 1), Interval(1, 3)]) - - result = qcut(values, 3, duplicates='drop') - tm.assert_index_equal(result.categories, expected) - - pytest.raises(ValueError, qcut, values, 3) - pytest.raises(ValueError, qcut, values, 3, duplicates='raise') - - # invalid - pytest.raises(ValueError, qcut, values, 3, duplicates='foo') - - def test_single_quantile(self): - # issue 15431 - expected = Series([0, 0]) - - s = Series([9., 9.]) - result = qcut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - result = qcut(s, 1) - intervals = IntervalIndex([Interval(8.999, 9.0), - Interval(8.999, 9.0)], closed='right') - expected = Series(intervals).astype(CDT(ordered=True)) - tm.assert_series_equal(result, expected) - - s = Series([-9., -9.]) - expected = Series([0, 0]) - result = qcut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - result = qcut(s, 1) - intervals = IntervalIndex([Interval(-9.001, -9.0), - Interval(-9.001, -9.0)], closed='right') - expected = Series(intervals).astype(CDT(ordered=True)) - tm.assert_series_equal(result, expected) - - s = Series([0., 0.]) - expected = Series([0, 0]) - result = qcut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - result = qcut(s, 1) - intervals = IntervalIndex([Interval(-0.001, 0.0), - Interval(-0.001, 0.0)], closed='right') - expected = Series(intervals).astype(CDT(ordered=True)) - tm.assert_series_equal(result, expected) - - s = Series([9]) - expected = Series([0]) - result = qcut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - result = qcut(s, 1) - intervals = IntervalIndex([Interval(8.999, 9.0)], closed='right') - expected = Series(intervals).astype(CDT(ordered=True)) - tm.assert_series_equal(result, expected) - - s = Series([-9]) - expected = Series([0]) - result = qcut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - result = qcut(s, 1) - intervals = IntervalIndex([Interval(-9.001, -9.0)], closed='right') - expected = Series(intervals).astype(CDT(ordered=True)) - tm.assert_series_equal(result, expected) - - s = Series([0]) - expected = Series([0]) - result = qcut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - result = qcut(s, 1) - intervals = IntervalIndex([Interval(-0.001, 0.0)], closed='right') - expected = Series(intervals).astype(CDT(ordered=True)) - tm.assert_series_equal(result, expected) - - def test_single_bin(self): - # issue 14652 - expected = Series([0, 0]) - - s = Series([9., 9.]) - result = cut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - - s = Series([-9., -9.]) - result = cut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - - expected = Series([0]) - - s = Series([9]) - result = cut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - - s = Series([-9]) - result = cut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - - # issue 15428 - expected = Series([0, 0]) - - s = Series([0., 0.]) - result = cut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - - expected = Series([0]) - - s = Series([0]) - result = cut(s, 1, labels=False) - tm.assert_series_equal(result, expected) - - @pytest.mark.parametrize( - "array_1_writeable, array_2_writeable", - [(True, True), (True, False), (False, False)]) - def test_cut_read_only(self, array_1_writeable, array_2_writeable): - # issue 18773 - array_1 = np.arange(0, 100, 10) - array_1.flags.writeable = array_1_writeable - - array_2 = np.arange(0, 100, 10) - array_2.flags.writeable = array_2_writeable - - hundred_elements = np.arange(100) - - tm.assert_categorical_equal(cut(hundred_elements, array_1), - cut(hundred_elements, array_2)) - - -class TestDatelike(object): - - @pytest.mark.parametrize('s', [ - Series(DatetimeIndex(['20180101', NaT, '20180103'])), - Series(TimedeltaIndex(['0 days', NaT, '2 days']))], - ids=lambda x: str(x.dtype)) - def test_qcut_nat(self, s): - # GH 19768 - intervals = IntervalIndex.from_tuples( - [(s[0] - Nano(), s[2] - Day()), np.nan, (s[2] - Day(), s[2])]) - expected = Series(Categorical(intervals, ordered=True)) - result = qcut(s, 2) - tm.assert_series_equal(result, expected) - - def test_datetime_cut(self): - # GH 14714 - # testing for time data to be present as series - data = to_datetime(Series(['2013-01-01', '2013-01-02', '2013-01-03'])) - - result, bins = cut(data, 3, retbins=True) - expected = ( - Series(IntervalIndex([ - Interval(Timestamp('2012-12-31 23:57:07.200000'), - Timestamp('2013-01-01 16:00:00')), - Interval(Timestamp('2013-01-01 16:00:00'), - Timestamp('2013-01-02 08:00:00')), - Interval(Timestamp('2013-01-02 08:00:00'), - Timestamp('2013-01-03 00:00:00'))])) - .astype(CDT(ordered=True))) - - tm.assert_series_equal(result, expected) - - # testing for time data to be present as list - data = [np.datetime64('2013-01-01'), np.datetime64('2013-01-02'), - np.datetime64('2013-01-03')] - result, bins = cut(data, 3, retbins=True) - tm.assert_series_equal(Series(result), expected) - - # testing for time data to be present as ndarray - data = np.array([np.datetime64('2013-01-01'), - np.datetime64('2013-01-02'), - np.datetime64('2013-01-03')]) - result, bins = cut(data, 3, retbins=True) - tm.assert_series_equal(Series(result), expected) - - # testing for time data to be present as datetime index - data = DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03']) - result, bins = cut(data, 3, retbins=True) - tm.assert_series_equal(Series(result), expected) - - @pytest.mark.parametrize('bins', [ - 3, [Timestamp('2013-01-01 04:57:07.200000'), - Timestamp('2013-01-01 21:00:00'), - Timestamp('2013-01-02 13:00:00'), - Timestamp('2013-01-03 05:00:00')]]) - @pytest.mark.parametrize('box', [list, np.array, Index, Series]) - def test_datetimetz_cut(self, bins, box): - # GH 19872 - tz = 'US/Eastern' - s = Series(date_range('20130101', periods=3, tz=tz)) - if not isinstance(bins, int): - bins = box(bins) - result = cut(s, bins) - expected = ( - Series(IntervalIndex([ - Interval(Timestamp('2012-12-31 23:57:07.200000', tz=tz), - Timestamp('2013-01-01 16:00:00', tz=tz)), - Interval(Timestamp('2013-01-01 16:00:00', tz=tz), - Timestamp('2013-01-02 08:00:00', tz=tz)), - Interval(Timestamp('2013-01-02 08:00:00', tz=tz), - Timestamp('2013-01-03 00:00:00', tz=tz))])) - .astype(CDT(ordered=True))) - tm.assert_series_equal(result, expected) - - @pytest.mark.parametrize('bins', [3, np.linspace(0, 1, 4)]) - def test_datetimetz_qcut(self, bins): - # GH 19872 - tz = 'US/Eastern' - s = Series(date_range('20130101', periods=3, tz=tz)) - result = qcut(s, bins) - expected = ( - Series(IntervalIndex([ - Interval(Timestamp('2012-12-31 23:59:59.999999999', tz=tz), - Timestamp('2013-01-01 16:00:00', tz=tz)), - Interval(Timestamp('2013-01-01 16:00:00', tz=tz), - Timestamp('2013-01-02 08:00:00', tz=tz)), - Interval(Timestamp('2013-01-02 08:00:00', tz=tz), - Timestamp('2013-01-03 00:00:00', tz=tz))])) - .astype(CDT(ordered=True))) - tm.assert_series_equal(result, expected) - - def test_datetime_bin(self): - data = [np.datetime64('2012-12-13'), np.datetime64('2012-12-15')] - bin_data = ['2012-12-12', '2012-12-14', '2012-12-16'] - expected = ( - Series(IntervalIndex([ - Interval(Timestamp(bin_data[0]), Timestamp(bin_data[1])), - Interval(Timestamp(bin_data[1]), Timestamp(bin_data[2]))])) - .astype(CDT(ordered=True))) - - for conv in [Timestamp, Timestamp, np.datetime64]: - bins = [conv(v) for v in bin_data] - result = cut(data, bins=bins) - tm.assert_series_equal(Series(result), expected) - - bin_pydatetime = [Timestamp(v).to_pydatetime() for v in bin_data] - result = cut(data, bins=bin_pydatetime) - tm.assert_series_equal(Series(result), expected) - - bins = to_datetime(bin_data) - result = cut(data, bins=bin_pydatetime) - tm.assert_series_equal(Series(result), expected) - - def test_datetime_nan(self): - - def f(): - cut(date_range('20130101', periods=3), bins=[0, 2, 4]) - pytest.raises(ValueError, f) - - result = cut(date_range('20130102', periods=5), - bins=date_range('20130101', periods=2)) - mask = result.categories.isna() - tm.assert_numpy_array_equal(mask, np.array([False])) - mask = result.isna() - tm.assert_numpy_array_equal( - mask, np.array([False, True, True, True, True])) - - @pytest.mark.parametrize('tz', [None, 'UTC', 'US/Pacific']) - def test_datetime_cut_roundtrip(self, tz): - # GH 19891 - s = Series(date_range('20180101', periods=3, tz=tz)) - result, result_bins = cut(s, 2, retbins=True) - expected = cut(s, result_bins) - tm.assert_series_equal(result, expected) - expected_bins = DatetimeIndex(['2017-12-31 23:57:07.200000', - '2018-01-02 00:00:00', - '2018-01-03 00:00:00']) - expected_bins = expected_bins.tz_localize(tz) - tm.assert_index_equal(result_bins, expected_bins) - - def test_timedelta_cut_roundtrip(self): - # GH 19891 - s = Series(timedelta_range('1day', periods=3)) - result, result_bins = cut(s, 2, retbins=True) - expected = cut(s, result_bins) - tm.assert_series_equal(result, expected) - expected_bins = TimedeltaIndex(['0 days 23:57:07.200000', - '2 days 00:00:00', - '3 days 00:00:00']) - tm.assert_index_equal(result_bins, expected_bins) - - @pytest.mark.parametrize('arg, expected_bins', [ - [timedelta_range('1day', periods=3), - TimedeltaIndex(['1 days', '2 days', '3 days'])], - [date_range('20180101', periods=3), - DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03'])]]) - def test_datelike_qcut_bins(self, arg, expected_bins): - # GH 19891 - s = Series(arg) - result, result_bins = qcut(s, 2, retbins=True) - tm.assert_index_equal(result_bins, expected_bins) diff --git a/pandas/tests/scalar/period/test_asfreq.py b/pandas/tests/scalar/period/test_asfreq.py index 064d1a96878c2..f46f2da6c076d 100644 --- a/pandas/tests/scalar/period/test_asfreq.py +++ b/pandas/tests/scalar/period/test_asfreq.py @@ -5,7 +5,6 @@ from pandas.errors import OutOfBoundsDatetime from pandas import Period, offsets -from pandas.util import testing as tm class TestFreqConversion(object): @@ -16,17 +15,15 @@ def test_asfreq_near_zero(self, freq): per = Period('0001-01-01', freq=freq) tup1 = (per.year, per.hour, per.day) - with tm.assert_produces_warning(FutureWarning): - prev = per - 1 + prev = per - 1 assert prev.ordinal == per.ordinal - 1 tup2 = (prev.year, prev.month, prev.day) assert tup2 < tup1 def test_asfreq_near_zero_weekly(self): # GH#19834 - with tm.assert_produces_warning(FutureWarning): - per1 = Period('0001-01-01', 'D') + 6 - per2 = Period('0001-01-01', 'D') - 6 + per1 = Period('0001-01-01', 'D') + 6 + per2 = Period('0001-01-01', 'D') - 6 week1 = per1.asfreq('W') week2 = per2.asfreq('W') assert week1 != week2 @@ -34,8 +31,7 @@ def test_asfreq_near_zero_weekly(self): assert week2.asfreq('D', 'S') <= per2 @pytest.mark.xfail(reason='GH#19643 period_helper asfreq functions fail ' - 'to check for overflows', - strict=True) + 'to check for overflows') def test_to_timestamp_out_of_bounds(self): # GH#19643, currently gives Timestamp('1754-08-30 22:43:41.128654848') per = Period('0001-01-01', freq='B') diff --git a/pandas/tests/scalar/period/test_period.py b/pandas/tests/scalar/period/test_period.py index 4d3aa1109c120..d0f87618ad3af 100644 --- a/pandas/tests/scalar/period/test_period.py +++ b/pandas/tests/scalar/period/test_period.py @@ -303,11 +303,10 @@ def test_multiples(self): assert result1.freq == offsets.YearEnd(2) assert result2.freq == offsets.YearEnd() - with tm.assert_produces_warning(FutureWarning): - assert (result1 + 1).ordinal == result1.ordinal + 2 - assert (1 + result1).ordinal == result1.ordinal + 2 - assert (result1 - 1).ordinal == result2.ordinal - 2 - assert (-1 + result1).ordinal == result2.ordinal - 2 + assert (result1 + 1).ordinal == result1.ordinal + 2 + assert (1 + result1).ordinal == result1.ordinal + 2 + assert (result1 - 1).ordinal == result2.ordinal - 2 + assert (-1 + result1).ordinal == result2.ordinal - 2 @pytest.mark.parametrize('month', MONTHS) def test_period_cons_quarterly(self, month): @@ -331,8 +330,7 @@ def test_period_cons_annual(self, month): stamp = exp.to_timestamp('D', how='end') + timedelta(days=30) p = Period(stamp, freq=freq) - with tm.assert_produces_warning(FutureWarning): - assert p == exp + 1 + assert p == exp + 1 assert isinstance(p, Period) @pytest.mark.parametrize('day', DAYS) @@ -385,16 +383,14 @@ def test_period_cons_mult(self): assert p2.freq == offsets.MonthEnd() assert p2.freqstr == 'M' - with tm.assert_produces_warning(FutureWarning): - result = p1 + 1 - assert result.ordinal == (p2 + 3).ordinal + result = p1 + 1 + assert result.ordinal == (p2 + 3).ordinal assert result.freq == p1.freq assert result.freqstr == '3M' - with tm.assert_produces_warning(FutureWarning): - result = p1 - 1 - assert result.ordinal == (p2 - 3).ordinal + result = p1 - 1 + assert result.ordinal == (p2 - 3).ordinal assert result.freq == p1.freq assert result.freqstr == '3M' @@ -428,27 +424,23 @@ def test_period_cons_combined(self): assert p3.freq == offsets.Hour() assert p3.freqstr == 'H' - with tm.assert_produces_warning(FutureWarning): - result = p1 + 1 - assert result.ordinal == (p3 + 25).ordinal + result = p1 + 1 + assert result.ordinal == (p3 + 25).ordinal assert result.freq == p1.freq assert result.freqstr == '25H' - with tm.assert_produces_warning(FutureWarning): - result = p2 + 1 - assert result.ordinal == (p3 + 25).ordinal + result = p2 + 1 + assert result.ordinal == (p3 + 25).ordinal assert result.freq == p2.freq assert result.freqstr == '25H' - with tm.assert_produces_warning(FutureWarning): - result = p1 - 1 - assert result.ordinal == (p3 - 25).ordinal + result = p1 - 1 + assert result.ordinal == (p3 - 25).ordinal assert result.freq == p1.freq assert result.freqstr == '25H' - with tm.assert_produces_warning(FutureWarning): - result = p2 - 1 - assert result.ordinal == (p3 - 25).ordinal + result = p2 - 1 + assert result.ordinal == (p3 - 25).ordinal assert result.freq == p2.freq assert result.freqstr == '25H' @@ -803,16 +795,14 @@ def test_properties_quarterly(self): # for x in range(3): for qd in (qedec_date, qejan_date, qejun_date): - with tm.assert_produces_warning(FutureWarning): - assert (qd + x).qyear == 2007 - assert (qd + x).quarter == x + 1 + assert (qd + x).qyear == 2007 + assert (qd + x).quarter == x + 1 def test_properties_monthly(self): # Test properties on Periods with daily frequency. m_date = Period(freq='M', year=2007, month=1) for x in range(11): - with tm.assert_produces_warning(FutureWarning): - m_ival_x = m_date + x + m_ival_x = m_date + x assert m_ival_x.year == 2007 if 1 <= x + 1 <= 3: assert m_ival_x.quarter == 1 @@ -832,8 +822,7 @@ def test_properties_weekly(self): assert w_date.quarter == 1 assert w_date.month == 1 assert w_date.week == 1 - with tm.assert_produces_warning(FutureWarning): - assert (w_date - 1).week == 52 + assert (w_date - 1).week == 52 assert w_date.days_in_month == 31 assert Period(freq='W', year=2012, month=2, day=1).days_in_month == 29 @@ -845,8 +834,7 @@ def test_properties_weekly_legacy(self): assert w_date.quarter == 1 assert w_date.month == 1 assert w_date.week == 1 - with tm.assert_produces_warning(FutureWarning): - assert (w_date - 1).week == 52 + assert (w_date - 1).week == 52 assert w_date.days_in_month == 31 exp = Period(freq='W', year=2012, month=2, day=1) @@ -1039,9 +1027,8 @@ def test_sub_delta(self): def test_add_integer(self): per1 = Period(freq='D', year=2008, month=1, day=1) per2 = Period(freq='D', year=2008, month=1, day=2) - with tm.assert_produces_warning(FutureWarning): - assert per1 + 1 == per2 - assert 1 + per1 == per2 + assert per1 + 1 == per2 + assert 1 + per1 == per2 def test_add_sub_nat(self): # GH#13071 @@ -1106,6 +1093,38 @@ def test_sub(self): with pytest.raises(period.IncompatibleFrequency, match=msg): per1 - Period('2011-02', freq='M') + @pytest.mark.parametrize('n', [1, 2, 3, 4]) + def test_sub_n_gt_1_ticks(self, tick_classes, n): + # GH 23878 + p1 = pd.Period('19910905', freq=tick_classes(n)) + p2 = pd.Period('19920406', freq=tick_classes(n)) + + expected = (pd.Period(str(p2), freq=p2.freq.base) + - pd.Period(str(p1), freq=p1.freq.base)) + + assert (p2 - p1) == expected + + @pytest.mark.parametrize('normalize', [True, False]) + @pytest.mark.parametrize('n', [1, 2, 3, 4]) + @pytest.mark.parametrize('offset, kwd_name', [ + (pd.offsets.YearEnd, 'month'), + (pd.offsets.QuarterEnd, 'startingMonth'), + (pd.offsets.MonthEnd, None), + (pd.offsets.Week, 'weekday') + ]) + def test_sub_n_gt_1_offsets(self, offset, kwd_name, n, normalize): + # GH 23878 + kwds = {kwd_name: 3} if kwd_name is not None else {} + p1_d = '19910905' + p2_d = '19920406' + p1 = pd.Period(p1_d, freq=offset(n, normalize, **kwds)) + p2 = pd.Period(p2_d, freq=offset(n, normalize, **kwds)) + + expected = (pd.Period(p2_d, freq=p2.freq.base) + - pd.Period(p1_d, freq=p1.freq.base)) + + assert (p2 - p1) == expected + def test_add_offset(self): # freq is DateOffset for freq in ['A', '2A', '3A']: @@ -1468,7 +1487,8 @@ def test_period_immutable(): # TODO: This doesn't fail on all systems; track down which -@pytest.mark.xfail(reason="Parses as Jan 1, 0007 on some systems") +@pytest.mark.xfail(reason="Parses as Jan 1, 0007 on some systems", + strict=False) def test_small_year_parsing(): per1 = Period('0001-01-07', 'D') assert per1.year == 1 diff --git a/pandas/tests/scalar/test_nat.py b/pandas/tests/scalar/test_nat.py index ddf3984744114..abf95b276cda1 100644 --- a/pandas/tests/scalar/test_nat.py +++ b/pandas/tests/scalar/test_nat.py @@ -4,25 +4,25 @@ import pytest import pytz -from pandas._libs.tslib import iNaT +from pandas._libs.tslibs import iNaT +import pandas.compat as compat from pandas import ( DatetimeIndex, Index, NaT, Period, Series, Timedelta, TimedeltaIndex, - Timestamp, isna) + Timestamp) from pandas.core.arrays import PeriodArray from pandas.util import testing as tm -@pytest.mark.parametrize('nat, idx', [(Timestamp('NaT'), DatetimeIndex), - (Timedelta('NaT'), TimedeltaIndex), - (Period('NaT', freq='M'), PeriodArray)]) +@pytest.mark.parametrize("nat,idx", [(Timestamp("NaT"), DatetimeIndex), + (Timedelta("NaT"), TimedeltaIndex), + (Period("NaT", freq="M"), PeriodArray)]) def test_nat_fields(nat, idx): for field in idx._field_ops: - # weekday is a property of DTI, but a method # on NaT/Timestamp for compat with datetime - if field == 'weekday': + if field == "weekday": continue result = getattr(NaT, field) @@ -41,289 +41,301 @@ def test_nat_fields(nat, idx): def test_nat_vector_field_access(): - idx = DatetimeIndex(['1/1/2000', None, None, '1/4/2000']) + idx = DatetimeIndex(["1/1/2000", None, None, "1/4/2000"]) for field in DatetimeIndex._field_ops: # weekday is a property of DTI, but a method # on NaT/Timestamp for compat with datetime - if field == 'weekday': + if field == "weekday": continue result = getattr(idx, field) expected = Index([getattr(x, field) for x in idx]) tm.assert_index_equal(result, expected) - s = Series(idx) + ser = Series(idx) for field in DatetimeIndex._field_ops: - # weekday is a property of DTI, but a method # on NaT/Timestamp for compat with datetime - if field == 'weekday': + if field == "weekday": continue - result = getattr(s.dt, field) + result = getattr(ser.dt, field) expected = [getattr(x, field) for x in idx] tm.assert_series_equal(result, Series(expected)) for field in DatetimeIndex._bool_ops: - result = getattr(s.dt, field) + result = getattr(ser.dt, field) expected = [getattr(x, field) for x in idx] tm.assert_series_equal(result, Series(expected)) -@pytest.mark.parametrize('klass', [Timestamp, Timedelta, Period]) -def test_identity(klass): - assert klass(None) is NaT - - result = klass(np.nan) - assert result is NaT - - result = klass(None) - assert result is NaT - - result = klass(iNaT) - assert result is NaT - - result = klass(np.nan) - assert result is NaT - - result = klass(float('nan')) - assert result is NaT - - result = klass(NaT) - assert result is NaT - - result = klass('NaT') - assert result is NaT - - assert isna(klass('nat')) - - -@pytest.mark.parametrize('klass', [Timestamp, Timedelta, Period]) -def test_equality(klass): - - # nat - if klass is not Period: - klass('').value == iNaT - klass('nat').value == iNaT - klass('NAT').value == iNaT - klass(None).value == iNaT - klass(np.nan).value == iNaT - assert isna(klass('nat')) - - -@pytest.mark.parametrize('klass', [Timestamp, Timedelta]) -def test_round_nat(klass): - # GH14940 - ts = klass('nat') - for method in ["round", "floor", "ceil"]: - round_method = getattr(ts, method) - for freq in ["s", "5s", "min", "5min", "h", "5h"]: - assert round_method(freq) is ts - - -def test_NaT_methods(): - # GH 9513 - # GH 17329 for `timestamp` - raise_methods = ['astimezone', 'combine', 'ctime', 'dst', - 'fromordinal', 'fromtimestamp', 'isocalendar', - 'strftime', 'strptime', 'time', 'timestamp', - 'timetuple', 'timetz', 'toordinal', 'tzname', - 'utcfromtimestamp', 'utcnow', 'utcoffset', - 'utctimetuple', 'timestamp'] - nat_methods = ['date', 'now', 'replace', 'to_datetime', 'today', - 'tz_convert', 'tz_localize'] - nan_methods = ['weekday', 'isoweekday'] +@pytest.mark.parametrize("klass", [Timestamp, Timedelta, Period]) +@pytest.mark.parametrize("value", [None, np.nan, iNaT, float("nan"), + NaT, "NaT", "nat"]) +def test_identity(klass, value): + assert klass(value) is NaT + + +@pytest.mark.parametrize("klass", [Timestamp, Timedelta, Period]) +@pytest.mark.parametrize("value", ["", "nat", "NAT", None, np.nan]) +def test_equality(klass, value): + if klass is Period and value == "": + pytest.skip("Period cannot parse empty string") + + assert klass(value).value == iNaT + + +@pytest.mark.parametrize("klass", [Timestamp, Timedelta]) +@pytest.mark.parametrize("method", ["round", "floor", "ceil"]) +@pytest.mark.parametrize("freq", ["s", "5s", "min", "5min", "h", "5h"]) +def test_round_nat(klass, method, freq): + # see gh-14940 + ts = klass("nat") + + round_method = getattr(ts, method) + assert round_method(freq) is ts + + +@pytest.mark.parametrize("method", [ + "astimezone", "combine", "ctime", "dst", "fromordinal", + "fromtimestamp", "isocalendar", "strftime", "strptime", + "time", "timestamp", "timetuple", "timetz", "toordinal", + "tzname", "utcfromtimestamp", "utcnow", "utcoffset", + "utctimetuple", "timestamp" +]) +def test_nat_methods_raise(method): + # see gh-9513, gh-17329 + msg = "NaTType does not support {method}".format(method=method) + + with pytest.raises(ValueError, match=msg): + getattr(NaT, method)() + + +@pytest.mark.parametrize("method", [ + "weekday", "isoweekday" +]) +def test_nat_methods_nan(method): + # see gh-9513, gh-17329 + assert np.isnan(getattr(NaT, method)()) + + +@pytest.mark.parametrize("method", [ + "date", "now", "replace", "today", + "tz_convert", "tz_localize" +]) +def test_nat_methods_nat(method): + # see gh-8254, gh-9513, gh-17329 + assert getattr(NaT, method)() is NaT + + +@pytest.mark.parametrize("get_nat", [ + lambda x: NaT, + lambda x: Timedelta(x), + lambda x: Timestamp(x) +]) +def test_nat_iso_format(get_nat): + # see gh-12300 + assert get_nat("NaT").isoformat() == "NaT" + + +@pytest.mark.parametrize("klass,expected", [ + (Timestamp, ["freqstr", "normalize", "to_julian_date", "to_period", "tz"]), + (Timedelta, ["components", "delta", "is_populated", "to_pytimedelta", + "to_timedelta64", "view"]) +]) +def test_missing_public_nat_methods(klass, expected): + # see gh-17327 + # + # NaT should have *most* of the Timestamp and Timedelta methods. + # Here, we check which public methods NaT does not have. We + # ignore any missing private methods. + nat_names = dir(NaT) + klass_names = dir(klass) - for method in raise_methods: - if hasattr(NaT, method): - with pytest.raises(ValueError): - getattr(NaT, method)() + missing = [x for x in klass_names if x not in nat_names and + not x.startswith("_")] + missing.sort() - for method in nan_methods: - if hasattr(NaT, method): - assert np.isnan(getattr(NaT, method)()) + assert missing == expected - for method in nat_methods: - if hasattr(NaT, method): - # see gh-8254 - exp_warning = None - if method == 'to_datetime': - exp_warning = FutureWarning - with tm.assert_produces_warning( - exp_warning, check_stacklevel=False): - assert getattr(NaT, method)() is NaT - # GH 12300 - assert NaT.isoformat() == 'NaT' +def _get_overlap_public_nat_methods(klass, as_tuple=False): + """ + Get overlapping public methods between NaT and another class. + Parameters + ---------- + klass : type + The class to compare with NaT + as_tuple : bool, default False + Whether to return a list of tuples of the form (klass, method). -def test_NaT_docstrings(): - # GH#17327 + Returns + ------- + overlap : list + """ nat_names = dir(NaT) - - # NaT should have *most* of the Timestamp methods, with matching - # docstrings. The attributes that are not expected to be present in NaT - # are private methods plus `ts_expected` below. - ts_names = dir(Timestamp) - ts_missing = [x for x in ts_names if x not in nat_names and - not x.startswith('_')] - ts_missing.sort() - ts_expected = ['freqstr', 'normalize', - 'to_julian_date', - 'to_period', 'tz'] - assert ts_missing == ts_expected - - ts_overlap = [x for x in nat_names if x in ts_names and - not x.startswith('_') and - callable(getattr(Timestamp, x))] - for name in ts_overlap: - tsdoc = getattr(Timestamp, name).__doc__ - natdoc = getattr(NaT, name).__doc__ - assert tsdoc == natdoc - - # NaT should have *most* of the Timedelta methods, with matching - # docstrings. The attributes that are not expected to be present in NaT - # are private methods plus `td_expected` below. - # For methods that are both Timestamp and Timedelta methods, the - # Timestamp docstring takes priority. - td_names = dir(Timedelta) - td_missing = [x for x in td_names if x not in nat_names and - not x.startswith('_')] - td_missing.sort() - td_expected = ['components', 'delta', 'is_populated', - 'to_pytimedelta', 'to_timedelta64', 'view'] - assert td_missing == td_expected - - td_overlap = [x for x in nat_names if x in td_names and - x not in ts_names and # Timestamp __doc__ takes priority - not x.startswith('_') and - callable(getattr(Timedelta, x))] - assert td_overlap == ['total_seconds'] - for name in td_overlap: - tddoc = getattr(Timedelta, name).__doc__ - natdoc = getattr(NaT, name).__doc__ - assert tddoc == natdoc - - -@pytest.mark.parametrize('klass', [Timestamp, Timedelta]) -def test_isoformat(klass): - - result = klass('NaT').isoformat() - expected = 'NaT' - assert result == expected - - -def test_nat_arithmetic(): - # GH 6873 - i = 2 - f = 1.5 - - for (left, right) in [(NaT, i), (NaT, f), (NaT, np.nan)]: - assert left / right is NaT - assert left * right is NaT - assert right * left is NaT - with pytest.raises(TypeError): - right / left - - # Timestamp / datetime - t = Timestamp('2014-01-01') - dt = datetime(2014, 1, 1) - for (left, right) in [(NaT, NaT), (NaT, t), (NaT, dt)]: - # NaT __add__ or __sub__ Timestamp-like (or inverse) returns NaT - assert right + left is NaT - assert left + right is NaT - assert left - right is NaT - assert right - left is NaT - - # timedelta-like - # offsets are tested in test_offsets.py - - delta = timedelta(3600) - td = Timedelta('5s') - - for (left, right) in [(NaT, delta), (NaT, td)]: - # NaT + timedelta-like returns NaT - assert right + left is NaT - assert left + right is NaT - assert right - left is NaT - assert left - right is NaT - assert np.isnan(left / right) - assert np.isnan(right / left) - - # GH 11718 - t_utc = Timestamp('2014-01-01', tz='UTC') - t_tz = Timestamp('2014-01-01', tz='US/Eastern') - dt_tz = pytz.timezone('Asia/Tokyo').localize(dt) - - for (left, right) in [(NaT, t_utc), (NaT, t_tz), - (NaT, dt_tz)]: - # NaT __add__ or __sub__ Timestamp-like (or inverse) returns NaT - assert right + left is NaT - assert left + right is NaT - assert left - right is NaT - assert right - left is NaT - - # int addition / subtraction - for (left, right) in [(NaT, 2), (NaT, 0), (NaT, -3)]: - assert right + left is NaT - assert left + right is NaT - assert left - right is NaT - assert right - left is NaT - - -def test_nat_rfloordiv_timedelta(): - # GH#18846 + klass_names = dir(klass) + + overlap = [x for x in nat_names if x in klass_names and + not x.startswith("_") and + callable(getattr(klass, x))] + + # Timestamp takes precedence over Timedelta in terms of overlap. + if klass is Timedelta: + ts_names = dir(Timestamp) + overlap = [x for x in overlap if x not in ts_names] + + if as_tuple: + overlap = [(klass, method) for method in overlap] + + overlap.sort() + return overlap + + +@pytest.mark.parametrize("klass,expected", [ + (Timestamp, ["astimezone", "ceil", "combine", "ctime", "date", "day_name", + "dst", "floor", "fromisoformat", "fromordinal", + "fromtimestamp", "isocalendar", "isoformat", "isoweekday", + "month_name", "now", "replace", "round", "strftime", + "strptime", "time", "timestamp", "timetuple", "timetz", + "to_datetime64", "to_pydatetime", "today", "toordinal", + "tz_convert", "tz_localize", "tzname", "utcfromtimestamp", + "utcnow", "utcoffset", "utctimetuple", "weekday"]), + (Timedelta, ["total_seconds"]) +]) +def test_overlap_public_nat_methods(klass, expected): + # see gh-17327 + # + # NaT should have *most* of the Timestamp and Timedelta methods. + # In case when Timestamp, Timedelta, and NaT are overlap, the overlap + # is considered to be with Timestamp and NaT, not Timedelta. + + # "fromisoformat" was introduced in 3.7 + if klass is Timestamp and not compat.PY37: + expected.remove("fromisoformat") + + assert _get_overlap_public_nat_methods(klass) == expected + + +@pytest.mark.parametrize("compare", ( + _get_overlap_public_nat_methods(Timestamp, True) + + _get_overlap_public_nat_methods(Timedelta, True)) +) +def test_nat_doc_strings(compare): + # see gh-17327 + # + # The docstrings for overlapping methods should match. + klass, method = compare + klass_doc = getattr(klass, method).__doc__ + + nat_doc = getattr(NaT, method).__doc__ + assert klass_doc == nat_doc + + +_ops = { + "left_plus_right": lambda a, b: a + b, + "right_plus_left": lambda a, b: b + a, + "left_minus_right": lambda a, b: a - b, + "right_minus_left": lambda a, b: b - a, + "left_times_right": lambda a, b: a * b, + "right_times_left": lambda a, b: b * a, + "left_div_right": lambda a, b: a / b, + "right_div_left": lambda a, b: b / a, +} + + +@pytest.mark.parametrize("op_name", list(_ops.keys())) +@pytest.mark.parametrize("value,val_type", [ + (2, "scalar"), + (1.5, "scalar"), + (np.nan, "scalar"), + (timedelta(3600), "timedelta"), + (Timedelta("5s"), "timedelta"), + (datetime(2014, 1, 1), "timestamp"), + (Timestamp("2014-01-01"), "timestamp"), + (Timestamp("2014-01-01", tz="UTC"), "timestamp"), + (Timestamp("2014-01-01", tz="US/Eastern"), "timestamp"), + (pytz.timezone("Asia/Tokyo").localize(datetime(2014, 1, 1)), "timestamp"), +]) +def test_nat_arithmetic_scalar(op_name, value, val_type): + # see gh-6873 + invalid_ops = { + "scalar": {"right_div_left"}, + "timedelta": {"left_times_right", "right_times_left"}, + "timestamp": {"left_times_right", "right_times_left", + "left_div_right", "right_div_left"} + } + + op = _ops[op_name] + + if op_name in invalid_ops.get(val_type, set()): + if (val_type == "timedelta" and "times" in op_name and + isinstance(value, Timedelta)): + msg = "Cannot multiply" + else: + msg = "unsupported operand type" + + with pytest.raises(TypeError, match=msg): + op(NaT, value) + else: + if val_type == "timedelta" and "div" in op_name: + expected = np.nan + else: + expected = NaT + + assert op(NaT, value) is expected + + +@pytest.mark.parametrize("val,expected", [ + (np.nan, NaT), + (NaT, np.nan), + (np.timedelta64("NaT"), np.nan) +]) +def test_nat_rfloordiv_timedelta(val, expected): + # see gh-#18846 + # # See also test_timedelta.TestTimedeltaArithmetic.test_floordiv td = Timedelta(hours=3, minutes=4) - - assert td // np.nan is NaT - assert np.isnan(td // NaT) - assert np.isnan(td // np.timedelta64('NaT')) - - -def test_nat_arithmetic_index(): - # GH 11718 - - dti = DatetimeIndex(['2011-01-01', '2011-01-02'], name='x') - exp = DatetimeIndex([NaT, NaT], name='x') - tm.assert_index_equal(dti + NaT, exp) - tm.assert_index_equal(NaT + dti, exp) - - dti_tz = DatetimeIndex(['2011-01-01', '2011-01-02'], - tz='US/Eastern', name='x') - exp = DatetimeIndex([NaT, NaT], name='x', tz='US/Eastern') - tm.assert_index_equal(dti_tz + NaT, exp) - tm.assert_index_equal(NaT + dti_tz, exp) - - exp = TimedeltaIndex([NaT, NaT], name='x') - for (left, right) in [(NaT, dti), (NaT, dti_tz)]: - tm.assert_index_equal(left - right, exp) - tm.assert_index_equal(right - left, exp) - - # timedelta # GH#19124 - tdi = TimedeltaIndex(['1 day', '2 day'], name='x') - tdi_nat = TimedeltaIndex([NaT, NaT], name='x') - - tm.assert_index_equal(tdi + NaT, tdi_nat) - tm.assert_index_equal(NaT + tdi, tdi_nat) - tm.assert_index_equal(tdi - NaT, tdi_nat) - tm.assert_index_equal(NaT - tdi, tdi_nat) - - -@pytest.mark.parametrize('box', [TimedeltaIndex, Series]) -def test_nat_arithmetic_td64_vector(box): - # GH#19124 - vec = box(['1 day', '2 day'], dtype='timedelta64[ns]') - box_nat = box([NaT, NaT], dtype='timedelta64[ns]') - - tm.assert_equal(vec + NaT, box_nat) - tm.assert_equal(NaT + vec, box_nat) - tm.assert_equal(vec - NaT, box_nat) - tm.assert_equal(NaT - vec, box_nat) + assert td // val is expected + + +@pytest.mark.parametrize("op_name", [ + "left_plus_right", "right_plus_left", + "left_minus_right", "right_minus_left" +]) +@pytest.mark.parametrize("value", [ + DatetimeIndex(["2011-01-01", "2011-01-02"], name="x"), + DatetimeIndex(["2011-01-01", "2011-01-02"], name="x"), + TimedeltaIndex(["1 day", "2 day"], name="x"), +]) +def test_nat_arithmetic_index(op_name, value): + # see gh-11718 + exp_name = "x" + exp_data = [NaT] * 2 + + if isinstance(value, DatetimeIndex) and "plus" in op_name: + expected = DatetimeIndex(exp_data, name=exp_name, tz=value.tz) + else: + expected = TimedeltaIndex(exp_data, name=exp_name) + + tm.assert_index_equal(_ops[op_name](NaT, value), expected) + + +@pytest.mark.parametrize("op_name", [ + "left_plus_right", "right_plus_left", + "left_minus_right", "right_minus_left" +]) +@pytest.mark.parametrize("box", [TimedeltaIndex, Series]) +def test_nat_arithmetic_td64_vector(op_name, box): + # see gh-19124 + vec = box(["1 day", "2 day"], dtype="timedelta64[ns]") + box_nat = box([NaT, NaT], dtype="timedelta64[ns]") + tm.assert_equal(_ops[op_name](vec, NaT), box_nat) def test_nat_pinned_docstrings(): - # GH17327 + # see gh-17327 assert NaT.ctime.__doc__ == datetime.ctime.__doc__ diff --git a/pandas/tests/scalar/timedelta/test_timedelta.py b/pandas/tests/scalar/timedelta/test_timedelta.py index f7dac81a5b8d7..db0c848eaeb4b 100644 --- a/pandas/tests/scalar/timedelta/test_timedelta.py +++ b/pandas/tests/scalar/timedelta/test_timedelta.py @@ -4,7 +4,7 @@ import numpy as np import pytest -from pandas._libs.tslib import NaT, iNaT +from pandas._libs.tslibs import NaT, iNaT import pandas.compat as compat import pandas as pd @@ -44,10 +44,8 @@ def test_ops_error_str(self): with pytest.raises(TypeError): left + right - # GH 20829: python 2 comparison naturally does not raise TypeError - if compat.PY3: - with pytest.raises(TypeError): - left > right + with pytest.raises(TypeError): + left > right assert not left == right assert left != right @@ -107,9 +105,12 @@ def test_compare_timedelta_ndarray(self): expected = np.array([False, False]) tm.assert_numpy_array_equal(result, expected) + @pytest.mark.skip(reason="GH#20829 is reverted until after 0.24.0") def test_compare_custom_object(self): - """Make sure non supported operations on Timedelta returns NonImplemented - and yields to other operand (GH20829).""" + """ + Make sure non supported operations on Timedelta returns NonImplemented + and yields to other operand (GH#20829). + """ class CustomClass(object): def __init__(self, cmp_result=None): @@ -139,11 +140,7 @@ def __gt__(self, other): assert t == CustomClass(cmp_result=True) - @pytest.mark.skipif(compat.PY2, - reason="python 2 does not raise TypeError for \ - comparisons of different types") - @pytest.mark.parametrize("val", [ - "string", 1]) + @pytest.mark.parametrize("val", ["string", 1]) def test_compare_unknown_type(self, val): # GH20829 t = Timedelta('1s') @@ -550,7 +547,7 @@ def test_overflow(self): # mean result = (s - s.min()).mean() - expected = pd.Timedelta((pd.DatetimeIndex((s - s.min())).asi8 / len(s) + expected = pd.Timedelta((pd.TimedeltaIndex((s - s.min())).asi8 / len(s) ).sum()) # the computation is converted to float so diff --git a/pandas/tests/scalar/timestamp/test_timestamp.py b/pandas/tests/scalar/timestamp/test_timestamp.py index 2d5c8f77dd338..b2c05d1564a48 100644 --- a/pandas/tests/scalar/timestamp/test_timestamp.py +++ b/pandas/tests/scalar/timestamp/test_timestamp.py @@ -589,6 +589,11 @@ def test_depreciate_tz_and_tzinfo_in_datetime_input(self, box): with tm.assert_produces_warning(FutureWarning): Timestamp(box(**kwargs), tz='US/Pacific') + def test_dont_convert_dateutil_utc_to_pytz_utc(self): + result = Timestamp(datetime(2018, 1, 1), tz=tzutc()) + expected = Timestamp(datetime(2018, 1, 1)).tz_localize(tzutc()) + assert result == expected + class TestTimestamp(object): @@ -612,7 +617,7 @@ def test_tz(self): assert conv.hour == 19 def test_utc_z_designator(self): - assert get_timezone(Timestamp('2014-11-02 01:00Z').tzinfo) == 'UTC' + assert get_timezone(Timestamp('2014-11-02 01:00Z').tzinfo) is utc def test_asm8(self): np.random.seed(7960929) diff --git a/pandas/tests/scalar/timestamp/test_timezones.py b/pandas/tests/scalar/timestamp/test_timezones.py index 72e4fd42ae15a..c02dc1083c366 100644 --- a/pandas/tests/scalar/timestamp/test_timezones.py +++ b/pandas/tests/scalar/timestamp/test_timezones.py @@ -11,6 +11,7 @@ import pytz from pytz.exceptions import AmbiguousTimeError, NonExistentTimeError +from pandas._libs.tslibs import timezones from pandas.errors import OutOfBoundsDatetime import pandas.util._test_decorators as td @@ -342,10 +343,7 @@ def test_timestamp_add_timedelta_push_over_dst_boundary(self, tz): def test_timestamp_timetz_equivalent_with_datetime_tz(self, tz_naive_fixture): # GH21358 - if tz_naive_fixture is not None: - tz = dateutil.tz.gettz(tz_naive_fixture) - else: - tz = None + tz = timezones.maybe_get_tz(tz_naive_fixture) stamp = Timestamp('2018-06-04 10:20:30', tz=tz) _datetime = datetime(2018, 6, 4, hour=10, diff --git a/pandas/tests/series/indexing/test_boolean.py b/pandas/tests/series/indexing/test_boolean.py index b94104a89627a..9d024bffa3240 100644 --- a/pandas/tests/series/indexing/test_boolean.py +++ b/pandas/tests/series/indexing/test_boolean.py @@ -49,16 +49,12 @@ def test_getitem_boolean_empty(): # invalid because of the boolean indexer # that's empty or not-aligned - def f(): + with pytest.raises(IndexingError): s[Series([], dtype=bool)] - pytest.raises(IndexingError, f) - - def f(): + with pytest.raises(IndexingError): s[Series([True], dtype=bool)] - pytest.raises(IndexingError, f) - def test_getitem_boolean_object(test_data): # using column from DataFrame @@ -210,16 +206,12 @@ def test_where_unsafe(): s = Series(np.arange(10)) mask = s > 5 - def f(): + with pytest.raises(ValueError): s[mask] = [5, 4, 3, 2, 1] - pytest.raises(ValueError, f) - - def f(): + with pytest.raises(ValueError): s[mask] = [0] * 5 - pytest.raises(ValueError, f) - # dtype changes s = Series([1, 2, 3, 4]) result = s.where(s > 2, np.nan) @@ -360,11 +352,9 @@ def test_where_setitem_invalid(): # slice s = Series(list('abc')) - def f(): + with pytest.raises(ValueError): s[0:3] = list(range(27)) - pytest.raises(ValueError, f) - s[0:3] = list(range(3)) expected = Series([0, 1, 2]) assert_series_equal(s.astype(np.int64), expected, ) @@ -372,11 +362,9 @@ def f(): # slice with step s = Series(list('abcdef')) - def f(): + with pytest.raises(ValueError): s[0:4:2] = list(range(27)) - pytest.raises(ValueError, f) - s = Series(list('abcdef')) s[0:4:2] = list(range(2)) expected = Series([0, 'b', 1, 'd', 'e', 'f']) @@ -385,11 +373,9 @@ def f(): # neg slices s = Series(list('abcdef')) - def f(): + with pytest.raises(ValueError): s[:-1] = list(range(27)) - pytest.raises(ValueError, f) - s[-3:-1] = list(range(2)) expected = Series(['a', 'b', 'c', 0, 1, 'f']) assert_series_equal(s, expected) @@ -397,18 +383,14 @@ def f(): # list s = Series(list('abc')) - def f(): + with pytest.raises(ValueError): s[[0, 1, 2]] = list(range(27)) - pytest.raises(ValueError, f) - s = Series(list('abc')) - def f(): + with pytest.raises(ValueError): s[[0, 1, 2]] = list(range(2)) - pytest.raises(ValueError, f) - # scalar s = Series(list('abc')) s[0] = list(range(10)) diff --git a/pandas/tests/series/indexing/test_datetime.py b/pandas/tests/series/indexing/test_datetime.py index cdcc423e3410c..21395f6004760 100644 --- a/pandas/tests/series/indexing/test_datetime.py +++ b/pandas/tests/series/indexing/test_datetime.py @@ -23,8 +23,8 @@ def test_fancy_getitem(): - dti = DatetimeIndex(freq='WOM-1FRI', start=datetime(2005, 1, 1), - end=datetime(2010, 1, 1)) + dti = date_range(freq='WOM-1FRI', start=datetime(2005, 1, 1), + end=datetime(2010, 1, 1)) s = Series(np.arange(len(dti)), index=dti) @@ -40,8 +40,8 @@ def test_fancy_getitem(): def test_fancy_setitem(): - dti = DatetimeIndex(freq='WOM-1FRI', start=datetime(2005, 1, 1), - end=datetime(2010, 1, 1)) + dti = date_range(freq='WOM-1FRI', start=datetime(2005, 1, 1), + end=datetime(2010, 1, 1)) s = Series(np.arange(len(dti)), index=dti) s[48] = -1 @@ -69,7 +69,7 @@ def test_dti_snap(): def test_dti_reset_index_round_trip(): - dti = DatetimeIndex(start='1/1/2001', end='6/1/2001', freq='D') + dti = date_range(start='1/1/2001', end='6/1/2001', freq='D') d1 = DataFrame({'v': np.random.rand(len(dti))}, index=dti) d2 = d1.reset_index() assert d2.dtypes[0] == np.dtype('M8[ns]') diff --git a/pandas/tests/series/indexing/test_indexing.py b/pandas/tests/series/indexing/test_indexing.py index f969619d5acb0..92c41f65eb831 100644 --- a/pandas/tests/series/indexing/test_indexing.py +++ b/pandas/tests/series/indexing/test_indexing.py @@ -711,8 +711,8 @@ def test_type_promote_putmask(): def test_multilevel_preserve_name(): index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], names=['first', 'second']) s = Series(np.random.randn(len(index)), index=index, name='sth') diff --git a/pandas/tests/series/indexing/test_loc.py b/pandas/tests/series/indexing/test_loc.py index 36c26267ecd5f..27d0eee673c11 100644 --- a/pandas/tests/series/indexing/test_loc.py +++ b/pandas/tests/series/indexing/test_loc.py @@ -11,6 +11,16 @@ from pandas.util.testing import assert_series_equal +@pytest.mark.parametrize("val,expected", [ + (2**63 - 1, 3), + (2**63, 4), +]) +def test_loc_uint64(val, expected): + # see gh-19399 + s = Series({2**63 - 1: 3, 2**63: 4}) + assert s.loc[val] == expected + + def test_loc_getitem(test_data): inds = test_data.series.index[[3, 4, 7]] assert_series_equal( diff --git a/pandas/tests/series/indexing/test_numeric.py b/pandas/tests/series/indexing/test_numeric.py index da0e15b8a96fc..8a4fdc7e12e4d 100644 --- a/pandas/tests/series/indexing/test_numeric.py +++ b/pandas/tests/series/indexing/test_numeric.py @@ -96,11 +96,9 @@ def test_delitem(): # empty s = Series() - def f(): + with pytest.raises(KeyError): del s[0] - pytest.raises(KeyError, f) - # only 1 left, del, add, del s = Series(1) del s[0] @@ -207,11 +205,9 @@ def test_setitem_float_labels(): def test_slice_float_get_set(test_data): pytest.raises(TypeError, lambda: test_data.ts[4.0:10.0]) - def f(): + with pytest.raises(TypeError): test_data.ts[4.0:10.0] = 0 - pytest.raises(TypeError, f) - pytest.raises(TypeError, test_data.ts.__getitem__, slice(4.5, 10.0)) pytest.raises(TypeError, test_data.ts.__setitem__, slice(4.5, 10.0), 0) diff --git a/pandas/tests/series/test_alter_axes.py b/pandas/tests/series/test_alter_axes.py index 79de3dc3be19f..99a4f0c424ce9 100644 --- a/pandas/tests/series/test_alter_axes.py +++ b/pandas/tests/series/test_alter_axes.py @@ -133,8 +133,8 @@ def test_reset_index(self): # level index = MultiIndex(levels=[['bar'], ['one', 'two', 'three'], [0, 1]], - labels=[[0, 0, 0, 0, 0, 0], [0, 1, 2, 0, 1, 2], - [0, 1, 0, 1, 0, 1]]) + codes=[[0, 0, 0, 0, 0, 0], [0, 1, 2, 0, 1, 2], + [0, 1, 0, 1, 0, 1]]) s = Series(np.random.randn(6), index=index) rs = s.reset_index(level=1) assert len(rs.columns) == 2 @@ -204,8 +204,8 @@ def test_reset_index_range(self): def test_reorder_levels(self): index = MultiIndex(levels=[['bar'], ['one', 'two', 'three'], [0, 1]], - labels=[[0, 0, 0, 0, 0, 0], [0, 1, 2, 0, 1, 2], - [0, 1, 0, 1, 0, 1]], + codes=[[0, 0, 0, 0, 0, 0], [0, 1, 2, 0, 1, 2], + [0, 1, 0, 1, 0, 1]], names=['L0', 'L1', 'L2']) s = Series(np.arange(6), index=index) @@ -220,8 +220,8 @@ def test_reorder_levels(self): # rotate, position result = s.reorder_levels([1, 2, 0]) e_idx = MultiIndex(levels=[['one', 'two', 'three'], [0, 1], ['bar']], - labels=[[0, 1, 2, 0, 1, 2], [0, 1, 0, 1, 0, 1], - [0, 0, 0, 0, 0, 0]], + codes=[[0, 1, 2, 0, 1, 2], [0, 1, 0, 1, 0, 1], + [0, 0, 0, 0, 0, 0]], names=['L1', 'L2', 'L0']) expected = Series(np.arange(6), index=e_idx) tm.assert_series_equal(result, expected) diff --git a/pandas/tests/series/test_analytics.py b/pandas/tests/series/test_analytics.py index a5a7cc2217864..b5140a5319c01 100644 --- a/pandas/tests/series/test_analytics.py +++ b/pandas/tests/series/test_analytics.py @@ -14,12 +14,11 @@ import pandas as pd from pandas import ( - Categorical, CategoricalIndex, DataFrame, Series, bdate_range, compat, - date_range, isna, notna) + Categorical, CategoricalIndex, DataFrame, Series, compat, date_range, isna, + notna) +from pandas.api.types import is_scalar from pandas.core.index import MultiIndex from pandas.core.indexes.datetimes import Timestamp -from pandas.core.indexes.timedeltas import Timedelta -import pandas.core.nanops as nanops import pandas.util.testing as tm from pandas.util.testing import ( assert_almost_equal, assert_frame_equal, assert_index_equal, @@ -28,292 +27,6 @@ class TestSeriesAnalytics(object): - @pytest.mark.parametrize("use_bottleneck", [True, False]) - @pytest.mark.parametrize("method, unit", [ - ("sum", 0.0), - ("prod", 1.0) - ]) - def test_empty(self, method, unit, use_bottleneck): - with pd.option_context("use_bottleneck", use_bottleneck): - # GH 9422 / 18921 - # Entirely empty - s = Series([]) - # NA by default - result = getattr(s, method)() - assert result == unit - - # Explicit - result = getattr(s, method)(min_count=0) - assert result == unit - - result = getattr(s, method)(min_count=1) - assert isna(result) - - # Skipna, default - result = getattr(s, method)(skipna=True) - result == unit - - # Skipna, explicit - result = getattr(s, method)(skipna=True, min_count=0) - assert result == unit - - result = getattr(s, method)(skipna=True, min_count=1) - assert isna(result) - - # All-NA - s = Series([np.nan]) - # NA by default - result = getattr(s, method)() - assert result == unit - - # Explicit - result = getattr(s, method)(min_count=0) - assert result == unit - - result = getattr(s, method)(min_count=1) - assert isna(result) - - # Skipna, default - result = getattr(s, method)(skipna=True) - result == unit - - # skipna, explicit - result = getattr(s, method)(skipna=True, min_count=0) - assert result == unit - - result = getattr(s, method)(skipna=True, min_count=1) - assert isna(result) - - # Mix of valid, empty - s = Series([np.nan, 1]) - # Default - result = getattr(s, method)() - assert result == 1.0 - - # Explicit - result = getattr(s, method)(min_count=0) - assert result == 1.0 - - result = getattr(s, method)(min_count=1) - assert result == 1.0 - - # Skipna - result = getattr(s, method)(skipna=True) - assert result == 1.0 - - result = getattr(s, method)(skipna=True, min_count=0) - assert result == 1.0 - - result = getattr(s, method)(skipna=True, min_count=1) - assert result == 1.0 - - # GH #844 (changed in 9422) - df = DataFrame(np.empty((10, 0))) - assert (getattr(df, method)(1) == unit).all() - - s = pd.Series([1]) - result = getattr(s, method)(min_count=2) - assert isna(result) - - s = pd.Series([np.nan]) - result = getattr(s, method)(min_count=2) - assert isna(result) - - s = pd.Series([np.nan, 1]) - result = getattr(s, method)(min_count=2) - assert isna(result) - - @pytest.mark.parametrize('method, unit', [ - ('sum', 0.0), - ('prod', 1.0), - ]) - def test_empty_multi(self, method, unit): - s = pd.Series([1, np.nan, np.nan, np.nan], - index=pd.MultiIndex.from_product([('a', 'b'), (0, 1)])) - # 1 / 0 by default - result = getattr(s, method)(level=0) - expected = pd.Series([1, unit], index=['a', 'b']) - tm.assert_series_equal(result, expected) - - # min_count=0 - result = getattr(s, method)(level=0, min_count=0) - expected = pd.Series([1, unit], index=['a', 'b']) - tm.assert_series_equal(result, expected) - - # min_count=1 - result = getattr(s, method)(level=0, min_count=1) - expected = pd.Series([1, np.nan], index=['a', 'b']) - tm.assert_series_equal(result, expected) - - @pytest.mark.parametrize( - "method", ['mean', 'median', 'std', 'var']) - def test_ops_consistency_on_empty(self, method): - - # GH 7869 - # consistency on empty - - # float - result = getattr(Series(dtype=float), method)() - assert isna(result) - - # timedelta64[ns] - result = getattr(Series(dtype='m8[ns]'), method)() - assert result is pd.NaT - - def test_nansum_buglet(self): - s = Series([1.0, np.nan], index=[0, 1]) - result = np.nansum(s) - assert_almost_equal(result, 1) - - @pytest.mark.parametrize("use_bottleneck", [True, False]) - def test_sum_overflow(self, use_bottleneck): - - with pd.option_context('use_bottleneck', use_bottleneck): - # GH 6915 - # overflowing on the smaller int dtypes - for dtype in ['int32', 'int64']: - v = np.arange(5000000, dtype=dtype) - s = Series(v) - - result = s.sum(skipna=False) - assert int(result) == v.sum(dtype='int64') - result = s.min(skipna=False) - assert int(result) == 0 - result = s.max(skipna=False) - assert int(result) == v[-1] - - for dtype in ['float32', 'float64']: - v = np.arange(5000000, dtype=dtype) - s = Series(v) - - result = s.sum(skipna=False) - assert result == v.sum(dtype=dtype) - result = s.min(skipna=False) - assert np.allclose(float(result), 0.0) - result = s.max(skipna=False) - assert np.allclose(float(result), v[-1]) - - def test_sum(self, string_series): - self._check_stat_op('sum', np.sum, string_series, check_allna=False) - - def test_sum_inf(self): - s = Series(np.random.randn(10)) - s2 = s.copy() - - s[5:8] = np.inf - s2[5:8] = np.nan - - assert np.isinf(s.sum()) - - arr = np.random.randn(100, 100).astype('f4') - arr[:, 2] = np.inf - - with pd.option_context("mode.use_inf_as_na", True): - assert_almost_equal(s.sum(), s2.sum()) - - res = nanops.nansum(arr, axis=1) - assert np.isinf(res).all() - - def test_mean(self, string_series): - self._check_stat_op('mean', np.mean, string_series) - - def test_median(self, string_series): - self._check_stat_op('median', np.median, string_series) - - # test with integers, test failure - int_ts = Series(np.ones(10, dtype=int), index=lrange(10)) - tm.assert_almost_equal(np.median(int_ts), int_ts.median()) - - def test_prod(self, string_series): - self._check_stat_op('prod', np.prod, string_series) - - def test_min(self, string_series): - self._check_stat_op('min', np.min, string_series, check_objects=True) - - def test_max(self, string_series): - self._check_stat_op('max', np.max, string_series, check_objects=True) - - def test_var_std(self, datetime_series, string_series): - alt = lambda x: np.std(x, ddof=1) - self._check_stat_op('std', alt, string_series) - - alt = lambda x: np.var(x, ddof=1) - self._check_stat_op('var', alt, string_series) - - result = datetime_series.std(ddof=4) - expected = np.std(datetime_series.values, ddof=4) - assert_almost_equal(result, expected) - - result = datetime_series.var(ddof=4) - expected = np.var(datetime_series.values, ddof=4) - assert_almost_equal(result, expected) - - # 1 - element series with ddof=1 - s = datetime_series.iloc[[0]] - result = s.var(ddof=1) - assert isna(result) - - result = s.std(ddof=1) - assert isna(result) - - def test_sem(self, datetime_series, string_series): - alt = lambda x: np.std(x, ddof=1) / np.sqrt(len(x)) - self._check_stat_op('sem', alt, string_series) - - result = datetime_series.sem(ddof=4) - expected = np.std(datetime_series.values, - ddof=4) / np.sqrt(len(datetime_series.values)) - assert_almost_equal(result, expected) - - # 1 - element series with ddof=1 - s = datetime_series.iloc[[0]] - result = s.sem(ddof=1) - assert isna(result) - - @td.skip_if_no_scipy - def test_skew(self, string_series): - from scipy.stats import skew - alt = lambda x: skew(x, bias=False) - self._check_stat_op('skew', alt, string_series) - - # test corner cases, skew() returns NaN unless there's at least 3 - # values - min_N = 3 - for i in range(1, min_N + 1): - s = Series(np.ones(i)) - df = DataFrame(np.ones((i, i))) - if i < min_N: - assert np.isnan(s.skew()) - assert np.isnan(df.skew()).all() - else: - assert 0 == s.skew() - assert (df.skew() == 0).all() - - @td.skip_if_no_scipy - def test_kurt(self, string_series): - from scipy.stats import kurtosis - alt = lambda x: kurtosis(x, bias=False) - self._check_stat_op('kurt', alt, string_series) - - index = MultiIndex(levels=[['bar'], ['one', 'two', 'three'], [0, 1]], - labels=[[0, 0, 0, 0, 0, 0], [0, 1, 2, 0, 1, 2], - [0, 1, 0, 1, 0, 1]]) - s = Series(np.random.randn(6), index=index) - tm.assert_almost_equal(s.kurt(), s.kurt(level=0)['bar']) - - # test corner cases, kurt() returns NaN unless there's at least 4 - # values - min_N = 4 - for i in range(1, min_N + 1): - s = Series(np.ones(i)) - df = DataFrame(np.ones((i, i))) - if i < min_N: - assert np.isnan(s.kurt()) - assert np.isnan(df.kurt()).all() - else: - assert 0 == s.kurt() - assert (df.kurt() == 0).all() - def test_describe(self): s = Series([0, 1, 2, 3, 4], name='int_data') result = s.describe() @@ -338,7 +51,7 @@ def test_describe(self): def test_describe_with_tz(self, tz_naive_fixture): # GH 21332 tz = tz_naive_fixture - name = tz_naive_fixture + name = str(tz_naive_fixture) start = Timestamp(2018, 1, 1) end = Timestamp(2018, 1, 5) s = Series(date_range(start, end, tz=tz), name=name) @@ -507,63 +220,6 @@ def test_npdiff(self): r = np.diff(s) assert_series_equal(Series([nan, 0, 0, 0, nan]), r) - def _check_stat_op(self, name, alternate, string_series_, - check_objects=False, check_allna=False): - - with pd.option_context('use_bottleneck', False): - f = getattr(Series, name) - - # add some NaNs - string_series_[5:15] = np.NaN - - # idxmax, idxmin, min, and max are valid for dates - if name not in ['max', 'min']: - ds = Series(date_range('1/1/2001', periods=10)) - pytest.raises(TypeError, f, ds) - - # skipna or no - assert notna(f(string_series_)) - assert isna(f(string_series_, skipna=False)) - - # check the result is correct - nona = string_series_.dropna() - assert_almost_equal(f(nona), alternate(nona.values)) - assert_almost_equal(f(string_series_), alternate(nona.values)) - - allna = string_series_ * nan - - if check_allna: - assert np.isnan(f(allna)) - - # dtype=object with None, it works! - s = Series([1, 2, 3, None, 5]) - f(s) - - # 2888 - items = [0] - items.extend(lrange(2 ** 40, 2 ** 40 + 1000)) - s = Series(items, dtype='int64') - assert_almost_equal(float(f(s)), float(alternate(s.values))) - - # check date range - if check_objects: - s = Series(bdate_range('1/1/2000', periods=10)) - res = f(s) - exp = alternate(s) - assert res == exp - - # check on string data - if name not in ['sum', 'min', 'max']: - pytest.raises(TypeError, f, Series(list('abc'))) - - # Invalid axis. - pytest.raises(ValueError, f, string_series_, axis=1) - - # Unimplemented numeric_only parameter. - if 'numeric_only' in compat.signature(f).args: - with pytest.raises(NotImplementedError, match=name): - f(string_series_, numeric_only=True) - def _check_accum_op(self, name, datetime_series_, check_dtype=True): func = getattr(np, name) tm.assert_numpy_array_equal(func(datetime_series_).values, @@ -648,75 +304,6 @@ def test_prod_numpy16_bug(self): assert not isinstance(result, Series) - def test_all_any(self): - ts = tm.makeTimeSeries() - bool_series = ts > 0 - assert not bool_series.all() - assert bool_series.any() - - # Alternative types, with implicit 'object' dtype. - s = Series(['abc', True]) - assert 'abc' == s.any() # 'abc' || True => 'abc' - - def test_all_any_params(self): - # Check skipna, with implicit 'object' dtype. - s1 = Series([np.nan, True]) - s2 = Series([np.nan, False]) - assert s1.all(skipna=False) # nan && True => True - assert s1.all(skipna=True) - assert np.isnan(s2.any(skipna=False)) # nan || False => nan - assert not s2.any(skipna=True) - - # Check level. - s = pd.Series([False, False, True, True, False, True], - index=[0, 0, 1, 1, 2, 2]) - assert_series_equal(s.all(level=0), Series([False, True, False])) - assert_series_equal(s.any(level=0), Series([False, True, True])) - - # bool_only is not implemented with level option. - pytest.raises(NotImplementedError, s.any, bool_only=True, level=0) - pytest.raises(NotImplementedError, s.all, bool_only=True, level=0) - - # bool_only is not implemented alone. - pytest.raises(NotImplementedError, s.any, bool_only=True) - pytest.raises(NotImplementedError, s.all, bool_only=True) - - def test_modulo(self): - with np.errstate(all='ignore'): - - # GH3590, modulo as ints - p = DataFrame({'first': [3, 4, 5, 8], 'second': [0, 0, 0, 3]}) - result = p['first'] % p['second'] - expected = Series(p['first'].values % p['second'].values, - dtype='float64') - expected.iloc[0:3] = np.nan - assert_series_equal(result, expected) - - result = p['first'] % 0 - expected = Series(np.nan, index=p.index, name='first') - assert_series_equal(result, expected) - - p = p.astype('float64') - result = p['first'] % p['second'] - expected = Series(p['first'].values % p['second'].values) - assert_series_equal(result, expected) - - p = p.astype('float64') - result = p['first'] % p['second'] - result2 = p['second'] % p['first'] - assert not result.equals(result2) - - # GH 9144 - s = Series([0, 1]) - - result = s % 0 - expected = Series([nan, nan]) - assert_series_equal(result, expected) - - result = 0 % s - expected = Series([nan, 0.0]) - assert_series_equal(result, expected) - @td.skip_if_no_scipy def test_corr(self, datetime_series): import scipy.stats as stats @@ -960,8 +547,10 @@ def test_matmul(self): def test_clip(self, datetime_series): val = datetime_series.median() - assert datetime_series.clip_lower(val).min() == val - assert datetime_series.clip_upper(val).max() == val + with tm.assert_produces_warning(FutureWarning): + assert datetime_series.clip_lower(val).min() == val + with tm.assert_produces_warning(FutureWarning): + assert datetime_series.clip_upper(val).max() == val assert datetime_series.clip(lower=val).min() == val assert datetime_series.clip(upper=val).max() == val @@ -979,8 +568,10 @@ def test_clip_types_and_nulls(self): for s in sers: thresh = s[2] - lower = s.clip_lower(thresh) - upper = s.clip_upper(thresh) + with tm.assert_produces_warning(FutureWarning): + lower = s.clip_lower(thresh) + with tm.assert_produces_warning(FutureWarning): + upper = s.clip_upper(thresh) assert lower[notna(lower)].min() == thresh assert upper[notna(upper)].max() == thresh assert list(isna(s)) == list(isna(lower)) @@ -1007,8 +598,12 @@ def test_clip_against_series(self): s = Series([1.0, 1.0, 4.0]) threshold = Series([1.0, 2.0, 3.0]) - assert_series_equal(s.clip_lower(threshold), Series([1.0, 2.0, 4.0])) - assert_series_equal(s.clip_upper(threshold), Series([1.0, 1.0, 3.0])) + with tm.assert_produces_warning(FutureWarning): + assert_series_equal(s.clip_lower(threshold), + Series([1.0, 2.0, 4.0])) + with tm.assert_produces_warning(FutureWarning): + assert_series_equal(s.clip_upper(threshold), + Series([1.0, 1.0, 3.0])) lower = Series([1.0, 2.0, 3.0]) upper = Series([1.5, 2.5, 3.5]) @@ -1051,12 +646,6 @@ def test_clip_with_datetimes(self): def test_cummethods_bool(self): # GH 6270 - # looks like a buggy np.maximum.accumulate for numpy 1.6.1, py 3.2 - def cummin(x): - return np.minimum.accumulate(x) - - def cummax(x): - return np.maximum.accumulate(x) a = pd.Series([False, False, False, True, True, False, False]) b = ~a @@ -1064,8 +653,8 @@ def cummax(x): d = ~c methods = {'cumsum': np.cumsum, 'cumprod': np.cumprod, - 'cummin': cummin, - 'cummax': cummax} + 'cummin': np.minimum.accumulate, + 'cummax': np.maximum.accumulate} args = product((a, b, c, d), methods) for s, method in args: expected = Series(methods[method](s.values)) @@ -1157,174 +746,6 @@ def test_isin_empty(self, empty): result = s.isin(empty) tm.assert_series_equal(expected, result) - def test_timedelta64_analytics(self): - from pandas import date_range - - # index min/max - td = Series(date_range('2012-1-1', periods=3, freq='D')) - \ - Timestamp('20120101') - - result = td.idxmin() - assert result == 0 - - result = td.idxmax() - assert result == 2 - - # GH 2982 - # with NaT - td[0] = np.nan - - result = td.idxmin() - assert result == 1 - - result = td.idxmax() - assert result == 2 - - # abs - s1 = Series(date_range('20120101', periods=3)) - s2 = Series(date_range('20120102', periods=3)) - expected = Series(s2 - s1) - - # this fails as numpy returns timedelta64[us] - # result = np.abs(s1-s2) - # assert_frame_equal(result,expected) - - result = (s1 - s2).abs() - assert_series_equal(result, expected) - - # max/min - result = td.max() - expected = Timedelta('2 days') - assert result == expected - - result = td.min() - expected = Timedelta('1 days') - assert result == expected - - def test_idxmin(self, string_series): - # test idxmin - # _check_stat_op approach can not be used here because of isna check. - - # add some NaNs - string_series[5:15] = np.NaN - - # skipna or no - assert string_series[string_series.idxmin()] == string_series.min() - assert isna(string_series.idxmin(skipna=False)) - - # no NaNs - nona = string_series.dropna() - assert nona[nona.idxmin()] == nona.min() - assert (nona.index.values.tolist().index(nona.idxmin()) == - nona.values.argmin()) - - # all NaNs - allna = string_series * nan - assert isna(allna.idxmin()) - - # datetime64[ns] - from pandas import date_range - s = Series(date_range('20130102', periods=6)) - result = s.idxmin() - assert result == 0 - - s[0] = np.nan - result = s.idxmin() - assert result == 1 - - def test_numpy_argmin_deprecated(self): - # See gh-16830 - data = np.arange(1, 11) - - s = Series(data, index=data) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - # The deprecation of Series.argmin also causes a deprecation - # warning when calling np.argmin. This behavior is temporary - # until the implementation of Series.argmin is corrected. - result = np.argmin(s) - - assert result == 1 - - with tm.assert_produces_warning(FutureWarning): - # argmin is aliased to idxmin - result = s.argmin() - - assert result == 1 - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - msg = "the 'out' parameter is not supported" - with pytest.raises(ValueError, match=msg): - np.argmin(s, out=data) - - def test_idxmax(self, string_series): - # test idxmax - # _check_stat_op approach can not be used here because of isna check. - - # add some NaNs - string_series[5:15] = np.NaN - - # skipna or no - assert string_series[string_series.idxmax()] == string_series.max() - assert isna(string_series.idxmax(skipna=False)) - - # no NaNs - nona = string_series.dropna() - assert nona[nona.idxmax()] == nona.max() - assert (nona.index.values.tolist().index(nona.idxmax()) == - nona.values.argmax()) - - # all NaNs - allna = string_series * nan - assert isna(allna.idxmax()) - - from pandas import date_range - s = Series(date_range('20130102', periods=6)) - result = s.idxmax() - assert result == 5 - - s[5] = np.nan - result = s.idxmax() - assert result == 4 - - # Float64Index - # GH 5914 - s = pd.Series([1, 2, 3], [1.1, 2.1, 3.1]) - result = s.idxmax() - assert result == 3.1 - result = s.idxmin() - assert result == 1.1 - - s = pd.Series(s.index, s.index) - result = s.idxmax() - assert result == 3.1 - result = s.idxmin() - assert result == 1.1 - - def test_numpy_argmax_deprecated(self): - # See gh-16830 - data = np.arange(1, 11) - - s = Series(data, index=data) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - # The deprecation of Series.argmax also causes a deprecation - # warning when calling np.argmax. This behavior is temporary - # until the implementation of Series.argmax is corrected. - result = np.argmax(s) - assert result == 10 - - with tm.assert_produces_warning(FutureWarning): - # argmax is aliased to idxmax - result = s.argmax() - - assert result == 10 - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - msg = "the 'out' parameter is not supported" - with pytest.raises(ValueError, match=msg): - np.argmax(s, out=data) - def test_ptp(self): # GH21614 N = 1000 @@ -1366,12 +787,6 @@ def test_ptp(self): check_stacklevel=False): s.ptp(numeric_only=True) - def test_empty_timeseries_redections_return_nat(self): - # covers #11245 - for dtype in ('m8[ns]', 'm8[ns]', 'M8[ns]', 'M8[ns, UTC]'): - assert Series([], dtype=dtype).min() is pd.NaT - assert Series([], dtype=dtype).max() is pd.NaT - def test_repeat(self): s = Series(np.random.randn(3), index=['a', 'b', 'c']) @@ -1398,17 +813,19 @@ def test_numpy_repeat(self): def test_searchsorted(self): s = Series([1, 2, 3]) - idx = s.searchsorted(1, side='left') - tm.assert_numpy_array_equal(idx, np.array([0], dtype=np.intp)) + result = s.searchsorted(1, side='left') + assert is_scalar(result) + assert result == 0 - idx = s.searchsorted(1, side='right') - tm.assert_numpy_array_equal(idx, np.array([1], dtype=np.intp)) + result = s.searchsorted(1, side='right') + assert is_scalar(result) + assert result == 1 def test_searchsorted_numeric_dtypes_scalar(self): s = Series([1, 2, 90, 1000, 3e9]) r = s.searchsorted(30) - e = 2 - assert r == e + assert is_scalar(r) + assert r == 2 r = s.searchsorted([30]) e = np.array([2], dtype=np.intp) @@ -1424,8 +841,8 @@ def test_search_sorted_datetime64_scalar(self): s = Series(pd.date_range('20120101', periods=10, freq='2D')) v = pd.Timestamp('20120102') r = s.searchsorted(v) - e = 1 - assert r == e + assert is_scalar(r) + assert r == 1 def test_search_sorted_datetime64_list(self): s = Series(pd.date_range('20120101', periods=10, freq='2D')) @@ -1523,7 +940,7 @@ def test_unstack(self): from numpy import nan index = MultiIndex(levels=[['bar', 'foo'], ['one', 'three', 'two']], - labels=[[1, 1, 0, 0], [0, 1, 0, 2]]) + codes=[[1, 1, 0, 0], [0, 1, 0, 2]]) s = Series(np.arange(4.), index=index) unstacked = s.unstack() @@ -1538,11 +955,11 @@ def test_unstack(self): assert_frame_equal(unstacked, expected.T) index = MultiIndex(levels=[['bar'], ['one', 'two', 'three'], [0, 1]], - labels=[[0, 0, 0, 0, 0, 0], [0, 1, 2, 0, 1, 2], - [0, 1, 0, 1, 0, 1]]) + codes=[[0, 0, 0, 0, 0, 0], [0, 1, 2, 0, 1, 2], + [0, 1, 0, 1, 0, 1]]) s = Series(np.random.randn(6), index=index) exp_index = MultiIndex(levels=[['one', 'two', 'three'], [0, 1]], - labels=[[0, 1, 2, 0, 1, 2], [0, 1, 0, 1, 0, 1]]) + codes=[[0, 1, 2, 0, 1, 2], [0, 1, 0, 1, 0, 1]]) expected = DataFrame({'bar': s.values}, index=exp_index).sort_index(level=0) unstacked = s.unstack(0).sort_index() @@ -1675,6 +1092,42 @@ def test_value_counts_categorical_not_ordered(self): tm.assert_series_equal(s.value_counts(normalize=True), exp) tm.assert_series_equal(idx.value_counts(normalize=True), exp) + @pytest.mark.parametrize("func", [np.any, np.all]) + @pytest.mark.parametrize("kwargs", [ + dict(keepdims=True), + dict(out=object()), + ]) + @td.skip_if_np_lt_115 + def test_validate_any_all_out_keepdims_raises(self, kwargs, func): + s = pd.Series([1, 2]) + param = list(kwargs)[0] + name = func.__name__ + + msg = "the '{}' parameter .* {}".format(param, name) + with pytest.raises(ValueError, match=msg): + func(s, **kwargs) + + @td.skip_if_np_lt_115 + def test_validate_sum_initial(self): + s = pd.Series([1, 2]) + with pytest.raises(ValueError, match="the 'initial' .* sum"): + np.sum(s, initial=10) + + def test_validate_median_initial(self): + s = pd.Series([1, 2]) + with pytest.raises(ValueError, + match="the 'overwrite_input' .* median"): + # It seems like np.median doesn't dispatch, so we use the + # method instead of the ufunc. + s.median(overwrite_input=True) + + @td.skip_if_np_lt_115 + def test_validate_stat_keepdims(self): + s = pd.Series([1, 2]) + with pytest.raises(ValueError, + match="the 'keepdims'"): + np.sum(s, keepdims=True) + main_dtypes = [ 'datetime', @@ -1730,180 +1183,6 @@ def s_main_dtypes_split(request, s_main_dtypes): return s_main_dtypes[request.param] -class TestMode(object): - - @pytest.mark.parametrize('dropna, expected', [ - (True, Series([], dtype=np.float64)), - (False, Series([], dtype=np.float64)) - ]) - def test_mode_empty(self, dropna, expected): - s = Series([], dtype=np.float64) - result = s.mode(dropna) - tm.assert_series_equal(result, expected) - - @pytest.mark.parametrize('dropna, data, expected', [ - (True, [1, 1, 1, 2], [1]), - (True, [1, 1, 1, 2, 3, 3, 3], [1, 3]), - (False, [1, 1, 1, 2], [1]), - (False, [1, 1, 1, 2, 3, 3, 3], [1, 3]), - ]) - @pytest.mark.parametrize( - 'dt', - list(np.typecodes['AllInteger'] + np.typecodes['Float']) - ) - def test_mode_numerical(self, dropna, data, expected, dt): - s = Series(data, dtype=dt) - result = s.mode(dropna) - expected = Series(expected, dtype=dt) - tm.assert_series_equal(result, expected) - - @pytest.mark.parametrize('dropna, expected', [ - (True, [1.0]), - (False, [1, np.nan]), - ]) - def test_mode_numerical_nan(self, dropna, expected): - s = Series([1, 1, 2, np.nan, np.nan]) - result = s.mode(dropna) - expected = Series(expected) - tm.assert_series_equal(result, expected) - - @pytest.mark.parametrize('dropna, expected1, expected2, expected3', [ - (True, ['b'], ['bar'], ['nan']), - (False, ['b'], [np.nan], ['nan']) - ]) - def test_mode_str_obj(self, dropna, expected1, expected2, expected3): - # Test string and object types. - data = ['a'] * 2 + ['b'] * 3 - - s = Series(data, dtype='c') - result = s.mode(dropna) - expected1 = Series(expected1, dtype='c') - tm.assert_series_equal(result, expected1) - - data = ['foo', 'bar', 'bar', np.nan, np.nan, np.nan] - - s = Series(data, dtype=object) - result = s.mode(dropna) - expected2 = Series(expected2, dtype=object) - tm.assert_series_equal(result, expected2) - - data = ['foo', 'bar', 'bar', np.nan, np.nan, np.nan] - - s = Series(data, dtype=object).astype(str) - result = s.mode(dropna) - expected3 = Series(expected3, dtype=str) - tm.assert_series_equal(result, expected3) - - @pytest.mark.parametrize('dropna, expected1, expected2', [ - (True, ['foo'], ['foo']), - (False, ['foo'], [np.nan]) - ]) - def test_mode_mixeddtype(self, dropna, expected1, expected2): - s = Series([1, 'foo', 'foo']) - result = s.mode(dropna) - expected = Series(expected1) - tm.assert_series_equal(result, expected) - - s = Series([1, 'foo', 'foo', np.nan, np.nan, np.nan]) - result = s.mode(dropna) - expected = Series(expected2, dtype=object) - tm.assert_series_equal(result, expected) - - @pytest.mark.parametrize('dropna, expected1, expected2', [ - (True, ['1900-05-03', '2011-01-03', '2013-01-02'], - ['2011-01-03', '2013-01-02']), - (False, [np.nan], [np.nan, '2011-01-03', '2013-01-02']), - ]) - def test_mode_datetime(self, dropna, expected1, expected2): - s = Series(['2011-01-03', '2013-01-02', - '1900-05-03', 'nan', 'nan'], dtype='M8[ns]') - result = s.mode(dropna) - expected1 = Series(expected1, dtype='M8[ns]') - tm.assert_series_equal(result, expected1) - - s = Series(['2011-01-03', '2013-01-02', '1900-05-03', - '2011-01-03', '2013-01-02', 'nan', 'nan'], - dtype='M8[ns]') - result = s.mode(dropna) - expected2 = Series(expected2, dtype='M8[ns]') - tm.assert_series_equal(result, expected2) - - @pytest.mark.parametrize('dropna, expected1, expected2', [ - (True, ['-1 days', '0 days', '1 days'], ['2 min', '1 day']), - (False, [np.nan], [np.nan, '2 min', '1 day']), - ]) - def test_mode_timedelta(self, dropna, expected1, expected2): - # gh-5986: Test timedelta types. - - s = Series(['1 days', '-1 days', '0 days', 'nan', 'nan'], - dtype='timedelta64[ns]') - result = s.mode(dropna) - expected1 = Series(expected1, dtype='timedelta64[ns]') - tm.assert_series_equal(result, expected1) - - s = Series(['1 day', '1 day', '-1 day', '-1 day 2 min', - '2 min', '2 min', 'nan', 'nan'], - dtype='timedelta64[ns]') - result = s.mode(dropna) - expected2 = Series(expected2, dtype='timedelta64[ns]') - tm.assert_series_equal(result, expected2) - - @pytest.mark.parametrize('dropna, expected1, expected2, expected3', [ - (True, Categorical([1, 2], categories=[1, 2]), - Categorical(['a'], categories=[1, 'a']), - Categorical([3, 1], categories=[3, 2, 1], ordered=True)), - (False, Categorical([np.nan], categories=[1, 2]), - Categorical([np.nan, 'a'], categories=[1, 'a']), - Categorical([np.nan, 3, 1], categories=[3, 2, 1], ordered=True)), - ]) - def test_mode_category(self, dropna, expected1, expected2, expected3): - s = Series(Categorical([1, 2, np.nan, np.nan])) - result = s.mode(dropna) - expected1 = Series(expected1, dtype='category') - tm.assert_series_equal(result, expected1) - - s = Series(Categorical([1, 'a', 'a', np.nan, np.nan])) - result = s.mode(dropna) - expected2 = Series(expected2, dtype='category') - tm.assert_series_equal(result, expected2) - - s = Series(Categorical([1, 1, 2, 3, 3, np.nan, np.nan], - categories=[3, 2, 1], ordered=True)) - result = s.mode(dropna) - expected3 = Series(expected3, dtype='category') - tm.assert_series_equal(result, expected3) - - @pytest.mark.parametrize('dropna, expected1, expected2', [ - (True, [2**63], [1, 2**63]), - (False, [2**63], [1, 2**63]) - ]) - def test_mode_intoverflow(self, dropna, expected1, expected2): - # Test for uint64 overflow. - s = Series([1, 2**63, 2**63], dtype=np.uint64) - result = s.mode(dropna) - expected1 = Series(expected1, dtype=np.uint64) - tm.assert_series_equal(result, expected1) - - s = Series([1, 2**63], dtype=np.uint64) - result = s.mode(dropna) - expected2 = Series(expected2, dtype=np.uint64) - tm.assert_series_equal(result, expected2) - - @pytest.mark.skipif(not compat.PY3, reason="only PY3") - def test_mode_sortwarning(self): - # Check for the warning that is raised when the mode - # results cannot be sorted - - expected = Series(['foo', np.nan]) - s = Series([1, 'foo', 'foo', np.nan, np.nan]) - - with tm.assert_produces_warning(UserWarning, check_stacklevel=False): - result = s.mode(dropna=False) - result = result.sort_values().reset_index(drop=True) - - tm.assert_series_equal(result, expected) - - def assert_check_nselect_boundary(vals, dtype, method): # helper function for 'test_boundary_{dtype}' tests s = Series(vals, dtype=dtype) @@ -2042,40 +1321,6 @@ def test_count(self): result = s.count() assert result == 2 - def test_min_max(self): - # unordered cats have no min/max - cat = Series(Categorical(["a", "b", "c", "d"], ordered=False)) - pytest.raises(TypeError, lambda: cat.min()) - pytest.raises(TypeError, lambda: cat.max()) - - cat = Series(Categorical(["a", "b", "c", "d"], ordered=True)) - _min = cat.min() - _max = cat.max() - assert _min == "a" - assert _max == "d" - - cat = Series(Categorical(["a", "b", "c", "d"], categories=[ - 'd', 'c', 'b', 'a'], ordered=True)) - _min = cat.min() - _max = cat.max() - assert _min == "d" - assert _max == "a" - - cat = Series(Categorical( - [np.nan, "b", "c", np.nan], categories=['d', 'c', 'b', 'a' - ], ordered=True)) - _min = cat.min() - _max = cat.max() - assert np.isnan(_min) - assert _max == "b" - - cat = Series(Categorical( - [np.nan, 1, 2, np.nan], categories=[5, 4, 3, 2, 1], ordered=True)) - _min = cat.min() - _max = cat.max() - assert np.isnan(_min) - assert _max == 1 - def test_value_counts(self): # GH 12835 cats = Categorical(list('abcccb'), categories=list('cabd')) @@ -2140,7 +1385,7 @@ def test_value_counts_with_nan(self): "dtype", ["int_", "uint", "float_", "unicode_", "timedelta64[h]", pytest.param("datetime64[D]", - marks=pytest.mark.xfail(reason="GH#7996", strict=True))] + marks=pytest.mark.xfail(reason="GH#7996"))] ) @pytest.mark.parametrize("is_ordered", [True, False]) def test_drop_duplicates_categorical_non_bool(self, dtype, is_ordered): diff --git a/pandas/tests/series/test_apply.py b/pandas/tests/series/test_apply.py index f4c8ebe64630c..90cf6916df0d1 100644 --- a/pandas/tests/series/test_apply.py +++ b/pandas/tests/series/test_apply.py @@ -216,24 +216,20 @@ def test_transform(self, string_series): def test_transform_and_agg_error(self, string_series): # we are trying to transform with an aggregator - def f(): + with pytest.raises(ValueError): string_series.transform(['min', 'max']) - pytest.raises(ValueError, f) - def f(): + with pytest.raises(ValueError): with np.errstate(all='ignore'): string_series.agg(['sqrt', 'max']) - pytest.raises(ValueError, f) - def f(): + with pytest.raises(ValueError): with np.errstate(all='ignore'): string_series.transform(['sqrt', 'max']) - pytest.raises(ValueError, f) - def f(): + with pytest.raises(ValueError): with np.errstate(all='ignore'): string_series.agg({'foo': np.sqrt, 'bar': 'sum'}) - pytest.raises(ValueError, f) def test_demo(self): # demonstration tests diff --git a/pandas/tests/series/test_arithmetic.py b/pandas/tests/series/test_arithmetic.py index d1d6aa8b51c0d..687ed59772d18 100644 --- a/pandas/tests/series/test_arithmetic.py +++ b/pandas/tests/series/test_arithmetic.py @@ -1,5 +1,4 @@ # -*- coding: utf-8 -*- -from datetime import timedelta import operator import numpy as np @@ -59,16 +58,6 @@ def test_flex_method_equivalence(self, opname, ts): class TestSeriesArithmetic(object): # Some of these may end up in tests/arithmetic, but are not yet sorted - def test_empty_series_add_sub(self): - # GH#13844 - a = Series(dtype='M8[ns]') - b = Series(dtype='m8[ns]') - tm.assert_series_equal(a, a + b) - tm.assert_series_equal(a, a - b) - tm.assert_series_equal(a, b + a) - with pytest.raises(TypeError): - b - a - def test_add_series_with_period_index(self): rng = pd.period_range('1/1/2000', '1/1/2010', freq='A') ts = Series(np.random.randn(len(rng)), index=rng) @@ -85,32 +74,6 @@ def test_add_series_with_period_index(self): with pytest.raises(IncompatibleFrequency, match=msg): ts + ts.asfreq('D', how="end") - def test_operators_datetimelike(self): - - # ## timedelta64 ### - td1 = Series([timedelta(minutes=5, seconds=3)] * 3) - td1.iloc[2] = np.nan - - # ## datetime64 ### - dt1 = Series([pd.Timestamp('20111230'), pd.Timestamp('20120101'), - pd.Timestamp('20120103')]) - dt1.iloc[2] = np.nan - dt2 = Series([pd.Timestamp('20111231'), pd.Timestamp('20120102'), - pd.Timestamp('20120104')]) - dt1 - dt2 - dt2 - dt1 - - # ## datetime64 with timetimedelta ### - dt1 + td1 - td1 + dt1 - dt1 - td1 - # TODO: Decide if this ought to work. - # td1 - dt1 - - # ## timetimedelta with datetime64 ### - td1 + dt1 - dt1 + td1 - # ------------------------------------------------------------------ # Comparisons @@ -207,18 +170,3 @@ def test_ser_cmp_result_names(self, names, op): ser = Series(cidx).rename(names[1]) result = op(ser, cidx) assert result.name == names[2] - - -def test_pow_ops_object(): - # 22922 - # pow is weird with masking & 1, so testing here - a = Series([1, np.nan, 1, np.nan], dtype=object) - b = Series([1, np.nan, np.nan, 1], dtype=object) - result = a ** b - expected = Series(a.values ** b.values, dtype=object) - tm.assert_series_equal(result, expected) - - result = b ** a - expected = Series(b.values ** a.values, dtype=object) - - tm.assert_series_equal(result, expected) diff --git a/pandas/tests/series/test_block_internals.py b/pandas/tests/series/test_block_internals.py new file mode 100644 index 0000000000000..ccfb169cc2f8d --- /dev/null +++ b/pandas/tests/series/test_block_internals.py @@ -0,0 +1,42 @@ +# -*- coding: utf-8 -*- + +import pandas as pd + +# Segregated collection of methods that require the BlockManager internal data +# structure + + +class TestSeriesBlockInternals(object): + + def test_setitem_invalidates_datetime_index_freq(self): + # GH#24096 altering a datetime64tz Series inplace invalidates the + # `freq` attribute on the underlying DatetimeIndex + + dti = pd.date_range('20130101', periods=3, tz='US/Eastern') + ts = dti[1] + ser = pd.Series(dti) + assert ser._values is not dti + assert ser._values._data.base is not dti._data.base + assert dti.freq == 'D' + ser.iloc[1] = pd.NaT + assert ser._values.freq is None + + # check that the DatetimeIndex was not altered in place + assert ser._values is not dti + assert ser._values._data.base is not dti._data.base + assert dti[1] == ts + assert dti.freq == 'D' + + def test_dt64tz_setitem_does_not_mutate_dti(self): + # GH#21907, GH#24096 + dti = pd.date_range('2016-01-01', periods=10, tz='US/Pacific') + ts = dti[0] + ser = pd.Series(dti) + assert ser._values is not dti + assert ser._values._data.base is not dti._data.base + assert ser._data.blocks[0].values is not dti + assert ser._data.blocks[0].values._data.base is not dti._data.base + + ser[::3] = pd.NaT + assert ser[0] is pd.NaT + assert dti[0] == ts diff --git a/pandas/tests/series/test_datetime_values.py b/pandas/tests/series/test_datetime_values.py index b1c92c2b82a56..745a9eee6c300 100644 --- a/pandas/tests/series/test_datetime_values.py +++ b/pandas/tests/series/test_datetime_values.py @@ -335,8 +335,8 @@ def test_dt_accessor_datetime_name_accessors(self, time_locale): expected_days = calendar.day_name[:] expected_months = calendar.month_name[1:] - s = Series(DatetimeIndex(freq='D', start=datetime(1998, 1, 1), - periods=365)) + s = Series(date_range(freq='D', start=datetime(1998, 1, 1), + periods=365)) english_days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'] for day, name, eng_name in zip(range(4, 11), @@ -348,7 +348,7 @@ def test_dt_accessor_datetime_name_accessors(self, time_locale): s = s.append(Series([pd.NaT])) assert np.isnan(s.dt.day_name(locale=time_locale).iloc[-1]) - s = Series(DatetimeIndex(freq='M', start='2012', end='2013')) + s = Series(date_range(freq='M', start='2012', end='2013')) result = s.dt.month_name(locale=time_locale) expected = Series([month.capitalize() for month in expected_months]) @@ -531,27 +531,19 @@ def test_dt_timetz_accessor(self, tz_naive_fixture): result = s.dt.timetz tm.assert_series_equal(result, expected) - @pytest.mark.parametrize('nat', [ - pd.Series([pd.NaT, pd.NaT]), - pd.Series([pd.NaT, pd.Timedelta('nat')]), - pd.Series([pd.Timedelta('nat'), pd.Timedelta('nat')])]) - def test_minmax_nat_series(self, nat): - # GH 23282 - assert nat.min() is pd.NaT - assert nat.max() is pd.NaT - - @pytest.mark.parametrize('nat', [ - # GH 23282 - pd.DataFrame([pd.NaT, pd.NaT]), - pd.DataFrame([pd.NaT, pd.Timedelta('nat')]), - pd.DataFrame([pd.Timedelta('nat'), pd.Timedelta('nat')])]) - def test_minmax_nat_dataframe(self, nat): - assert nat.min()[0] is pd.NaT - assert nat.max()[0] is pd.NaT - def test_setitem_with_string_index(self): # GH 23451 x = pd.Series([1, 2, 3], index=['Date', 'b', 'other']) x['Date'] = date.today() assert x.Date == date.today() assert x['Date'] == date.today() + + def test_setitem_with_different_tz(self): + # GH#24024 + ser = pd.Series(pd.date_range('2000', periods=2, tz="US/Central")) + ser[0] = pd.Timestamp("2000", tz='US/Eastern') + expected = pd.Series([ + pd.Timestamp("2000-01-01 00:00:00-05:00", tz="US/Eastern"), + pd.Timestamp("2000-01-02 00:00:00-06:00", tz="US/Central"), + ], dtype=object) + tm.assert_series_equal(ser, expected) diff --git a/pandas/tests/series/test_dtypes.py b/pandas/tests/series/test_dtypes.py index 79b1bc10b9f4b..2bc009c5a2fc8 100644 --- a/pandas/tests/series/test_dtypes.py +++ b/pandas/tests/series/test_dtypes.py @@ -492,3 +492,13 @@ def test_is_homogeneous_type(self): assert Series()._is_homogeneous_type assert Series([1, 2])._is_homogeneous_type assert Series(pd.Categorical([1, 2]))._is_homogeneous_type + + @pytest.mark.parametrize("data", [ + pd.period_range("2000", periods=4), + pd.IntervalIndex.from_breaks([1, 2, 3, 4]) + ]) + def test_values_compatibility(self, data): + # https://github.com/pandas-dev/pandas/issues/23995 + result = pd.Series(data).values + expected = np.array(data.astype(object)) + tm.assert_numpy_array_equal(result, expected) diff --git a/pandas/tests/series/test_duplicates.py b/pandas/tests/series/test_duplicates.py index f41483405f6cc..26222637e3509 100644 --- a/pandas/tests/series/test_duplicates.py +++ b/pandas/tests/series/test_duplicates.py @@ -91,8 +91,11 @@ def __ne__(self, other): ('last', Series([False, True, True, False, False, False, False])), (False, Series([False, True, True, False, True, True, False])) ]) -def test_drop_duplicates_non_bool(any_numpy_dtype, keep, expected): - tc = Series([1, 2, 3, 5, 3, 2, 4], dtype=np.dtype(any_numpy_dtype)) +def test_drop_duplicates(any_numpy_dtype, keep, expected): + tc = Series([1, 0, 3, 5, 3, 0, 4], dtype=np.dtype(any_numpy_dtype)) + + if tc.dtype == 'bool': + pytest.skip('tested separately in test_drop_duplicates_bool') tm.assert_series_equal(tc.duplicated(keep=keep), expected) tm.assert_series_equal(tc.drop_duplicates(keep=keep), tc[~expected]) diff --git a/pandas/tests/series/test_operators.py b/pandas/tests/series/test_operators.py index bcecedc2bba97..f6fb5f0c46cc8 100644 --- a/pandas/tests/series/test_operators.py +++ b/pandas/tests/series/test_operators.py @@ -12,7 +12,7 @@ import pandas as pd from pandas import ( - Categorical, DataFrame, Index, NaT, Series, bdate_range, date_range, isna) + Categorical, DataFrame, Index, Series, bdate_range, date_range, isna) from pandas.core import ops import pandas.core.nanops as nanops import pandas.util.testing as tm @@ -543,49 +543,6 @@ def test_unequal_categorical_comparison_raises_type_error(self): tm.assert_series_equal(cat == "d", Series([False, False, False])) tm.assert_series_equal(cat != "d", Series([True, True, True])) - @pytest.mark.parametrize('pair', [ - ([pd.Timestamp('2011-01-01'), NaT, pd.Timestamp('2011-01-03')], - [NaT, NaT, pd.Timestamp('2011-01-03')]), - - ([pd.Timedelta('1 days'), NaT, pd.Timedelta('3 days')], - [NaT, NaT, pd.Timedelta('3 days')]), - - ([pd.Period('2011-01', freq='M'), NaT, - pd.Period('2011-03', freq='M')], - [NaT, NaT, pd.Period('2011-03', freq='M')]), - - ]) - @pytest.mark.parametrize('reverse', [True, False]) - @pytest.mark.parametrize('box', [Series, Index]) - @pytest.mark.parametrize('dtype', [None, object]) - def test_nat_comparisons(self, dtype, box, reverse, pair): - l, r = pair - if reverse: - # add lhs / rhs switched data - l, r = r, l - - left = Series(l, dtype=dtype) - right = box(r, dtype=dtype) - # Series, Index - - expected = Series([False, False, True]) - assert_series_equal(left == right, expected) - - expected = Series([True, True, False]) - assert_series_equal(left != right, expected) - - expected = Series([False, False, False]) - assert_series_equal(left < right, expected) - - expected = Series([False, False, False]) - assert_series_equal(left > right, expected) - - expected = Series([False, False, True]) - assert_series_equal(left >= right, expected) - - expected = Series([False, False, True]) - assert_series_equal(left <= right, expected) - def test_ne(self): ts = Series([3, 4, 5, 6, 7], [3, 4, 5, 6, 7], dtype=float) expected = [True, True, False, True, True] @@ -793,53 +750,6 @@ def test_op_duplicate_index(self): expected = pd.Series([11, 12, np.nan], index=[1, 1, 2]) assert_series_equal(result, expected) - @pytest.mark.parametrize( - "test_input,error_type", - [ - (pd.Series([]), ValueError), - - # For strings, or any Series with dtype 'O' - (pd.Series(['foo', 'bar', 'baz']), TypeError), - (pd.Series([(1,), (2,)]), TypeError), - - # For mixed data types - ( - pd.Series(['foo', 'foo', 'bar', 'bar', None, np.nan, 'baz']), - TypeError - ), - ] - ) - def test_assert_idxminmax_raises(self, test_input, error_type): - """ - Cases where ``Series.argmax`` and related should raise an exception - """ - with pytest.raises(error_type): - test_input.idxmin() - with pytest.raises(error_type): - test_input.idxmin(skipna=False) - with pytest.raises(error_type): - test_input.idxmax() - with pytest.raises(error_type): - test_input.idxmax(skipna=False) - - def test_idxminmax_with_inf(self): - # For numeric data with NA and Inf (GH #13595) - s = pd.Series([0, -np.inf, np.inf, np.nan]) - - assert s.idxmin() == 1 - assert np.isnan(s.idxmin(skipna=False)) - - assert s.idxmax() == 2 - assert np.isnan(s.idxmax(skipna=False)) - - # Using old-style behavior that treats floating point nan, -inf, and - # +inf as missing - with pd.option_context('mode.use_inf_as_na', True): - assert s.idxmin() == 0 - assert np.isnan(s.idxmin(skipna=False)) - assert s.idxmax() == 0 - np.isnan(s.idxmax(skipna=False)) - class TestSeriesUnaryOps(object): # __neg__, __pos__, __inv__ diff --git a/pandas/tests/series/test_period.py b/pandas/tests/series/test_period.py index ce620db8d9c1b..0a86bb0b67797 100644 --- a/pandas/tests/series/test_period.py +++ b/pandas/tests/series/test_period.py @@ -64,8 +64,7 @@ def test_between(self): # --------------------------------------------------------------------- # NaT support - @pytest.mark.xfail(reason="PeriodDtype Series not supported yet", - strict=True) + @pytest.mark.xfail(reason="PeriodDtype Series not supported yet") def test_NaT_scalar(self): series = Series([0, 1000, 2000, pd._libs.iNaT], dtype='period[D]') @@ -75,8 +74,7 @@ def test_NaT_scalar(self): series[2] = val assert pd.isna(series[2]) - @pytest.mark.xfail(reason="PeriodDtype Series not supported yet", - strict=True) + @pytest.mark.xfail(reason="PeriodDtype Series not supported yet") def test_NaT_cast(self): result = Series([np.nan]).astype('period[D]') expected = Series([pd.NaT]) diff --git a/pandas/tests/series/test_rank.py b/pandas/tests/series/test_rank.py index 72d05cb4839ef..da414a577ae0b 100644 --- a/pandas/tests/series/test_rank.py +++ b/pandas/tests/series/test_rank.py @@ -222,8 +222,7 @@ def test_rank_signature(self): 'int64', marks=pytest.mark.xfail( reason="iNaT is equivalent to minimum value of dtype" - "int64 pending issue GH#16674", - strict=True)), + "int64 pending issue GH#16674")), ([NegInfinity(), '1', 'A', 'BA', 'Ba', 'C', Infinity()], 'object') ]) diff --git a/pandas/tests/series/test_repr.py b/pandas/tests/series/test_repr.py index ef96274746655..86de8176a9a65 100644 --- a/pandas/tests/series/test_repr.py +++ b/pandas/tests/series/test_repr.py @@ -25,8 +25,8 @@ class TestSeriesRepr(TestData): def test_multilevel_name_print(self): index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], names=['first', 'second']) s = Series(lrange(0, len(index)), index=index, name='sth') expected = ["first second", "foo one 0", @@ -364,11 +364,11 @@ def test_categorical_series_repr_datetime_ordered(self): def test_categorical_series_repr_period(self): idx = period_range('2011-01-01 09:00', freq='H', periods=5) s = Series(Categorical(idx)) - exp = """0 2011-01-01 09:00 -1 2011-01-01 10:00 -2 2011-01-01 11:00 -3 2011-01-01 12:00 -4 2011-01-01 13:00 + exp = """0 2011-01-01 09:00 +1 2011-01-01 10:00 +2 2011-01-01 11:00 +3 2011-01-01 12:00 +4 2011-01-01 13:00 dtype: category Categories (5, period[H]): [2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00]""" # noqa @@ -377,11 +377,11 @@ def test_categorical_series_repr_period(self): idx = period_range('2011-01', freq='M', periods=5) s = Series(Categorical(idx)) - exp = """0 2011-01 -1 2011-02 -2 2011-03 -3 2011-04 -4 2011-05 + exp = """0 2011-01 +1 2011-02 +2 2011-03 +3 2011-04 +4 2011-05 dtype: category Categories (5, period[M]): [2011-01, 2011-02, 2011-03, 2011-04, 2011-05]""" @@ -390,11 +390,11 @@ def test_categorical_series_repr_period(self): def test_categorical_series_repr_period_ordered(self): idx = period_range('2011-01-01 09:00', freq='H', periods=5) s = Series(Categorical(idx, ordered=True)) - exp = """0 2011-01-01 09:00 -1 2011-01-01 10:00 -2 2011-01-01 11:00 -3 2011-01-01 12:00 -4 2011-01-01 13:00 + exp = """0 2011-01-01 09:00 +1 2011-01-01 10:00 +2 2011-01-01 11:00 +3 2011-01-01 12:00 +4 2011-01-01 13:00 dtype: category Categories (5, period[H]): [2011-01-01 09:00 < 2011-01-01 10:00 < 2011-01-01 11:00 < 2011-01-01 12:00 < 2011-01-01 13:00]""" # noqa @@ -403,11 +403,11 @@ def test_categorical_series_repr_period_ordered(self): idx = period_range('2011-01', freq='M', periods=5) s = Series(Categorical(idx, ordered=True)) - exp = """0 2011-01 -1 2011-02 -2 2011-03 -3 2011-04 -4 2011-05 + exp = """0 2011-01 +1 2011-02 +2 2011-03 +3 2011-04 +4 2011-05 dtype: category Categories (5, period[M]): [2011-01 < 2011-02 < 2011-03 < 2011-04 < 2011-05]""" diff --git a/pandas/tests/series/test_timeseries.py b/pandas/tests/series/test_timeseries.py index 969c20601c7c8..4f47c308c9a13 100644 --- a/pandas/tests/series/test_timeseries.py +++ b/pandas/tests/series/test_timeseries.py @@ -967,35 +967,6 @@ def test_setops_preserve_freq(self, tz): assert result.freq == rng.freq assert result.tz == rng.tz - def test_min_max(self): - rng = date_range('1/1/2000', '12/31/2000') - rng2 = rng.take(np.random.permutation(len(rng))) - - the_min = rng2.min() - the_max = rng2.max() - assert isinstance(the_min, Timestamp) - assert isinstance(the_max, Timestamp) - assert the_min == rng[0] - assert the_max == rng[-1] - - assert rng.min() == rng[0] - assert rng.max() == rng[-1] - - def test_min_max_series(self): - rng = date_range('1/1/2000', periods=10, freq='4h') - lvls = ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'] - df = DataFrame({'TS': rng, 'V': np.random.randn(len(rng)), 'L': lvls}) - - result = df.TS.max() - exp = Timestamp(df.TS.iat[-1]) - assert isinstance(result, Timestamp) - assert result == exp - - result = df.TS.min() - exp = Timestamp(df.TS.iat[0]) - assert isinstance(result, Timestamp) - assert result == exp - def test_from_M8_structured(self): dates = [(datetime(2012, 9, 9, 0, 0), datetime(2012, 9, 8, 15, 10))] arr = np.array(dates, @@ -1018,8 +989,18 @@ def test_get_level_values_box(self): dates = date_range('1/1/2000', periods=4) levels = [dates, [0, 1]] - labels = [[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]] + codes = [[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]] - index = MultiIndex(levels=levels, labels=labels) + index = MultiIndex(levels=levels, codes=codes) assert isinstance(index.get_level_values(0)[0], Timestamp) + + def test_view_tz(self): + # GH#24024 + ser = pd.Series(pd.date_range('2000', periods=4, tz='US/Central')) + result = ser.view("i8") + expected = pd.Series([946706400000000000, + 946792800000000000, + 946879200000000000, + 946965600000000000]) + tm.assert_series_equal(result, expected) diff --git a/pandas/tests/series/test_timezones.py b/pandas/tests/series/test_timezones.py index bdf5944cab408..2e52f7ddbac9c 100644 --- a/pandas/tests/series/test_timezones.py +++ b/pandas/tests/series/test_timezones.py @@ -343,7 +343,7 @@ def test_getitem_pydatetime_tz(self, tzstr): def test_series_truncate_datetimeindex_tz(self): # GH 9243 - idx = date_range('4/1/2005', '4/30/2005', freq='CD', tz='US/Pacific') + idx = date_range('4/1/2005', '4/30/2005', freq='D', tz='US/Pacific') s = Series(range(len(idx)), index=idx) result = s.truncate(datetime(2005, 4, 2), datetime(2005, 4, 4)) expected = Series([1, 2, 3], index=idx[1:4]) diff --git a/pandas/tests/sparse/frame/test_analytics.py b/pandas/tests/sparse/frame/test_analytics.py index 54e3ddbf2f1cf..2d9ccaa059a8c 100644 --- a/pandas/tests/sparse/frame/test_analytics.py +++ b/pandas/tests/sparse/frame/test_analytics.py @@ -4,8 +4,7 @@ from pandas.util import testing as tm -@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)', - strict=True) +@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)') def test_quantile(): # GH 17386 data = [[1, 1], [2, 10], [3, 100], [np.nan, np.nan]] @@ -22,8 +21,7 @@ def test_quantile(): tm.assert_sp_series_equal(result, sparse_expected) -@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)', - strict=True) +@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)') def test_quantile_multi(): # GH 17386 data = [[1, 1], [2, 10], [3, 100], [np.nan, np.nan]] diff --git a/pandas/tests/sparse/frame/test_apply.py b/pandas/tests/sparse/frame/test_apply.py index 2d7a537f0fb3b..c26776ac4fd49 100644 --- a/pandas/tests/sparse/frame/test_apply.py +++ b/pandas/tests/sparse/frame/test_apply.py @@ -91,3 +91,14 @@ def test_applymap(frame): # just test that it works result = frame.applymap(lambda x: x * 2) assert isinstance(result, SparseDataFrame) + + +def test_apply_keep_sparse_dtype(): + # GH 23744 + sdf = SparseDataFrame(np.array([[0, 1, 0], [0, 0, 0], [0, 0, 1]]), + columns=['b', 'a', 'c'], default_fill_value=1) + df = DataFrame(sdf) + + expected = sdf.apply(np.exp) + result = df.apply(np.exp) + tm.assert_frame_equal(expected, result) diff --git a/pandas/tests/sparse/frame/test_frame.py b/pandas/tests/sparse/frame/test_frame.py index f802598542cb9..21100e3c3ffeb 100644 --- a/pandas/tests/sparse/frame/test_frame.py +++ b/pandas/tests/sparse/frame/test_frame.py @@ -101,9 +101,7 @@ def test_constructor(self, float_frame, float_frame_int_kind, assert isinstance(series, SparseSeries) # construct from nested dict - data = {} - for c, s in compat.iteritems(float_frame): - data[c] = s.to_dict() + data = {c: s.to_dict() for c, s in compat.iteritems(float_frame)} sdf = SparseDataFrame(data) tm.assert_sp_frame_equal(sdf, float_frame) @@ -1142,7 +1140,7 @@ def test_combine_first(self, float_frame): tm.assert_sp_frame_equal(result, expected) - @pytest.mark.xfail(reason="No longer supported.", strict=True) + @pytest.mark.xfail(reason="No longer supported.") def test_combine_first_with_dense(self): # We could support this if we allow # pd.core.dtypes.cast.find_common_type to special case SparseDtype @@ -1198,8 +1196,7 @@ def test_as_blocks(self): tm.assert_frame_equal(df_blocks['Sparse[float64, nan]'], df) @pytest.mark.xfail(reason='nan column names in _init_dict problematic ' - '(GH#16894)', - strict=True) + '(GH#16894)') def test_nan_columnname(self): # GH 8822 nan_colname = DataFrame(Series(1.0, index=[0]), columns=[nan]) @@ -1316,8 +1313,7 @@ def test_numpy_func_call(self, float_frame): for func in funcs: getattr(np, func)(float_frame) - @pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH 17386)', - strict=True) + @pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH 17386)') def test_quantile(self): # GH 17386 data = [[1, 1], [2, 10], [3, 100], [nan, nan]] @@ -1333,8 +1329,7 @@ def test_quantile(self): tm.assert_series_equal(result, dense_expected) tm.assert_sp_series_equal(result, sparse_expected) - @pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH 17386)', - strict=True) + @pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH 17386)') def test_quantile_multi(self): # GH 17386 data = [[1, 1], [2, 10], [3, 100], [nan, nan]] diff --git a/pandas/tests/sparse/frame/test_indexing.py b/pandas/tests/sparse/frame/test_indexing.py index 607eb2da6ded0..e4ca3b90ff8d0 100644 --- a/pandas/tests/sparse/frame/test_indexing.py +++ b/pandas/tests/sparse/frame/test_indexing.py @@ -18,8 +18,7 @@ [np.nan, np.nan] ] ]) -@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)', - strict=True) +@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)') def test_where_with_numeric_data(data): # GH 17386 lower_bound = 1.5 @@ -52,8 +51,7 @@ def test_where_with_numeric_data(data): 0.1, 100.0 + 100.0j ]) -@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)', - strict=True) +@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)') def test_where_with_numeric_data_and_other(data, other): # GH 17386 lower_bound = 1.5 @@ -70,8 +68,7 @@ def test_where_with_numeric_data_and_other(data, other): tm.assert_sp_frame_equal(result, sparse_expected) -@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)', - strict=True) +@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)') def test_where_with_bool_data(): # GH 17386 data = [[False, False], [True, True], [False, False]] @@ -94,8 +91,7 @@ def test_where_with_bool_data(): 0.1, 100.0 + 100.0j ]) -@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)', - strict=True) +@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)') def test_where_with_bool_data_and_other(other): # GH 17386 data = [[False, False], [True, True], [False, False]] diff --git a/pandas/tests/sparse/frame/test_to_from_scipy.py b/pandas/tests/sparse/frame/test_to_from_scipy.py index 1a10ff83d3097..e5c50e9574f90 100644 --- a/pandas/tests/sparse/frame/test_to_from_scipy.py +++ b/pandas/tests/sparse/frame/test_to_from_scipy.py @@ -1,5 +1,6 @@ import pytest import numpy as np +import pandas as pd from pandas.util import testing as tm from pandas import SparseDataFrame, SparseSeries from pandas.core.sparse.api import SparseDtype @@ -168,3 +169,16 @@ def test_from_scipy_fillna(spmatrix): expected[col].fill_value = -1 tm.assert_sp_frame_equal(sdf, expected) + + +def test_index_names_multiple_nones(): + # https://github.com/pandas-dev/pandas/pull/24092 + sparse = pytest.importorskip("scipy.sparse") + + s = (pd.Series(1, index=pd.MultiIndex.from_product([['A', 'B'], [0, 1]])) + .to_sparse()) + result, _, _ = s.to_coo() + assert isinstance(result, sparse.coo_matrix) + result = result.toarray() + expected = np.ones((2, 2), dtype="int64") + tm.assert_numpy_array_equal(result, expected) diff --git a/pandas/tests/sparse/series/test_indexing.py b/pandas/tests/sparse/series/test_indexing.py index 998285d933492..989cf3b974560 100644 --- a/pandas/tests/sparse/series/test_indexing.py +++ b/pandas/tests/sparse/series/test_indexing.py @@ -18,8 +18,7 @@ np.nan, np.nan ] ]) -@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)', - strict=True) +@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)') def test_where_with_numeric_data(data): # GH 17386 lower_bound = 1.5 @@ -70,8 +69,7 @@ def test_where_with_numeric_data_and_other(data, other): tm.assert_sp_series_equal(result, sparse_expected) -@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)', - strict=True) +@pytest.mark.xfail(reason='Wrong SparseBlock initialization (GH#17386)') def test_where_with_bool_data(): # GH 17386 data = [False, False, True, True, False, False] diff --git a/pandas/tests/sparse/series/test_series.py b/pandas/tests/sparse/series/test_series.py index fd5dbcd932993..225ef96581e72 100644 --- a/pandas/tests/sparse/series/test_series.py +++ b/pandas/tests/sparse/series/test_series.py @@ -843,10 +843,10 @@ def test_dropna(self): def test_homogenize(self): def _check_matches(indices, expected): - data = {} - for i, idx in enumerate(indices): - data[i] = SparseSeries(idx.to_int_index().indices, - sparse_index=idx, fill_value=np.nan) + data = {i: SparseSeries(idx.to_int_index().indices, + sparse_index=idx, fill_value=np.nan) + for i, idx in enumerate(indices)} + # homogenized is only valid with NaN fill values homogenized = spf.homogenize(data) diff --git a/pandas/tests/sparse/test_reshape.py b/pandas/tests/sparse/test_reshape.py index b492c47375bcf..d4ba672607982 100644 --- a/pandas/tests/sparse/test_reshape.py +++ b/pandas/tests/sparse/test_reshape.py @@ -35,4 +35,8 @@ def test_sparse_frame_unstack(sparse_df): def test_sparse_series_unstack(sparse_df, multi_index3): frame = pd.SparseSeries(np.ones(3), index=multi_index3).unstack() - tm.assert_sp_frame_equal(frame, sparse_df) + + arr = np.array([1, np.nan, np.nan]) + arrays = {i: pd.SparseArray(np.roll(arr, i)) for i in range(3)} + expected = pd.DataFrame(arrays) + tm.assert_frame_equal(frame, expected) diff --git a/pandas/tests/test_algos.py b/pandas/tests/test_algos.py index fa33a1ceae0b9..c9d403f6696af 100644 --- a/pandas/tests/test_algos.py +++ b/pandas/tests/test_algos.py @@ -1361,6 +1361,14 @@ def test_hashtable_unique(self, htable, tm_dtype, writable): result_unique = htable().unique(s_duplicated.values) tm.assert_numpy_array_equal(result_unique, expected_unique) + # test return_inverse=True + # reconstruction can only succeed if the inverse is correct + result_unique, result_inverse = htable().unique(s_duplicated.values, + return_inverse=True) + tm.assert_numpy_array_equal(result_unique, expected_unique) + reconstr = result_unique[result_inverse] + tm.assert_numpy_array_equal(reconstr, s_duplicated.values) + @pytest.mark.parametrize('htable, tm_dtype', [ (ht.PyObjectHashTable, 'String'), (ht.StringHashTable, 'String'), @@ -1383,7 +1391,7 @@ def test_hashtable_factorize(self, htable, tm_dtype, writable): s_duplicated.values.setflags(write=writable) na_mask = s_duplicated.isna().values - result_inverse, result_unique = htable().factorize(s_duplicated.values) + result_unique, result_inverse = htable().factorize(s_duplicated.values) # drop_duplicates has own cython code (hash_table_func_helper.pxi) # and is tested separately; keeps first occurrence like ht.factorize() diff --git a/pandas/tests/test_base.py b/pandas/tests/test_base.py index 084477d8202b1..91e1af5c8887c 100644 --- a/pandas/tests/test_base.py +++ b/pandas/tests/test_base.py @@ -10,12 +10,12 @@ import pandas as pd import pandas.compat as compat from pandas.core.dtypes.common import ( - is_object_dtype, is_datetimetz, is_datetime64_dtype, + is_object_dtype, is_datetime64_dtype, is_datetime64tz_dtype, needs_i8_conversion) import pandas.util.testing as tm from pandas import (Series, Index, DatetimeIndex, TimedeltaIndex, PeriodIndex, Timedelta, IntervalIndex, Interval, - CategoricalIndex, Timestamp) + CategoricalIndex, Timestamp, DataFrame, Panel) from pandas.compat import StringIO, PYPY, long from pandas.compat.numpy import np_array_datetime64_compat from pandas.core.accessor import PandasDelegate @@ -49,8 +49,8 @@ class CheckImmutable(object): def check_mutable_error(self, *args, **kwargs): # Pass whatever function you normally would to pytest.raises # (after the Exception kind). - pytest.raises( - TypeError, self.mutable_regex, *args, **kwargs) + with pytest.raises(TypeError): + self.mutable_regex(*args, **kwargs) def test_no_mutable_funcs(self): def setitem(): @@ -132,21 +132,15 @@ def test_invalid_delegation(self): delegate = self.Delegate(self.Delegator()) - def f(): + with pytest.raises(TypeError): delegate.foo - pytest.raises(TypeError, f) - - def f(): + with pytest.raises(TypeError): delegate.foo = 5 - pytest.raises(TypeError, f) - - def f(): + with pytest.raises(TypeError): delegate.foo() - pytest.raises(TypeError, f) - @pytest.mark.skipif(PYPY, reason="not relevant for PyPy") def test_memory_usage(self): # Delegate does not implement memory_usage. @@ -233,14 +227,15 @@ def check_ops_properties(self, props, filter=None, ignore_failures=False): # an object that is datetimelike will raise a TypeError, # otherwise an AttributeError + err = AttributeError if issubclass(type(o), DatetimeIndexOpsMixin): - pytest.raises(TypeError, lambda: getattr(o, op)) - else: - pytest.raises(AttributeError, - lambda: getattr(o, op)) + err = TypeError - def test_binary_ops_docs(self): - from pandas import DataFrame, Panel + with pytest.raises(err): + getattr(o, op) + + @pytest.mark.parametrize('klass', [Series, DataFrame, Panel]) + def test_binary_ops_docs(self, klass): op_map = {'add': '+', 'sub': '-', 'mul': '*', @@ -248,18 +243,16 @@ def test_binary_ops_docs(self): 'pow': '**', 'truediv': '/', 'floordiv': '//'} - for op_name in ['add', 'sub', 'mul', 'mod', 'pow', 'truediv', - 'floordiv']: - for klass in [Series, DataFrame, Panel]: - operand1 = klass.__name__.lower() - operand2 = 'other' - op = op_map[op_name] - expected_str = ' '.join([operand1, op, operand2]) - assert expected_str in getattr(klass, op_name).__doc__ + for op_name in op_map: + operand1 = klass.__name__.lower() + operand2 = 'other' + op = op_map[op_name] + expected_str = ' '.join([operand1, op, operand2]) + assert expected_str in getattr(klass, op_name).__doc__ - # reverse version of the binary ops - expected_str = ' '.join([operand2, op, operand1]) - assert expected_str in getattr(klass, 'r' + op_name).__doc__ + # reverse version of the binary ops + expected_str = ' '.join([operand2, op, operand1]) + assert expected_str in getattr(klass, 'r' + op_name).__doc__ class TestIndexOps(Ops): @@ -292,12 +285,11 @@ def test_none_comparison(self): assert not result.iat[0] assert not result.iat[1] - # this fails for numpy < 1.9 - # and oddly for *some* platforms - # result = None != o # noqa - # assert result.iat[0] - # assert result.iat[1] - if (is_datetime64_dtype(o) or is_datetimetz(o)): + result = None != o # noqa + assert result.iat[0] + assert result.iat[1] + + if (is_datetime64_dtype(o) or is_datetime64tz_dtype(o)): # Following DatetimeIndex (and Timestamp) convention, # inequality comparisons with Series[datetime64] raise with pytest.raises(TypeError): @@ -345,68 +337,6 @@ def test_ndarray_compat_properties(self): assert Index([1]).item() == 1 assert Series([1]).item() == 1 - def test_ops(self): - for op in ['max', 'min']: - for o in self.objs: - result = getattr(o, op)() - if not isinstance(o, PeriodIndex): - expected = getattr(o.values, op)() - else: - expected = pd.Period( - ordinal=getattr(o._ndarray_values, op)(), - freq=o.freq) - try: - assert result == expected - except TypeError: - # comparing tz-aware series with np.array results in - # TypeError - expected = expected.astype('M8[ns]').astype('int64') - assert result.value == expected - - def test_nanops(self): - # GH 7261 - for op in ['max', 'min']: - for klass in [Index, Series]: - - obj = klass([np.nan, 2.0]) - assert getattr(obj, op)() == 2.0 - - obj = klass([np.nan]) - assert pd.isna(getattr(obj, op)()) - - obj = klass([]) - assert pd.isna(getattr(obj, op)()) - - obj = klass([pd.NaT, datetime(2011, 11, 1)]) - # check DatetimeIndex monotonic path - assert getattr(obj, op)() == datetime(2011, 11, 1) - - obj = klass([pd.NaT, datetime(2011, 11, 1), pd.NaT]) - # check DatetimeIndex non-monotonic path - assert getattr(obj, op)(), datetime(2011, 11, 1) - - # argmin/max - obj = Index(np.arange(5, dtype='int64')) - assert obj.argmin() == 0 - assert obj.argmax() == 4 - - obj = Index([np.nan, 1, np.nan, 2]) - assert obj.argmin() == 1 - assert obj.argmax() == 3 - - obj = Index([np.nan]) - assert obj.argmin() == -1 - assert obj.argmax() == -1 - - obj = Index([pd.NaT, datetime(2011, 11, 1), datetime(2011, 11, 2), - pd.NaT]) - assert obj.argmin() == 1 - assert obj.argmax() == 2 - - obj = Index([pd.NaT]) - assert obj.argmin() == -1 - assert obj.argmax() == -1 - def test_value_counts_unique_nunique(self): for orig in self.objs: o = orig.copy() @@ -447,7 +377,7 @@ def test_value_counts_unique_nunique(self): if isinstance(o, Index): assert isinstance(result, o.__class__) tm.assert_index_equal(result, orig) - elif is_datetimetz(o): + elif is_datetime64tz_dtype(o): # datetimetz Series returns array of Timestamp assert result[0] == orig[0] for r in result: @@ -471,7 +401,7 @@ def test_value_counts_unique_nunique_null(self): continue # special assign to the numpy array - if is_datetimetz(o): + if is_datetime64tz_dtype(o): if isinstance(o, DatetimeIndex): v = o.asi8 v[0:2] = iNaT @@ -500,7 +430,7 @@ def test_value_counts_unique_nunique_null(self): o = klass(values.repeat(range(1, len(o) + 1))) o.name = 'a' else: - if is_datetimetz(o): + if is_datetime64tz_dtype(o): expected_index = orig._values._shallow_copy(values) else: expected_index = Index(values) @@ -539,7 +469,7 @@ def test_value_counts_unique_nunique_null(self): if isinstance(o, Index): tm.assert_index_equal(result, Index(values[1:], name='a')) - elif is_datetimetz(o): + elif is_datetime64tz_dtype(o): # unable to compare NaT / nan vals = values[2:].astype(object).values tm.assert_numpy_array_equal(result[1:], vals) @@ -553,106 +483,105 @@ def test_value_counts_unique_nunique_null(self): assert o.nunique() == 8 assert o.nunique(dropna=False) == 9 - def test_value_counts_inferred(self): - klasses = [Index, Series] - for klass in klasses: - s_values = ['a', 'b', 'b', 'b', 'b', 'c', 'd', 'd', 'a', 'a'] - s = klass(s_values) - expected = Series([4, 3, 2, 1], index=['b', 'a', 'd', 'c']) - tm.assert_series_equal(s.value_counts(), expected) - - if isinstance(s, Index): - exp = Index(np.unique(np.array(s_values, dtype=np.object_))) - tm.assert_index_equal(s.unique(), exp) - else: - exp = np.unique(np.array(s_values, dtype=np.object_)) - tm.assert_numpy_array_equal(s.unique(), exp) - - assert s.nunique() == 4 - # don't sort, have to sort after the fact as not sorting is - # platform-dep - hist = s.value_counts(sort=False).sort_values() - expected = Series([3, 1, 4, 2], index=list('acbd')).sort_values() - tm.assert_series_equal(hist, expected) - - # sort ascending - hist = s.value_counts(ascending=True) - expected = Series([1, 2, 3, 4], index=list('cdab')) - tm.assert_series_equal(hist, expected) - - # relative histogram. - hist = s.value_counts(normalize=True) - expected = Series([.4, .3, .2, .1], index=['b', 'a', 'd', 'c']) - tm.assert_series_equal(hist, expected) - - def test_value_counts_bins(self): - klasses = [Index, Series] - for klass in klasses: - s_values = ['a', 'b', 'b', 'b', 'b', 'c', 'd', 'd', 'a', 'a'] - s = klass(s_values) - - # bins - pytest.raises(TypeError, lambda bins: s.value_counts(bins=bins), 1) - - s1 = Series([1, 1, 2, 3]) - res1 = s1.value_counts(bins=1) - exp1 = Series({Interval(0.997, 3.0): 4}) - tm.assert_series_equal(res1, exp1) - res1n = s1.value_counts(bins=1, normalize=True) - exp1n = Series({Interval(0.997, 3.0): 1.0}) - tm.assert_series_equal(res1n, exp1n) - - if isinstance(s1, Index): - tm.assert_index_equal(s1.unique(), Index([1, 2, 3])) - else: - exp = np.array([1, 2, 3], dtype=np.int64) - tm.assert_numpy_array_equal(s1.unique(), exp) - - assert s1.nunique() == 3 - - # these return the same - res4 = s1.value_counts(bins=4, dropna=True) - intervals = IntervalIndex.from_breaks([0.997, 1.5, 2.0, 2.5, 3.0]) - exp4 = Series([2, 1, 1, 0], index=intervals.take([0, 3, 1, 2])) - tm.assert_series_equal(res4, exp4) - - res4 = s1.value_counts(bins=4, dropna=False) - intervals = IntervalIndex.from_breaks([0.997, 1.5, 2.0, 2.5, 3.0]) - exp4 = Series([2, 1, 1, 0], index=intervals.take([0, 3, 1, 2])) - tm.assert_series_equal(res4, exp4) - - res4n = s1.value_counts(bins=4, normalize=True) - exp4n = Series([0.5, 0.25, 0.25, 0], - index=intervals.take([0, 3, 1, 2])) - tm.assert_series_equal(res4n, exp4n) - - # handle NA's properly - s_values = ['a', 'b', 'b', 'b', np.nan, np.nan, - 'd', 'd', 'a', 'a', 'b'] - s = klass(s_values) - expected = Series([4, 3, 2], index=['b', 'a', 'd']) - tm.assert_series_equal(s.value_counts(), expected) - - if isinstance(s, Index): - exp = Index(['a', 'b', np.nan, 'd']) - tm.assert_index_equal(s.unique(), exp) - else: - exp = np.array(['a', 'b', np.nan, 'd'], dtype=object) - tm.assert_numpy_array_equal(s.unique(), exp) - assert s.nunique() == 3 - - s = klass({}) - expected = Series([], dtype=np.int64) - tm.assert_series_equal(s.value_counts(), expected, - check_index_type=False) - # returned dtype differs depending on original - if isinstance(s, Index): - tm.assert_index_equal(s.unique(), Index([]), exact=False) - else: - tm.assert_numpy_array_equal(s.unique(), np.array([]), - check_dtype=False) + @pytest.mark.parametrize('klass', [Index, Series]) + def test_value_counts_inferred(self, klass): + s_values = ['a', 'b', 'b', 'b', 'b', 'c', 'd', 'd', 'a', 'a'] + s = klass(s_values) + expected = Series([4, 3, 2, 1], index=['b', 'a', 'd', 'c']) + tm.assert_series_equal(s.value_counts(), expected) + + if isinstance(s, Index): + exp = Index(np.unique(np.array(s_values, dtype=np.object_))) + tm.assert_index_equal(s.unique(), exp) + else: + exp = np.unique(np.array(s_values, dtype=np.object_)) + tm.assert_numpy_array_equal(s.unique(), exp) + + assert s.nunique() == 4 + # don't sort, have to sort after the fact as not sorting is + # platform-dep + hist = s.value_counts(sort=False).sort_values() + expected = Series([3, 1, 4, 2], index=list('acbd')).sort_values() + tm.assert_series_equal(hist, expected) + + # sort ascending + hist = s.value_counts(ascending=True) + expected = Series([1, 2, 3, 4], index=list('cdab')) + tm.assert_series_equal(hist, expected) + + # relative histogram. + hist = s.value_counts(normalize=True) + expected = Series([.4, .3, .2, .1], index=['b', 'a', 'd', 'c']) + tm.assert_series_equal(hist, expected) + + @pytest.mark.parametrize('klass', [Index, Series]) + def test_value_counts_bins(self, klass): + s_values = ['a', 'b', 'b', 'b', 'b', 'c', 'd', 'd', 'a', 'a'] + s = klass(s_values) + + # bins + with pytest.raises(TypeError): + s.value_counts(bins=1) + + s1 = Series([1, 1, 2, 3]) + res1 = s1.value_counts(bins=1) + exp1 = Series({Interval(0.997, 3.0): 4}) + tm.assert_series_equal(res1, exp1) + res1n = s1.value_counts(bins=1, normalize=True) + exp1n = Series({Interval(0.997, 3.0): 1.0}) + tm.assert_series_equal(res1n, exp1n) + + if isinstance(s1, Index): + tm.assert_index_equal(s1.unique(), Index([1, 2, 3])) + else: + exp = np.array([1, 2, 3], dtype=np.int64) + tm.assert_numpy_array_equal(s1.unique(), exp) + + assert s1.nunique() == 3 + + # these return the same + res4 = s1.value_counts(bins=4, dropna=True) + intervals = IntervalIndex.from_breaks([0.997, 1.5, 2.0, 2.5, 3.0]) + exp4 = Series([2, 1, 1, 0], index=intervals.take([0, 3, 1, 2])) + tm.assert_series_equal(res4, exp4) + + res4 = s1.value_counts(bins=4, dropna=False) + intervals = IntervalIndex.from_breaks([0.997, 1.5, 2.0, 2.5, 3.0]) + exp4 = Series([2, 1, 1, 0], index=intervals.take([0, 3, 1, 2])) + tm.assert_series_equal(res4, exp4) + + res4n = s1.value_counts(bins=4, normalize=True) + exp4n = Series([0.5, 0.25, 0.25, 0], + index=intervals.take([0, 3, 1, 2])) + tm.assert_series_equal(res4n, exp4n) + + # handle NA's properly + s_values = ['a', 'b', 'b', 'b', np.nan, np.nan, + 'd', 'd', 'a', 'a', 'b'] + s = klass(s_values) + expected = Series([4, 3, 2], index=['b', 'a', 'd']) + tm.assert_series_equal(s.value_counts(), expected) - assert s.nunique() == 0 + if isinstance(s, Index): + exp = Index(['a', 'b', np.nan, 'd']) + tm.assert_index_equal(s.unique(), exp) + else: + exp = np.array(['a', 'b', np.nan, 'd'], dtype=object) + tm.assert_numpy_array_equal(s.unique(), exp) + assert s.nunique() == 3 + + s = klass({}) + expected = Series([], dtype=np.int64) + tm.assert_series_equal(s.value_counts(), expected, + check_index_type=False) + # returned dtype differs depending on original + if isinstance(s, Index): + tm.assert_index_equal(s.unique(), Index([]), exact=False) + else: + tm.assert_numpy_array_equal(s.unique(), np.array([]), + check_dtype=False) + + assert s.nunique() == 0 @pytest.mark.parametrize('klass', [Index, Series]) def test_value_counts_datetime64(self, klass): @@ -1008,8 +937,10 @@ def test_getitem(self): assert i[-1] == i[9] - pytest.raises(IndexError, i.__getitem__, 20) - pytest.raises(IndexError, s.iloc.__getitem__, 20) + with pytest.raises(IndexError): + i[20] + with pytest.raises(IndexError): + s.iloc[20] @pytest.mark.parametrize('indexer_klass', [list, pd.Index]) @pytest.mark.parametrize('indexer', [[True] * 10, [False] * 10, @@ -1029,10 +960,7 @@ class TestTranspose(Ops): def test_transpose(self): for obj in self.objs: - if isinstance(obj, Index): - tm.assert_index_equal(obj.transpose(), obj) - else: - tm.assert_series_equal(obj.transpose(), obj) + tm.assert_equal(obj.transpose(), obj) def test_transpose_non_default_axes(self): for obj in self.objs: @@ -1043,10 +971,7 @@ def test_transpose_non_default_axes(self): def test_numpy_transpose(self): for obj in self.objs: - if isinstance(obj, Index): - tm.assert_index_equal(np.transpose(obj), obj) - else: - tm.assert_series_equal(np.transpose(obj), obj) + tm.assert_equal(np.transpose(obj), obj) with pytest.raises(ValueError, match=self.errmsg): np.transpose(obj, axes=1) @@ -1068,10 +993,9 @@ class T(NoNewAttributesMixin): assert "__frozen" in dir(t) assert getattr(t, "__frozen") - def f(): + with pytest.raises(AttributeError): t.b = "test" - pytest.raises(AttributeError, f) assert not hasattr(t, "b") @@ -1100,9 +1024,10 @@ class TestToIterable(object): 'method', [ lambda x: x.tolist(), + lambda x: x.to_list(), lambda x: list(x), lambda x: list(x.__iter__()), - ], ids=['tolist', 'list', 'iter']) + ], ids=['tolist', 'to_list', 'list', 'iter']) @pytest.mark.parametrize('typ', [Series, Index]) def test_iterable(self, typ, method, dtype, rdtype): # gh-10904 @@ -1123,9 +1048,10 @@ def test_iterable(self, typ, method, dtype, rdtype): 'method', [ lambda x: x.tolist(), + lambda x: x.to_list(), lambda x: list(x), lambda x: list(x.__iter__()), - ], ids=['tolist', 'list', 'iter']) + ], ids=['tolist', 'to_list', 'list', 'iter']) @pytest.mark.parametrize('typ', [Series, Index]) def test_iterable_object_and_category(self, typ, method, dtype, rdtype, obj): @@ -1168,9 +1094,10 @@ def test_iterable_map(self, typ, dtype, rdtype): 'method', [ lambda x: x.tolist(), + lambda x: x.to_list(), lambda x: list(x), lambda x: list(x.__iter__()), - ], ids=['tolist', 'list', 'iter']) + ], ids=['tolist', 'to_list', 'list', 'iter']) def test_categorial_datetimelike(self, method): i = CategoricalIndex([Timestamp('1999-12-31'), Timestamp('2000-12-31')]) @@ -1235,21 +1162,7 @@ def test_values_consistent(array, expected_type, dtype): assert type(l_values) is expected_type assert type(l_values) is type(r_values) - if isinstance(l_values, np.ndarray): - tm.assert_numpy_array_equal(l_values, r_values) - elif isinstance(l_values, pd.Index): - tm.assert_index_equal(l_values, r_values) - elif pd.api.types.is_categorical(l_values): - tm.assert_categorical_equal(l_values, r_values) - elif pd.api.types.is_period_dtype(l_values): - tm.assert_period_array_equal(l_values, r_values) - elif pd.api.types.is_interval_dtype(l_values): - tm.assert_interval_array_equal(l_values, r_values) - else: - raise TypeError("Unexpected type {}".format(type(l_values))) - - assert l_values.dtype == dtype - assert r_values.dtype == dtype + tm.assert_equal(l_values, r_values) @pytest.mark.parametrize('array, expected', [ @@ -1269,3 +1182,94 @@ def test_ndarray_values(array, expected): r_values = pd.Index(array)._ndarray_values tm.assert_numpy_array_equal(l_values, r_values) tm.assert_numpy_array_equal(l_values, expected) + + +@pytest.mark.parametrize("array, attr", [ + (np.array([1, 2], dtype=np.int64), None), + (pd.Categorical(['a', 'b']), '_codes'), + (pd.core.arrays.period_array(['2000', '2001'], freq='D'), '_data'), + (pd.core.arrays.integer_array([0, np.nan]), '_data'), + (pd.core.arrays.IntervalArray.from_breaks([0, 1]), '_left'), + (pd.SparseArray([0, 1]), '_sparse_values'), + # TODO: DatetimeArray(add) +]) +@pytest.mark.parametrize('box', [pd.Series, pd.Index]) +def test_array(array, attr, box): + if array.dtype.name in ('Int64', 'Sparse[int64, 0]') and box is pd.Index: + pytest.skip("No index type for {}".format(array.dtype)) + result = box(array, copy=False).array + + if attr: + array = getattr(array, attr) + result = getattr(result, attr) + + assert result is array + + +def test_array_multiindex_raises(): + idx = pd.MultiIndex.from_product([['A'], ['a', 'b']]) + with pytest.raises(ValueError, match='MultiIndex'): + idx.array + + +@pytest.mark.parametrize('array, expected', [ + (np.array([1, 2], dtype=np.int64), np.array([1, 2], dtype=np.int64)), + (pd.Categorical(['a', 'b']), np.array(['a', 'b'], dtype=object)), + (pd.core.arrays.period_array(['2000', '2001'], freq='D'), + np.array([pd.Period('2000', freq="D"), pd.Period('2001', freq='D')])), + (pd.core.arrays.integer_array([0, np.nan]), + np.array([0, np.nan], dtype=object)), + (pd.core.arrays.IntervalArray.from_breaks([0, 1, 2]), + np.array([pd.Interval(0, 1), pd.Interval(1, 2)], dtype=object)), + (pd.SparseArray([0, 1]), np.array([0, 1], dtype=np.int64)), + # TODO: DatetimeArray(add) +]) +@pytest.mark.parametrize('box', [pd.Series, pd.Index]) +def test_to_numpy(array, expected, box): + thing = box(array) + + if array.dtype.name in ('Int64', 'Sparse[int64, 0]') and box is pd.Index: + pytest.skip("No index type for {}".format(array.dtype)) + + result = thing.to_numpy() + tm.assert_numpy_array_equal(result, expected) + + +@pytest.mark.parametrize("as_series", [True, False]) +@pytest.mark.parametrize("arr", [ + np.array([1, 2, 3], dtype="int64"), + np.array(['a', 'b', 'c'], dtype=object), +]) +def test_to_numpy_copy(arr, as_series): + obj = pd.Index(arr, copy=False) + if as_series: + obj = pd.Series(obj.values, copy=False) + + # no copy by default + result = obj.to_numpy() + assert np.shares_memory(arr, result) is True + + result = obj.to_numpy(copy=False) + assert np.shares_memory(arr, result) is True + + # copy=True + result = obj.to_numpy(copy=True) + assert np.shares_memory(arr, result) is False + + +@pytest.mark.parametrize("as_series", [True, False]) +def test_to_numpy_dtype(as_series): + tz = "US/Eastern" + obj = pd.DatetimeIndex(['2000', '2001'], tz=tz) + if as_series: + obj = pd.Series(obj) + result = obj.to_numpy(dtype=object) + expected = np.array([pd.Timestamp('2000', tz=tz), + pd.Timestamp('2001', tz=tz)], + dtype=object) + tm.assert_numpy_array_equal(result, expected) + + result = obj.to_numpy() + expected = np.array(['2000-01-01T05', '2001-01-01T05'], + dtype='M8[ns]') + tm.assert_numpy_array_equal(result, expected) diff --git a/pandas/tests/test_downstream.py b/pandas/tests/test_downstream.py index abcfa4b320b22..1d17b514a5b67 100644 --- a/pandas/tests/test_downstream.py +++ b/pandas/tests/test_downstream.py @@ -101,7 +101,7 @@ def test_pandas_gbq(df): pandas_gbq = import_module('pandas_gbq') # noqa -@pytest.mark.xfail(reason="0.7.0 pending", strict=True) +@pytest.mark.xfail(reason="0.7.0 pending") @tm.network def test_pandas_datareader(): diff --git a/pandas/tests/test_lib.py b/pandas/tests/test_lib.py index 3e34b48fb6795..d0812eae80f2d 100644 --- a/pandas/tests/test_lib.py +++ b/pandas/tests/test_lib.py @@ -24,8 +24,8 @@ def test_max_len_string_array(self): assert libwriters.max_len_string_array(arr) == 3 # raises - pytest.raises(TypeError, - lambda: libwriters.max_len_string_array(arr.astype('U'))) + with pytest.raises(TypeError): + libwriters.max_len_string_array(arr.astype('U')) def test_fast_unique_multiple_list_gen_sort(self): keys = [['p', 'a'], ['n', 'd'], ['a', 's']] diff --git a/pandas/tests/test_multilevel.py b/pandas/tests/test_multilevel.py index 70d2c9080ab94..6c1a2490ea76e 100644 --- a/pandas/tests/test_multilevel.py +++ b/pandas/tests/test_multilevel.py @@ -10,16 +10,13 @@ import numpy as np from pandas.core.index import Index, MultiIndex -from pandas import (Panel, DataFrame, Series, notna, isna, Timestamp, concat, - read_csv) +from pandas import (Panel, DataFrame, Series, isna, Timestamp) from pandas.core.dtypes.common import is_float_dtype, is_integer_dtype -import pandas.core.common as com import pandas.util.testing as tm from pandas.compat import (range, lrange, StringIO, lzip, u, product as cart_product, zip) import pandas as pd -import pandas._libs.index as _index AGG_FUNCTIONS = ['sum', 'prod', 'min', 'max', 'median', 'mean', 'skew', 'mad', 'std', 'var', 'sem'] @@ -31,14 +28,14 @@ def setup_method(self, method): index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ['one', 'two', 'three']], - labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], - [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], names=['first', 'second']) self.frame = DataFrame(np.random.randn(10, 3), index=index, columns=Index(['A', 'B', 'C'], name='exp')) self.single_level = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux']], - labels=[[0, 1, 2, 3]], names=['first']) + codes=[[0, 1, 2, 3]], names=['first']) # create test series object arrays = [['bar', 'bar', 'baz', 'baz', 'qux', 'qux', 'foo', 'foo'], @@ -239,493 +236,6 @@ def test_repr_name_coincide(self): lines = repr(df).split('\n') assert lines[2].startswith('a 0 foo') - def test_getitem_simple(self): - df = self.frame.T - - col = df['foo', 'one'] - tm.assert_almost_equal(col.values, df.values[:, 0]) - with pytest.raises(KeyError): - df[('foo', 'four')] - with pytest.raises(KeyError): - df['foobar'] - - def test_series_getitem(self): - s = self.ymd['A'] - - result = s[2000, 3] - - # TODO(wesm): unused? - # result2 = s.loc[2000, 3] - - expected = s.reindex(s.index[42:65]) - expected.index = expected.index.droplevel(0).droplevel(0) - tm.assert_series_equal(result, expected) - - result = s[2000, 3, 10] - expected = s[49] - assert result == expected - - # fancy - expected = s.reindex(s.index[49:51]) - result = s.loc[[(2000, 3, 10), (2000, 3, 13)]] - tm.assert_series_equal(result, expected) - - with catch_warnings(record=True): - simplefilter("ignore", DeprecationWarning) - result = s.ix[[(2000, 3, 10), (2000, 3, 13)]] - tm.assert_series_equal(result, expected) - - # key error - pytest.raises(KeyError, s.__getitem__, (2000, 3, 4)) - - def test_series_getitem_corner(self): - s = self.ymd['A'] - - # don't segfault, GH #495 - # out of bounds access - pytest.raises(IndexError, s.__getitem__, len(self.ymd)) - - # generator - result = s[(x > 0 for x in s)] - expected = s[s > 0] - tm.assert_series_equal(result, expected) - - def test_series_setitem(self): - s = self.ymd['A'] - - s[2000, 3] = np.nan - assert isna(s.values[42:65]).all() - assert notna(s.values[:42]).all() - assert notna(s.values[65:]).all() - - s[2000, 3, 10] = np.nan - assert isna(s[49]) - - def test_series_slice_partial(self): - pass - - def test_frame_getitem_setitem_boolean(self): - df = self.frame.T.copy() - values = df.values - - result = df[df > 0] - expected = df.where(df > 0) - tm.assert_frame_equal(result, expected) - - df[df > 0] = 5 - values[values > 0] = 5 - tm.assert_almost_equal(df.values, values) - - df[df == 5] = 0 - values[values == 5] = 0 - tm.assert_almost_equal(df.values, values) - - # a df that needs alignment first - df[df[:-1] < 0] = 2 - np.putmask(values[:-1], values[:-1] < 0, 2) - tm.assert_almost_equal(df.values, values) - - with pytest.raises(TypeError, match='boolean values only'): - df[df * 0] = 2 - - def test_frame_getitem_setitem_slice(self): - # getitem - result = self.frame.iloc[:4] - expected = self.frame[:4] - tm.assert_frame_equal(result, expected) - - # setitem - cp = self.frame.copy() - cp.iloc[:4] = 0 - - assert (cp.values[:4] == 0).all() - assert (cp.values[4:] != 0).all() - - def test_frame_getitem_setitem_multislice(self): - levels = [['t1', 't2'], ['a', 'b', 'c']] - labels = [[0, 0, 0, 1, 1], [0, 1, 2, 0, 1]] - midx = MultiIndex(labels=labels, levels=levels, names=[None, 'id']) - df = DataFrame({'value': [1, 2, 3, 7, 8]}, index=midx) - - result = df.loc[:, 'value'] - tm.assert_series_equal(df['value'], result) - - with catch_warnings(record=True): - simplefilter("ignore", DeprecationWarning) - result = df.ix[:, 'value'] - tm.assert_series_equal(df['value'], result) - - result = df.loc[df.index[1:3], 'value'] - tm.assert_series_equal(df['value'][1:3], result) - - result = df.loc[:, :] - tm.assert_frame_equal(df, result) - - result = df - df.loc[:, 'value'] = 10 - result['value'] = 10 - tm.assert_frame_equal(df, result) - - df.loc[:, :] = 10 - tm.assert_frame_equal(df, result) - - def test_frame_getitem_multicolumn_empty_level(self): - f = DataFrame({'a': ['1', '2', '3'], 'b': ['2', '3', '4']}) - f.columns = [['level1 item1', 'level1 item2'], ['', 'level2 item2'], - ['level3 item1', 'level3 item2']] - - result = f['level1 item1'] - expected = DataFrame([['1'], ['2'], ['3']], index=f.index, - columns=['level3 item1']) - tm.assert_frame_equal(result, expected) - - def test_frame_setitem_multi_column(self): - df = DataFrame(randn(10, 4), columns=[['a', 'a', 'b', 'b'], - [0, 1, 0, 1]]) - - cp = df.copy() - cp['a'] = cp['b'] - tm.assert_frame_equal(cp['a'], cp['b']) - - # set with ndarray - cp = df.copy() - cp['a'] = cp['b'].values - tm.assert_frame_equal(cp['a'], cp['b']) - - # --------------------------------------- - # #1803 - columns = MultiIndex.from_tuples([('A', '1'), ('A', '2'), ('B', '1')]) - df = DataFrame(index=[1, 3, 5], columns=columns) - - # Works, but adds a column instead of updating the two existing ones - df['A'] = 0.0 # Doesn't work - assert (df['A'].values == 0).all() - - # it broadcasts - df['B', '1'] = [1, 2, 3] - df['A'] = df['B', '1'] - - sliced_a1 = df['A', '1'] - sliced_a2 = df['A', '2'] - sliced_b1 = df['B', '1'] - tm.assert_series_equal(sliced_a1, sliced_b1, check_names=False) - tm.assert_series_equal(sliced_a2, sliced_b1, check_names=False) - assert sliced_a1.name == ('A', '1') - assert sliced_a2.name == ('A', '2') - assert sliced_b1.name == ('B', '1') - - def test_getitem_tuple_plus_slice(self): - # GH #671 - df = DataFrame({'a': lrange(10), - 'b': lrange(10), - 'c': np.random.randn(10), - 'd': np.random.randn(10)}) - - idf = df.set_index(['a', 'b']) - - result = idf.loc[(0, 0), :] - expected = idf.loc[0, 0] - expected2 = idf.xs((0, 0)) - with catch_warnings(record=True): - simplefilter("ignore", DeprecationWarning) - expected3 = idf.ix[0, 0] - - tm.assert_series_equal(result, expected) - tm.assert_series_equal(result, expected2) - tm.assert_series_equal(result, expected3) - - def test_getitem_setitem_tuple_plus_columns(self): - # GH #1013 - - df = self.ymd[:5] - - result = df.loc[(2000, 1, 6), ['A', 'B', 'C']] - expected = df.loc[2000, 1, 6][['A', 'B', 'C']] - tm.assert_series_equal(result, expected) - - def test_xs(self): - xs = self.frame.xs(('bar', 'two')) - xs2 = self.frame.loc[('bar', 'two')] - - tm.assert_series_equal(xs, xs2) - tm.assert_almost_equal(xs.values, self.frame.values[4]) - - # GH 6574 - # missing values in returned index should be preserrved - acc = [ - ('a', 'abcde', 1), - ('b', 'bbcde', 2), - ('y', 'yzcde', 25), - ('z', 'xbcde', 24), - ('z', None, 26), - ('z', 'zbcde', 25), - ('z', 'ybcde', 26), - ] - df = DataFrame(acc, - columns=['a1', 'a2', 'cnt']).set_index(['a1', 'a2']) - expected = DataFrame({'cnt': [24, 26, 25, 26]}, index=Index( - ['xbcde', np.nan, 'zbcde', 'ybcde'], name='a2')) - - result = df.xs('z', level='a1') - tm.assert_frame_equal(result, expected) - - def test_xs_partial(self): - result = self.frame.xs('foo') - result2 = self.frame.loc['foo'] - expected = self.frame.T['foo'].T - tm.assert_frame_equal(result, expected) - tm.assert_frame_equal(result, result2) - - result = self.ymd.xs((2000, 4)) - expected = self.ymd.loc[2000, 4] - tm.assert_frame_equal(result, expected) - - # ex from #1796 - index = MultiIndex(levels=[['foo', 'bar'], ['one', 'two'], [-1, 1]], - labels=[[0, 0, 0, 0, 1, 1, 1, 1], - [0, 0, 1, 1, 0, 0, 1, 1], [0, 1, 0, 1, 0, 1, - 0, 1]]) - df = DataFrame(np.random.randn(8, 4), index=index, - columns=list('abcd')) - - result = df.xs(['foo', 'one']) - expected = df.loc['foo', 'one'] - tm.assert_frame_equal(result, expected) - - def test_xs_with_duplicates(self): - # Issue #13719 - df_dup = concat([self.frame] * 2) - assert df_dup.index.is_unique is False - expected = concat([self.frame.xs('one', level='second')] * 2) - tm.assert_frame_equal(df_dup.xs('one', level='second'), expected) - tm.assert_frame_equal(df_dup.xs(['one'], level=['second']), expected) - - def test_xs_level(self): - result = self.frame.xs('two', level='second') - expected = self.frame[self.frame.index.get_level_values(1) == 'two'] - expected.index = expected.index.droplevel(1) - - tm.assert_frame_equal(result, expected) - - index = MultiIndex.from_tuples([('x', 'y', 'z'), ('a', 'b', 'c'), ( - 'p', 'q', 'r')]) - df = DataFrame(np.random.randn(3, 5), index=index) - result = df.xs('c', level=2) - expected = df[1:2] - expected.index = expected.index.droplevel(2) - tm.assert_frame_equal(result, expected) - - # this is a copy in 0.14 - result = self.frame.xs('two', level='second') - - # setting this will give a SettingWithCopyError - # as we are trying to write a view - def f(x): - x[:] = 10 - - pytest.raises(com.SettingWithCopyError, f, result) - - def test_xs_level_multiple(self): - text = """ A B C D E -one two three four -a b 10.0032 5 -0.5109 -2.3358 -0.4645 0.05076 0.3640 -a q 20 4 0.4473 1.4152 0.2834 1.00661 0.1744 -x q 30 3 -0.6662 -0.5243 -0.3580 0.89145 2.5838""" - - df = read_csv(StringIO(text), sep=r'\s+', engine='python') - - result = df.xs(('a', 4), level=['one', 'four']) - expected = df.xs('a').xs(4, level='four') - tm.assert_frame_equal(result, expected) - - # this is a copy in 0.14 - result = df.xs(('a', 4), level=['one', 'four']) - - # setting this will give a SettingWithCopyError - # as we are trying to write a view - def f(x): - x[:] = 10 - - pytest.raises(com.SettingWithCopyError, f, result) - - # GH2107 - dates = lrange(20111201, 20111205) - ids = 'abcde' - idx = MultiIndex.from_tuples([x for x in cart_product(dates, ids)]) - idx.names = ['date', 'secid'] - df = DataFrame(np.random.randn(len(idx), 3), idx, ['X', 'Y', 'Z']) - - rs = df.xs(20111201, level='date') - xp = df.loc[20111201, :] - tm.assert_frame_equal(rs, xp) - - def test_xs_level0(self): - text = """ A B C D E -one two three four -a b 10.0032 5 -0.5109 -2.3358 -0.4645 0.05076 0.3640 -a q 20 4 0.4473 1.4152 0.2834 1.00661 0.1744 -x q 30 3 -0.6662 -0.5243 -0.3580 0.89145 2.5838""" - - df = read_csv(StringIO(text), sep=r'\s+', engine='python') - - result = df.xs('a', level=0) - expected = df.xs('a') - assert len(result) == 2 - tm.assert_frame_equal(result, expected) - - def test_xs_level_series(self): - s = self.frame['A'] - result = s[:, 'two'] - expected = self.frame.xs('two', level=1)['A'] - tm.assert_series_equal(result, expected) - - s = self.ymd['A'] - result = s[2000, 5] - expected = self.ymd.loc[2000, 5]['A'] - tm.assert_series_equal(result, expected) - - # not implementing this for now - - pytest.raises(TypeError, s.__getitem__, (2000, slice(3, 4))) - - # result = s[2000, 3:4] - # lv =s.index.get_level_values(1) - # expected = s[(lv == 3) | (lv == 4)] - # expected.index = expected.index.droplevel(0) - # tm.assert_series_equal(result, expected) - - # can do this though - - def test_get_loc_single_level(self): - s = Series(np.random.randn(len(self.single_level)), - index=self.single_level) - for k in self.single_level.values: - s[k] - - def test_getitem_toplevel(self): - df = self.frame.T - - result = df['foo'] - expected = df.reindex(columns=df.columns[:3]) - expected.columns = expected.columns.droplevel(0) - tm.assert_frame_equal(result, expected) - - result = df['bar'] - result2 = df.loc[:, 'bar'] - - expected = df.reindex(columns=df.columns[3:5]) - expected.columns = expected.columns.droplevel(0) - tm.assert_frame_equal(result, expected) - tm.assert_frame_equal(result, result2) - - def test_getitem_setitem_slice_integers(self): - index = MultiIndex(levels=[[0, 1, 2], [0, 2]], - labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]]) - - frame = DataFrame(np.random.randn(len(index), 4), index=index, - columns=['a', 'b', 'c', 'd']) - res = frame.loc[1:2] - exp = frame.reindex(frame.index[2:]) - tm.assert_frame_equal(res, exp) - - frame.loc[1:2] = 7 - assert (frame.loc[1:2] == 7).values.all() - - series = Series(np.random.randn(len(index)), index=index) - - res = series.loc[1:2] - exp = series.reindex(series.index[2:]) - tm.assert_series_equal(res, exp) - - series.loc[1:2] = 7 - assert (series.loc[1:2] == 7).values.all() - - def test_getitem_int(self): - levels = [[0, 1], [0, 1, 2]] - labels = [[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]] - index = MultiIndex(levels=levels, labels=labels) - - frame = DataFrame(np.random.randn(6, 2), index=index) - - result = frame.loc[1] - expected = frame[-3:] - expected.index = expected.index.droplevel(0) - tm.assert_frame_equal(result, expected) - - # raises exception - pytest.raises(KeyError, frame.loc.__getitem__, 3) - - # however this will work - result = self.frame.iloc[2] - expected = self.frame.xs(self.frame.index[2]) - tm.assert_series_equal(result, expected) - - def test_getitem_partial(self): - ymd = self.ymd.T - result = ymd[2000, 2] - - expected = ymd.reindex(columns=ymd.columns[ymd.columns.labels[1] == 1]) - expected.columns = expected.columns.droplevel(0).droplevel(0) - tm.assert_frame_equal(result, expected) - - def test_setitem_change_dtype(self): - dft = self.frame.T - s = dft['foo', 'two'] - dft['foo', 'two'] = s > s.median() - tm.assert_series_equal(dft['foo', 'two'], s > s.median()) - # assert isinstance(dft._data.blocks[1].items, MultiIndex) - - reindexed = dft.reindex(columns=[('foo', 'two')]) - tm.assert_series_equal(reindexed['foo', 'two'], s > s.median()) - - def test_frame_setitem_ix(self): - self.frame.loc[('bar', 'two'), 'B'] = 5 - assert self.frame.loc[('bar', 'two'), 'B'] == 5 - - # with integer labels - df = self.frame.copy() - df.columns = lrange(3) - df.loc[('bar', 'two'), 1] = 7 - assert df.loc[('bar', 'two'), 1] == 7 - - with catch_warnings(record=True): - simplefilter("ignore", DeprecationWarning) - df = self.frame.copy() - df.columns = lrange(3) - df.ix[('bar', 'two'), 1] = 7 - assert df.loc[('bar', 'two'), 1] == 7 - - def test_fancy_slice_partial(self): - result = self.frame.loc['bar':'baz'] - expected = self.frame[3:7] - tm.assert_frame_equal(result, expected) - - result = self.ymd.loc[(2000, 2):(2000, 4)] - lev = self.ymd.index.labels[1] - expected = self.ymd[(lev >= 1) & (lev <= 3)] - tm.assert_frame_equal(result, expected) - - def test_getitem_partial_column_select(self): - idx = MultiIndex(labels=[[0, 0, 0], [0, 1, 1], [1, 0, 1]], - levels=[['a', 'b'], ['x', 'y'], ['p', 'q']]) - df = DataFrame(np.random.rand(3, 2), index=idx) - - result = df.loc[('a', 'y'), :] - expected = df.loc[('a', 'y')] - tm.assert_frame_equal(result, expected) - - result = df.loc[('a', 'y'), [1, 0]] - expected = df.loc[('a', 'y')][[1, 0]] - tm.assert_frame_equal(result, expected) - - with catch_warnings(record=True): - simplefilter("ignore", DeprecationWarning) - result = df.ix[('a', 'y'), [1, 0]] - tm.assert_frame_equal(result, expected) - - pytest.raises(KeyError, df.loc.__getitem__, - (('a', 'foo'), slice(None, None))) - def test_delevel_infer_dtype(self): tuples = [tuple for tuple in cart_product( @@ -782,7 +292,7 @@ def _check_counts(frame, axis=0): def test_count_level_series(self): index = MultiIndex(levels=[['foo', 'bar', 'baz'], ['one', 'two', 'three', 'four']], - labels=[[0, 0, 0, 2, 2], [2, 0, 1, 1, 2]]) + codes=[[0, 0, 0, 2, 2], [2, 0, 1, 1, 2]]) s = Series(np.random.randn(len(index)), index=index) @@ -900,7 +410,7 @@ def check(left, right): columns=['1st', '2nd', '3rd']) mi = MultiIndex(levels=[['a', 'b'], ['1st', '2nd', '3rd']], - labels=[np.tile( + codes=[np.tile( np.arange(2).repeat(3), 2), np.tile( np.arange(3), 4)]) @@ -908,7 +418,7 @@ def check(left, right): check(left, right) df.columns = ['1st', '2nd', '1st'] - mi = MultiIndex(levels=[['a', 'b'], ['1st', '2nd']], labels=[np.tile( + mi = MultiIndex(levels=[['a', 'b'], ['1st', '2nd']], codes=[np.tile( np.arange(2).repeat(3), 2), np.tile( [0, 1, 0], 4)]) @@ -918,7 +428,7 @@ def check(left, right): tpls = ('a', 2), ('b', 1), ('a', 1), ('b', 2) df.index = MultiIndex.from_tuples(tpls) mi = MultiIndex(levels=[['a', 'b'], [1, 2], ['1st', '2nd']], - labels=[np.tile( + codes=[np.tile( np.arange(2).repeat(3), 2), np.repeat( [1, 0, 1], [3, 6, 3]), np.tile( [0, 1, 0], 4)]) @@ -1198,9 +708,9 @@ def test_unstack_sparse_keyspace(self): def test_unstack_unobserved_keys(self): # related to #2278 refactoring levels = [[0, 1], [0, 1, 2, 3]] - labels = [[0, 0, 1, 1], [0, 2, 0, 2]] + codes = [[0, 0, 1, 1], [0, 2, 0, 2]] - index = MultiIndex(levels, labels) + index = MultiIndex(levels, codes) df = DataFrame(np.random.randn(4, 2), index=index) @@ -1226,8 +736,8 @@ def manual_compare_stacked(df, df_stacked, lev0, lev1): for levels in levels_poss: columns = MultiIndex(levels=levels, - labels=[[0, 0, 1, 1], - [0, 1, 0, 1]]) + codes=[[0, 0, 1, 1], + [0, 1, 0, 1]]) df = DataFrame(columns=columns, data=[range(4)]) for stack_lev in range(2): df_stacked = df.stack(stack_lev) @@ -1236,14 +746,14 @@ def manual_compare_stacked(df, df_stacked, lev0, lev1): # check multi-row case mi = MultiIndex(levels=[["A", "C", "B"], ["B", "A", "C"]], - labels=[np.repeat(range(3), 3), np.tile(range(3), 3)]) + codes=[np.repeat(range(3), 3), np.tile(range(3), 3)]) df = DataFrame(columns=mi, index=range(5), data=np.arange(5 * len(mi)).reshape(5, -1)) manual_compare_stacked(df, df.stack(0), 0, 1) def test_groupby_corner(self): midx = MultiIndex(levels=[['foo'], ['bar'], ['baz']], - labels=[[0], [0], [0]], + codes=[[0], [0], [0]], names=['one', 'two', 'three']) df = DataFrame([np.random.rand(4)], columns=['a', 'b', 'c', 'd'], index=midx) @@ -1355,31 +865,6 @@ def test_alignment(self): exp = x.reindex(exp_index) - y.reindex(exp_index) tm.assert_series_equal(res, exp) - def test_frame_getitem_view(self): - df = self.frame.T.copy() - - # this works because we are modifying the underlying array - # really a no-no - df['foo'].values[:] = 0 - assert (df['foo'].values == 0).all() - - # but not if it's mixed-type - df['foo', 'four'] = 'foo' - df = df.sort_index(level=0, axis=1) - - # this will work, but will raise/warn as its chained assignment - def f(): - df['foo']['one'] = 2 - return df - - pytest.raises(com.SettingWithCopyError, f) - - try: - df = f() - except ValueError: - pass - assert (df['foo', 'one'] == 0).all() - def test_count(self): frame = self.frame.copy() frame.index.names = ['a', 'b'] @@ -1544,26 +1029,6 @@ def test_ix_preserve_names(self): assert result.index.name == self.ymd.index.names[2] assert result2.index.name == self.ymd.index.names[2] - def test_partial_set(self): - # GH #397 - df = self.ymd.copy() - exp = self.ymd.copy() - df.loc[2000, 4] = 0 - exp.loc[2000, 4].values[:] = 0 - tm.assert_frame_equal(df, exp) - - df['A'].loc[2000, 4] = 1 - exp['A'].loc[2000, 4].values[:] = 1 - tm.assert_frame_equal(df, exp) - - df.loc[2000] = 5 - exp.loc[2000].values[:] = 5 - tm.assert_frame_equal(df, exp) - - # this works...for now - df['A'].iloc[14] = 5 - assert df['A'][14] == 5 - def test_unstack_preserve_types(self): # GH #403 self.ymd['E'] = 'foo' @@ -1575,11 +1040,11 @@ def test_unstack_preserve_types(self): assert unstacked['F', 1].dtype == np.float64 def test_unstack_group_index_overflow(self): - labels = np.tile(np.arange(500), 2) + codes = np.tile(np.arange(500), 2) level = np.arange(500) index = MultiIndex(levels=[level] * 8 + [[0, 1]], - labels=[labels] * 8 + [np.arange(2).repeat(500)]) + codes=[codes] * 8 + [np.arange(2).repeat(500)]) s = Series(np.arange(1000), index=index) result = s.unstack() @@ -1591,7 +1056,7 @@ def test_unstack_group_index_overflow(self): # put it at beginning index = MultiIndex(levels=[[0, 1]] + [level] * 8, - labels=[np.arange(2).repeat(500)] + [labels] * 8) + codes=[np.arange(2).repeat(500)] + [codes] * 8) s = Series(np.arange(1000), index=index) result = s.unstack(0) @@ -1599,8 +1064,8 @@ def test_unstack_group_index_overflow(self): # put it in middle index = MultiIndex(levels=[level] * 4 + [[0, 1]] + [level] * 4, - labels=([labels] * 4 + [np.arange(2).repeat(500)] + - [labels] * 4)) + codes=([codes] * 4 + [np.arange(2).repeat(500)] + + [codes] * 4)) s = Series(np.arange(1000), index=index) result = s.unstack(4) @@ -1638,35 +1103,6 @@ def test_pyint_engine(self): result = index.get_indexer([missing] + [keys[i] for i in idces]) tm.assert_numpy_array_equal(result, expected) - def test_getitem_lowerdim_corner(self): - pytest.raises(KeyError, self.frame.loc.__getitem__, - (('bar', 'three'), 'B')) - - # in theory should be inserting in a sorted space???? - self.frame.loc[('bar', 'three'), 'B'] = 0 - assert self.frame.sort_index().loc[('bar', 'three'), 'B'] == 0 - - # --------------------------------------------------------------------- - # AMBIGUOUS CASES! - - def test_partial_ix_missing(self): - pytest.skip("skipping for now") - - result = self.ymd.loc[2000, 0] - expected = self.ymd.loc[2000]['A'] - tm.assert_series_equal(result, expected) - - # need to put in some work here - - # self.ymd.loc[2000, 0] = 0 - # assert (self.ymd.loc[2000]['A'] == 0).all() - - # Pretty sure the second (and maybe even the first) is already wrong. - pytest.raises(Exception, self.ymd.loc.__getitem__, (2000, 6)) - pytest.raises(Exception, self.ymd.loc.__getitem__, (2000, 6), 0) - - # --------------------------------------------------------------------- - def test_to_html(self): self.ymd.columns.name = 'foo' self.ymd.to_html() @@ -1675,7 +1111,7 @@ def test_to_html(self): def test_level_with_tuples(self): index = MultiIndex(levels=[[('foo', 'bar', 0), ('foo', 'baz', 0), ( 'foo', 'qux', 0)], [0, 1]], - labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]]) + codes=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]]) series = Series(np.random.randn(6), index=index) frame = DataFrame(np.random.randn(6, 4), index=index) @@ -1698,7 +1134,7 @@ def test_level_with_tuples(self): index = MultiIndex(levels=[[('foo', 'bar'), ('foo', 'baz'), ( 'foo', 'qux')], [0, 1]], - labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]]) + codes=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]]) series = Series(np.random.randn(6), index=index) frame = DataFrame(np.random.randn(6, 4), index=index) @@ -1717,62 +1153,6 @@ def test_level_with_tuples(self): tm.assert_frame_equal(result, expected) tm.assert_frame_equal(result2, expected) - def test_int_series_slicing(self): - s = self.ymd['A'] - result = s[5:] - expected = s.reindex(s.index[5:]) - tm.assert_series_equal(result, expected) - - exp = self.ymd['A'].copy() - s[5:] = 0 - exp.values[5:] = 0 - tm.assert_numpy_array_equal(s.values, exp.values) - - result = self.ymd[5:] - expected = self.ymd.reindex(s.index[5:]) - tm.assert_frame_equal(result, expected) - - @pytest.mark.parametrize('unicode_strings', [True, False]) - def test_mixed_depth_get(self, unicode_strings): - # If unicode_strings is True, the column labels in dataframe - # construction will use unicode strings in Python 2 (pull request - # #17099). - - arrays = [['a', 'top', 'top', 'routine1', 'routine1', 'routine2'], - ['', 'OD', 'OD', 'result1', 'result2', 'result1'], - ['', 'wx', 'wy', '', '', '']] - - if unicode_strings: - arrays = [[u(s) for s in arr] for arr in arrays] - - tuples = sorted(zip(*arrays)) - index = MultiIndex.from_tuples(tuples) - df = DataFrame(np.random.randn(4, 6), columns=index) - - result = df['a'] - expected = df['a', '', ''].rename('a') - tm.assert_series_equal(result, expected) - - result = df['routine1', 'result1'] - expected = df['routine1', 'result1', ''] - expected = expected.rename(('routine1', 'result1')) - tm.assert_series_equal(result, expected) - - def test_mixed_depth_insert(self): - arrays = [['a', 'top', 'top', 'routine1', 'routine1', 'routine2'], - ['', 'OD', 'OD', 'result1', 'result2', 'result1'], - ['', 'wx', 'wy', '', '', '']] - - tuples = sorted(zip(*arrays)) - index = MultiIndex.from_tuples(tuples) - df = DataFrame(randn(4, 6), columns=index) - - result = df.copy() - expected = df.copy() - result['b'] = [1, 2, 3, 4] - expected['b', '', ''] = [1, 2, 3, 4] - tm.assert_frame_equal(result, expected) - def test_mixed_depth_drop(self): arrays = [['a', 'top', 'top', 'routine1', 'routine1', 'routine2'], ['', 'OD', 'OD', 'result1', 'result2', 'result1'], @@ -1864,35 +1244,6 @@ def test_reindex_level_partial_selection(self): result = self.frame.T.loc[:, ['foo', 'qux']] tm.assert_frame_equal(result, expected.T) - def test_setitem_multiple_partial(self): - expected = self.frame.copy() - result = self.frame.copy() - result.loc[['foo', 'bar']] = 0 - expected.loc['foo'] = 0 - expected.loc['bar'] = 0 - tm.assert_frame_equal(result, expected) - - expected = self.frame.copy() - result = self.frame.copy() - result.loc['foo':'bar'] = 0 - expected.loc['foo'] = 0 - expected.loc['bar'] = 0 - tm.assert_frame_equal(result, expected) - - expected = self.frame['A'].copy() - result = self.frame['A'].copy() - result.loc[['foo', 'bar']] = 0 - expected.loc['foo'] = 0 - expected.loc['bar'] = 0 - tm.assert_series_equal(result, expected) - - expected = self.frame['A'].copy() - result = self.frame['A'].copy() - result.loc['foo':'bar'] = 0 - expected.loc['foo'] = 0 - expected.loc['bar'] = 0 - tm.assert_series_equal(result, expected) - def test_drop_level(self): result = self.frame.drop(['bar', 'qux'], level='first') expected = self.frame.iloc[[0, 1, 2, 5, 6]] @@ -1955,8 +1306,8 @@ def test_drop_preserve_names(self): def test_unicode_repr_issues(self): levels = [Index([u('a/\u03c3'), u('b/\u03c3'), u('c/\u03c3')]), Index([0, 1])] - labels = [np.arange(3).repeat(2), np.tile(np.arange(2), 3)] - index = MultiIndex(levels=levels, labels=labels) + codes = [np.arange(3).repeat(2), np.tile(np.arange(2), 3)] + index = MultiIndex(levels=levels, codes=codes) repr(index.levels) @@ -1972,15 +1323,6 @@ def test_unicode_repr_level_names(self): repr(s) repr(df) - def test_dataframe_insert_column_all_na(self): - # GH #1534 - mix = MultiIndex.from_tuples([('1a', '2a'), ('1a', '2b'), ('1a', '2c') - ]) - df = DataFrame([[1, 2], [3, 4], [5, 6]], index=mix) - s = Series({(1, 1): 1, (1, 2): 2}) - df['new'] = s - assert df['new'].isna().all() - def test_join_segfault(self): # 1532 df1 = DataFrame({'a': [1, 1], 'b': [1, 2], 'x': [1, 2]}) @@ -1991,16 +1333,6 @@ def test_join_segfault(self): for how in ['left', 'right', 'outer']: df1.join(df2, how=how) - def test_set_column_scalar_with_ix(self): - subset = self.frame.index[[1, 4, 5]] - - self.frame.loc[subset] = 99 - assert (self.frame.loc[subset].values == 99).all() - - col = self.frame['B'] - col[subset] = 97 - assert (self.frame.loc[subset, 'B'] == 97).all() - def test_frame_dict_constructor_empty_series(self): s1 = Series([ 1, 2, 3, 4 @@ -2014,47 +1346,6 @@ def test_frame_dict_constructor_empty_series(self): DataFrame({'foo': s1, 'bar': s2, 'baz': s3}) DataFrame.from_dict({'foo': s1, 'baz': s3, 'bar': s2}) - def test_indexing_ambiguity_bug_1678(self): - columns = MultiIndex.from_tuples([('Ohio', 'Green'), ('Ohio', 'Red'), ( - 'Colorado', 'Green')]) - index = MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b', 2) - ]) - - frame = DataFrame(np.arange(12).reshape((4, 3)), index=index, - columns=columns) - - result = frame.iloc[:, 1] - exp = frame.loc[:, ('Ohio', 'Red')] - assert isinstance(result, Series) - tm.assert_series_equal(result, exp) - - def test_nonunique_assignment_1750(self): - df = DataFrame([[1, 1, "x", "X"], [1, 1, "y", "Y"], [1, 2, "z", "Z"]], - columns=list("ABCD")) - - df = df.set_index(['A', 'B']) - ix = MultiIndex.from_tuples([(1, 1)]) - - df.loc[ix, "C"] = '_' - - assert (df.xs((1, 1))['C'] == '_').all() - - def test_indexing_over_hashtable_size_cutoff(self): - n = 10000 - - old_cutoff = _index._SIZE_CUTOFF - _index._SIZE_CUTOFF = 20000 - - s = Series(np.arange(n), - MultiIndex.from_arrays((["a"] * n, np.arange(n)))) - - # hai it works! - assert s[("a", 5)] == 5 - assert s[("a", 6)] == 6 - assert s[("a", 7)] == 7 - - _index._SIZE_CUTOFF = old_cutoff - def test_multiindex_na_repr(self): # only an issue with long columns @@ -2088,8 +1379,8 @@ def test_assign_index_sequences(self): def test_tuples_have_na(self): index = MultiIndex(levels=[[1, 0], [0, 1, 2, 3]], - labels=[[1, 1, 1, 1, -1, 0, 0, 0], [0, 1, 2, 3, 0, - 1, 2, 3]]) + codes=[[1, 1, 1, 1, -1, 0, 0, 0], + [0, 1, 2, 3, 0, 1, 2, 3]]) assert isna(index[4][0]) assert isna(index.values[4][0]) @@ -2424,24 +1715,6 @@ def test_repeat(self): m_df = Series(data, index=m_idx) assert m_df.repeat(3).shape == (3 * len(data), ) - def test_iloc_mi(self): - # GH 13797 - # Test if iloc can handle integer locations in MultiIndexed DataFrame - - data = [['str00', 'str01'], ['str10', 'str11'], ['str20', 'srt21'], - ['str30', 'str31'], ['str40', 'str41']] - - mi = MultiIndex.from_tuples( - [('CC', 'A'), ('CC', 'B'), ('CC', 'B'), ('BB', 'a'), ('BB', 'b')]) - - expected = DataFrame(data) - df_mi = DataFrame(data, index=mi) - - result = DataFrame([[df_mi.iloc[r, c] for c in range(2)] - for r in range(5)]) - - tm.assert_frame_equal(result, expected) - class TestSorted(Base): """ everything you wanted to test about sorting """ @@ -2554,99 +1827,18 @@ def test_is_lexsorted(self): levels = [[0, 1], [0, 1, 2]] index = MultiIndex(levels=levels, - labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]]) + codes=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]]) assert index.is_lexsorted() index = MultiIndex(levels=levels, - labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 2, 1]]) + codes=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 2, 1]]) assert not index.is_lexsorted() index = MultiIndex(levels=levels, - labels=[[0, 0, 1, 0, 1, 1], [0, 1, 0, 2, 2, 1]]) + codes=[[0, 0, 1, 0, 1, 1], [0, 1, 0, 2, 2, 1]]) assert not index.is_lexsorted() assert index.lexsort_depth == 0 - def test_getitem_multilevel_index_tuple_not_sorted(self): - index_columns = list("abc") - df = DataFrame([[0, 1, 0, "x"], [0, 0, 1, "y"]], - columns=index_columns + ["data"]) - df = df.set_index(index_columns) - query_index = df.index[:1] - rs = df.loc[query_index, "data"] - - xp_idx = MultiIndex.from_tuples([(0, 1, 0)], names=['a', 'b', 'c']) - xp = Series(['x'], index=xp_idx, name='data') - tm.assert_series_equal(rs, xp) - - def test_getitem_slice_not_sorted(self): - df = self.frame.sort_index(level=1).T - - # buglet with int typechecking - result = df.iloc[:, :np.int32(3)] - expected = df.reindex(columns=df.columns[:3]) - tm.assert_frame_equal(result, expected) - - def test_frame_getitem_not_sorted2(self): - # 13431 - df = DataFrame({'col1': ['b', 'd', 'b', 'a'], - 'col2': [3, 1, 1, 2], - 'data': ['one', 'two', 'three', 'four']}) - - df2 = df.set_index(['col1', 'col2']) - df2_original = df2.copy() - - df2.index.set_levels(['b', 'd', 'a'], level='col1', inplace=True) - df2.index.set_labels([0, 1, 0, 2], level='col1', inplace=True) - assert not df2.index.is_lexsorted() - assert not df2.index.is_monotonic - - assert df2_original.index.equals(df2.index) - expected = df2.sort_index() - assert expected.index.is_lexsorted() - assert expected.index.is_monotonic - - result = df2.sort_index(level=0) - assert result.index.is_lexsorted() - assert result.index.is_monotonic - tm.assert_frame_equal(result, expected) - - def test_frame_getitem_not_sorted(self): - df = self.frame.T - df['foo', 'four'] = 'foo' - - arrays = [np.array(x) for x in zip(*df.columns.values)] - - result = df['foo'] - result2 = df.loc[:, 'foo'] - expected = df.reindex(columns=df.columns[arrays[0] == 'foo']) - expected.columns = expected.columns.droplevel(0) - tm.assert_frame_equal(result, expected) - tm.assert_frame_equal(result2, expected) - - df = df.T - result = df.xs('foo') - result2 = df.loc['foo'] - expected = df.reindex(df.index[arrays[0] == 'foo']) - expected.index = expected.index.droplevel(0) - tm.assert_frame_equal(result, expected) - tm.assert_frame_equal(result2, expected) - - def test_series_getitem_not_sorted(self): - arrays = [['bar', 'bar', 'baz', 'baz', 'qux', 'qux', 'foo', 'foo'], - ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] - tuples = lzip(*arrays) - index = MultiIndex.from_tuples(tuples) - s = Series(randn(8), index=index) - - arrays = [np.array(x) for x in zip(*index.values)] - - result = s['qux'] - result2 = s.loc['qux'] - expected = s[arrays[0] == 'qux'] - expected.index = expected.index.droplevel(0) - tm.assert_series_equal(result, expected) - tm.assert_series_equal(result2, expected) - def test_sort_index_and_reconstruction(self): # 15622 @@ -2673,7 +1865,7 @@ def test_sort_index_and_reconstruction(self): result = DataFrame( [[1, 1], [2, 2], [1, 1], [2, 2]], index=MultiIndex(levels=[[0.5, 0.8], ['a', 'b']], - labels=[[0, 0, 1, 1], [0, 1, 0, 1]])) + codes=[[0, 0, 1, 1], [0, 1, 0, 1]])) result = result.sort_index() assert result.index.is_lexsorted() @@ -2711,7 +1903,7 @@ def test_sort_index_and_reconstruction_doc_example(self): df = DataFrame({'value': [1, 2, 3, 4]}, index=MultiIndex( levels=[['a', 'b'], ['bb', 'aa']], - labels=[[0, 0, 1, 1], [0, 1, 0, 1]])) + codes=[[0, 0, 1, 1], [0, 1, 0, 1]])) assert df.index.is_lexsorted() assert not df.index.is_monotonic @@ -2719,7 +1911,7 @@ def test_sort_index_and_reconstruction_doc_example(self): expected = DataFrame({'value': [2, 1, 4, 3]}, index=MultiIndex( levels=[['a', 'b'], ['aa', 'bb']], - labels=[[0, 0, 1, 1], [0, 1, 0, 1]])) + codes=[[0, 0, 1, 1], [0, 1, 0, 1]])) result = df.sort_index() assert result.index.is_lexsorted() assert result.index.is_monotonic diff --git a/pandas/tests/test_nanops.py b/pandas/tests/test_nanops.py index 49dbccb82fac8..e214d4c1985a9 100644 --- a/pandas/tests/test_nanops.py +++ b/pandas/tests/test_nanops.py @@ -464,7 +464,6 @@ def test_nankurt(self): allow_str=False, allow_date=False, allow_tdelta=False) - @td.skip_if_no("numpy", min_version="1.10.0") def test_nanprod(self): self.check_funs(nanops.nanprod, np.prod, allow_str=False, allow_date=False, allow_tdelta=False, diff --git a/pandas/tests/test_panel.py b/pandas/tests/test_panel.py index 6d5d07b00398c..33f2c34400373 100644 --- a/pandas/tests/test_panel.py +++ b/pandas/tests/test_panel.py @@ -85,7 +85,6 @@ def test_sum(self): def test_mean(self): self._check_stat_op('mean', np.mean) - @td.skip_if_no("numpy", min_version="1.10.0") def test_prod(self): self._check_stat_op('prod', np.prod, skipna_alternative=np.nanprod) @@ -1761,7 +1760,7 @@ def test_to_frame_multi_major(self): def test_to_frame_multi_major_minor(self): cols = MultiIndex(levels=[['C_A', 'C_B'], ['C_1', 'C_2']], - labels=[[0, 0, 1, 1], [0, 1, 0, 1]]) + codes=[[0, 0, 1, 1], [0, 1, 0, 1]]) idx = MultiIndex.from_tuples([(1, 'one'), (1, 'two'), (2, 'one'), ( 2, 'two'), (3, 'three'), (4, 'four')]) df = DataFrame([[1, 2, 11, 12], [3, 4, 13, 14], @@ -2487,10 +2486,10 @@ def is_sorted(arr): return (arr[1:] > arr[:-1]).any() sorted_minor = self.panel.sort_index(level=1) - assert is_sorted(sorted_minor.index.labels[1]) + assert is_sorted(sorted_minor.index.codes[1]) sorted_major = sorted_minor.sort_index(level=0) - assert is_sorted(sorted_major.index.labels[0]) + assert is_sorted(sorted_major.index.codes[0]) def test_to_string(self): buf = StringIO() @@ -2562,7 +2561,7 @@ def test_axis_dummies(self): def test_get_dummies(self): from pandas.core.reshape.reshape import get_dummies, make_axis_dummies - self.panel['Label'] = self.panel.index.labels[1] + self.panel['Label'] = self.panel.index.codes[1] minor_dummies = make_axis_dummies(self.panel, 'minor').astype(np.uint8) dummies = get_dummies(self.panel['Label']) tm.assert_numpy_array_equal(dummies.values, minor_dummies.values) @@ -2585,14 +2584,14 @@ def test_count(self): index = self.panel.index major_count = self.panel.count(level=0)['ItemA'] - labels = index.labels[0] + level_codes = index.codes[0] for i, idx in enumerate(index.levels[0]): - assert major_count[i] == (labels == i).sum() + assert major_count[i] == (level_codes == i).sum() minor_count = self.panel.count(level=1)['ItemA'] - labels = index.labels[1] + level_codes = index.codes[1] for i, idx in enumerate(index.levels[1]): - assert minor_count[i] == (labels == i).sum() + assert minor_count[i] == (level_codes == i).sum() def test_join(self): lp1 = self.panel.filter(['ItemA', 'ItemB']) diff --git a/pandas/tests/test_resample.py b/pandas/tests/test_resample.py deleted file mode 100644 index d38f2a237c31d..0000000000000 --- a/pandas/tests/test_resample.py +++ /dev/null @@ -1,3529 +0,0 @@ -# pylint: disable=E1101 - -from warnings import catch_warnings, simplefilter -from datetime import datetime, timedelta -from functools import partial -from textwrap import dedent -from operator import methodcaller - -import pytz -import pytest -import dateutil -import numpy as np - -from pandas._libs.tslibs.period import IncompatibleFrequency -from pandas._libs.tslibs.ccalendar import DAYS, MONTHS - -import pandas.util.testing as tm -from pandas.util.testing import (assert_series_equal, assert_almost_equal, - assert_frame_equal, assert_index_equal) - -import pandas as pd - -from pandas import (Series, DataFrame, Panel, Index, isna, - notna, Timestamp, Timedelta) - -from pandas.compat import range, lrange, zip, OrderedDict -from pandas.errors import AbstractMethodError, UnsupportedFunctionCall -import pandas.tseries.offsets as offsets -from pandas.tseries.offsets import Minute, BDay - -from pandas.core.groupby.groupby import DataError - -from pandas.core.indexes.datetimes import date_range -from pandas.core.indexes.period import period_range, PeriodIndex, Period -from pandas.core.resample import DatetimeIndex, TimeGrouper -from pandas.core.indexes.timedeltas import timedelta_range, TimedeltaIndex - -bday = BDay() - -# The various methods we support -downsample_methods = ['min', 'max', 'first', 'last', 'sum', 'mean', 'sem', - 'median', 'prod', 'var', 'ohlc', 'quantile'] -upsample_methods = ['count', 'size'] -series_methods = ['nunique'] -resample_methods = downsample_methods + upsample_methods + series_methods - - -def _simple_ts(start, end, freq='D'): - rng = date_range(start, end, freq=freq) - return Series(np.random.randn(len(rng)), index=rng) - - -def _simple_pts(start, end, freq='D'): - rng = period_range(start, end, freq=freq) - return Series(np.random.randn(len(rng)), index=rng) - - -class TestResampleAPI(object): - - def setup_method(self, method): - dti = DatetimeIndex(start=datetime(2005, 1, 1), - end=datetime(2005, 1, 10), freq='Min') - - self.series = Series(np.random.rand(len(dti)), dti) - self.frame = DataFrame( - {'A': self.series, 'B': self.series, 'C': np.arange(len(dti))}) - - def test_str(self): - - r = self.series.resample('H') - assert ('DatetimeIndexResampler [freq=, axis=0, closed=left, ' - 'label=left, convention=start, base=0]' in str(r)) - - def test_api(self): - - r = self.series.resample('H') - result = r.mean() - assert isinstance(result, Series) - assert len(result) == 217 - - r = self.series.to_frame().resample('H') - result = r.mean() - assert isinstance(result, DataFrame) - assert len(result) == 217 - - def test_groupby_resample_api(self): - - # GH 12448 - # .groupby(...).resample(...) hitting warnings - # when appropriate - df = DataFrame({'date': pd.date_range(start='2016-01-01', - periods=4, - freq='W'), - 'group': [1, 1, 2, 2], - 'val': [5, 6, 7, 8]}).set_index('date') - - # replication step - i = pd.date_range('2016-01-03', periods=8).tolist() + \ - pd.date_range('2016-01-17', periods=8).tolist() - index = pd.MultiIndex.from_arrays([[1] * 8 + [2] * 8, i], - names=['group', 'date']) - expected = DataFrame({'val': [5] * 7 + [6] + [7] * 7 + [8]}, - index=index) - result = df.groupby('group').apply( - lambda x: x.resample('1D').ffill())[['val']] - assert_frame_equal(result, expected) - - def test_groupby_resample_on_api(self): - - # GH 15021 - # .groupby(...).resample(on=...) results in an unexpected - # keyword warning. - df = DataFrame({'key': ['A', 'B'] * 5, - 'dates': pd.date_range('2016-01-01', periods=10), - 'values': np.random.randn(10)}) - - expected = df.set_index('dates').groupby('key').resample('D').mean() - - result = df.groupby('key').resample('D', on='dates').mean() - assert_frame_equal(result, expected) - - def test_pipe(self): - # GH17905 - - # series - r = self.series.resample('H') - expected = r.max() - r.mean() - result = r.pipe(lambda x: x.max() - x.mean()) - tm.assert_series_equal(result, expected) - - # dataframe - r = self.frame.resample('H') - expected = r.max() - r.mean() - result = r.pipe(lambda x: x.max() - x.mean()) - tm.assert_frame_equal(result, expected) - - def test_getitem(self): - - r = self.frame.resample('H') - tm.assert_index_equal(r._selected_obj.columns, self.frame.columns) - - r = self.frame.resample('H')['B'] - assert r._selected_obj.name == self.frame.columns[1] - - # technically this is allowed - r = self.frame.resample('H')['A', 'B'] - tm.assert_index_equal(r._selected_obj.columns, - self.frame.columns[[0, 1]]) - - r = self.frame.resample('H')['A', 'B'] - tm.assert_index_equal(r._selected_obj.columns, - self.frame.columns[[0, 1]]) - - def test_select_bad_cols(self): - - g = self.frame.resample('H') - pytest.raises(KeyError, g.__getitem__, ['D']) - - pytest.raises(KeyError, g.__getitem__, ['A', 'D']) - with pytest.raises(KeyError, match='^[^A]+$'): - # A should not be referenced as a bad column... - # will have to rethink regex if you change message! - g[['A', 'D']] - - def test_attribute_access(self): - - r = self.frame.resample('H') - tm.assert_series_equal(r.A.sum(), r['A'].sum()) - - def test_api_compat_before_use(self): - - # make sure that we are setting the binner - # on these attributes - for attr in ['groups', 'ngroups', 'indices']: - rng = pd.date_range('1/1/2012', periods=100, freq='S') - ts = Series(np.arange(len(rng)), index=rng) - rs = ts.resample('30s') - - # before use - getattr(rs, attr) - - # after grouper is initialized is ok - rs.mean() - getattr(rs, attr) - - def tests_skip_nuisance(self): - - df = self.frame - df['D'] = 'foo' - r = df.resample('H') - result = r[['A', 'B']].sum() - expected = pd.concat([r.A.sum(), r.B.sum()], axis=1) - assert_frame_equal(result, expected) - - expected = r[['A', 'B', 'C']].sum() - result = r.sum() - assert_frame_equal(result, expected) - - def test_downsample_but_actually_upsampling(self): - - # this is reindex / asfreq - rng = pd.date_range('1/1/2012', periods=100, freq='S') - ts = Series(np.arange(len(rng), dtype='int64'), index=rng) - result = ts.resample('20s').asfreq() - expected = Series([0, 20, 40, 60, 80], - index=pd.date_range('2012-01-01 00:00:00', - freq='20s', - periods=5)) - assert_series_equal(result, expected) - - def test_combined_up_downsampling_of_irregular(self): - - # since we are reallydoing an operation like this - # ts2.resample('2s').mean().ffill() - # preserve these semantics - - rng = pd.date_range('1/1/2012', periods=100, freq='S') - ts = Series(np.arange(len(rng)), index=rng) - ts2 = ts.iloc[[0, 1, 2, 3, 5, 7, 11, 15, 16, 25, 30]] - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = ts2.resample('2s', how='mean', fill_method='ffill') - expected = ts2.resample('2s').mean().ffill() - assert_series_equal(result, expected) - - def test_transform(self): - - r = self.series.resample('20min') - expected = self.series.groupby( - pd.Grouper(freq='20min')).transform('mean') - result = r.transform('mean') - assert_series_equal(result, expected) - - def test_fillna(self): - - # need to upsample here - rng = pd.date_range('1/1/2012', periods=10, freq='2S') - ts = Series(np.arange(len(rng), dtype='int64'), index=rng) - r = ts.resample('s') - - expected = r.ffill() - result = r.fillna(method='ffill') - assert_series_equal(result, expected) - - expected = r.bfill() - result = r.fillna(method='bfill') - assert_series_equal(result, expected) - - with pytest.raises(ValueError): - r.fillna(0) - - def test_apply_without_aggregation(self): - - # both resample and groupby should work w/o aggregation - r = self.series.resample('20min') - g = self.series.groupby(pd.Grouper(freq='20min')) - - for t in [g, r]: - result = t.apply(lambda x: x) - assert_series_equal(result, self.series) - - def test_agg_consistency(self): - - # make sure that we are consistent across - # similar aggregations with and w/o selection list - df = DataFrame(np.random.randn(1000, 3), - index=pd.date_range('1/1/2012', freq='S', periods=1000), - columns=['A', 'B', 'C']) - - r = df.resample('3T') - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - expected = r[['A', 'B', 'C']].agg({'r1': 'mean', 'r2': 'sum'}) - result = r.agg({'r1': 'mean', 'r2': 'sum'}) - assert_frame_equal(result, expected) - - # TODO: once GH 14008 is fixed, move these tests into - # `Base` test class - def test_agg(self): - # test with all three Resampler apis and TimeGrouper - - np.random.seed(1234) - index = date_range(datetime(2005, 1, 1), - datetime(2005, 1, 10), freq='D') - index.name = 'date' - df = DataFrame(np.random.rand(10, 2), columns=list('AB'), index=index) - df_col = df.reset_index() - df_mult = df_col.copy() - df_mult.index = pd.MultiIndex.from_arrays([range(10), df.index], - names=['index', 'date']) - r = df.resample('2D') - cases = [ - r, - df_col.resample('2D', on='date'), - df_mult.resample('2D', level='date'), - df.groupby(pd.Grouper(freq='2D')) - ] - - a_mean = r['A'].mean() - a_std = r['A'].std() - a_sum = r['A'].sum() - b_mean = r['B'].mean() - b_std = r['B'].std() - b_sum = r['B'].sum() - - expected = pd.concat([a_mean, a_std, b_mean, b_std], axis=1) - expected.columns = pd.MultiIndex.from_product([['A', 'B'], - ['mean', 'std']]) - for t in cases: - result = t.aggregate([np.mean, np.std]) - assert_frame_equal(result, expected) - - expected = pd.concat([a_mean, b_std], axis=1) - for t in cases: - result = t.aggregate({'A': np.mean, - 'B': np.std}) - assert_frame_equal(result, expected, check_like=True) - - expected = pd.concat([a_mean, a_std], axis=1) - expected.columns = pd.MultiIndex.from_tuples([('A', 'mean'), - ('A', 'std')]) - for t in cases: - result = t.aggregate({'A': ['mean', 'std']}) - assert_frame_equal(result, expected) - - expected = pd.concat([a_mean, a_sum], axis=1) - expected.columns = ['mean', 'sum'] - for t in cases: - result = t['A'].aggregate(['mean', 'sum']) - assert_frame_equal(result, expected) - - expected = pd.concat([a_mean, a_sum], axis=1) - expected.columns = pd.MultiIndex.from_tuples([('A', 'mean'), - ('A', 'sum')]) - for t in cases: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = t.aggregate({'A': {'mean': 'mean', 'sum': 'sum'}}) - assert_frame_equal(result, expected, check_like=True) - - expected = pd.concat([a_mean, a_sum, b_mean, b_sum], axis=1) - expected.columns = pd.MultiIndex.from_tuples([('A', 'mean'), - ('A', 'sum'), - ('B', 'mean2'), - ('B', 'sum2')]) - for t in cases: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = t.aggregate({'A': {'mean': 'mean', 'sum': 'sum'}, - 'B': {'mean2': 'mean', 'sum2': 'sum'}}) - assert_frame_equal(result, expected, check_like=True) - - expected = pd.concat([a_mean, a_std, b_mean, b_std], axis=1) - expected.columns = pd.MultiIndex.from_tuples([('A', 'mean'), - ('A', 'std'), - ('B', 'mean'), - ('B', 'std')]) - for t in cases: - result = t.aggregate({'A': ['mean', 'std'], - 'B': ['mean', 'std']}) - assert_frame_equal(result, expected, check_like=True) - - expected = pd.concat([a_mean, a_sum, b_mean, b_sum], axis=1) - expected.columns = pd.MultiIndex.from_tuples([('r1', 'A', 'mean'), - ('r1', 'A', 'sum'), - ('r2', 'B', 'mean'), - ('r2', 'B', 'sum')]) - - def test_agg_misc(self): - # test with all three Resampler apis and TimeGrouper - - np.random.seed(1234) - index = date_range(datetime(2005, 1, 1), - datetime(2005, 1, 10), freq='D') - index.name = 'date' - df = DataFrame(np.random.rand(10, 2), columns=list('AB'), index=index) - df_col = df.reset_index() - df_mult = df_col.copy() - df_mult.index = pd.MultiIndex.from_arrays([range(10), df.index], - names=['index', 'date']) - - r = df.resample('2D') - cases = [ - r, - df_col.resample('2D', on='date'), - df_mult.resample('2D', level='date'), - df.groupby(pd.Grouper(freq='2D')) - ] - - # passed lambda - for t in cases: - result = t.agg({'A': np.sum, - 'B': lambda x: np.std(x, ddof=1)}) - rcustom = t['B'].apply(lambda x: np.std(x, ddof=1)) - expected = pd.concat([r['A'].sum(), rcustom], axis=1) - assert_frame_equal(result, expected, check_like=True) - - # agg with renamers - expected = pd.concat([t['A'].sum(), - t['B'].sum(), - t['A'].mean(), - t['B'].mean()], - axis=1) - expected.columns = pd.MultiIndex.from_tuples([('result1', 'A'), - ('result1', 'B'), - ('result2', 'A'), - ('result2', 'B')]) - - for t in cases: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = t[['A', 'B']].agg(OrderedDict([('result1', np.sum), - ('result2', np.mean)])) - assert_frame_equal(result, expected, check_like=True) - - # agg with different hows - expected = pd.concat([t['A'].sum(), - t['A'].std(), - t['B'].mean(), - t['B'].std()], - axis=1) - expected.columns = pd.MultiIndex.from_tuples([('A', 'sum'), - ('A', 'std'), - ('B', 'mean'), - ('B', 'std')]) - for t in cases: - result = t.agg(OrderedDict([('A', ['sum', 'std']), - ('B', ['mean', 'std'])])) - assert_frame_equal(result, expected, check_like=True) - - # equivalent of using a selection list / or not - for t in cases: - result = t[['A', 'B']].agg({'A': ['sum', 'std'], - 'B': ['mean', 'std']}) - assert_frame_equal(result, expected, check_like=True) - - # series like aggs - for t in cases: - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = t['A'].agg({'A': ['sum', 'std']}) - expected = pd.concat([t['A'].sum(), - t['A'].std()], - axis=1) - expected.columns = pd.MultiIndex.from_tuples([('A', 'sum'), - ('A', 'std')]) - assert_frame_equal(result, expected, check_like=True) - - expected = pd.concat([t['A'].agg(['sum', 'std']), - t['A'].agg(['mean', 'std'])], - axis=1) - expected.columns = pd.MultiIndex.from_tuples([('A', 'sum'), - ('A', 'std'), - ('B', 'mean'), - ('B', 'std')]) - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = t['A'].agg({'A': ['sum', 'std'], - 'B': ['mean', 'std']}) - assert_frame_equal(result, expected, check_like=True) - - # errors - # invalid names in the agg specification - for t in cases: - def f(): - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - t[['A']].agg({'A': ['sum', 'std'], - 'B': ['mean', 'std']}) - - pytest.raises(KeyError, f) - - def test_agg_nested_dicts(self): - - np.random.seed(1234) - index = date_range(datetime(2005, 1, 1), - datetime(2005, 1, 10), freq='D') - index.name = 'date' - df = DataFrame(np.random.rand(10, 2), columns=list('AB'), index=index) - df_col = df.reset_index() - df_mult = df_col.copy() - df_mult.index = pd.MultiIndex.from_arrays([range(10), df.index], - names=['index', 'date']) - r = df.resample('2D') - cases = [ - r, - df_col.resample('2D', on='date'), - df_mult.resample('2D', level='date'), - df.groupby(pd.Grouper(freq='2D')) - ] - - for t in cases: - def f(): - t.aggregate({'r1': {'A': ['mean', 'sum']}, - 'r2': {'B': ['mean', 'sum']}}) - pytest.raises(ValueError, f) - - for t in cases: - expected = pd.concat([t['A'].mean(), t['A'].std(), t['B'].mean(), - t['B'].std()], axis=1) - expected.columns = pd.MultiIndex.from_tuples([('ra', 'mean'), ( - 'ra', 'std'), ('rb', 'mean'), ('rb', 'std')]) - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = t[['A', 'B']].agg({'A': {'ra': ['mean', 'std']}, - 'B': {'rb': ['mean', 'std']}}) - assert_frame_equal(result, expected, check_like=True) - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result = t.agg({'A': {'ra': ['mean', 'std']}, - 'B': {'rb': ['mean', 'std']}}) - assert_frame_equal(result, expected, check_like=True) - - def test_try_aggregate_non_existing_column(self): - # GH 16766 - data = [ - {'dt': datetime(2017, 6, 1, 0), 'x': 1.0, 'y': 2.0}, - {'dt': datetime(2017, 6, 1, 1), 'x': 2.0, 'y': 2.0}, - {'dt': datetime(2017, 6, 1, 2), 'x': 3.0, 'y': 1.5} - ] - df = DataFrame(data).set_index('dt') - - # Error as we don't have 'z' column - with pytest.raises(KeyError): - df.resample('30T').agg({'x': ['mean'], - 'y': ['median'], - 'z': ['sum']}) - - def test_selection_api_validation(self): - # GH 13500 - index = date_range(datetime(2005, 1, 1), - datetime(2005, 1, 10), freq='D') - - rng = np.arange(len(index), dtype=np.int64) - df = DataFrame({'date': index, 'a': rng}, - index=pd.MultiIndex.from_arrays([rng, index], - names=['v', 'd'])) - df_exp = DataFrame({'a': rng}, index=index) - - # non DatetimeIndex - with pytest.raises(TypeError): - df.resample('2D', level='v') - - with pytest.raises(ValueError): - df.resample('2D', on='date', level='d') - - with pytest.raises(TypeError): - df.resample('2D', on=['a', 'date']) - - with pytest.raises(KeyError): - df.resample('2D', level=['a', 'date']) - - # upsampling not allowed - with pytest.raises(ValueError): - df.resample('2D', level='d').asfreq() - - with pytest.raises(ValueError): - df.resample('2D', on='date').asfreq() - - exp = df_exp.resample('2D').sum() - exp.index.name = 'date' - assert_frame_equal(exp, df.resample('2D', on='date').sum()) - - exp.index.name = 'd' - assert_frame_equal(exp, df.resample('2D', level='d').sum()) - - -class Base(object): - """ - base class for resampling testing, calling - .create_series() generates a series of each index type - """ - - def create_index(self, *args, **kwargs): - """ return the _index_factory created using the args, kwargs """ - factory = self._index_factory() - return factory(*args, **kwargs) - - @pytest.fixture - def _index_start(self): - return datetime(2005, 1, 1) - - @pytest.fixture - def _index_end(self): - return datetime(2005, 1, 10) - - @pytest.fixture - def _index_freq(self): - return 'D' - - @pytest.fixture - def index(self, _index_start, _index_end, _index_freq): - return self.create_index(_index_start, _index_end, freq=_index_freq) - - @pytest.fixture - def _series_name(self): - raise AbstractMethodError(self) - - @pytest.fixture - def _static_values(self, index): - return np.arange(len(index)) - - @pytest.fixture - def series(self, index, _series_name, _static_values): - return Series(_static_values, index=index, name=_series_name) - - @pytest.fixture - def frame(self, index, _static_values): - return DataFrame({'value': _static_values}, index=index) - - @pytest.fixture(params=[Series, DataFrame]) - def series_and_frame(self, request, index, _series_name, _static_values): - if request.param == Series: - return Series(_static_values, index=index, name=_series_name) - if request.param == DataFrame: - return DataFrame({'value': _static_values}, index=index) - - @pytest.mark.parametrize('freq', ['2D', '1H']) - def test_asfreq(self, series_and_frame, freq): - obj = series_and_frame - - result = obj.resample(freq).asfreq() - new_index = self.create_index(obj.index[0], obj.index[-1], freq=freq) - expected = obj.reindex(new_index) - assert_almost_equal(result, expected) - - def test_asfreq_fill_value(self): - # test for fill value during resampling, issue 3715 - - s = self.create_series() - - result = s.resample('1H').asfreq() - new_index = self.create_index(s.index[0], s.index[-1], freq='1H') - expected = s.reindex(new_index) - assert_series_equal(result, expected) - - frame = s.to_frame('value') - frame.iloc[1] = None - result = frame.resample('1H').asfreq(fill_value=4.0) - new_index = self.create_index(frame.index[0], - frame.index[-1], freq='1H') - expected = frame.reindex(new_index, fill_value=4.0) - assert_frame_equal(result, expected) - - def test_resample_interpolate(self): - # # 12925 - df = self.create_series().to_frame('value') - assert_frame_equal( - df.resample('1T').asfreq().interpolate(), - df.resample('1T').interpolate()) - - def test_raises_on_non_datetimelike_index(self): - # this is a non datetimelike index - xp = DataFrame() - pytest.raises(TypeError, lambda: xp.resample('A').mean()) - - def test_resample_empty_series(self): - # GH12771 & GH12868 - - s = self.create_series()[:0] - - for freq in ['M', 'D', 'H']: - # need to test for ohlc from GH13083 - methods = [method for method in resample_methods - if method != 'ohlc'] - for method in methods: - result = getattr(s.resample(freq), method)() - - expected = s.copy() - expected.index = s.index._shallow_copy(freq=freq) - assert_index_equal(result.index, expected.index) - assert result.index.freq == expected.index.freq - assert_series_equal(result, expected, check_dtype=False) - - def test_resample_empty_dataframe(self): - # GH13212 - index = self.create_series().index[:0] - f = DataFrame(index=index) - - for freq in ['M', 'D', 'H']: - # count retains dimensions too - methods = downsample_methods + upsample_methods - for method in methods: - result = getattr(f.resample(freq), method)() - if method != 'size': - expected = f.copy() - else: - # GH14962 - expected = Series([]) - - expected.index = f.index._shallow_copy(freq=freq) - assert_index_equal(result.index, expected.index) - assert result.index.freq == expected.index.freq - assert_almost_equal(result, expected, check_dtype=False) - - # test size for GH13212 (currently stays as df) - - @pytest.mark.parametrize("index", tm.all_timeseries_index_generator(0)) - @pytest.mark.parametrize( - "dtype", - [np.float, np.int, np.object, 'datetime64[ns]']) - def test_resample_empty_dtypes(self, index, dtype): - - # Empty series were sometimes causing a segfault (for the functions - # with Cython bounds-checking disabled) or an IndexError. We just run - # them to ensure they no longer do. (GH #10228) - for how in downsample_methods + upsample_methods: - empty_series = Series([], index, dtype) - try: - getattr(empty_series.resample('d'), how)() - except DataError: - # Ignore these since some combinations are invalid - # (ex: doing mean with dtype of np.object) - pass - - def test_resample_loffset_arg_type(self): - # GH 13218, 15002 - df = self.create_series().to_frame('value') - expected_means = [df.values[i:i + 2].mean() - for i in range(0, len(df.values), 2)] - expected_index = self.create_index(df.index[0], - periods=len(df.index) / 2, - freq='2D') - - # loffset coerces PeriodIndex to DateTimeIndex - if isinstance(expected_index, PeriodIndex): - expected_index = expected_index.to_timestamp() - - expected_index += timedelta(hours=2) - expected = DataFrame({'value': expected_means}, index=expected_index) - - for arg in ['mean', {'value': 'mean'}, ['mean']]: - - result_agg = df.resample('2D', loffset='2H').agg(arg) - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - result_how = df.resample('2D', how=arg, loffset='2H') - - if isinstance(arg, list): - expected.columns = pd.MultiIndex.from_tuples([('value', - 'mean')]) - - # GH 13022, 7687 - TODO: fix resample w/ TimedeltaIndex - if isinstance(expected.index, TimedeltaIndex): - with pytest.raises(AssertionError): - assert_frame_equal(result_agg, expected) - assert_frame_equal(result_how, expected) - else: - assert_frame_equal(result_agg, expected) - assert_frame_equal(result_how, expected) - - def test_apply_to_empty_series(self): - # GH 14313 - series = self.create_series()[:0] - - for freq in ['M', 'D', 'H']: - result = series.resample(freq).apply(lambda x: 1) - expected = series.resample(freq).apply(np.sum) - - assert_series_equal(result, expected, check_dtype=False) - - def test_resampler_is_iterable(self): - # GH 15314 - series = self.create_series() - freq = 'H' - tg = TimeGrouper(freq, convention='start') - grouped = series.groupby(tg) - resampled = series.resample(freq) - for (rk, rv), (gk, gv) in zip(resampled, grouped): - assert rk == gk - assert_series_equal(rv, gv) - - def test_resample_quantile(self): - # GH 15023 - s = self.create_series() - q = 0.75 - freq = 'H' - result = s.resample(freq).quantile(q) - expected = s.resample(freq).agg(lambda x: x.quantile(q)) - tm.assert_series_equal(result, expected) - - -class TestDatetimeIndex(Base): - _index_factory = lambda x: date_range - - @pytest.fixture - def _series_name(self): - return 'dti' - - def setup_method(self, method): - dti = DatetimeIndex(start=datetime(2005, 1, 1), - end=datetime(2005, 1, 10), freq='Min') - - self.series = Series(np.random.rand(len(dti)), dti) - - def create_series(self): - i = date_range(datetime(2005, 1, 1), - datetime(2005, 1, 10), freq='D') - - return Series(np.arange(len(i)), index=i, name='dti') - - def test_custom_grouper(self): - - dti = DatetimeIndex(freq='Min', start=datetime(2005, 1, 1), - end=datetime(2005, 1, 10)) - - s = Series(np.array([1] * len(dti)), index=dti, dtype='int64') - - b = TimeGrouper(Minute(5)) - g = s.groupby(b) - - # check all cython functions work - funcs = ['add', 'mean', 'prod', 'ohlc', 'min', 'max', 'var'] - for f in funcs: - g._cython_agg_general(f) - - b = TimeGrouper(Minute(5), closed='right', label='right') - g = s.groupby(b) - # check all cython functions work - funcs = ['add', 'mean', 'prod', 'ohlc', 'min', 'max', 'var'] - for f in funcs: - g._cython_agg_general(f) - - assert g.ngroups == 2593 - assert notna(g.mean()).all() - - # construct expected val - arr = [1] + [5] * 2592 - idx = dti[0:-1:5] - idx = idx.append(dti[-1:]) - expect = Series(arr, index=idx) - - # GH2763 - return in put dtype if we can - result = g.agg(np.sum) - assert_series_equal(result, expect) - - df = DataFrame(np.random.rand(len(dti), 10), - index=dti, dtype='float64') - r = df.groupby(b).agg(np.sum) - - assert len(r.columns) == 10 - assert len(r.index) == 2593 - - def test_resample_basic(self): - rng = date_range('1/1/2000 00:00:00', '1/1/2000 00:13:00', freq='min', - name='index') - s = Series(np.random.randn(14), index=rng) - - result = s.resample('5min', closed='right', label='right').mean() - - exp_idx = date_range('1/1/2000', periods=4, freq='5min', name='index') - expected = Series([s[0], s[1:6].mean(), s[6:11].mean(), s[11:].mean()], - index=exp_idx) - assert_series_equal(result, expected) - assert result.index.name == 'index' - - result = s.resample('5min', closed='left', label='right').mean() - - exp_idx = date_range('1/1/2000 00:05', periods=3, freq='5min', - name='index') - expected = Series([s[:5].mean(), s[5:10].mean(), - s[10:].mean()], index=exp_idx) - assert_series_equal(result, expected) - - s = self.series - result = s.resample('5Min').last() - grouper = TimeGrouper(Minute(5), closed='left', label='left') - expect = s.groupby(grouper).agg(lambda x: x[-1]) - assert_series_equal(result, expect) - - def test_resample_string_kwargs(self): - # Test for issue #19303 - rng = date_range('1/1/2000 00:00:00', '1/1/2000 00:13:00', freq='min', - name='index') - s = Series(np.random.randn(14), index=rng) - - # Check that wrong keyword argument strings raise an error - with pytest.raises(ValueError): - s.resample('5min', label='righttt').mean() - with pytest.raises(ValueError): - s.resample('5min', closed='righttt').mean() - with pytest.raises(ValueError): - s.resample('5min', convention='starttt').mean() - - def test_resample_how(self): - rng = date_range('1/1/2000 00:00:00', '1/1/2000 00:13:00', freq='min', - name='index') - s = Series(np.random.randn(14), index=rng) - grouplist = np.ones_like(s) - grouplist[0] = 0 - grouplist[1:6] = 1 - grouplist[6:11] = 2 - grouplist[11:] = 3 - args = downsample_methods - - def _ohlc(group): - if isna(group).all(): - return np.repeat(np.nan, 4) - return [group[0], group.max(), group.min(), group[-1]] - - inds = date_range('1/1/2000', periods=4, freq='5min', name='index') - - for arg in args: - if arg == 'ohlc': - func = _ohlc - else: - func = arg - try: - result = getattr(s.resample( - '5min', closed='right', label='right'), arg)() - - expected = s.groupby(grouplist).agg(func) - assert result.index.name == 'index' - if arg == 'ohlc': - expected = DataFrame(expected.values.tolist()) - expected.columns = ['open', 'high', 'low', 'close'] - expected.index = Index(inds, name='index') - assert_frame_equal(result, expected) - else: - expected.index = inds - assert_series_equal(result, expected) - except BaseException as exc: - - exc.args += ('how=%s' % arg,) - raise - - def test_numpy_compat(self): - # see gh-12811 - s = Series([1, 2, 3, 4, 5], index=date_range( - '20130101', periods=5, freq='s')) - r = s.resample('2s') - - msg = "numpy operations are not valid with resample" - - for func in ('min', 'max', 'sum', 'prod', - 'mean', 'var', 'std'): - with pytest.raises(UnsupportedFunctionCall, match=msg): - getattr(r, func)(func, 1, 2, 3) - with pytest.raises(UnsupportedFunctionCall, match=msg): - getattr(r, func)(axis=1) - - def test_resample_how_callables(self): - # GH 7929 - data = np.arange(5, dtype=np.int64) - ind = pd.DatetimeIndex(start='2014-01-01', periods=len(data), freq='d') - df = DataFrame({"A": data, "B": data}, index=ind) - - def fn(x, a=1): - return str(type(x)) - - class FnClass(object): - - def __call__(self, x): - return str(type(x)) - - df_standard = df.resample("M").apply(fn) - df_lambda = df.resample("M").apply(lambda x: str(type(x))) - df_partial = df.resample("M").apply(partial(fn)) - df_partial2 = df.resample("M").apply(partial(fn, a=2)) - df_class = df.resample("M").apply(FnClass()) - - assert_frame_equal(df_standard, df_lambda) - assert_frame_equal(df_standard, df_partial) - assert_frame_equal(df_standard, df_partial2) - assert_frame_equal(df_standard, df_class) - - def test_resample_with_timedeltas(self): - - expected = DataFrame({'A': np.arange(1480)}) - expected = expected.groupby(expected.index // 30).sum() - expected.index = pd.timedelta_range('0 days', freq='30T', periods=50) - - df = DataFrame({'A': np.arange(1480)}, index=pd.to_timedelta( - np.arange(1480), unit='T')) - result = df.resample('30T').sum() - - assert_frame_equal(result, expected) - - s = df['A'] - result = s.resample('30T').sum() - assert_series_equal(result, expected['A']) - - def test_resample_single_period_timedelta(self): - - s = Series(list(range(5)), index=pd.timedelta_range( - '1 day', freq='s', periods=5)) - result = s.resample('2s').sum() - expected = Series([1, 5, 4], index=pd.timedelta_range( - '1 day', freq='2s', periods=3)) - assert_series_equal(result, expected) - - def test_resample_timedelta_idempotency(self): - - # GH 12072 - index = pd.timedelta_range('0', periods=9, freq='10L') - series = Series(range(9), index=index) - result = series.resample('10L').mean() - expected = series - assert_series_equal(result, expected) - - def test_resample_rounding(self): - # GH 8371 - # odd results when rounding is needed - - data = """date,time,value -11-08-2014,00:00:01.093,1 -11-08-2014,00:00:02.159,1 -11-08-2014,00:00:02.667,1 -11-08-2014,00:00:03.175,1 -11-08-2014,00:00:07.058,1 -11-08-2014,00:00:07.362,1 -11-08-2014,00:00:08.324,1 -11-08-2014,00:00:08.830,1 -11-08-2014,00:00:08.982,1 -11-08-2014,00:00:09.815,1 -11-08-2014,00:00:10.540,1 -11-08-2014,00:00:11.061,1 -11-08-2014,00:00:11.617,1 -11-08-2014,00:00:13.607,1 -11-08-2014,00:00:14.535,1 -11-08-2014,00:00:15.525,1 -11-08-2014,00:00:17.960,1 -11-08-2014,00:00:20.674,1 -11-08-2014,00:00:21.191,1""" - - from pandas.compat import StringIO - df = pd.read_csv(StringIO(data), parse_dates={'timestamp': [ - 'date', 'time']}, index_col='timestamp') - df.index.name = None - result = df.resample('6s').sum() - expected = DataFrame({'value': [ - 4, 9, 4, 2 - ]}, index=date_range('2014-11-08', freq='6s', periods=4)) - assert_frame_equal(result, expected) - - result = df.resample('7s').sum() - expected = DataFrame({'value': [ - 4, 10, 4, 1 - ]}, index=date_range('2014-11-08', freq='7s', periods=4)) - assert_frame_equal(result, expected) - - result = df.resample('11s').sum() - expected = DataFrame({'value': [ - 11, 8 - ]}, index=date_range('2014-11-08', freq='11s', periods=2)) - assert_frame_equal(result, expected) - - result = df.resample('13s').sum() - expected = DataFrame({'value': [ - 13, 6 - ]}, index=date_range('2014-11-08', freq='13s', periods=2)) - assert_frame_equal(result, expected) - - result = df.resample('17s').sum() - expected = DataFrame({'value': [ - 16, 3 - ]}, index=date_range('2014-11-08', freq='17s', periods=2)) - assert_frame_equal(result, expected) - - def test_resample_basic_from_daily(self): - # from daily - dti = DatetimeIndex(start=datetime(2005, 1, 1), - end=datetime(2005, 1, 10), freq='D', name='index') - - s = Series(np.random.rand(len(dti)), dti) - - # to weekly - result = s.resample('w-sun').last() - - assert len(result) == 3 - assert (result.index.dayofweek == [6, 6, 6]).all() - assert result.iloc[0] == s['1/2/2005'] - assert result.iloc[1] == s['1/9/2005'] - assert result.iloc[2] == s.iloc[-1] - - result = s.resample('W-MON').last() - assert len(result) == 2 - assert (result.index.dayofweek == [0, 0]).all() - assert result.iloc[0] == s['1/3/2005'] - assert result.iloc[1] == s['1/10/2005'] - - result = s.resample('W-TUE').last() - assert len(result) == 2 - assert (result.index.dayofweek == [1, 1]).all() - assert result.iloc[0] == s['1/4/2005'] - assert result.iloc[1] == s['1/10/2005'] - - result = s.resample('W-WED').last() - assert len(result) == 2 - assert (result.index.dayofweek == [2, 2]).all() - assert result.iloc[0] == s['1/5/2005'] - assert result.iloc[1] == s['1/10/2005'] - - result = s.resample('W-THU').last() - assert len(result) == 2 - assert (result.index.dayofweek == [3, 3]).all() - assert result.iloc[0] == s['1/6/2005'] - assert result.iloc[1] == s['1/10/2005'] - - result = s.resample('W-FRI').last() - assert len(result) == 2 - assert (result.index.dayofweek == [4, 4]).all() - assert result.iloc[0] == s['1/7/2005'] - assert result.iloc[1] == s['1/10/2005'] - - # to biz day - result = s.resample('B').last() - assert len(result) == 7 - assert (result.index.dayofweek == [4, 0, 1, 2, 3, 4, 0]).all() - - assert result.iloc[0] == s['1/2/2005'] - assert result.iloc[1] == s['1/3/2005'] - assert result.iloc[5] == s['1/9/2005'] - assert result.index.name == 'index' - - def test_resample_upsampling_picked_but_not_correct(self): - - # Test for issue #3020 - dates = date_range('01-Jan-2014', '05-Jan-2014', freq='D') - series = Series(1, index=dates) - - result = series.resample('D').mean() - assert result.index[0] == dates[0] - - # GH 5955 - # incorrect deciding to upsample when the axis frequency matches the - # resample frequency - - import datetime - s = Series(np.arange(1., 6), index=[datetime.datetime( - 1975, 1, i, 12, 0) for i in range(1, 6)]) - expected = Series(np.arange(1., 6), index=date_range( - '19750101', periods=5, freq='D')) - - result = s.resample('D').count() - assert_series_equal(result, Series(1, index=expected.index)) - - result1 = s.resample('D').sum() - result2 = s.resample('D').mean() - assert_series_equal(result1, expected) - assert_series_equal(result2, expected) - - def test_resample_frame_basic(self): - df = tm.makeTimeDataFrame() - - b = TimeGrouper('M') - g = df.groupby(b) - - # check all cython functions work - funcs = ['add', 'mean', 'prod', 'min', 'max', 'var'] - for f in funcs: - g._cython_agg_general(f) - - result = df.resample('A').mean() - assert_series_equal(result['A'], df['A'].resample('A').mean()) - - result = df.resample('M').mean() - assert_series_equal(result['A'], df['A'].resample('M').mean()) - - df.resample('M', kind='period').mean() - df.resample('W-WED', kind='period').mean() - - @pytest.mark.parametrize('loffset', [timedelta(minutes=1), - '1min', Minute(1), - np.timedelta64(1, 'm')]) - def test_resample_loffset(self, loffset): - # GH 7687 - rng = date_range('1/1/2000 00:00:00', '1/1/2000 00:13:00', freq='min') - s = Series(np.random.randn(14), index=rng) - - result = s.resample('5min', closed='right', label='right', - loffset=loffset).mean() - idx = date_range('1/1/2000', periods=4, freq='5min') - expected = Series([s[0], s[1:6].mean(), s[6:11].mean(), s[11:].mean()], - index=idx + timedelta(minutes=1)) - assert_series_equal(result, expected) - assert result.index.freq == Minute(5) - - # from daily - dti = DatetimeIndex(start=datetime(2005, 1, 1), - end=datetime(2005, 1, 10), freq='D') - ser = Series(np.random.rand(len(dti)), dti) - - # to weekly - result = ser.resample('w-sun').last() - expected = ser.resample('w-sun', loffset=-bday).last() - assert result.index[0] - bday == expected.index[0] - - def test_resample_loffset_upsample(self): - # GH 20744 - rng = date_range('1/1/2000 00:00:00', '1/1/2000 00:13:00', freq='min') - s = Series(np.random.randn(14), index=rng) - - result = s.resample('5min', closed='right', label='right', - loffset=timedelta(minutes=1)).ffill() - idx = date_range('1/1/2000', periods=4, freq='5min') - expected = Series([s[0], s[5], s[10], s[-1]], - index=idx + timedelta(minutes=1)) - - assert_series_equal(result, expected) - - def test_resample_loffset_count(self): - # GH 12725 - start_time = '1/1/2000 00:00:00' - rng = date_range(start_time, periods=100, freq='S') - ts = Series(np.random.randn(len(rng)), index=rng) - - result = ts.resample('10S', loffset='1s').count() - - expected_index = ( - date_range(start_time, periods=10, freq='10S') + - timedelta(seconds=1) - ) - expected = Series(10, index=expected_index) - - assert_series_equal(result, expected) - - # Same issue should apply to .size() since it goes through - # same code path - result = ts.resample('10S', loffset='1s').size() - - assert_series_equal(result, expected) - - def test_resample_upsample(self): - # from daily - dti = DatetimeIndex(start=datetime(2005, 1, 1), - end=datetime(2005, 1, 10), freq='D', name='index') - - s = Series(np.random.rand(len(dti)), dti) - - # to minutely, by padding - result = s.resample('Min').pad() - assert len(result) == 12961 - assert result[0] == s[0] - assert result[-1] == s[-1] - - assert result.index.name == 'index' - - def test_resample_how_method(self): - # GH9915 - s = Series([11, 22], - index=[Timestamp('2015-03-31 21:48:52.672000'), - Timestamp('2015-03-31 21:49:52.739000')]) - expected = Series([11, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, 22], - index=[Timestamp('2015-03-31 21:48:50'), - Timestamp('2015-03-31 21:49:00'), - Timestamp('2015-03-31 21:49:10'), - Timestamp('2015-03-31 21:49:20'), - Timestamp('2015-03-31 21:49:30'), - Timestamp('2015-03-31 21:49:40'), - Timestamp('2015-03-31 21:49:50')]) - assert_series_equal(s.resample("10S").mean(), expected) - - def test_resample_extra_index_point(self): - # GH 9756 - index = DatetimeIndex(start='20150101', end='20150331', freq='BM') - expected = DataFrame({'A': Series([21, 41, 63], index=index)}) - - index = DatetimeIndex(start='20150101', end='20150331', freq='B') - df = DataFrame( - {'A': Series(range(len(index)), index=index)}, dtype='int64') - result = df.resample('BM').last() - assert_frame_equal(result, expected) - - def test_upsample_with_limit(self): - rng = date_range('1/1/2000', periods=3, freq='5t') - ts = Series(np.random.randn(len(rng)), rng) - - result = ts.resample('t').ffill(limit=2) - expected = ts.reindex(result.index, method='ffill', limit=2) - assert_series_equal(result, expected) - - def test_nearest_upsample_with_limit(self): - rng = date_range('1/1/2000', periods=3, freq='5t') - ts = Series(np.random.randn(len(rng)), rng) - - result = ts.resample('t').nearest(limit=2) - expected = ts.reindex(result.index, method='nearest', limit=2) - assert_series_equal(result, expected) - - def test_resample_ohlc(self): - s = self.series - - grouper = TimeGrouper(Minute(5)) - expect = s.groupby(grouper).agg(lambda x: x[-1]) - result = s.resample('5Min').ohlc() - - assert len(result) == len(expect) - assert len(result.columns) == 4 - - xs = result.iloc[-2] - assert xs['open'] == s[-6] - assert xs['high'] == s[-6:-1].max() - assert xs['low'] == s[-6:-1].min() - assert xs['close'] == s[-2] - - xs = result.iloc[0] - assert xs['open'] == s[0] - assert xs['high'] == s[:5].max() - assert xs['low'] == s[:5].min() - assert xs['close'] == s[4] - - def test_resample_ohlc_result(self): - - # GH 12332 - index = pd.date_range('1-1-2000', '2-15-2000', freq='h') - index = index.union(pd.date_range('4-15-2000', '5-15-2000', freq='h')) - s = Series(range(len(index)), index=index) - - a = s.loc[:'4-15-2000'].resample('30T').ohlc() - assert isinstance(a, DataFrame) - - b = s.loc[:'4-14-2000'].resample('30T').ohlc() - assert isinstance(b, DataFrame) - - # GH12348 - # raising on odd period - rng = date_range('2013-12-30', '2014-01-07') - index = rng.drop([Timestamp('2014-01-01'), - Timestamp('2013-12-31'), - Timestamp('2014-01-04'), - Timestamp('2014-01-05')]) - df = DataFrame(data=np.arange(len(index)), index=index) - result = df.resample('B').mean() - expected = df.reindex(index=date_range(rng[0], rng[-1], freq='B')) - assert_frame_equal(result, expected) - - def test_resample_ohlc_dataframe(self): - df = ( - DataFrame({ - 'PRICE': { - Timestamp('2011-01-06 10:59:05', tz=None): 24990, - Timestamp('2011-01-06 12:43:33', tz=None): 25499, - Timestamp('2011-01-06 12:54:09', tz=None): 25499}, - 'VOLUME': { - Timestamp('2011-01-06 10:59:05', tz=None): 1500000000, - Timestamp('2011-01-06 12:43:33', tz=None): 5000000000, - Timestamp('2011-01-06 12:54:09', tz=None): 100000000}}) - ).reindex(['VOLUME', 'PRICE'], axis=1) - res = df.resample('H').ohlc() - exp = pd.concat([df['VOLUME'].resample('H').ohlc(), - df['PRICE'].resample('H').ohlc()], - axis=1, - keys=['VOLUME', 'PRICE']) - assert_frame_equal(exp, res) - - df.columns = [['a', 'b'], ['c', 'd']] - res = df.resample('H').ohlc() - exp.columns = pd.MultiIndex.from_tuples([ - ('a', 'c', 'open'), ('a', 'c', 'high'), ('a', 'c', 'low'), - ('a', 'c', 'close'), ('b', 'd', 'open'), ('b', 'd', 'high'), - ('b', 'd', 'low'), ('b', 'd', 'close')]) - assert_frame_equal(exp, res) - - # dupe columns fail atm - # df.columns = ['PRICE', 'PRICE'] - - def test_resample_dup_index(self): - - # GH 4812 - # dup columns with resample raising - df = DataFrame(np.random.randn(4, 12), index=[2000, 2000, 2000, 2000], - columns=[Period(year=2000, month=i + 1, freq='M') - for i in range(12)]) - df.iloc[3, :] = np.nan - result = df.resample('Q', axis=1).mean() - expected = df.groupby(lambda x: int((x.month - 1) / 3), axis=1).mean() - expected.columns = [ - Period(year=2000, quarter=i + 1, freq='Q') for i in range(4)] - assert_frame_equal(result, expected) - - def test_resample_reresample(self): - dti = DatetimeIndex(start=datetime(2005, 1, 1), - end=datetime(2005, 1, 10), freq='D') - s = Series(np.random.rand(len(dti)), dti) - bs = s.resample('B', closed='right', label='right').mean() - result = bs.resample('8H').mean() - assert len(result) == 22 - assert isinstance(result.index.freq, offsets.DateOffset) - assert result.index.freq == offsets.Hour(8) - - def test_resample_timestamp_to_period(self): - ts = _simple_ts('1/1/1990', '1/1/2000') - - result = ts.resample('A-DEC', kind='period').mean() - expected = ts.resample('A-DEC').mean() - expected.index = period_range('1990', '2000', freq='a-dec') - assert_series_equal(result, expected) - - result = ts.resample('A-JUN', kind='period').mean() - expected = ts.resample('A-JUN').mean() - expected.index = period_range('1990', '2000', freq='a-jun') - assert_series_equal(result, expected) - - result = ts.resample('M', kind='period').mean() - expected = ts.resample('M').mean() - expected.index = period_range('1990-01', '2000-01', freq='M') - assert_series_equal(result, expected) - - result = ts.resample('M', kind='period').mean() - expected = ts.resample('M').mean() - expected.index = period_range('1990-01', '2000-01', freq='M') - assert_series_equal(result, expected) - - def test_ohlc_5min(self): - def _ohlc(group): - if isna(group).all(): - return np.repeat(np.nan, 4) - return [group[0], group.max(), group.min(), group[-1]] - - rng = date_range('1/1/2000 00:00:00', '1/1/2000 5:59:50', freq='10s') - ts = Series(np.random.randn(len(rng)), index=rng) - - resampled = ts.resample('5min', closed='right', - label='right').ohlc() - - assert (resampled.loc['1/1/2000 00:00'] == ts[0]).all() - - exp = _ohlc(ts[1:31]) - assert (resampled.loc['1/1/2000 00:05'] == exp).all() - - exp = _ohlc(ts['1/1/2000 5:55:01':]) - assert (resampled.loc['1/1/2000 6:00:00'] == exp).all() - - def test_downsample_non_unique(self): - rng = date_range('1/1/2000', '2/29/2000') - rng2 = rng.repeat(5).values - ts = Series(np.random.randn(len(rng2)), index=rng2) - - result = ts.resample('M').mean() - - expected = ts.groupby(lambda x: x.month).mean() - assert len(result) == 2 - assert_almost_equal(result[0], expected[1]) - assert_almost_equal(result[1], expected[2]) - - def test_asfreq_non_unique(self): - # GH #1077 - rng = date_range('1/1/2000', '2/29/2000') - rng2 = rng.repeat(2).values - ts = Series(np.random.randn(len(rng2)), index=rng2) - - pytest.raises(Exception, ts.asfreq, 'B') - - def test_resample_axis1(self): - rng = date_range('1/1/2000', '2/29/2000') - df = DataFrame(np.random.randn(3, len(rng)), columns=rng, - index=['a', 'b', 'c']) - - result = df.resample('M', axis=1).mean() - expected = df.T.resample('M').mean().T - tm.assert_frame_equal(result, expected) - - def test_resample_panel(self): - rng = date_range('1/1/2000', '6/30/2000') - n = len(rng) - - with catch_warnings(record=True): - simplefilter("ignore", FutureWarning) - panel = Panel(np.random.randn(3, n, 5), - items=['one', 'two', 'three'], - major_axis=rng, - minor_axis=['a', 'b', 'c', 'd', 'e']) - - result = panel.resample('M', axis=1).mean() - - def p_apply(panel, f): - result = {} - for item in panel.items: - result[item] = f(panel[item]) - return Panel(result, items=panel.items) - - expected = p_apply(panel, lambda x: x.resample('M').mean()) - tm.assert_panel_equal(result, expected) - - panel2 = panel.swapaxes(1, 2) - result = panel2.resample('M', axis=2).mean() - expected = p_apply(panel2, - lambda x: x.resample('M', axis=1).mean()) - tm.assert_panel_equal(result, expected) - - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_resample_panel_numpy(self): - rng = date_range('1/1/2000', '6/30/2000') - n = len(rng) - - with catch_warnings(record=True): - panel = Panel(np.random.randn(3, n, 5), - items=['one', 'two', 'three'], - major_axis=rng, - minor_axis=['a', 'b', 'c', 'd', 'e']) - - result = panel.resample('M', axis=1).apply(lambda x: x.mean(1)) - expected = panel.resample('M', axis=1).mean() - tm.assert_panel_equal(result, expected) - - panel = panel.swapaxes(1, 2) - result = panel.resample('M', axis=2).apply(lambda x: x.mean(2)) - expected = panel.resample('M', axis=2).mean() - tm.assert_panel_equal(result, expected) - - def test_resample_anchored_ticks(self): - # If a fixed delta (5 minute, 4 hour) evenly divides a day, we should - # "anchor" the origin at midnight so we get regular intervals rather - # than starting from the first timestamp which might start in the - # middle of a desired interval - - rng = date_range('1/1/2000 04:00:00', periods=86400, freq='s') - ts = Series(np.random.randn(len(rng)), index=rng) - ts[:2] = np.nan # so results are the same - - freqs = ['t', '5t', '15t', '30t', '4h', '12h'] - for freq in freqs: - result = ts[2:].resample(freq, closed='left', label='left').mean() - expected = ts.resample(freq, closed='left', label='left').mean() - assert_series_equal(result, expected) - - def test_resample_single_group(self): - mysum = lambda x: x.sum() - - rng = date_range('2000-1-1', '2000-2-10', freq='D') - ts = Series(np.random.randn(len(rng)), index=rng) - assert_series_equal(ts.resample('M').sum(), - ts.resample('M').apply(mysum)) - - rng = date_range('2000-1-1', '2000-1-10', freq='D') - ts = Series(np.random.randn(len(rng)), index=rng) - assert_series_equal(ts.resample('M').sum(), - ts.resample('M').apply(mysum)) - - # GH 3849 - s = Series([30.1, 31.6], index=[Timestamp('20070915 15:30:00'), - Timestamp('20070915 15:40:00')]) - expected = Series([0.75], index=[Timestamp('20070915')]) - result = s.resample('D').apply(lambda x: np.std(x)) - assert_series_equal(result, expected) - - def test_resample_base(self): - rng = date_range('1/1/2000 00:00:00', '1/1/2000 02:00', freq='s') - ts = Series(np.random.randn(len(rng)), index=rng) - - resampled = ts.resample('5min', base=2).mean() - exp_rng = date_range('12/31/1999 23:57:00', '1/1/2000 01:57', - freq='5min') - tm.assert_index_equal(resampled.index, exp_rng) - - def test_resample_base_with_timedeltaindex(self): - - # GH 10530 - rng = timedelta_range(start='0s', periods=25, freq='s') - ts = Series(np.random.randn(len(rng)), index=rng) - - with_base = ts.resample('2s', base=5).mean() - without_base = ts.resample('2s').mean() - - exp_without_base = timedelta_range(start='0s', end='25s', freq='2s') - exp_with_base = timedelta_range(start='5s', end='29s', freq='2s') - - tm.assert_index_equal(without_base.index, exp_without_base) - tm.assert_index_equal(with_base.index, exp_with_base) - - def test_resample_categorical_data_with_timedeltaindex(self): - # GH #12169 - df = DataFrame({'Group_obj': 'A'}, - index=pd.to_timedelta(list(range(20)), unit='s')) - df['Group'] = df['Group_obj'].astype('category') - result = df.resample('10s').agg(lambda x: (x.value_counts().index[0])) - expected = DataFrame({'Group_obj': ['A', 'A'], - 'Group': ['A', 'A']}, - index=pd.to_timedelta([0, 10], unit='s')) - expected = expected.reindex(['Group_obj', 'Group'], axis=1) - expected['Group'] = expected['Group_obj'].astype('category') - tm.assert_frame_equal(result, expected) - - def test_resample_daily_anchored(self): - rng = date_range('1/1/2000 0:00:00', periods=10000, freq='T') - ts = Series(np.random.randn(len(rng)), index=rng) - ts[:2] = np.nan # so results are the same - - result = ts[2:].resample('D', closed='left', label='left').mean() - expected = ts.resample('D', closed='left', label='left').mean() - assert_series_equal(result, expected) - - def test_resample_to_period_monthly_buglet(self): - # GH #1259 - - rng = date_range('1/1/2000', '12/31/2000') - ts = Series(np.random.randn(len(rng)), index=rng) - - result = ts.resample('M', kind='period').mean() - exp_index = period_range('Jan-2000', 'Dec-2000', freq='M') - tm.assert_index_equal(result.index, exp_index) - - def test_period_with_agg(self): - - # aggregate a period resampler with a lambda - s2 = Series(np.random.randint(0, 5, 50), - index=pd.period_range('2012-01-01', freq='H', periods=50), - dtype='float64') - - expected = s2.to_timestamp().resample('D').mean().to_period() - result = s2.resample('D').agg(lambda x: x.mean()) - assert_series_equal(result, expected) - - def test_resample_segfault(self): - # GH 8573 - # segfaulting in older versions - all_wins_and_wagers = [ - (1, datetime(2013, 10, 1, 16, 20), 1, 0), - (2, datetime(2013, 10, 1, 16, 10), 1, 0), - (2, datetime(2013, 10, 1, 18, 15), 1, 0), - (2, datetime(2013, 10, 1, 16, 10, 31), 1, 0)] - - df = DataFrame.from_records(all_wins_and_wagers, - columns=("ID", "timestamp", "A", "B") - ).set_index("timestamp") - result = df.groupby("ID").resample("5min").sum() - expected = df.groupby("ID").apply(lambda x: x.resample("5min").sum()) - assert_frame_equal(result, expected) - - def test_resample_dtype_preservation(self): - - # GH 12202 - # validation tests for dtype preservation - - df = DataFrame({'date': pd.date_range(start='2016-01-01', - periods=4, freq='W'), - 'group': [1, 1, 2, 2], - 'val': Series([5, 6, 7, 8], - dtype='int32')} - ).set_index('date') - - result = df.resample('1D').ffill() - assert result.val.dtype == np.int32 - - result = df.groupby('group').resample('1D').ffill() - assert result.val.dtype == np.int32 - - def test_resample_dtype_coerceion(self): - - pytest.importorskip('scipy.interpolate') - - # GH 16361 - df = {"a": [1, 3, 1, 4]} - df = DataFrame(df, index=pd.date_range("2017-01-01", "2017-01-04")) - - expected = (df.astype("float64") - .resample("H") - .mean() - ["a"] - .interpolate("cubic") - ) - - result = df.resample("H")["a"].mean().interpolate("cubic") - tm.assert_series_equal(result, expected) - - result = df.resample("H").mean()["a"].interpolate("cubic") - tm.assert_series_equal(result, expected) - - def test_weekly_resample_buglet(self): - # #1327 - rng = date_range('1/1/2000', freq='B', periods=20) - ts = Series(np.random.randn(len(rng)), index=rng) - - resampled = ts.resample('W').mean() - expected = ts.resample('W-SUN').mean() - assert_series_equal(resampled, expected) - - def test_monthly_resample_error(self): - # #1451 - dates = date_range('4/16/2012 20:00', periods=5000, freq='h') - ts = Series(np.random.randn(len(dates)), index=dates) - # it works! - ts.resample('M') - - def test_nanosecond_resample_error(self): - # GH 12307 - Values falls after last bin when - # Resampling using pd.tseries.offsets.Nano as period - start = 1443707890427 - exp_start = 1443707890400 - indx = pd.date_range( - start=pd.to_datetime(start), - periods=10, - freq='100n' - ) - ts = Series(range(len(indx)), index=indx) - r = ts.resample(pd.tseries.offsets.Nano(100)) - result = r.agg('mean') - - exp_indx = pd.date_range( - start=pd.to_datetime(exp_start), - periods=10, - freq='100n' - ) - exp = Series(range(len(exp_indx)), index=exp_indx) - - assert_series_equal(result, exp) - - def test_resample_anchored_intraday(self): - # #1471, #1458 - - rng = date_range('1/1/2012', '4/1/2012', freq='100min') - df = DataFrame(rng.month, index=rng) - - result = df.resample('M').mean() - expected = df.resample( - 'M', kind='period').mean().to_timestamp(how='end') - expected.index += Timedelta(1, 'ns') - Timedelta(1, 'D') - tm.assert_frame_equal(result, expected) - - result = df.resample('M', closed='left').mean() - exp = df.tshift(1, freq='D').resample('M', kind='period').mean() - exp = exp.to_timestamp(how='end') - - exp.index = exp.index + Timedelta(1, 'ns') - Timedelta(1, 'D') - tm.assert_frame_equal(result, exp) - - rng = date_range('1/1/2012', '4/1/2012', freq='100min') - df = DataFrame(rng.month, index=rng) - - result = df.resample('Q').mean() - expected = df.resample( - 'Q', kind='period').mean().to_timestamp(how='end') - expected.index += Timedelta(1, 'ns') - Timedelta(1, 'D') - tm.assert_frame_equal(result, expected) - - result = df.resample('Q', closed='left').mean() - expected = df.tshift(1, freq='D').resample('Q', kind='period', - closed='left').mean() - expected = expected.to_timestamp(how='end') - expected.index += Timedelta(1, 'ns') - Timedelta(1, 'D') - tm.assert_frame_equal(result, expected) - - ts = _simple_ts('2012-04-29 23:00', '2012-04-30 5:00', freq='h') - resampled = ts.resample('M').mean() - assert len(resampled) == 1 - - def test_resample_anchored_monthstart(self): - ts = _simple_ts('1/1/2000', '12/31/2002') - - freqs = ['MS', 'BMS', 'QS-MAR', 'AS-DEC', 'AS-JUN'] - - for freq in freqs: - ts.resample(freq).mean() - - def test_resample_anchored_multiday(self): - # When resampling a range spanning multiple days, ensure that the - # start date gets used to determine the offset. Fixes issue where - # a one day period is not a multiple of the frequency. - # - # See: https://github.com/pandas-dev/pandas/issues/8683 - - index = pd.date_range( - '2014-10-14 23:06:23.206', periods=3, freq='400L' - ) | pd.date_range( - '2014-10-15 23:00:00', periods=2, freq='2200L') - - s = Series(np.random.randn(5), index=index) - - # Ensure left closing works - result = s.resample('2200L').mean() - assert result.index[-1] == Timestamp('2014-10-15 23:00:02.000') - - # Ensure right closing works - result = s.resample('2200L', label='right').mean() - assert result.index[-1] == Timestamp('2014-10-15 23:00:04.200') - - def test_corner_cases(self): - # miscellaneous test coverage - - rng = date_range('1/1/2000', periods=12, freq='t') - ts = Series(np.random.randn(len(rng)), index=rng) - - result = ts.resample('5t', closed='right', label='left').mean() - ex_index = date_range('1999-12-31 23:55', periods=4, freq='5t') - tm.assert_index_equal(result.index, ex_index) - - len0pts = _simple_pts('2007-01', '2010-05', freq='M')[:0] - # it works - result = len0pts.resample('A-DEC').mean() - assert len(result) == 0 - - # resample to periods - ts = _simple_ts('2000-04-28', '2000-04-30 11:00', freq='h') - result = ts.resample('M', kind='period').mean() - assert len(result) == 1 - assert result.index[0] == Period('2000-04', freq='M') - - def test_anchored_lowercase_buglet(self): - dates = date_range('4/16/2012 20:00', periods=50000, freq='s') - ts = Series(np.random.randn(len(dates)), index=dates) - # it works! - ts.resample('d').mean() - - def test_upsample_apply_functions(self): - # #1596 - rng = pd.date_range('2012-06-12', periods=4, freq='h') - - ts = Series(np.random.randn(len(rng)), index=rng) - - result = ts.resample('20min').aggregate(['mean', 'sum']) - assert isinstance(result, DataFrame) - - def test_resample_not_monotonic(self): - rng = pd.date_range('2012-06-12', periods=200, freq='h') - ts = Series(np.random.randn(len(rng)), index=rng) - - ts = ts.take(np.random.permutation(len(ts))) - - result = ts.resample('D').sum() - exp = ts.sort_index().resample('D').sum() - assert_series_equal(result, exp) - - def test_resample_median_bug_1688(self): - - for dtype in ['int64', 'int32', 'float64', 'float32']: - df = DataFrame([1, 2], index=[datetime(2012, 1, 1, 0, 0, 0), - datetime(2012, 1, 1, 0, 5, 0)], - dtype=dtype) - - result = df.resample("T").apply(lambda x: x.mean()) - exp = df.asfreq('T') - tm.assert_frame_equal(result, exp) - - result = df.resample("T").median() - exp = df.asfreq('T') - tm.assert_frame_equal(result, exp) - - def test_how_lambda_functions(self): - - ts = _simple_ts('1/1/2000', '4/1/2000') - - result = ts.resample('M').apply(lambda x: x.mean()) - exp = ts.resample('M').mean() - tm.assert_series_equal(result, exp) - - foo_exp = ts.resample('M').mean() - foo_exp.name = 'foo' - bar_exp = ts.resample('M').std() - bar_exp.name = 'bar' - - result = ts.resample('M').apply( - [lambda x: x.mean(), lambda x: x.std(ddof=1)]) - result.columns = ['foo', 'bar'] - tm.assert_series_equal(result['foo'], foo_exp) - tm.assert_series_equal(result['bar'], bar_exp) - - # this is a MI Series, so comparing the names of the results - # doesn't make sense - result = ts.resample('M').aggregate({'foo': lambda x: x.mean(), - 'bar': lambda x: x.std(ddof=1)}) - tm.assert_series_equal(result['foo'], foo_exp, check_names=False) - tm.assert_series_equal(result['bar'], bar_exp, check_names=False) - - def test_resample_unequal_times(self): - # #1772 - start = datetime(1999, 3, 1, 5) - # end hour is less than start - end = datetime(2012, 7, 31, 4) - bad_ind = date_range(start, end, freq="30min") - df = DataFrame({'close': 1}, index=bad_ind) - - # it works! - df.resample('AS').sum() - - def test_resample_consistency(self): - - # GH 6418 - # resample with bfill / limit / reindex consistency - - i30 = pd.date_range('2002-02-02', periods=4, freq='30T') - s = Series(np.arange(4.), index=i30) - s[2] = np.NaN - - # Upsample by factor 3 with reindex() and resample() methods: - i10 = pd.date_range(i30[0], i30[-1], freq='10T') - - s10 = s.reindex(index=i10, method='bfill') - s10_2 = s.reindex(index=i10, method='bfill', limit=2) - rl = s.reindex_like(s10, method='bfill', limit=2) - r10_2 = s.resample('10Min').bfill(limit=2) - r10 = s.resample('10Min').bfill() - - # s10_2, r10, r10_2, rl should all be equal - assert_series_equal(s10_2, r10) - assert_series_equal(s10_2, r10_2) - assert_series_equal(s10_2, rl) - - def test_resample_timegrouper(self): - # GH 7227 - dates1 = [datetime(2014, 10, 1), datetime(2014, 9, 3), - datetime(2014, 11, 5), datetime(2014, 9, 5), - datetime(2014, 10, 8), datetime(2014, 7, 15)] - - dates2 = dates1[:2] + [pd.NaT] + dates1[2:4] + [pd.NaT] + dates1[4:] - dates3 = [pd.NaT] + dates1 + [pd.NaT] - - for dates in [dates1, dates2, dates3]: - df = DataFrame(dict(A=dates, B=np.arange(len(dates)))) - result = df.set_index('A').resample('M').count() - exp_idx = pd.DatetimeIndex(['2014-07-31', '2014-08-31', - '2014-09-30', - '2014-10-31', '2014-11-30'], - freq='M', name='A') - expected = DataFrame({'B': [1, 0, 2, 2, 1]}, index=exp_idx) - assert_frame_equal(result, expected) - - result = df.groupby(pd.Grouper(freq='M', key='A')).count() - assert_frame_equal(result, expected) - - df = DataFrame(dict(A=dates, B=np.arange(len(dates)), C=np.arange( - len(dates)))) - result = df.set_index('A').resample('M').count() - expected = DataFrame({'B': [1, 0, 2, 2, 1], 'C': [1, 0, 2, 2, 1]}, - index=exp_idx, columns=['B', 'C']) - assert_frame_equal(result, expected) - - result = df.groupby(pd.Grouper(freq='M', key='A')).count() - assert_frame_equal(result, expected) - - def test_resample_nunique(self): - - # GH 12352 - df = DataFrame({ - 'ID': {Timestamp('2015-06-05 00:00:00'): '0010100903', - Timestamp('2015-06-08 00:00:00'): '0010150847'}, - 'DATE': {Timestamp('2015-06-05 00:00:00'): '2015-06-05', - Timestamp('2015-06-08 00:00:00'): '2015-06-08'}}) - r = df.resample('D') - g = df.groupby(pd.Grouper(freq='D')) - expected = df.groupby(pd.Grouper(freq='D')).ID.apply(lambda x: - x.nunique()) - assert expected.name == 'ID' - - for t in [r, g]: - result = r.ID.nunique() - assert_series_equal(result, expected) - - result = df.ID.resample('D').nunique() - assert_series_equal(result, expected) - - result = df.ID.groupby(pd.Grouper(freq='D')).nunique() - assert_series_equal(result, expected) - - def test_resample_nunique_with_date_gap(self): - # GH 13453 - index = pd.date_range('1-1-2000', '2-15-2000', freq='h') - index2 = pd.date_range('4-15-2000', '5-15-2000', freq='h') - index3 = index.append(index2) - s = Series(range(len(index3)), index=index3, dtype='int64') - r = s.resample('M') - - # Since all elements are unique, these should all be the same - results = [ - r.count(), - r.nunique(), - r.agg(Series.nunique), - r.agg('nunique') - ] - - assert_series_equal(results[0], results[1]) - assert_series_equal(results[0], results[2]) - assert_series_equal(results[0], results[3]) - - @pytest.mark.parametrize('n', [10000, 100000]) - @pytest.mark.parametrize('k', [10, 100, 1000]) - def test_resample_group_info(self, n, k): - # GH10914 - dr = date_range(start='2015-08-27', periods=n // 10, freq='T') - ts = Series(np.random.randint(0, n // k, n).astype('int64'), - index=np.random.choice(dr, n)) - - left = ts.resample('30T').nunique() - ix = date_range(start=ts.index.min(), end=ts.index.max(), - freq='30T') - - vals = ts.values - bins = np.searchsorted(ix.values, ts.index, side='right') - - sorter = np.lexsort((vals, bins)) - vals, bins = vals[sorter], bins[sorter] - - mask = np.r_[True, vals[1:] != vals[:-1]] - mask |= np.r_[True, bins[1:] != bins[:-1]] - - arr = np.bincount(bins[mask] - 1, - minlength=len(ix)).astype('int64', copy=False) - right = Series(arr, index=ix) - - assert_series_equal(left, right) - - def test_resample_size(self): - n = 10000 - dr = date_range('2015-09-19', periods=n, freq='T') - ts = Series(np.random.randn(n), index=np.random.choice(dr, n)) - - left = ts.resample('7T').size() - ix = date_range(start=left.index.min(), end=ts.index.max(), freq='7T') - - bins = np.searchsorted(ix.values, ts.index.values, side='right') - val = np.bincount(bins, minlength=len(ix) + 1)[1:].astype('int64', - copy=False) - - right = Series(val, index=ix) - assert_series_equal(left, right) - - def test_resample_across_dst(self): - # The test resamples a DatetimeIndex with values before and after a - # DST change - # Issue: 14682 - - # The DatetimeIndex we will start with - # (note that DST happens at 03:00+02:00 -> 02:00+01:00) - # 2016-10-30 02:23:00+02:00, 2016-10-30 02:23:00+01:00 - df1 = DataFrame([1477786980, 1477790580], columns=['ts']) - dti1 = DatetimeIndex(pd.to_datetime(df1.ts, unit='s') - .dt.tz_localize('UTC') - .dt.tz_convert('Europe/Madrid')) - - # The expected DatetimeIndex after resampling. - # 2016-10-30 02:00:00+02:00, 2016-10-30 02:00:00+01:00 - df2 = DataFrame([1477785600, 1477789200], columns=['ts']) - dti2 = DatetimeIndex(pd.to_datetime(df2.ts, unit='s') - .dt.tz_localize('UTC') - .dt.tz_convert('Europe/Madrid')) - df = DataFrame([5, 5], index=dti1) - - result = df.resample(rule='H').sum() - expected = DataFrame([5, 5], index=dti2) - - assert_frame_equal(result, expected) - - def test_resample_dst_anchor(self): - # 5172 - dti = DatetimeIndex([datetime(2012, 11, 4, 23)], tz='US/Eastern') - df = DataFrame([5], index=dti) - assert_frame_equal(df.resample(rule='CD').sum(), - DataFrame([5], index=df.index.normalize())) - df.resample(rule='MS').sum() - assert_frame_equal( - df.resample(rule='MS').sum(), - DataFrame([5], index=DatetimeIndex([datetime(2012, 11, 1)], - tz='US/Eastern'))) - - dti = date_range('2013-09-30', '2013-11-02', freq='30Min', - tz='Europe/Paris') - values = range(dti.size) - df = DataFrame({"a": values, - "b": values, - "c": values}, index=dti, dtype='int64') - how = {"a": "min", "b": "max", "c": "count"} - - assert_frame_equal( - df.resample("W-MON").agg(how)[["a", "b", "c"]], - DataFrame({"a": [0, 48, 384, 720, 1056, 1394], - "b": [47, 383, 719, 1055, 1393, 1586], - "c": [48, 336, 336, 336, 338, 193]}, - index=date_range('9/30/2013', '11/4/2013', - freq='W-MON', tz='Europe/Paris')), - 'W-MON Frequency') - - assert_frame_equal( - df.resample("2W-MON").agg(how)[["a", "b", "c"]], - DataFrame({"a": [0, 48, 720, 1394], - "b": [47, 719, 1393, 1586], - "c": [48, 672, 674, 193]}, - index=date_range('9/30/2013', '11/11/2013', - freq='2W-MON', tz='Europe/Paris')), - '2W-MON Frequency') - - assert_frame_equal( - df.resample("MS").agg(how)[["a", "b", "c"]], - DataFrame({"a": [0, 48, 1538], - "b": [47, 1537, 1586], - "c": [48, 1490, 49]}, - index=date_range('9/1/2013', '11/1/2013', - freq='MS', tz='Europe/Paris')), - 'MS Frequency') - - assert_frame_equal( - df.resample("2MS").agg(how)[["a", "b", "c"]], - DataFrame({"a": [0, 1538], - "b": [1537, 1586], - "c": [1538, 49]}, - index=date_range('9/1/2013', '11/1/2013', - freq='2MS', tz='Europe/Paris')), - '2MS Frequency') - - df_daily = df['10/26/2013':'10/29/2013'] - assert_frame_equal( - df_daily.resample("CD").agg({"a": "min", "b": "max", "c": "count"}) - [["a", "b", "c"]], - DataFrame({"a": [1248, 1296, 1346, 1394], - "b": [1295, 1345, 1393, 1441], - "c": [48, 50, 48, 48]}, - index=date_range('10/26/2013', '10/29/2013', - freq='CD', tz='Europe/Paris')), - 'CD Frequency') - - def test_downsample_across_dst(self): - # GH 8531 - tz = pytz.timezone('Europe/Berlin') - dt = datetime(2014, 10, 26) - dates = date_range(tz.localize(dt), periods=4, freq='2H') - result = Series(5, index=dates).resample('H').mean() - expected = Series([5., np.nan] * 3 + [5.], - index=date_range(tz.localize(dt), periods=7, - freq='H')) - tm.assert_series_equal(result, expected) - - def test_downsample_across_dst_weekly(self): - # GH 9119, GH 21459 - df = DataFrame(index=DatetimeIndex([ - '2017-03-25', '2017-03-26', '2017-03-27', - '2017-03-28', '2017-03-29' - ], tz='Europe/Amsterdam'), - data=[11, 12, 13, 14, 15]) - result = df.resample('1W').sum() - expected = DataFrame([23, 42], index=pd.DatetimeIndex([ - '2017-03-26', '2017-04-02' - ], tz='Europe/Amsterdam')) - tm.assert_frame_equal(result, expected) - - idx = pd.date_range("2013-04-01", "2013-05-01", tz='Europe/London', - freq='H') - s = Series(index=idx) - result = s.resample('W').mean() - expected = Series(index=pd.date_range( - '2013-04-07', freq='W', periods=5, tz='Europe/London' - )) - tm.assert_series_equal(result, expected) - - def test_resample_with_nat(self): - # GH 13020 - index = DatetimeIndex([pd.NaT, - '1970-01-01 00:00:00', - pd.NaT, - '1970-01-01 00:00:01', - '1970-01-01 00:00:02']) - frame = DataFrame([2, 3, 5, 7, 11], index=index) - - index_1s = DatetimeIndex(['1970-01-01 00:00:00', - '1970-01-01 00:00:01', - '1970-01-01 00:00:02']) - frame_1s = DataFrame([3, 7, 11], index=index_1s) - assert_frame_equal(frame.resample('1s').mean(), frame_1s) - - index_2s = DatetimeIndex(['1970-01-01 00:00:00', - '1970-01-01 00:00:02']) - frame_2s = DataFrame([5, 11], index=index_2s) - assert_frame_equal(frame.resample('2s').mean(), frame_2s) - - index_3s = DatetimeIndex(['1970-01-01 00:00:00']) - frame_3s = DataFrame([7], index=index_3s) - assert_frame_equal(frame.resample('3s').mean(), frame_3s) - - assert_frame_equal(frame.resample('60s').mean(), frame_3s) - - def test_resample_timedelta_values(self): - # GH 13119 - # check that timedelta dtype is preserved when NaT values are - # introduced by the resampling - - times = timedelta_range('1 day', '4 day', freq='4D') - df = DataFrame({'time': times}, index=times) - - times2 = timedelta_range('1 day', '4 day', freq='2D') - exp = Series(times2, index=times2, name='time') - exp.iloc[1] = pd.NaT - - res = df.resample('2D').first()['time'] - tm.assert_series_equal(res, exp) - res = df['time'].resample('2D').first() - tm.assert_series_equal(res, exp) - - def test_resample_datetime_values(self): - # GH 13119 - # check that datetime dtype is preserved when NaT values are - # introduced by the resampling - - dates = [datetime(2016, 1, 15), datetime(2016, 1, 19)] - df = DataFrame({'timestamp': dates}, index=dates) - - exp = Series([datetime(2016, 1, 15), pd.NaT, datetime(2016, 1, 19)], - index=date_range('2016-01-15', periods=3, freq='2D'), - name='timestamp') - - res = df.resample('2D').first()['timestamp'] - tm.assert_series_equal(res, exp) - res = df['timestamp'].resample('2D').first() - tm.assert_series_equal(res, exp) - - def test_resample_apply_with_additional_args(self): - # GH 14615 - def f(data, add_arg): - return np.mean(data) * add_arg - - multiplier = 10 - result = self.series.resample('D').apply(f, multiplier) - expected = self.series.resample('D').mean().multiply(multiplier) - tm.assert_series_equal(result, expected) - - # Testing as kwarg - result = self.series.resample('D').apply(f, add_arg=multiplier) - expected = self.series.resample('D').mean().multiply(multiplier) - tm.assert_series_equal(result, expected) - - # Testing dataframe - df = pd.DataFrame({"A": 1, "B": 2}, - index=pd.date_range('2017', periods=10)) - result = df.groupby("A").resample("D").agg(f, multiplier) - expected = df.groupby("A").resample('D').mean().multiply(multiplier) - assert_frame_equal(result, expected) - - -class TestPeriodIndex(Base): - _index_factory = lambda x: period_range - - @pytest.fixture - def _series_name(self): - return 'pi' - - def create_series(self): - # TODO: replace calls to .create_series() by injecting the series - # fixture - i = period_range(datetime(2005, 1, 1), - datetime(2005, 1, 10), freq='D') - - return Series(np.arange(len(i)), index=i, name='pi') - - @pytest.mark.parametrize('freq', ['2D', '1H', '2H']) - @pytest.mark.parametrize('kind', ['period', None, 'timestamp']) - def test_asfreq(self, series_and_frame, freq, kind): - # GH 12884, 15944 - # make sure .asfreq() returns PeriodIndex (except kind='timestamp') - - obj = series_and_frame - if kind == 'timestamp': - expected = obj.to_timestamp().resample(freq).asfreq() - else: - start = obj.index[0].to_timestamp(how='start') - end = (obj.index[-1] + obj.index.freq).to_timestamp(how='start') - new_index = date_range(start=start, end=end, freq=freq, - closed='left') - expected = obj.to_timestamp().reindex(new_index).to_period(freq) - result = obj.resample(freq, kind=kind).asfreq() - assert_almost_equal(result, expected) - - def test_asfreq_fill_value(self): - # test for fill value during resampling, issue 3715 - - s = self.create_series() - new_index = date_range(s.index[0].to_timestamp(how='start'), - (s.index[-1]).to_timestamp(how='start'), - freq='1H') - expected = s.to_timestamp().reindex(new_index, fill_value=4.0) - result = s.resample('1H', kind='timestamp').asfreq(fill_value=4.0) - assert_series_equal(result, expected) - - frame = s.to_frame('value') - new_index = date_range(frame.index[0].to_timestamp(how='start'), - (frame.index[-1]).to_timestamp(how='start'), - freq='1H') - expected = frame.to_timestamp().reindex(new_index, fill_value=3.0) - result = frame.resample('1H', kind='timestamp').asfreq(fill_value=3.0) - assert_frame_equal(result, expected) - - @pytest.mark.parametrize('freq', ['H', '12H', '2D', 'W']) - @pytest.mark.parametrize('kind', [None, 'period', 'timestamp']) - def test_selection(self, index, freq, kind): - # This is a bug, these should be implemented - # GH 14008 - rng = np.arange(len(index), dtype=np.int64) - df = DataFrame({'date': index, 'a': rng}, - index=pd.MultiIndex.from_arrays([rng, index], - names=['v', 'd'])) - with pytest.raises(NotImplementedError): - df.resample(freq, on='date', kind=kind) - with pytest.raises(NotImplementedError): - df.resample(freq, level='d', kind=kind) - - def test_annual_upsample_D_s_f(self): - self._check_annual_upsample_cases('D', 'start', 'ffill') - - def test_annual_upsample_D_e_f(self): - self._check_annual_upsample_cases('D', 'end', 'ffill') - - def test_annual_upsample_D_s_b(self): - self._check_annual_upsample_cases('D', 'start', 'bfill') - - def test_annual_upsample_D_e_b(self): - self._check_annual_upsample_cases('D', 'end', 'bfill') - - def test_annual_upsample_B_s_f(self): - self._check_annual_upsample_cases('B', 'start', 'ffill') - - def test_annual_upsample_B_e_f(self): - self._check_annual_upsample_cases('B', 'end', 'ffill') - - def test_annual_upsample_B_s_b(self): - self._check_annual_upsample_cases('B', 'start', 'bfill') - - def test_annual_upsample_B_e_b(self): - self._check_annual_upsample_cases('B', 'end', 'bfill') - - def test_annual_upsample_M_s_f(self): - self._check_annual_upsample_cases('M', 'start', 'ffill') - - def test_annual_upsample_M_e_f(self): - self._check_annual_upsample_cases('M', 'end', 'ffill') - - def test_annual_upsample_M_s_b(self): - self._check_annual_upsample_cases('M', 'start', 'bfill') - - def test_annual_upsample_M_e_b(self): - self._check_annual_upsample_cases('M', 'end', 'bfill') - - def _check_annual_upsample_cases(self, targ, conv, meth, end='12/31/1991'): - for month in MONTHS: - ts = _simple_pts('1/1/1990', end, freq='A-%s' % month) - - result = getattr(ts.resample(targ, convention=conv), meth)() - expected = result.to_timestamp(targ, how=conv) - expected = expected.asfreq(targ, meth).to_period() - assert_series_equal(result, expected) - - def test_basic_downsample(self): - ts = _simple_pts('1/1/1990', '6/30/1995', freq='M') - result = ts.resample('a-dec').mean() - - expected = ts.groupby(ts.index.year).mean() - expected.index = period_range('1/1/1990', '6/30/1995', freq='a-dec') - assert_series_equal(result, expected) - - # this is ok - assert_series_equal(ts.resample('a-dec').mean(), result) - assert_series_equal(ts.resample('a').mean(), result) - - def test_not_subperiod(self): - # These are incompatible period rules for resampling - ts = _simple_pts('1/1/1990', '6/30/1995', freq='w-wed') - pytest.raises(ValueError, lambda: ts.resample('a-dec').mean()) - pytest.raises(ValueError, lambda: ts.resample('q-mar').mean()) - pytest.raises(ValueError, lambda: ts.resample('M').mean()) - pytest.raises(ValueError, lambda: ts.resample('w-thu').mean()) - - @pytest.mark.parametrize('freq', ['D', '2D']) - def test_basic_upsample(self, freq): - ts = _simple_pts('1/1/1990', '6/30/1995', freq='M') - result = ts.resample('a-dec').mean() - - resampled = result.resample(freq, convention='end').ffill() - expected = result.to_timestamp(freq, how='end') - expected = expected.asfreq(freq, 'ffill').to_period(freq) - assert_series_equal(resampled, expected) - - def test_upsample_with_limit(self): - rng = period_range('1/1/2000', periods=5, freq='A') - ts = Series(np.random.randn(len(rng)), rng) - - result = ts.resample('M', convention='end').ffill(limit=2) - expected = ts.asfreq('M').reindex(result.index, method='ffill', - limit=2) - assert_series_equal(result, expected) - - def test_annual_upsample(self): - ts = _simple_pts('1/1/1990', '12/31/1995', freq='A-DEC') - df = DataFrame({'a': ts}) - rdf = df.resample('D').ffill() - exp = df['a'].resample('D').ffill() - assert_series_equal(rdf['a'], exp) - - rng = period_range('2000', '2003', freq='A-DEC') - ts = Series([1, 2, 3, 4], index=rng) - - result = ts.resample('M').ffill() - ex_index = period_range('2000-01', '2003-12', freq='M') - - expected = ts.asfreq('M', how='start').reindex(ex_index, - method='ffill') - assert_series_equal(result, expected) - - @pytest.mark.parametrize('month', MONTHS) - @pytest.mark.parametrize('target', ['D', 'B', 'M']) - @pytest.mark.parametrize('convention', ['start', 'end']) - def test_quarterly_upsample(self, month, target, convention): - freq = 'Q-{month}'.format(month=month) - ts = _simple_pts('1/1/1990', '12/31/1995', freq=freq) - result = ts.resample(target, convention=convention).ffill() - expected = result.to_timestamp(target, how=convention) - expected = expected.asfreq(target, 'ffill').to_period() - assert_series_equal(result, expected) - - @pytest.mark.parametrize('target', ['D', 'B']) - @pytest.mark.parametrize('convention', ['start', 'end']) - def test_monthly_upsample(self, target, convention): - ts = _simple_pts('1/1/1990', '12/31/1995', freq='M') - result = ts.resample(target, convention=convention).ffill() - expected = result.to_timestamp(target, how=convention) - expected = expected.asfreq(target, 'ffill').to_period() - assert_series_equal(result, expected) - - def test_resample_basic(self): - # GH3609 - s = Series(range(100), index=date_range( - '20130101', freq='s', periods=100, name='idx'), dtype='float') - s[10:30] = np.nan - index = PeriodIndex([ - Period('2013-01-01 00:00', 'T'), - Period('2013-01-01 00:01', 'T')], name='idx') - expected = Series([34.5, 79.5], index=index) - result = s.to_period().resample('T', kind='period').mean() - assert_series_equal(result, expected) - result2 = s.resample('T', kind='period').mean() - assert_series_equal(result2, expected) - - @pytest.mark.parametrize('freq,expected_vals', [('M', [31, 29, 31, 9]), - ('2M', [31 + 29, 31 + 9])]) - def test_resample_count(self, freq, expected_vals): - # GH12774 - series = Series(1, index=pd.period_range(start='2000', periods=100)) - result = series.resample(freq).count() - expected_index = pd.period_range(start='2000', freq=freq, - periods=len(expected_vals)) - expected = Series(expected_vals, index=expected_index) - assert_series_equal(result, expected) - - def test_resample_same_freq(self): - - # GH12770 - series = Series(range(3), index=pd.period_range( - start='2000', periods=3, freq='M')) - expected = series - - for method in resample_methods: - result = getattr(series.resample('M'), method)() - assert_series_equal(result, expected) - - def test_resample_incompat_freq(self): - - with pytest.raises(IncompatibleFrequency): - Series(range(3), index=pd.period_range( - start='2000', periods=3, freq='M')).resample('W').mean() - - def test_with_local_timezone_pytz(self): - # see gh-5430 - local_timezone = pytz.timezone('America/Los_Angeles') - - start = datetime(year=2013, month=11, day=1, hour=0, minute=0, - tzinfo=pytz.utc) - # 1 day later - end = datetime(year=2013, month=11, day=2, hour=0, minute=0, - tzinfo=pytz.utc) - - index = pd.date_range(start, end, freq='H') - - series = Series(1, index=index) - series = series.tz_convert(local_timezone) - result = series.resample('D', kind='period').mean() - - # Create the expected series - # Index is moved back a day with the timezone conversion from UTC to - # Pacific - expected_index = (pd.period_range(start=start, end=end, freq='D') - - offsets.Day()) - expected = Series(1, index=expected_index) - assert_series_equal(result, expected) - - def test_resample_with_pytz(self): - # GH 13238 - s = Series(2, index=pd.date_range('2017-01-01', periods=48, freq="H", - tz="US/Eastern")) - result = s.resample("D").mean() - expected = Series(2, index=pd.DatetimeIndex(['2017-01-01', - '2017-01-02'], - tz="US/Eastern")) - assert_series_equal(result, expected) - # Especially assert that the timezone is LMT for pytz - assert result.index.tz == pytz.timezone('US/Eastern') - - def test_with_local_timezone_dateutil(self): - # see gh-5430 - local_timezone = 'dateutil/America/Los_Angeles' - - start = datetime(year=2013, month=11, day=1, hour=0, minute=0, - tzinfo=dateutil.tz.tzutc()) - # 1 day later - end = datetime(year=2013, month=11, day=2, hour=0, minute=0, - tzinfo=dateutil.tz.tzutc()) - - index = pd.date_range(start, end, freq='H', name='idx') - - series = Series(1, index=index) - series = series.tz_convert(local_timezone) - result = series.resample('D', kind='period').mean() - - # Create the expected series - # Index is moved back a day with the timezone conversion from UTC to - # Pacific - expected_index = (pd.period_range(start=start, end=end, freq='D', - name='idx') - offsets.Day()) - expected = Series(1, index=expected_index) - assert_series_equal(result, expected) - - def test_resample_nonexistent_time_bin_edge(self): - # GH 19375 - index = date_range('2017-03-12', '2017-03-12 1:45:00', freq='15T') - s = Series(np.zeros(len(index)), index=index) - expected = s.tz_localize('US/Pacific') - result = expected.resample('900S').mean() - tm.assert_series_equal(result, expected) - - # GH 23742 - index = date_range(start='2017-10-10', end='2017-10-20', freq='1H') - index = index.tz_localize('UTC').tz_convert('America/Sao_Paulo') - df = DataFrame(data=list(range(len(index))), index=index) - result = df.groupby(pd.Grouper(freq='1D')) - expected = date_range(start='2017-10-09', end='2017-10-20', freq='D', - tz="America/Sao_Paulo") - tm.assert_index_equal(result.count().index, expected) - - def test_resample_ambiguous_time_bin_edge(self): - # GH 10117 - idx = pd.date_range("2014-10-25 22:00:00", "2014-10-26 00:30:00", - freq="30T", tz="Europe/London") - expected = Series(np.zeros(len(idx)), index=idx) - result = expected.resample('30T').mean() - tm.assert_series_equal(result, expected) - - def test_fill_method_and_how_upsample(self): - # GH2073 - s = Series(np.arange(9, dtype='int64'), - index=date_range('2010-01-01', periods=9, freq='Q')) - last = s.resample('M').ffill() - both = s.resample('M').ffill().resample('M').last().astype('int64') - assert_series_equal(last, both) - - @pytest.mark.parametrize('day', DAYS) - @pytest.mark.parametrize('target', ['D', 'B']) - @pytest.mark.parametrize('convention', ['start', 'end']) - def test_weekly_upsample(self, day, target, convention): - freq = 'W-{day}'.format(day=day) - ts = _simple_pts('1/1/1990', '12/31/1995', freq=freq) - result = ts.resample(target, convention=convention).ffill() - expected = result.to_timestamp(target, how=convention) - expected = expected.asfreq(target, 'ffill').to_period() - assert_series_equal(result, expected) - - def test_resample_to_timestamps(self): - ts = _simple_pts('1/1/1990', '12/31/1995', freq='M') - - result = ts.resample('A-DEC', kind='timestamp').mean() - expected = ts.to_timestamp(how='start').resample('A-DEC').mean() - assert_series_equal(result, expected) - - def test_resample_to_quarterly(self): - for month in MONTHS: - ts = _simple_pts('1990', '1992', freq='A-%s' % month) - quar_ts = ts.resample('Q-%s' % month).ffill() - - stamps = ts.to_timestamp('D', how='start') - qdates = period_range(ts.index[0].asfreq('D', 'start'), - ts.index[-1].asfreq('D', 'end'), - freq='Q-%s' % month) - - expected = stamps.reindex(qdates.to_timestamp('D', 's'), - method='ffill') - expected.index = qdates - - assert_series_equal(quar_ts, expected) - - # conforms, but different month - ts = _simple_pts('1990', '1992', freq='A-JUN') - - for how in ['start', 'end']: - result = ts.resample('Q-MAR', convention=how).ffill() - expected = ts.asfreq('Q-MAR', how=how) - expected = expected.reindex(result.index, method='ffill') - - # .to_timestamp('D') - # expected = expected.resample('Q-MAR').ffill() - - assert_series_equal(result, expected) - - def test_resample_fill_missing(self): - rng = PeriodIndex([2000, 2005, 2007, 2009], freq='A') - - s = Series(np.random.randn(4), index=rng) - - stamps = s.to_timestamp() - filled = s.resample('A').ffill() - expected = stamps.resample('A').ffill().to_period('A') - assert_series_equal(filled, expected) - - def test_cant_fill_missing_dups(self): - rng = PeriodIndex([2000, 2005, 2005, 2007, 2007], freq='A') - s = Series(np.random.randn(5), index=rng) - pytest.raises(Exception, lambda: s.resample('A').ffill()) - - @pytest.mark.parametrize('freq', ['5min']) - @pytest.mark.parametrize('kind', ['period', None, 'timestamp']) - def test_resample_5minute(self, freq, kind): - rng = period_range('1/1/2000', '1/5/2000', freq='T') - ts = Series(np.random.randn(len(rng)), index=rng) - expected = ts.to_timestamp().resample(freq).mean() - if kind != 'timestamp': - expected = expected.to_period(freq) - result = ts.resample(freq, kind=kind).mean() - assert_series_equal(result, expected) - - def test_upsample_daily_business_daily(self): - ts = _simple_pts('1/1/2000', '2/1/2000', freq='B') - - result = ts.resample('D').asfreq() - expected = ts.asfreq('D').reindex(period_range('1/3/2000', '2/1/2000')) - assert_series_equal(result, expected) - - ts = _simple_pts('1/1/2000', '2/1/2000') - result = ts.resample('H', convention='s').asfreq() - exp_rng = period_range('1/1/2000', '2/1/2000 23:00', freq='H') - expected = ts.asfreq('H', how='s').reindex(exp_rng) - assert_series_equal(result, expected) - - def test_resample_irregular_sparse(self): - dr = date_range(start='1/1/2012', freq='5min', periods=1000) - s = Series(np.array(100), index=dr) - # subset the data. - subset = s[:'2012-01-04 06:55'] - - result = subset.resample('10min').apply(len) - expected = s.resample('10min').apply(len).loc[result.index] - assert_series_equal(result, expected) - - def test_resample_weekly_all_na(self): - rng = date_range('1/1/2000', periods=10, freq='W-WED') - ts = Series(np.random.randn(len(rng)), index=rng) - - result = ts.resample('W-THU').asfreq() - - assert result.isna().all() - - result = ts.resample('W-THU').asfreq().ffill()[:-1] - expected = ts.asfreq('W-THU').ffill() - assert_series_equal(result, expected) - - def test_resample_tz_localized(self): - dr = date_range(start='2012-4-13', end='2012-5-1') - ts = Series(lrange(len(dr)), dr) - - ts_utc = ts.tz_localize('UTC') - ts_local = ts_utc.tz_convert('America/Los_Angeles') - - result = ts_local.resample('W').mean() - - ts_local_naive = ts_local.copy() - ts_local_naive.index = [x.replace(tzinfo=None) - for x in ts_local_naive.index.to_pydatetime()] - - exp = ts_local_naive.resample( - 'W').mean().tz_localize('America/Los_Angeles') - - assert_series_equal(result, exp) - - # it works - result = ts_local.resample('D').mean() - - # #2245 - idx = date_range('2001-09-20 15:59', '2001-09-20 16:00', freq='T', - tz='Australia/Sydney') - s = Series([1, 2], index=idx) - - result = s.resample('D', closed='right', label='right').mean() - ex_index = date_range('2001-09-21', periods=1, freq='D', - tz='Australia/Sydney') - expected = Series([1.5], index=ex_index) - - assert_series_equal(result, expected) - - # for good measure - result = s.resample('D', kind='period').mean() - ex_index = period_range('2001-09-20', periods=1, freq='D') - expected = Series([1.5], index=ex_index) - assert_series_equal(result, expected) - - # GH 6397 - # comparing an offset that doesn't propagate tz's - rng = date_range('1/1/2011', periods=20000, freq='H') - rng = rng.tz_localize('EST') - ts = DataFrame(index=rng) - ts['first'] = np.random.randn(len(rng)) - ts['second'] = np.cumsum(np.random.randn(len(rng))) - expected = DataFrame( - { - 'first': ts.resample('A').sum()['first'], - 'second': ts.resample('A').mean()['second']}, - columns=['first', 'second']) - result = ts.resample( - 'A').agg({'first': np.sum, - 'second': np.mean}).reindex(columns=['first', 'second']) - assert_frame_equal(result, expected) - - def test_closed_left_corner(self): - # #1465 - s = Series(np.random.randn(21), - index=date_range(start='1/1/2012 9:30', - freq='1min', periods=21)) - s[0] = np.nan - - result = s.resample('10min', closed='left', label='right').mean() - exp = s[1:].resample('10min', closed='left', label='right').mean() - assert_series_equal(result, exp) - - result = s.resample('10min', closed='left', label='left').mean() - exp = s[1:].resample('10min', closed='left', label='left').mean() - - ex_index = date_range(start='1/1/2012 9:30', freq='10min', periods=3) - - tm.assert_index_equal(result.index, ex_index) - assert_series_equal(result, exp) - - def test_quarterly_resampling(self): - rng = period_range('2000Q1', periods=10, freq='Q-DEC') - ts = Series(np.arange(10), index=rng) - - result = ts.resample('A').mean() - exp = ts.to_timestamp().resample('A').mean().to_period() - assert_series_equal(result, exp) - - def test_resample_weekly_bug_1726(self): - # 8/6/12 is a Monday - ind = DatetimeIndex(start="8/6/2012", end="8/26/2012", freq="D") - n = len(ind) - data = [[x] * 5 for x in range(n)] - df = DataFrame(data, columns=['open', 'high', 'low', 'close', 'vol'], - index=ind) - - # it works! - df.resample('W-MON', closed='left', label='left').first() - - def test_resample_with_dst_time_change(self): - # GH 15549 - index = pd.DatetimeIndex([1457537600000000000, 1458059600000000000], - tz='UTC').tz_convert('America/Chicago') - df = pd.DataFrame([1, 2], index=index) - result = df.resample('12h', closed='right', - label='right').last().ffill() - - expected_index_values = ['2016-03-09 12:00:00-06:00', - '2016-03-10 00:00:00-06:00', - '2016-03-10 12:00:00-06:00', - '2016-03-11 00:00:00-06:00', - '2016-03-11 12:00:00-06:00', - '2016-03-12 00:00:00-06:00', - '2016-03-12 12:00:00-06:00', - '2016-03-13 00:00:00-06:00', - '2016-03-13 13:00:00-05:00', - '2016-03-14 01:00:00-05:00', - '2016-03-14 13:00:00-05:00', - '2016-03-15 01:00:00-05:00', - '2016-03-15 13:00:00-05:00'] - index = pd.to_datetime(expected_index_values, utc=True).tz_convert( - 'America/Chicago') - expected = pd.DataFrame([1.0, 1.0, 1.0, 1.0, 1.0, - 1.0, 1.0, 1.0, 1.0, 1.0, - 1.0, 1.0, 2.0], index=index) - assert_frame_equal(result, expected) - - def test_resample_bms_2752(self): - # GH2753 - foo = Series(index=pd.bdate_range('20000101', '20000201')) - res1 = foo.resample("BMS").mean() - res2 = foo.resample("BMS").mean().resample("B").mean() - assert res1.index[0] == Timestamp('20000103') - assert res1.index[0] == res2.index[0] - - # def test_monthly_convention_span(self): - # rng = period_range('2000-01', periods=3, freq='M') - # ts = Series(np.arange(3), index=rng) - - # # hacky way to get same thing - # exp_index = period_range('2000-01-01', '2000-03-31', freq='D') - # expected = ts.asfreq('D', how='end').reindex(exp_index) - # expected = expected.fillna(method='bfill') - - # result = ts.resample('D', convention='span').mean() - - # assert_series_equal(result, expected) - - def test_default_right_closed_label(self): - end_freq = ['D', 'Q', 'M', 'D'] - end_types = ['M', 'A', 'Q', 'W'] - - for from_freq, to_freq in zip(end_freq, end_types): - idx = DatetimeIndex(start='8/15/2012', periods=100, freq=from_freq) - df = DataFrame(np.random.randn(len(idx), 2), idx) - - resampled = df.resample(to_freq).mean() - assert_frame_equal(resampled, df.resample(to_freq, closed='right', - label='right').mean()) - - def test_default_left_closed_label(self): - others = ['MS', 'AS', 'QS', 'D', 'H'] - others_freq = ['D', 'Q', 'M', 'H', 'T'] - - for from_freq, to_freq in zip(others_freq, others): - idx = DatetimeIndex(start='8/15/2012', periods=100, freq=from_freq) - df = DataFrame(np.random.randn(len(idx), 2), idx) - - resampled = df.resample(to_freq).mean() - assert_frame_equal(resampled, df.resample(to_freq, closed='left', - label='left').mean()) - - def test_all_values_single_bin(self): - # 2070 - index = period_range(start="2012-01-01", end="2012-12-31", freq="M") - s = Series(np.random.randn(len(index)), index=index) - - result = s.resample("A").mean() - tm.assert_almost_equal(result[0], s.mean()) - - def test_evenly_divisible_with_no_extra_bins(self): - # 4076 - # when the frequency is evenly divisible, sometimes extra bins - - df = DataFrame(np.random.randn(9, 3), - index=date_range('2000-1-1', periods=9)) - result = df.resample('5D').mean() - expected = pd.concat( - [df.iloc[0:5].mean(), df.iloc[5:].mean()], axis=1).T - expected.index = [Timestamp('2000-1-1'), Timestamp('2000-1-6')] - assert_frame_equal(result, expected) - - index = date_range(start='2001-5-4', periods=28) - df = DataFrame( - [{'REST_KEY': 1, 'DLY_TRN_QT': 80, 'DLY_SLS_AMT': 90, - 'COOP_DLY_TRN_QT': 30, 'COOP_DLY_SLS_AMT': 20}] * 28 + - [{'REST_KEY': 2, 'DLY_TRN_QT': 70, 'DLY_SLS_AMT': 10, - 'COOP_DLY_TRN_QT': 50, 'COOP_DLY_SLS_AMT': 20}] * 28, - index=index.append(index)).sort_index() - - index = date_range('2001-5-4', periods=4, freq='7D') - expected = DataFrame( - [{'REST_KEY': 14, 'DLY_TRN_QT': 14, 'DLY_SLS_AMT': 14, - 'COOP_DLY_TRN_QT': 14, 'COOP_DLY_SLS_AMT': 14}] * 4, - index=index) - result = df.resample('7D').count() - assert_frame_equal(result, expected) - - expected = DataFrame( - [{'REST_KEY': 21, 'DLY_TRN_QT': 1050, 'DLY_SLS_AMT': 700, - 'COOP_DLY_TRN_QT': 560, 'COOP_DLY_SLS_AMT': 280}] * 4, - index=index) - result = df.resample('7D').sum() - assert_frame_equal(result, expected) - - @pytest.mark.parametrize('kind', ['period', None, 'timestamp']) - @pytest.mark.parametrize('agg_arg', ['mean', {'value': 'mean'}, ['mean']]) - def test_loffset_returns_datetimeindex(self, frame, kind, agg_arg): - # make sure passing loffset returns DatetimeIndex in all cases - # basic method taken from Base.test_resample_loffset_arg_type() - df = frame - expected_means = [df.values[i:i + 2].mean() - for i in range(0, len(df.values), 2)] - expected_index = self.create_index(df.index[0], - periods=len(df.index) / 2, - freq='2D') - - # loffset coerces PeriodIndex to DateTimeIndex - expected_index = expected_index.to_timestamp() - expected_index += timedelta(hours=2) - expected = DataFrame({'value': expected_means}, index=expected_index) - - result_agg = df.resample('2D', loffset='2H', kind=kind).agg(agg_arg) - with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): - result_how = df.resample('2D', how=agg_arg, loffset='2H', - kind=kind) - if isinstance(agg_arg, list): - expected.columns = pd.MultiIndex.from_tuples([('value', 'mean')]) - assert_frame_equal(result_agg, expected) - assert_frame_equal(result_how, expected) - - @pytest.mark.parametrize('freq, period_mult', [('H', 24), ('12H', 2)]) - @pytest.mark.parametrize('kind', [None, 'period']) - def test_upsampling_ohlc(self, freq, period_mult, kind): - # GH 13083 - pi = PeriodIndex(start='2000', freq='D', periods=10) - s = Series(range(len(pi)), index=pi) - expected = s.to_timestamp().resample(freq).ohlc().to_period(freq) - - # timestamp-based resampling doesn't include all sub-periods - # of the last original period, so extend accordingly: - new_index = PeriodIndex(start='2000', freq=freq, - periods=period_mult * len(pi)) - expected = expected.reindex(new_index) - result = s.resample(freq, kind=kind).ohlc() - assert_frame_equal(result, expected) - - @pytest.mark.parametrize('periods, values', - [([pd.NaT, '1970-01-01 00:00:00', pd.NaT, - '1970-01-01 00:00:02', '1970-01-01 00:00:03'], - [2, 3, 5, 7, 11]), - ([pd.NaT, pd.NaT, '1970-01-01 00:00:00', pd.NaT, - pd.NaT, pd.NaT, '1970-01-01 00:00:02', - '1970-01-01 00:00:03', pd.NaT, pd.NaT], - [1, 2, 3, 5, 6, 8, 7, 11, 12, 13])]) - @pytest.mark.parametrize('freq, expected_values', - [('1s', [3, np.NaN, 7, 11]), - ('2s', [3, int((7 + 11) / 2)]), - ('3s', [int((3 + 7) / 2), 11])]) - def test_resample_with_nat(self, periods, values, freq, expected_values): - # GH 13224 - index = PeriodIndex(periods, freq='S') - frame = DataFrame(values, index=index) - - expected_index = period_range('1970-01-01 00:00:00', - periods=len(expected_values), freq=freq) - expected = DataFrame(expected_values, index=expected_index) - result = frame.resample(freq).mean() - assert_frame_equal(result, expected) - - def test_resample_with_only_nat(self): - # GH 13224 - pi = PeriodIndex([pd.NaT] * 3, freq='S') - frame = DataFrame([2, 3, 5], index=pi) - expected_index = PeriodIndex(data=[], freq=pi.freq) - expected = DataFrame([], index=expected_index) - result = frame.resample('1s').mean() - assert_frame_equal(result, expected) - - -class TestTimedeltaIndex(Base): - _index_factory = lambda x: timedelta_range - - @pytest.fixture - def _index_start(self): - return '1 day' - - @pytest.fixture - def _index_end(self): - return '10 day' - - @pytest.fixture - def _series_name(self): - return 'tdi' - - def create_series(self): - i = timedelta_range('1 day', - '10 day', freq='D') - - return Series(np.arange(len(i)), index=i, name='tdi') - - def test_asfreq_bug(self): - import datetime as dt - df = DataFrame(data=[1, 3], - index=[dt.timedelta(), dt.timedelta(minutes=3)]) - result = df.resample('1T').asfreq() - expected = DataFrame(data=[1, np.nan, np.nan, 3], - index=timedelta_range('0 day', - periods=4, - freq='1T')) - assert_frame_equal(result, expected) - - def test_resample_with_nat(self): - # GH 13223 - index = pd.to_timedelta(['0s', pd.NaT, '2s']) - result = DataFrame({'value': [2, 3, 5]}, index).resample('1s').mean() - expected = DataFrame({'value': [2.5, np.nan, 5.0]}, - index=timedelta_range('0 day', - periods=3, - freq='1S')) - assert_frame_equal(result, expected) - - def test_resample_as_freq_with_subperiod(self): - # GH 13022 - index = timedelta_range('00:00:00', '00:10:00', freq='5T') - df = DataFrame(data={'value': [1, 5, 10]}, index=index) - result = df.resample('2T').asfreq() - expected_data = {'value': [1, np.nan, np.nan, np.nan, np.nan, 10]} - expected = DataFrame(data=expected_data, - index=timedelta_range('00:00:00', - '00:10:00', freq='2T')) - tm.assert_frame_equal(result, expected) - - -class TestResamplerGrouper(object): - - def setup_method(self, method): - self.frame = DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8, - 'B': np.arange(40)}, - index=date_range('1/1/2000', - freq='s', - periods=40)) - - def test_tab_complete_ipython6_warning(self, ip): - from IPython.core.completer import provisionalcompleter - code = dedent("""\ - import pandas.util.testing as tm - s = tm.makeTimeSeries() - rs = s.resample("D") - """) - ip.run_code(code) - - with tm.assert_produces_warning(None): - with provisionalcompleter('ignore'): - list(ip.Completer.completions('rs.', 1)) - - def test_deferred_with_groupby(self): - - # GH 12486 - # support deferred resample ops with groupby - data = [['2010-01-01', 'A', 2], ['2010-01-02', 'A', 3], - ['2010-01-05', 'A', 8], ['2010-01-10', 'A', 7], - ['2010-01-13', 'A', 3], ['2010-01-01', 'B', 5], - ['2010-01-03', 'B', 2], ['2010-01-04', 'B', 1], - ['2010-01-11', 'B', 7], ['2010-01-14', 'B', 3]] - - df = DataFrame(data, columns=['date', 'id', 'score']) - df.date = pd.to_datetime(df.date) - f = lambda x: x.set_index('date').resample('D').asfreq() - expected = df.groupby('id').apply(f) - result = df.set_index('date').groupby('id').resample('D').asfreq() - assert_frame_equal(result, expected) - - df = DataFrame({'date': pd.date_range(start='2016-01-01', - periods=4, - freq='W'), - 'group': [1, 1, 2, 2], - 'val': [5, 6, 7, 8]}).set_index('date') - - f = lambda x: x.resample('1D').ffill() - expected = df.groupby('group').apply(f) - result = df.groupby('group').resample('1D').ffill() - assert_frame_equal(result, expected) - - def test_getitem(self): - g = self.frame.groupby('A') - - expected = g.B.apply(lambda x: x.resample('2s').mean()) - - result = g.resample('2s').B.mean() - assert_series_equal(result, expected) - - result = g.B.resample('2s').mean() - assert_series_equal(result, expected) - - result = g.resample('2s').mean().B - assert_series_equal(result, expected) - - def test_getitem_multiple(self): - - # GH 13174 - # multiple calls after selection causing an issue with aliasing - data = [{'id': 1, 'buyer': 'A'}, {'id': 2, 'buyer': 'B'}] - df = DataFrame(data, index=pd.date_range('2016-01-01', periods=2)) - r = df.groupby('id').resample('1D') - result = r['buyer'].count() - expected = Series([1, 1], - index=pd.MultiIndex.from_tuples( - [(1, Timestamp('2016-01-01')), - (2, Timestamp('2016-01-02'))], - names=['id', None]), - name='buyer') - assert_series_equal(result, expected) - - result = r['buyer'].count() - assert_series_equal(result, expected) - - def test_groupby_resample_on_api_with_getitem(self): - # GH 17813 - df = pd.DataFrame({'id': list('aabbb'), - 'date': pd.date_range('1-1-2016', periods=5), - 'data': 1}) - exp = df.set_index('date').groupby('id').resample('2D')['data'].sum() - result = df.groupby('id').resample('2D', on='date')['data'].sum() - assert_series_equal(result, exp) - - def test_nearest(self): - - # GH 17496 - # Resample nearest - index = pd.date_range('1/1/2000', periods=3, freq='T') - result = Series(range(3), index=index).resample('20s').nearest() - - expected = Series( - [0, 0, 1, 1, 1, 2, 2], - index=pd.DatetimeIndex( - ['2000-01-01 00:00:00', '2000-01-01 00:00:20', - '2000-01-01 00:00:40', '2000-01-01 00:01:00', - '2000-01-01 00:01:20', '2000-01-01 00:01:40', - '2000-01-01 00:02:00'], - dtype='datetime64[ns]', - freq='20S')) - assert_series_equal(result, expected) - - def test_methods(self): - g = self.frame.groupby('A') - r = g.resample('2s') - - for f in ['first', 'last', 'median', 'sem', 'sum', 'mean', - 'min', 'max']: - result = getattr(r, f)() - expected = g.apply(lambda x: getattr(x.resample('2s'), f)()) - assert_frame_equal(result, expected) - - for f in ['size']: - result = getattr(r, f)() - expected = g.apply(lambda x: getattr(x.resample('2s'), f)()) - assert_series_equal(result, expected) - - for f in ['count']: - result = getattr(r, f)() - expected = g.apply(lambda x: getattr(x.resample('2s'), f)()) - assert_frame_equal(result, expected) - - # series only - for f in ['nunique']: - result = getattr(r.B, f)() - expected = g.B.apply(lambda x: getattr(x.resample('2s'), f)()) - assert_series_equal(result, expected) - - for f in ['nearest', 'backfill', 'ffill', 'asfreq']: - result = getattr(r, f)() - expected = g.apply(lambda x: getattr(x.resample('2s'), f)()) - assert_frame_equal(result, expected) - - result = r.ohlc() - expected = g.apply(lambda x: x.resample('2s').ohlc()) - assert_frame_equal(result, expected) - - for f in ['std', 'var']: - result = getattr(r, f)(ddof=1) - expected = g.apply(lambda x: getattr(x.resample('2s'), f)(ddof=1)) - assert_frame_equal(result, expected) - - def test_apply(self): - - g = self.frame.groupby('A') - r = g.resample('2s') - - # reduction - expected = g.resample('2s').sum() - - def f(x): - return x.resample('2s').sum() - - result = r.apply(f) - assert_frame_equal(result, expected) - - def f(x): - return x.resample('2s').apply(lambda y: y.sum()) - - result = g.apply(f) - assert_frame_equal(result, expected) - - def test_apply_with_mutated_index(self): - # GH 15169 - index = pd.date_range('1-1-2015', '12-31-15', freq='D') - df = DataFrame(data={'col1': np.random.rand(len(index))}, index=index) - - def f(x): - s = Series([1, 2], index=['a', 'b']) - return s - - expected = df.groupby(pd.Grouper(freq='M')).apply(f) - - result = df.resample('M').apply(f) - assert_frame_equal(result, expected) - - # A case for series - expected = df['col1'].groupby(pd.Grouper(freq='M')).apply(f) - result = df['col1'].resample('M').apply(f) - assert_series_equal(result, expected) - - def test_resample_groupby_with_label(self): - # GH 13235 - index = date_range('2000-01-01', freq='2D', periods=5) - df = DataFrame(index=index, - data={'col0': [0, 0, 1, 1, 2], 'col1': [1, 1, 1, 1, 1]} - ) - result = df.groupby('col0').resample('1W', label='left').sum() - - mi = [np.array([0, 0, 1, 2]), - pd.to_datetime(np.array(['1999-12-26', '2000-01-02', - '2000-01-02', '2000-01-02']) - ) - ] - mindex = pd.MultiIndex.from_arrays(mi, names=['col0', None]) - expected = DataFrame(data={'col0': [0, 0, 2, 2], 'col1': [1, 1, 2, 1]}, - index=mindex - ) - - assert_frame_equal(result, expected) - - def test_consistency_with_window(self): - - # consistent return values with window - df = self.frame - expected = pd.Int64Index([1, 2, 3], name='A') - result = df.groupby('A').resample('2s').mean() - assert result.index.nlevels == 2 - tm.assert_index_equal(result.index.levels[0], expected) - - result = df.groupby('A').rolling(20).mean() - assert result.index.nlevels == 2 - tm.assert_index_equal(result.index.levels[0], expected) - - def test_median_duplicate_columns(self): - # GH 14233 - - df = DataFrame(np.random.randn(20, 3), - columns=list('aaa'), - index=pd.date_range('2012-01-01', periods=20, freq='s')) - df2 = df.copy() - df2.columns = ['a', 'b', 'c'] - expected = df2.resample('5s').median() - result = df.resample('5s').median() - expected.columns = result.columns - assert_frame_equal(result, expected) - - -class TestTimeGrouper(object): - - def setup_method(self, method): - self.ts = Series(np.random.randn(1000), - index=date_range('1/1/2000', periods=1000)) - - def test_apply(self): - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - grouper = pd.TimeGrouper(freq='A', label='right', closed='right') - - grouped = self.ts.groupby(grouper) - - f = lambda x: x.sort_values()[-3:] - - applied = grouped.apply(f) - expected = self.ts.groupby(lambda x: x.year).apply(f) - - applied.index = applied.index.droplevel(0) - expected.index = expected.index.droplevel(0) - assert_series_equal(applied, expected) - - def test_count(self): - self.ts[::3] = np.nan - - expected = self.ts.groupby(lambda x: x.year).count() - - with tm.assert_produces_warning(FutureWarning, - check_stacklevel=False): - grouper = pd.TimeGrouper(freq='A', label='right', closed='right') - result = self.ts.groupby(grouper).count() - expected.index = result.index - assert_series_equal(result, expected) - - result = self.ts.resample('A').count() - expected.index = result.index - assert_series_equal(result, expected) - - def test_numpy_reduction(self): - result = self.ts.resample('A', closed='right').prod() - - expected = self.ts.groupby(lambda x: x.year).agg(np.prod) - expected.index = result.index - - assert_series_equal(result, expected) - - def test_apply_iteration(self): - # #2300 - N = 1000 - ind = pd.date_range(start="2000-01-01", freq="D", periods=N) - df = DataFrame({'open': 1, 'close': 2}, index=ind) - tg = TimeGrouper('M') - - _, grouper, _ = tg._get_grouper(df) - - # Errors - grouped = df.groupby(grouper, group_keys=False) - f = lambda df: df['close'] / df['open'] - - # it works! - result = grouped.apply(f) - tm.assert_index_equal(result.index, df.index) - - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_panel_aggregation(self): - ind = pd.date_range('1/1/2000', periods=100) - data = np.random.randn(2, len(ind), 4) - - wp = Panel(data, items=['Item1', 'Item2'], major_axis=ind, - minor_axis=['A', 'B', 'C', 'D']) - - tg = TimeGrouper('M', axis=1) - _, grouper, _ = tg._get_grouper(wp) - bingrouped = wp.groupby(grouper) - binagg = bingrouped.mean() - - def f(x): - assert (isinstance(x, Panel)) - return x.mean(1) - - result = bingrouped.agg(f) - tm.assert_panel_equal(result, binagg) - - def test_fails_on_no_datetime_index(self): - index_names = ('Int64Index', 'Index', 'Float64Index', 'MultiIndex') - index_funcs = (tm.makeIntIndex, - tm.makeUnicodeIndex, tm.makeFloatIndex, - lambda m: tm.makeCustomIndex(m, 2)) - n = 2 - for name, func in zip(index_names, index_funcs): - index = func(n) - df = DataFrame({'a': np.random.randn(n)}, index=index) - - msg = ("Only valid with DatetimeIndex, TimedeltaIndex " - "or PeriodIndex, but got an instance of %r" % name) - with pytest.raises(TypeError, match=msg): - df.groupby(TimeGrouper('D')) - - def test_aaa_group_order(self): - # GH 12840 - # check TimeGrouper perform stable sorts - n = 20 - data = np.random.randn(n, 4) - df = DataFrame(data, columns=['A', 'B', 'C', 'D']) - df['key'] = [datetime(2013, 1, 1), datetime(2013, 1, 2), - datetime(2013, 1, 3), datetime(2013, 1, 4), - datetime(2013, 1, 5)] * 4 - grouped = df.groupby(TimeGrouper(key='key', freq='D')) - - tm.assert_frame_equal(grouped.get_group(datetime(2013, 1, 1)), - df[::5]) - tm.assert_frame_equal(grouped.get_group(datetime(2013, 1, 2)), - df[1::5]) - tm.assert_frame_equal(grouped.get_group(datetime(2013, 1, 3)), - df[2::5]) - tm.assert_frame_equal(grouped.get_group(datetime(2013, 1, 4)), - df[3::5]) - tm.assert_frame_equal(grouped.get_group(datetime(2013, 1, 5)), - df[4::5]) - - def test_aggregate_normal(self): - # check TimeGrouper's aggregation is identical as normal groupby - - n = 20 - data = np.random.randn(n, 4) - normal_df = DataFrame(data, columns=['A', 'B', 'C', 'D']) - normal_df['key'] = [1, 2, 3, 4, 5] * 4 - - dt_df = DataFrame(data, columns=['A', 'B', 'C', 'D']) - dt_df['key'] = [datetime(2013, 1, 1), datetime(2013, 1, 2), - datetime(2013, 1, 3), datetime(2013, 1, 4), - datetime(2013, 1, 5)] * 4 - - normal_grouped = normal_df.groupby('key') - dt_grouped = dt_df.groupby(TimeGrouper(key='key', freq='D')) - - for func in ['min', 'max', 'prod', 'var', 'std', 'mean']: - expected = getattr(normal_grouped, func)() - dt_result = getattr(dt_grouped, func)() - expected.index = date_range(start='2013-01-01', freq='D', - periods=5, name='key') - assert_frame_equal(expected, dt_result) - - for func in ['count', 'sum']: - expected = getattr(normal_grouped, func)() - expected.index = date_range(start='2013-01-01', freq='D', - periods=5, name='key') - dt_result = getattr(dt_grouped, func)() - assert_frame_equal(expected, dt_result) - - # GH 7453 - for func in ['size']: - expected = getattr(normal_grouped, func)() - expected.index = date_range(start='2013-01-01', freq='D', - periods=5, name='key') - dt_result = getattr(dt_grouped, func)() - assert_series_equal(expected, dt_result) - - # GH 7453 - for func in ['first', 'last']: - expected = getattr(normal_grouped, func)() - expected.index = date_range(start='2013-01-01', freq='D', - periods=5, name='key') - dt_result = getattr(dt_grouped, func)() - assert_frame_equal(expected, dt_result) - - # if TimeGrouper is used included, 'nth' doesn't work yet - - """ - for func in ['nth']: - expected = getattr(normal_grouped, func)(3) - expected.index = date_range(start='2013-01-01', - freq='D', periods=5, name='key') - dt_result = getattr(dt_grouped, func)(3) - assert_frame_equal(expected, dt_result) - """ - - @pytest.mark.parametrize('method, unit', [ - ('sum', 0), - ('prod', 1), - ]) - def test_resample_entirly_nat_window(self, method, unit): - s = pd.Series([0] * 2 + [np.nan] * 2, - index=pd.date_range('2017', periods=4)) - # 0 / 1 by default - result = methodcaller(method)(s.resample("2d")) - expected = pd.Series([0.0, unit], - index=pd.to_datetime(['2017-01-01', - '2017-01-03'])) - tm.assert_series_equal(result, expected) - - # min_count=0 - result = methodcaller(method, min_count=0)(s.resample("2d")) - expected = pd.Series([0.0, unit], - index=pd.to_datetime(['2017-01-01', - '2017-01-03'])) - tm.assert_series_equal(result, expected) - - # min_count=1 - result = methodcaller(method, min_count=1)(s.resample("2d")) - expected = pd.Series([0.0, np.nan], - index=pd.to_datetime(['2017-01-01', - '2017-01-03'])) - tm.assert_series_equal(result, expected) - - @pytest.mark.parametrize('func, fill_value', [ - ('min', np.nan), - ('max', np.nan), - ('sum', 0), - ('prod', 1), - ('count', 0), - ]) - def test_aggregate_with_nat(self, func, fill_value): - # check TimeGrouper's aggregation is identical as normal groupby - # if NaT is included, 'var', 'std', 'mean', 'first','last' - # and 'nth' doesn't work yet - - n = 20 - data = np.random.randn(n, 4).astype('int64') - normal_df = DataFrame(data, columns=['A', 'B', 'C', 'D']) - normal_df['key'] = [1, 2, np.nan, 4, 5] * 4 - - dt_df = DataFrame(data, columns=['A', 'B', 'C', 'D']) - dt_df['key'] = [datetime(2013, 1, 1), datetime(2013, 1, 2), pd.NaT, - datetime(2013, 1, 4), datetime(2013, 1, 5)] * 4 - - normal_grouped = normal_df.groupby('key') - dt_grouped = dt_df.groupby(TimeGrouper(key='key', freq='D')) - - normal_result = getattr(normal_grouped, func)() - dt_result = getattr(dt_grouped, func)() - - pad = DataFrame([[fill_value] * 4], index=[3], - columns=['A', 'B', 'C', 'D']) - expected = normal_result.append(pad) - expected = expected.sort_index() - expected.index = date_range(start='2013-01-01', freq='D', - periods=5, name='key') - assert_frame_equal(expected, dt_result) - assert dt_result.index.name == 'key' - - def test_aggregate_with_nat_size(self): - # GH 9925 - n = 20 - data = np.random.randn(n, 4).astype('int64') - normal_df = DataFrame(data, columns=['A', 'B', 'C', 'D']) - normal_df['key'] = [1, 2, np.nan, 4, 5] * 4 - - dt_df = DataFrame(data, columns=['A', 'B', 'C', 'D']) - dt_df['key'] = [datetime(2013, 1, 1), datetime(2013, 1, 2), pd.NaT, - datetime(2013, 1, 4), datetime(2013, 1, 5)] * 4 - - normal_grouped = normal_df.groupby('key') - dt_grouped = dt_df.groupby(TimeGrouper(key='key', freq='D')) - - normal_result = normal_grouped.size() - dt_result = dt_grouped.size() - - pad = Series([0], index=[3]) - expected = normal_result.append(pad) - expected = expected.sort_index() - expected.index = date_range(start='2013-01-01', freq='D', - periods=5, name='key') - assert_series_equal(expected, dt_result) - assert dt_result.index.name == 'key' - - def test_repr(self): - # GH18203 - result = repr(TimeGrouper(key='A', freq='H')) - expected = ("TimeGrouper(key='A', freq=, axis=0, sort=True, " - "closed='left', label='left', how='mean', " - "convention='e', base=0)") - assert result == expected - - @pytest.mark.parametrize('method, unit', [ - ('sum', 0), - ('prod', 1), - ]) - def test_upsample_sum(self, method, unit): - s = pd.Series(1, index=pd.date_range("2017", periods=2, freq="H")) - resampled = s.resample("30T") - index = pd.to_datetime(['2017-01-01T00:00:00', - '2017-01-01T00:30:00', - '2017-01-01T01:00:00']) - - # 0 / 1 by default - result = methodcaller(method)(resampled) - expected = pd.Series([1, unit, 1], index=index) - tm.assert_series_equal(result, expected) - - # min_count=0 - result = methodcaller(method, min_count=0)(resampled) - expected = pd.Series([1, unit, 1], index=index) - tm.assert_series_equal(result, expected) - - # min_count=1 - result = methodcaller(method, min_count=1)(resampled) - expected = pd.Series([1, np.nan, 1], index=index) - tm.assert_series_equal(result, expected) - - # min_count>1 - result = methodcaller(method, min_count=2)(resampled) - expected = pd.Series([np.nan, np.nan, np.nan], index=index) - tm.assert_series_equal(result, expected) diff --git a/pandas/tests/test_sorting.py b/pandas/tests/test_sorting.py index 22e758a0e59a7..333b93dbdf580 100644 --- a/pandas/tests/test_sorting.py +++ b/pandas/tests/test_sorting.py @@ -127,13 +127,6 @@ def test_nargsort(self): # np.argsort(items2) may not place NaNs first items2 = np.array(items, dtype='O') - try: - # GH 2785; due to a regression in NumPy1.6.2 - np.argsort(np.array([[1, 2], [1, 3], [1, 2]], dtype='i')) - np.argsort(items2, kind='mergesort') - except TypeError: - pytest.skip('requested sort not available for type') - # mergesort is the most difficult to get right because we want it to be # stable. diff --git a/pandas/tests/test_strings.py b/pandas/tests/test_strings.py index bfabaa7a1069a..c5a4e9511a6ef 100644 --- a/pandas/tests/test_strings.py +++ b/pandas/tests/test_strings.py @@ -9,7 +9,7 @@ import numpy as np from numpy.random import randint -from pandas.compat import range, u +from pandas.compat import range, u, PY3 import pandas.compat as compat from pandas import Index, Series, DataFrame, isna, MultiIndex, notna, concat @@ -118,6 +118,55 @@ def any_string_method(request): return request.param +# subset of the full set from pandas/conftest.py +_any_allowed_skipna_inferred_dtype = [ + ('string', ['a', np.nan, 'c']), + ('unicode' if not PY3 else 'string', [u('a'), np.nan, u('c')]), + ('bytes' if PY3 else 'string', [b'a', np.nan, b'c']), + ('empty', [np.nan, np.nan, np.nan]), + ('empty', []), + ('mixed-integer', ['a', np.nan, 2]) +] +ids, _ = zip(*_any_allowed_skipna_inferred_dtype) # use inferred type as id + + +@pytest.fixture(params=_any_allowed_skipna_inferred_dtype, ids=ids) +def any_allowed_skipna_inferred_dtype(request): + """ + Fixture for all (inferred) dtypes allowed in StringMethods.__init__ + + The covered (inferred) types are: + * 'string' + * 'unicode' (if PY2) + * 'empty' + * 'bytes' (if PY3) + * 'mixed' + * 'mixed-integer' + + Returns + ------- + inferred_dtype : str + The string for the inferred dtype from _libs.lib.infer_dtype + values : np.ndarray + An array of object dtype that will be inferred to have + `inferred_dtype` + + Examples + -------- + >>> import pandas._libs.lib as lib + >>> + >>> def test_something(any_allowed_skipna_inferred_dtype): + ... inferred_dtype, values = any_skipna_inferred_dtype + ... # will pass + ... assert lib.infer_dtype(values, skipna=True) == inferred_dtype + """ + inferred_dtype, values = request.param + values = np.array(values, dtype=object) # object dtype to avoid casting + + # correctness of inference tested in tests/dtypes/test_inference.py + return inferred_dtype, values + + class TestStringMethods(object): def test_api(self): @@ -126,11 +175,101 @@ def test_api(self): assert Series.str is strings.StringMethods assert isinstance(Series(['']).str, strings.StringMethods) - # GH 9184 - invalid = Series([1]) - with pytest.raises(AttributeError, match="only use .str accessor"): - invalid.str - assert not hasattr(invalid, 'str') + @pytest.mark.parametrize('dtype', [object, 'category']) + @pytest.mark.parametrize('box', [Series, Index]) + def test_api_per_dtype(self, box, dtype, any_skipna_inferred_dtype): + # one instance of parametrized fixture + inferred_dtype, values = any_skipna_inferred_dtype + + t = box(values, dtype=dtype) # explicit dtype to avoid casting + + # TODO: get rid of these xfails + if dtype == 'category' and inferred_dtype in ['period', 'interval']: + pytest.xfail(reason='Conversion to numpy array fails because ' + 'the ._values-attribute is not a numpy array for ' + 'PeriodArray/IntervalArray; see GH 23553') + if box == Index and inferred_dtype in ['empty', 'bytes']: + pytest.xfail(reason='Raising too restrictively; ' + 'solved by GH 23167') + if (box == Index and dtype == object + and inferred_dtype in ['boolean', 'date', 'time']): + pytest.xfail(reason='Inferring incorrectly because of NaNs; ' + 'solved by GH 23167') + if (box == Series + and (dtype == object and inferred_dtype not in [ + 'string', 'unicode', 'empty', + 'bytes', 'mixed', 'mixed-integer']) + or (dtype == 'category' + and inferred_dtype in ['decimal', 'boolean', 'time'])): + pytest.xfail(reason='Not raising correctly; solved by GH 23167') + + types_passing_constructor = ['string', 'unicode', 'empty', + 'bytes', 'mixed', 'mixed-integer'] + if inferred_dtype in types_passing_constructor: + # GH 6106 + assert isinstance(t.str, strings.StringMethods) + else: + # GH 9184, GH 23011, GH 23163 + with pytest.raises(AttributeError, match='Can only use .str ' + 'accessor with string values.*'): + t.str + assert not hasattr(t, 'str') + + @pytest.mark.parametrize('dtype', [object, 'category']) + @pytest.mark.parametrize('box', [Series, Index]) + def test_api_per_method(self, box, dtype, + any_allowed_skipna_inferred_dtype, + any_string_method): + # this test does not check correctness of the different methods, + # just that the methods work on the specified (inferred) dtypes, + # and raise on all others + + # one instance of each parametrized fixture + inferred_dtype, values = any_allowed_skipna_inferred_dtype + method_name, args, kwargs = any_string_method + + # TODO: get rid of these xfails + if (method_name not in ['encode', 'decode', 'len'] + and inferred_dtype == 'bytes'): + pytest.xfail(reason='Not raising for "bytes", see GH 23011;' + 'Also: malformed method names, see GH 23551; ' + 'solved by GH 23167') + if (method_name == 'cat' + and inferred_dtype in ['mixed', 'mixed-integer']): + pytest.xfail(reason='Bad error message; should raise better; ' + 'solved by GH 23167') + if box == Index and inferred_dtype in ['empty', 'bytes']: + pytest.xfail(reason='Raising too restrictively; ' + 'solved by GH 23167') + if (box == Index and dtype == object + and inferred_dtype in ['boolean', 'date', 'time']): + pytest.xfail(reason='Inferring incorrectly because of NaNs; ' + 'solved by GH 23167') + + t = box(values, dtype=dtype) # explicit dtype to avoid casting + method = getattr(t.str, method_name) + + bytes_allowed = method_name in ['encode', 'decode', 'len'] + # as of v0.23.4, all methods except 'cat' are very lenient with the + # allowed data types, just returning NaN for entries that error. + # This could be changed with an 'errors'-kwarg to the `str`-accessor, + # see discussion in GH 13877 + mixed_allowed = method_name not in ['cat'] + + allowed_types = (['string', 'unicode', 'empty'] + + ['bytes'] * bytes_allowed + + ['mixed', 'mixed-integer'] * mixed_allowed) + + if inferred_dtype in allowed_types: + # xref GH 23555, GH 23556 + method(*args, **kwargs) # works! + else: + # GH 23011, GH 23163 + msg = ('Cannot use .str.{name} with values of inferred dtype ' + '{inferred_dtype!r}.'.format(name=method_name, + inferred_dtype=inferred_dtype)) + with pytest.raises(TypeError, match=msg): + method(*args, **kwargs) def test_api_for_categorical(self, any_string_method): # https://github.com/pandas-dev/pandas/issues/10661 @@ -489,11 +628,31 @@ def test_str_cat_align_mixed_inputs(self, join): with pytest.raises(ValueError, match=rgx): s.str.cat([t, z], join=join) - def test_str_cat_raises(self): - # non-strings hiding behind object dtype - s = Series([1, 2, 3, 4], dtype='object') - with pytest.raises(TypeError, match="unsupported operand type.*"): - s.str.cat(s) + @pytest.mark.parametrize('box', [Series, Index]) + @pytest.mark.parametrize('other', [Series, Index]) + def test_str_cat_all_na(self, box, other): + # GH 24044 + + # check that all NaNs in caller / target work + s = Index(['a', 'b', 'c', 'd']) + s = s if box == Index else Series(s, index=s) + t = other([np.nan] * 4, dtype=object) + # add index of s for alignment + t = t if other == Index else Series(t, index=s) + + # all-NA target + if box == Series: + expected = Series([np.nan] * 4, index=s.index, dtype=object) + else: # box == Index + expected = Index([np.nan] * 4, dtype=object) + result = s.str.cat(t, join='left') + assert_series_or_index_equal(result, expected) + + # all-NA caller (only for Series) + if other == Series: + expected = Series([np.nan] * 4, dtype=object, index=t.index) + result = t.str.cat(s, join='left') + tm.assert_series_equal(result, expected) def test_str_cat_special_cases(self): s = Series(['a', 'b', 'c', 'd']) diff --git a/pandas/tests/test_window.py b/pandas/tests/test_window.py index 31ea5c11f5bd1..b53aca2c9852b 100644 --- a/pandas/tests/test_window.py +++ b/pandas/tests/test_window.py @@ -695,8 +695,7 @@ def test_numpy_compat(self, method): 'expander', [1, pytest.param('ls', marks=pytest.mark.xfail( reason='GH#16425 expanding with ' - 'offset not supported', - strict=True))]) + 'offset not supported'))]) def test_empty_df_expanding(self, expander): # GH 15819 Verifies that datetime and integer expanding windows can be # applied to empty DataFrames diff --git a/pandas/tests/tseries/offsets/conftest.py b/pandas/tests/tseries/offsets/conftest.py index 4766e7e277b13..c192a56b205ca 100644 --- a/pandas/tests/tseries/offsets/conftest.py +++ b/pandas/tests/tseries/offsets/conftest.py @@ -1,4 +1,5 @@ import pytest + import pandas.tseries.offsets as offsets @@ -18,12 +19,3 @@ def month_classes(request): Fixture for month based datetime offsets available for a time series. """ return request.param - - -@pytest.fixture(params=[getattr(offsets, o) for o in offsets.__all__ if - issubclass(getattr(offsets, o), offsets.Tick)]) -def tick_classes(request): - """ - Fixture for Tick based datetime offsets available for a time series. - """ - return request.param diff --git a/pandas/tests/tseries/offsets/test_fiscal.py b/pandas/tests/tseries/offsets/test_fiscal.py index 2f17a61917320..a5d7460921fb4 100644 --- a/pandas/tests/tseries/offsets/test_fiscal.py +++ b/pandas/tests/tseries/offsets/test_fiscal.py @@ -7,10 +7,12 @@ from dateutil.relativedelta import relativedelta import pytest +from pandas._libs.tslibs.frequencies import INVALID_FREQ_ERR_MSG + from pandas import Timestamp + from pandas.tseries.frequencies import get_offset -from pandas._libs.tslibs.frequencies import INVALID_FREQ_ERR_MSG -from pandas.tseries.offsets import FY5253Quarter, FY5253 +from pandas.tseries.offsets import FY5253, FY5253Quarter from .common import assert_offset_equal, assert_onOffset from .test_offsets import Base, WeekDay diff --git a/pandas/tests/tseries/offsets/test_offsets.py b/pandas/tests/tseries/offsets/test_offsets.py index d68dd65c9841b..a938c1fe9a8fe 100644 --- a/pandas/tests/tseries/offsets/test_offsets.py +++ b/pandas/tests/tseries/offsets/test_offsets.py @@ -1,42 +1,32 @@ -from distutils.version import LooseVersion from datetime import date, datetime, timedelta - -import pytest -import pytz -from pandas.compat import range -from pandas import compat +from distutils.version import LooseVersion import numpy as np +import pytest +from pandas._libs.tslibs import ( + NaT, OutOfBoundsDatetime, Timestamp, conversion, timezones) +from pandas._libs.tslibs.frequencies import ( + INVALID_FREQ_ERR_MSG, get_freq_code, get_freq_str) +import pandas._libs.tslibs.offsets as liboffsets +import pandas.compat as compat +from pandas.compat import range from pandas.compat.numpy import np_datetime64_compat +from pandas.core.indexes.datetimes import DatetimeIndex, _to_m8, date_range from pandas.core.series import Series -from pandas._libs.tslibs import conversion -from pandas._libs.tslibs.frequencies import (get_freq_code, get_freq_str, - INVALID_FREQ_ERR_MSG) -from pandas.tseries.frequencies import _offset_map, get_offset -from pandas.core.indexes.datetimes import _to_m8, DatetimeIndex -from pandas.core.indexes.timedeltas import TimedeltaIndex -import pandas._libs.tslibs.offsets as liboffsets -from pandas.tseries.offsets import (BDay, CDay, BQuarterEnd, BMonthEnd, - BusinessHour, WeekOfMonth, CBMonthEnd, - CustomBusinessHour, - CBMonthBegin, BYearEnd, MonthEnd, - MonthBegin, SemiMonthBegin, SemiMonthEnd, - BYearBegin, QuarterBegin, BQuarterBegin, - BMonthBegin, DateOffset, Week, YearBegin, - YearEnd, Day, - QuarterEnd, FY5253, - Nano, Easter, FY5253Quarter, - LastWeekOfMonth, Tick, CalendarDay) -import pandas.tseries.offsets as offsets -from pandas.io.pickle import read_pickle -from pandas._libs.tslibs import timezones -from pandas._libs.tslib import NaT, Timestamp -from pandas._libs.tslibs.timedeltas import Timedelta -import pandas._libs.tslib as tslib import pandas.util.testing as tm + +from pandas.io.pickle import read_pickle +from pandas.tseries.frequencies import _offset_map, get_offset from pandas.tseries.holiday import USFederalHolidayCalendar +import pandas.tseries.offsets as offsets +from pandas.tseries.offsets import ( + FY5253, BDay, BMonthBegin, BMonthEnd, BQuarterBegin, BQuarterEnd, + BusinessHour, BYearBegin, BYearEnd, CBMonthBegin, CBMonthEnd, CDay, + CustomBusinessHour, DateOffset, Day, Easter, FY5253Quarter, + LastWeekOfMonth, MonthBegin, MonthEnd, Nano, QuarterBegin, QuarterEnd, + SemiMonthBegin, SemiMonthEnd, Tick, Week, WeekOfMonth, YearBegin, YearEnd) from .common import assert_offset_equal, assert_onOffset @@ -61,17 +51,11 @@ def test_to_m8(): valb = datetime(2007, 10, 1) valu = _to_m8(valb) assert isinstance(valu, np.datetime64) - # assert valu == np.datetime64(datetime(2007,10,1)) - # def test_datetime64_box(): - # valu = np.datetime64(datetime(2007,10,1)) - # valb = _dt_box(valu) - # assert type(valb) == datetime - # assert valb == datetime(2007,10,1) - ##### - # DateOffset Tests - ##### +##### +# DateOffset Tests +##### class Base(object): @@ -130,7 +114,7 @@ def test_apply_out_of_range(self, tz_naive_fixture): assert isinstance(result, datetime) assert t.tzinfo == result.tzinfo - except tslib.OutOfBoundsDatetime: + except OutOfBoundsDatetime: raise except (ValueError, KeyError): # we are creating an invalid offset @@ -206,7 +190,6 @@ class TestCommon(Base): # are applied to 2011/01/01 09:00 (Saturday) # used for .apply and .rollforward expecteds = {'Day': Timestamp('2011-01-02 09:00:00'), - 'CalendarDay': Timestamp('2011-01-02 09:00:00'), 'DateOffset': Timestamp('2011-01-02 09:00:00'), 'BusinessDay': Timestamp('2011-01-03 09:00:00'), 'CustomBusinessDay': Timestamp('2011-01-03 09:00:00'), @@ -375,7 +358,7 @@ def test_rollforward(self, offset_types): # result will not be changed if the target is on the offset no_changes = ['Day', 'MonthBegin', 'SemiMonthBegin', 'YearBegin', 'Week', 'Hour', 'Minute', 'Second', 'Milli', 'Micro', - 'Nano', 'DateOffset', 'CalendarDay'] + 'Nano', 'DateOffset'] for n in no_changes: expecteds[n] = Timestamp('2011/01/01 09:00') @@ -388,7 +371,6 @@ def test_rollforward(self, offset_types): norm_expected[k] = Timestamp(norm_expected[k].date()) normalized = {'Day': Timestamp('2011-01-02 00:00:00'), - 'CalendarDay': Timestamp('2011-01-02 00:00:00'), 'DateOffset': Timestamp('2011-01-02 00:00:00'), 'MonthBegin': Timestamp('2011-02-01 00:00:00'), 'SemiMonthBegin': Timestamp('2011-01-15 00:00:00'), @@ -441,7 +423,7 @@ def test_rollback(self, offset_types): # result will not be changed if the target is on the offset for n in ['Day', 'MonthBegin', 'SemiMonthBegin', 'YearBegin', 'Week', 'Hour', 'Minute', 'Second', 'Milli', 'Micro', 'Nano', - 'DateOffset', 'CalendarDay']: + 'DateOffset']: expecteds[n] = Timestamp('2011/01/01 09:00') # but be changed when normalize=True @@ -450,7 +432,6 @@ def test_rollback(self, offset_types): norm_expected[k] = Timestamp(norm_expected[k].date()) normalized = {'Day': Timestamp('2010-12-31 00:00:00'), - 'CalendarDay': Timestamp('2010-12-31 00:00:00'), 'DateOffset': Timestamp('2010-12-31 00:00:00'), 'MonthBegin': Timestamp('2010-12-01 00:00:00'), 'SemiMonthBegin': Timestamp('2010-12-15 00:00:00'), @@ -1367,10 +1348,10 @@ def test_apply_nanoseconds(self): assert_offset_equal(offset, base, expected) def test_datetimeindex(self): - idx1 = DatetimeIndex(start='2014-07-04 15:00', end='2014-07-08 10:00', - freq='BH') - idx2 = DatetimeIndex(start='2014-07-04 15:00', periods=12, freq='BH') - idx3 = DatetimeIndex(end='2014-07-08 10:00', periods=12, freq='BH') + idx1 = date_range(start='2014-07-04 15:00', end='2014-07-08 10:00', + freq='BH') + idx2 = date_range(start='2014-07-04 15:00', periods=12, freq='BH') + idx3 = date_range(end='2014-07-08 10:00', periods=12, freq='BH') expected = DatetimeIndex(['2014-07-04 15:00', '2014-07-04 16:00', '2014-07-07 09:00', '2014-07-07 10:00', '2014-07-07 11:00', @@ -1383,10 +1364,10 @@ def test_datetimeindex(self): for idx in [idx1, idx2, idx3]: tm.assert_index_equal(idx, expected) - idx1 = DatetimeIndex(start='2014-07-04 15:45', end='2014-07-08 10:45', - freq='BH') - idx2 = DatetimeIndex(start='2014-07-04 15:45', periods=12, freq='BH') - idx3 = DatetimeIndex(end='2014-07-08 10:45', periods=12, freq='BH') + idx1 = date_range(start='2014-07-04 15:45', end='2014-07-08 10:45', + freq='BH') + idx2 = date_range(start='2014-07-04 15:45', periods=12, freq='BH') + idx3 = date_range(end='2014-07-08 10:45', periods=12, freq='BH') expected = DatetimeIndex(['2014-07-04 15:45', '2014-07-04 16:45', '2014-07-07 09:45', @@ -2005,8 +1986,8 @@ def test_datetimeindex(self): hcal = USFederalHolidayCalendar() freq = CBMonthEnd(calendar=hcal) - assert (DatetimeIndex(start='20120101', end='20130101', - freq=freq).tolist()[0] == datetime(2012, 1, 31)) + assert (date_range(start='20120101', end='20130101', + freq=freq).tolist()[0] == datetime(2012, 1, 31)) class TestCustomBusinessMonthBegin(CustomBusinessMonthBase, Base): @@ -2122,8 +2103,8 @@ def test_holidays(self): def test_datetimeindex(self): hcal = USFederalHolidayCalendar() cbmb = CBMonthBegin(calendar=hcal) - assert (DatetimeIndex(start='20120101', end='20130101', - freq=cbmb).tolist()[0] == datetime(2012, 1, 3)) + assert (date_range(start='20120101', end='20130101', + freq=cbmb).tolist()[0] == datetime(2012, 1, 3)) class TestWeek(Base): @@ -2425,7 +2406,7 @@ def test_offset_whole_year(self): tm.assert_index_equal(result, exp) # ensure generating a range with DatetimeIndex gives same result - result = DatetimeIndex(start=dates[0], end=dates[-1], freq='SM') + result = date_range(start=dates[0], end=dates[-1], freq='SM') exp = DatetimeIndex(dates) tm.assert_index_equal(result, exp) @@ -2612,7 +2593,7 @@ def test_offset_whole_year(self): tm.assert_index_equal(result, exp) # ensure generating a range with DatetimeIndex gives same result - result = DatetimeIndex(start=dates[0], end=dates[-1], freq='SMS') + result = date_range(start=dates[0], end=dates[-1], freq='SMS') exp = DatetimeIndex(dates) tm.assert_index_equal(result, exp) @@ -3160,71 +3141,3 @@ def test_last_week_of_month_on_offset(): slow = (ts + offset) - offset == ts fast = offset.onOffset(ts) assert fast == slow - - -class TestCalendarDay(object): - - def test_add_across_dst_scalar(self): - # GH 22274 - ts = Timestamp('2016-10-30 00:00:00+0300', tz='Europe/Helsinki') - expected = Timestamp('2016-10-31 00:00:00+0200', tz='Europe/Helsinki') - result = ts + CalendarDay(1) - assert result == expected - - result = result - CalendarDay(1) - assert result == ts - - @pytest.mark.parametrize('box', [DatetimeIndex, Series]) - def test_add_across_dst_array(self, box): - # GH 22274 - ts = Timestamp('2016-10-30 00:00:00+0300', tz='Europe/Helsinki') - expected = Timestamp('2016-10-31 00:00:00+0200', tz='Europe/Helsinki') - arr = box([ts]) - expected = box([expected]) - result = arr + CalendarDay(1) - tm.assert_equal(result, expected) - - result = result - CalendarDay(1) - tm.assert_equal(arr, result) - - @pytest.mark.parametrize('arg', [ - Timestamp("2018-11-03 01:00:00", tz='US/Pacific'), - DatetimeIndex([Timestamp("2018-11-03 01:00:00", tz='US/Pacific')]) - ]) - def test_raises_AmbiguousTimeError(self, arg): - # GH 22274 - with pytest.raises(pytz.AmbiguousTimeError): - arg + CalendarDay(1) - - @pytest.mark.parametrize('arg', [ - Timestamp("2019-03-09 02:00:00", tz='US/Pacific'), - DatetimeIndex([Timestamp("2019-03-09 02:00:00", tz='US/Pacific')]) - ]) - def test_raises_NonExistentTimeError(self, arg): - # GH 22274 - with pytest.raises(pytz.NonExistentTimeError): - arg + CalendarDay(1) - - @pytest.mark.parametrize('arg, exp', [ - [1, 2], - [-1, 0], - [-5, -4] - ]) - def test_arithmetic(self, arg, exp): - # GH 22274 - result = CalendarDay(1) + CalendarDay(arg) - expected = CalendarDay(exp) - assert result == expected - - @pytest.mark.parametrize('arg', [ - timedelta(1), - Day(1), - Timedelta(1), - TimedeltaIndex([timedelta(1)]) - ]) - def test_invalid_arithmetic(self, arg): - # GH 22274 - # CalendarDay (relative time) cannot be added to Timedelta-like objects - # (absolute time) - with pytest.raises(TypeError): - CalendarDay(1) + arg diff --git a/pandas/tests/tseries/offsets/test_offsets_properties.py b/pandas/tests/tseries/offsets/test_offsets_properties.py index 07a6895d1e231..cd5f2a2a25e58 100644 --- a/pandas/tests/tseries/offsets/test_offsets_properties.py +++ b/pandas/tests/tseries/offsets/test_offsets_properties.py @@ -10,18 +10,16 @@ """ import warnings -import pytest -from hypothesis import given, assume, strategies as st -from hypothesis.extra.pytz import timezones as pytz_timezones +from hypothesis import assume, given, strategies as st from hypothesis.extra.dateutil import timezones as dateutil_timezones +from hypothesis.extra.pytz import timezones as pytz_timezones +import pytest import pandas as pd from pandas.tseries.offsets import ( - MonthEnd, MonthBegin, BMonthEnd, BMonthBegin, - QuarterEnd, QuarterBegin, BQuarterEnd, BQuarterBegin, - YearEnd, YearBegin, BYearEnd, BYearBegin, -) + BMonthBegin, BMonthEnd, BQuarterBegin, BQuarterEnd, BYearBegin, BYearEnd, + MonthBegin, MonthEnd, QuarterBegin, QuarterEnd, YearBegin, YearEnd) # ---------------------------------------------------------------- # Helpers for generating random data @@ -74,7 +72,7 @@ def test_on_offset_implementations(dt, offset): assert offset.onOffset(dt) == (compare == dt) -@pytest.mark.xfail(strict=True) +@pytest.mark.xfail @given(gen_yqm_offset, gen_date_range) def test_apply_index_implementations(offset, rng): # offset.apply_index(dti)[i] should match dti[i] + offset @@ -96,7 +94,7 @@ def test_apply_index_implementations(offset, rng): # TODO: Check randomly assorted entries, not just first/last -@pytest.mark.xfail(strict=True) +@pytest.mark.xfail @given(gen_yqm_offset) def test_shift_across_dst(offset): # GH#18319 check that 1) timezone is correctly normalized and diff --git a/pandas/tests/tseries/offsets/test_ticks.py b/pandas/tests/tseries/offsets/test_ticks.py index 128010fe6d32c..a1940241b4c56 100644 --- a/pandas/tests/tseries/offsets/test_ticks.py +++ b/pandas/tests/tseries/offsets/test_ticks.py @@ -4,14 +4,14 @@ """ from datetime import datetime, timedelta -import pytest +from hypothesis import assume, example, given, strategies as st import numpy as np -from hypothesis import given, assume, example, strategies as st +import pytest from pandas import Timedelta, Timestamp + from pandas.tseries import offsets -from pandas.tseries.offsets import (Day, Hour, Minute, Second, Milli, Micro, - Nano) +from pandas.tseries.offsets import Hour, Micro, Milli, Minute, Nano, Second from .common import assert_offset_equal @@ -212,13 +212,6 @@ def test_Nanosecond(): assert Micro(5) + Nano(1) == Nano(5001) -def test_Day_equals_24_Hours(): - ts = Timestamp('2016-10-30 00:00:00+0300', tz='Europe/Helsinki') - result = ts + Day(1) - expected = ts + Hour(24) - assert result == expected - - @pytest.mark.parametrize('kls, expected', [(Hour, Timedelta(hours=5)), (Minute, Timedelta(hours=2, minutes=3)), diff --git a/pandas/tests/tseries/offsets/test_yqm_offsets.py b/pandas/tests/tseries/offsets/test_yqm_offsets.py index 22b8cf6119d18..8023ee3139dd5 100644 --- a/pandas/tests/tseries/offsets/test_yqm_offsets.py +++ b/pandas/tests/tseries/offsets/test_yqm_offsets.py @@ -7,22 +7,19 @@ import pytest import pandas as pd -from pandas import Timestamp -from pandas import compat +from pandas import Timestamp, compat -from pandas.tseries.offsets import (BMonthBegin, BMonthEnd, - MonthBegin, MonthEnd, - YearEnd, YearBegin, BYearEnd, BYearBegin, - QuarterEnd, QuarterBegin, - BQuarterEnd, BQuarterBegin) +from pandas.tseries.offsets import ( + BMonthBegin, BMonthEnd, BQuarterBegin, BQuarterEnd, BYearBegin, BYearEnd, + MonthBegin, MonthEnd, QuarterBegin, QuarterEnd, YearBegin, YearEnd) -from .test_offsets import Base from .common import assert_offset_equal, assert_onOffset - +from .test_offsets import Base # -------------------------------------------------------------------- # Misc + def test_quarterly_dont_normalize(): date = datetime(2012, 3, 31, 5, 30) diff --git a/pandas/tests/tseries/test_frequencies.py b/pandas/tests/tseries/test_frequencies.py index a8def56aa06d4..d2ca70795be80 100644 --- a/pandas/tests/tseries/test_frequencies.py +++ b/pandas/tests/tseries/test_frequencies.py @@ -1,26 +1,24 @@ from datetime import datetime, timedelta -from pandas.compat import range -import pytest import numpy as np +import pytest -from pandas import (Index, DatetimeIndex, Timestamp, Series, - date_range, period_range) - -from pandas._libs.tslibs.frequencies import (_period_code_map, - INVALID_FREQ_ERR_MSG) -from pandas._libs.tslibs.ccalendar import MONTHS from pandas._libs.tslibs import resolution -import pandas.tseries.frequencies as frequencies -from pandas.core.tools.datetimes import to_datetime - -import pandas.tseries.offsets as offsets -from pandas.core.indexes.period import PeriodIndex +from pandas._libs.tslibs.ccalendar import MONTHS +from pandas._libs.tslibs.frequencies import ( + INVALID_FREQ_ERR_MSG, _period_code_map) import pandas.compat as compat -from pandas.compat import is_platform_windows +from pandas.compat import is_platform_windows, range +from pandas import ( + DatetimeIndex, Index, Series, Timedelta, Timestamp, date_range, + period_range) +from pandas.core.indexes.period import PeriodIndex +from pandas.core.tools.datetimes import to_datetime import pandas.util.testing as tm -from pandas import Timedelta + +import pandas.tseries.frequencies as frequencies +import pandas.tseries.offsets as offsets class TestToOffset(object): diff --git a/pandas/tests/tseries/test_holiday.py b/pandas/tests/tseries/test_holiday.py index 3ea7e5b8620f2..86f154ed1acc2 100644 --- a/pandas/tests/tseries/test_holiday.py +++ b/pandas/tests/tseries/test_holiday.py @@ -1,22 +1,19 @@ +from datetime import datetime + import pytest +from pytz import utc -from datetime import datetime +from pandas import DatetimeIndex, compat import pandas.util.testing as tm -from pandas import compat -from pandas import DatetimeIndex -from pandas.tseries.holiday import (USFederalHolidayCalendar, USMemorialDay, - USThanksgivingDay, nearest_workday, - next_monday_or_tuesday, next_monday, - previous_friday, sunday_to_monday, Holiday, - DateOffset, MO, SA, Timestamp, - AbstractHolidayCalendar, get_calendar, - HolidayCalendarFactory, next_workday, - previous_workday, before_nearest_workday, - EasterMonday, GoodFriday, - after_nearest_workday, weekend_to_monday, - USLaborDay, USColumbusDay, - USMartinLutherKingJr, USPresidentsDay) -from pytz import utc + +from pandas.tseries.holiday import ( + MO, SA, AbstractHolidayCalendar, DateOffset, EasterMonday, GoodFriday, + Holiday, HolidayCalendarFactory, Timestamp, USColumbusDay, + USFederalHolidayCalendar, USLaborDay, USMartinLutherKingJr, USMemorialDay, + USPresidentsDay, USThanksgivingDay, after_nearest_workday, + before_nearest_workday, get_calendar, nearest_workday, next_monday, + next_monday_or_tuesday, next_workday, previous_friday, previous_workday, + sunday_to_monday, weekend_to_monday) class TestCalendar(object): diff --git a/pandas/tests/tslibs/test_conversion.py b/pandas/tests/tslibs/test_conversion.py index de36c0bb2f789..6bfc686ba830e 100644 --- a/pandas/tests/tslibs/test_conversion.py +++ b/pandas/tests/tslibs/test_conversion.py @@ -2,6 +2,7 @@ import numpy as np import pytest +from pytz import UTC from pandas._libs.tslib import iNaT from pandas._libs.tslibs import conversion, timezones @@ -11,15 +12,15 @@ def compare_utc_to_local(tz_didx, utc_didx): - f = lambda x: conversion.tz_convert_single(x, 'UTC', tz_didx.tz) - result = conversion.tz_convert(tz_didx.asi8, 'UTC', tz_didx.tz) + f = lambda x: conversion.tz_convert_single(x, UTC, tz_didx.tz) + result = conversion.tz_convert(tz_didx.asi8, UTC, tz_didx.tz) result_single = np.vectorize(f)(tz_didx.asi8) tm.assert_numpy_array_equal(result, result_single) def compare_local_to_utc(tz_didx, utc_didx): - f = lambda x: conversion.tz_convert_single(x, tz_didx.tz, 'UTC') - result = conversion.tz_convert(utc_didx.asi8, tz_didx.tz, 'UTC') + f = lambda x: conversion.tz_convert_single(x, tz_didx.tz, UTC) + result = conversion.tz_convert(utc_didx.asi8, tz_didx.tz, UTC) result_single = np.vectorize(f)(utc_didx.asi8) tm.assert_numpy_array_equal(result, result_single) @@ -56,3 +57,15 @@ def test_tz_convert_corner(self, arr): timezones.maybe_get_tz('US/Eastern'), timezones.maybe_get_tz('Asia/Tokyo')) tm.assert_numpy_array_equal(result, arr) + + +class TestEnsureDatetime64NS(object): + @pytest.mark.parametrize('copy', [True, False]) + @pytest.mark.parametrize('dtype', ['M8[ns]', 'M8[s]']) + def test_length_zero_copy(self, dtype, copy): + arr = np.array([], dtype=dtype) + result = conversion.ensure_datetime64ns(arr, copy=copy) + if copy: + assert result.base is None + else: + assert result.base is arr diff --git a/pandas/tests/util/conftest.py b/pandas/tests/util/conftest.py new file mode 100644 index 0000000000000..5eff49ab774b5 --- /dev/null +++ b/pandas/tests/util/conftest.py @@ -0,0 +1,26 @@ +import pytest + + +@pytest.fixture(params=[True, False]) +def check_dtype(request): + return request.param + + +@pytest.fixture(params=[True, False]) +def check_exact(request): + return request.param + + +@pytest.fixture(params=[True, False]) +def check_index_type(request): + return request.param + + +@pytest.fixture(params=[True, False]) +def check_less_precise(request): + return request.param + + +@pytest.fixture(params=[True, False]) +def check_categorical(request): + return request.param diff --git a/pandas/tests/util/test_assert_almost_equal.py b/pandas/tests/util/test_assert_almost_equal.py new file mode 100644 index 0000000000000..afee9c008295f --- /dev/null +++ b/pandas/tests/util/test_assert_almost_equal.py @@ -0,0 +1,350 @@ +# -*- coding: utf-8 -*- + +import numpy as np +import pytest + +from pandas import DataFrame, Index, Series, Timestamp +from pandas.util.testing import assert_almost_equal + + +def _assert_almost_equal_both(a, b, **kwargs): + """ + Check that two objects are approximately equal. + + This check is performed commutatively. + + Parameters + ---------- + a : object + The first object to compare. + b : object + The second object to compare. + kwargs : dict + The arguments passed to `assert_almost_equal`. + """ + assert_almost_equal(a, b, **kwargs) + assert_almost_equal(b, a, **kwargs) + + +def _assert_not_almost_equal(a, b, **kwargs): + """ + Check that two objects are not approximately equal. + + Parameters + ---------- + a : object + The first object to compare. + b : object + The second object to compare. + kwargs : dict + The arguments passed to `assert_almost_equal`. + """ + try: + assert_almost_equal(a, b, **kwargs) + msg = ("{a} and {b} were approximately equal " + "when they shouldn't have been").format(a=a, b=b) + pytest.fail(msg=msg) + except AssertionError: + pass + + +def _assert_not_almost_equal_both(a, b, **kwargs): + """ + Check that two objects are not approximately equal. + + This check is performed commutatively. + + Parameters + ---------- + a : object + The first object to compare. + b : object + The second object to compare. + kwargs : dict + The arguments passed to `tm.assert_almost_equal`. + """ + _assert_not_almost_equal(a, b, **kwargs) + _assert_not_almost_equal(b, a, **kwargs) + + +@pytest.mark.parametrize("a,b", [ + (1.1, 1.1), (1.1, 1.100001), (np.int16(1), 1.000001), + (np.float64(1.1), 1.1), (np.uint32(5), 5), +]) +def test_assert_almost_equal_numbers(a, b): + _assert_almost_equal_both(a, b) + + +@pytest.mark.parametrize("a,b", [ + (1.1, 1), (1.1, True), (1, 2), (1.0001, np.int16(1)), +]) +def test_assert_not_almost_equal_numbers(a, b): + _assert_not_almost_equal_both(a, b) + + +@pytest.mark.parametrize("a,b", [ + (0, 0), (0, 0.0), (0, np.float64(0)), (0.000001, 0), +]) +def test_assert_almost_equal_numbers_with_zeros(a, b): + _assert_almost_equal_both(a, b) + + +@pytest.mark.parametrize("a,b", [ + (0.001, 0), (1, 0), +]) +def test_assert_not_almost_equal_numbers_with_zeros(a, b): + _assert_not_almost_equal_both(a, b) + + +@pytest.mark.parametrize("a,b", [ + (1, "abc"), (1, [1, ]), (1, object()), +]) +def test_assert_not_almost_equal_numbers_with_mixed(a, b): + _assert_not_almost_equal_both(a, b) + + +@pytest.mark.parametrize( + "left_dtype", ["M8[ns]", "m8[ns]", "float64", "int64", "object"]) +@pytest.mark.parametrize( + "right_dtype", ["M8[ns]", "m8[ns]", "float64", "int64", "object"]) +def test_assert_almost_equal_edge_case_ndarrays(left_dtype, right_dtype): + # Empty compare. + _assert_almost_equal_both(np.array([], dtype=left_dtype), + np.array([], dtype=right_dtype), + check_dtype=False) + + +def test_assert_almost_equal_dicts(): + _assert_almost_equal_both({"a": 1, "b": 2}, {"a": 1, "b": 2}) + + +@pytest.mark.parametrize("a,b", [ + ({"a": 1, "b": 2}, {"a": 1, "b": 3}), + ({"a": 1, "b": 2}, {"a": 1, "b": 2, "c": 3}), + ({"a": 1}, 1), ({"a": 1}, "abc"), ({"a": 1}, [1, ]), +]) +def test_assert_not_almost_equal_dicts(a, b): + _assert_not_almost_equal_both(a, b) + + +@pytest.mark.parametrize("val", [1, 2]) +def test_assert_almost_equal_dict_like_object(val): + dict_val = 1 + real_dict = dict(a=val) + + class DictLikeObj(object): + def keys(self): + return "a", + + def __getitem__(self, item): + if item == "a": + return dict_val + + func = (_assert_almost_equal_both if val == dict_val + else _assert_not_almost_equal_both) + func(real_dict, DictLikeObj(), check_dtype=False) + + +def test_assert_almost_equal_strings(): + _assert_almost_equal_both("abc", "abc") + + +@pytest.mark.parametrize("a,b", [ + ("abc", "abcd"), ("abc", "abd"), ("abc", 1), ("abc", [1, ]), +]) +def test_assert_not_almost_equal_strings(a, b): + _assert_not_almost_equal_both(a, b) + + +@pytest.mark.parametrize("a,b", [ + ([1, 2, 3], [1, 2, 3]), (np.array([1, 2, 3]), np.array([1, 2, 3])), +]) +def test_assert_almost_equal_iterables(a, b): + _assert_almost_equal_both(a, b) + + +@pytest.mark.parametrize("a,b", [ + # Class is different. + (np.array([1, 2, 3]), [1, 2, 3]), + + # Dtype is different. + (np.array([1, 2, 3]), np.array([1., 2., 3.])), + + # Can't compare generators. + (iter([1, 2, 3]), [1, 2, 3]), ([1, 2, 3], [1, 2, 4]), + ([1, 2, 3], [1, 2, 3, 4]), ([1, 2, 3], 1), +]) +def test_assert_not_almost_equal_iterables(a, b): + _assert_not_almost_equal(a, b) + + +def test_assert_almost_equal_null(): + _assert_almost_equal_both(None, None) + + +@pytest.mark.parametrize("a,b", [ + (None, np.NaN), (None, 0), (np.NaN, 0), +]) +def test_assert_not_almost_equal_null(a, b): + _assert_not_almost_equal(a, b) + + +@pytest.mark.parametrize("a,b", [ + (np.inf, np.inf), (np.inf, float("inf")), + (np.array([np.inf, np.nan, -np.inf]), + np.array([np.inf, np.nan, -np.inf])), + (np.array([np.inf, None, -np.inf], dtype=np.object_), + np.array([np.inf, np.nan, -np.inf], dtype=np.object_)), +]) +def test_assert_almost_equal_inf(a, b): + _assert_almost_equal_both(a, b) + + +def test_assert_not_almost_equal_inf(): + _assert_not_almost_equal_both(np.inf, 0) + + +@pytest.mark.parametrize("a,b", [ + (Index([1., 1.1]), Index([1., 1.100001])), + (Series([1., 1.1]), Series([1., 1.100001])), + (np.array([1.1, 2.000001]), np.array([1.1, 2.0])), + (DataFrame({"a": [1., 1.1]}), DataFrame({"a": [1., 1.100001]})) +]) +def test_assert_almost_equal_pandas(a, b): + _assert_almost_equal_both(a, b) + + +def test_assert_almost_equal_object(): + a = [Timestamp("2011-01-01"), Timestamp("2011-01-01")] + b = [Timestamp("2011-01-01"), Timestamp("2011-01-01")] + _assert_almost_equal_both(a, b) + + +def test_assert_almost_equal_value_mismatch(): + msg = "expected 2\\.00000 but got 1\\.00000, with decimal 5" + + with pytest.raises(AssertionError, match=msg): + assert_almost_equal(1, 2) + + +@pytest.mark.parametrize("a,b,klass1,klass2", [ + (np.array([1]), 1, "ndarray", "int"), + (1, np.array([1]), "int", "ndarray"), +]) +def test_assert_almost_equal_class_mismatch(a, b, klass1, klass2): + msg = """numpy array are different + +numpy array classes are different +\\[left\\]: {klass1} +\\[right\\]: {klass2}""".format(klass1=klass1, klass2=klass2) + + with pytest.raises(AssertionError, match=msg): + assert_almost_equal(a, b) + + +def test_assert_almost_equal_value_mismatch1(): + msg = """numpy array are different + +numpy array values are different \\(66\\.66667 %\\) +\\[left\\]: \\[nan, 2\\.0, 3\\.0\\] +\\[right\\]: \\[1\\.0, nan, 3\\.0\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_almost_equal(np.array([np.nan, 2, 3]), + np.array([1, np.nan, 3])) + + +def test_assert_almost_equal_value_mismatch2(): + msg = """numpy array are different + +numpy array values are different \\(50\\.0 %\\) +\\[left\\]: \\[1, 2\\] +\\[right\\]: \\[1, 3\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_almost_equal(np.array([1, 2]), np.array([1, 3])) + + +def test_assert_almost_equal_value_mismatch3(): + msg = """numpy array are different + +numpy array values are different \\(16\\.66667 %\\) +\\[left\\]: \\[\\[1, 2\\], \\[3, 4\\], \\[5, 6\\]\\] +\\[right\\]: \\[\\[1, 3\\], \\[3, 4\\], \\[5, 6\\]\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_almost_equal(np.array([[1, 2], [3, 4], [5, 6]]), + np.array([[1, 3], [3, 4], [5, 6]])) + + +def test_assert_almost_equal_value_mismatch4(): + msg = """numpy array are different + +numpy array values are different \\(25\\.0 %\\) +\\[left\\]: \\[\\[1, 2\\], \\[3, 4\\]\\] +\\[right\\]: \\[\\[1, 3\\], \\[3, 4\\]\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_almost_equal(np.array([[1, 2], [3, 4]]), + np.array([[1, 3], [3, 4]])) + + +def test_assert_almost_equal_shape_mismatch_override(): + msg = """Index are different + +Index shapes are different +\\[left\\]: \\(2L*,\\) +\\[right\\]: \\(3L*,\\)""" + with pytest.raises(AssertionError, match=msg): + assert_almost_equal(np.array([1, 2]), + np.array([3, 4, 5]), + obj="Index") + + +def test_assert_almost_equal_unicode(): + # see gh-20503 + msg = """numpy array are different + +numpy array values are different \\(33\\.33333 %\\) +\\[left\\]: \\[á, à, ä\\] +\\[right\\]: \\[á, à, å\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_almost_equal(np.array([u"á", u"à", u"ä"]), + np.array([u"á", u"à", u"å"])) + + +def test_assert_almost_equal_timestamp(): + a = np.array([Timestamp("2011-01-01"), Timestamp("2011-01-01")]) + b = np.array([Timestamp("2011-01-01"), Timestamp("2011-01-02")]) + + msg = """numpy array are different + +numpy array values are different \\(50\\.0 %\\) +\\[left\\]: \\[2011-01-01 00:00:00, 2011-01-01 00:00:00\\] +\\[right\\]: \\[2011-01-01 00:00:00, 2011-01-02 00:00:00\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_almost_equal(a, b) + + +def test_assert_almost_equal_iterable_length_mismatch(): + msg = """Iterable are different + +Iterable length are different +\\[left\\]: 2 +\\[right\\]: 3""" + + with pytest.raises(AssertionError, match=msg): + assert_almost_equal([1, 2], [3, 4, 5]) + + +def test_assert_almost_equal_iterable_values_mismatch(): + msg = """Iterable are different + +Iterable values are different \\(50\\.0 %\\) +\\[left\\]: \\[1, 2\\] +\\[right\\]: \\[1, 3\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_almost_equal([1, 2], [1, 3]) diff --git a/pandas/tests/util/test_assert_categorical_equal.py b/pandas/tests/util/test_assert_categorical_equal.py new file mode 100644 index 0000000000000..04c8301027039 --- /dev/null +++ b/pandas/tests/util/test_assert_categorical_equal.py @@ -0,0 +1,92 @@ +# -*- coding: utf-8 -*- + +import pytest + +from pandas import Categorical +from pandas.util.testing import assert_categorical_equal + + +@pytest.mark.parametrize("c", [ + Categorical([1, 2, 3, 4]), + Categorical([1, 2, 3, 4], categories=[1, 2, 3, 4, 5]), +]) +def test_categorical_equal(c): + assert_categorical_equal(c, c) + + +@pytest.mark.parametrize("check_category_order", [True, False]) +def test_categorical_equal_order_mismatch(check_category_order): + c1 = Categorical([1, 2, 3, 4], categories=[1, 2, 3, 4]) + c2 = Categorical([1, 2, 3, 4], categories=[4, 3, 2, 1]) + kwargs = dict(check_category_order=check_category_order) + + if check_category_order: + msg = """Categorical\\.categories are different + +Categorical\\.categories values are different \\(100\\.0 %\\) +\\[left\\]: Int64Index\\(\\[1, 2, 3, 4\\], dtype='int64'\\) +\\[right\\]: Int64Index\\(\\[4, 3, 2, 1\\], dtype='int64'\\)""" + with pytest.raises(AssertionError, match=msg): + assert_categorical_equal(c1, c2, **kwargs) + else: + assert_categorical_equal(c1, c2, **kwargs) + + +def test_categorical_equal_categories_mismatch(): + msg = """Categorical\\.categories are different + +Categorical\\.categories values are different \\(25\\.0 %\\) +\\[left\\]: Int64Index\\(\\[1, 2, 3, 4\\], dtype='int64'\\) +\\[right\\]: Int64Index\\(\\[1, 2, 3, 5\\], dtype='int64'\\)""" + + c1 = Categorical([1, 2, 3, 4]) + c2 = Categorical([1, 2, 3, 5]) + + with pytest.raises(AssertionError, match=msg): + assert_categorical_equal(c1, c2) + + +def test_categorical_equal_codes_mismatch(): + categories = [1, 2, 3, 4] + msg = """Categorical\\.codes are different + +Categorical\\.codes values are different \\(50\\.0 %\\) +\\[left\\]: \\[0, 1, 3, 2\\] +\\[right\\]: \\[0, 1, 2, 3\\]""" + + c1 = Categorical([1, 2, 4, 3], categories=categories) + c2 = Categorical([1, 2, 3, 4], categories=categories) + + with pytest.raises(AssertionError, match=msg): + assert_categorical_equal(c1, c2) + + +def test_categorical_equal_ordered_mismatch(): + data = [1, 2, 3, 4] + msg = """Categorical are different + +Attribute "ordered" are different +\\[left\\]: False +\\[right\\]: True""" + + c1 = Categorical(data, ordered=False) + c2 = Categorical(data, ordered=True) + + with pytest.raises(AssertionError, match=msg): + assert_categorical_equal(c1, c2) + + +@pytest.mark.parametrize("obj", ["index", "foo", "pandas"]) +def test_categorical_equal_object_override(obj): + data = [1, 2, 3, 4] + msg = """{obj} are different + +Attribute "ordered" are different +\\[left\\]: False +\\[right\\]: True""".format(obj=obj) + + c1 = Categorical(data, ordered=False) + c2 = Categorical(data, ordered=True) + + with pytest.raises(AssertionError, match=msg): + assert_categorical_equal(c1, c2, obj=obj) diff --git a/pandas/tests/util/test_assert_extension_array_equal.py b/pandas/tests/util/test_assert_extension_array_equal.py new file mode 100644 index 0000000000000..3149078a56783 --- /dev/null +++ b/pandas/tests/util/test_assert_extension_array_equal.py @@ -0,0 +1,102 @@ +# -*- coding: utf-8 -*- + +import numpy as np +import pytest + +from pandas.core.arrays.sparse import SparseArray +from pandas.util.testing import assert_extension_array_equal + + +@pytest.mark.parametrize("kwargs", [ + dict(), # Default is check_exact=False + dict(check_exact=False), dict(check_exact=True) +]) +def test_assert_extension_array_equal_not_exact(kwargs): + # see gh-23709 + arr1 = SparseArray([-0.17387645482451206, 0.3414148016424936]) + arr2 = SparseArray([-0.17387645482451206, 0.3414148016424937]) + + if kwargs.get("check_exact", False): + msg = """\ +ExtensionArray are different + +ExtensionArray values are different \\(50\\.0 %\\) +\\[left\\]: \\[-0\\.17387645482.*, 0\\.341414801642.*\\] +\\[right\\]: \\[-0\\.17387645482.*, 0\\.341414801642.*\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_extension_array_equal(arr1, arr2, **kwargs) + else: + assert_extension_array_equal(arr1, arr2, **kwargs) + + +@pytest.mark.parametrize("check_less_precise", [ + True, False, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 +]) +def test_assert_extension_array_equal_less_precise(check_less_precise): + arr1 = SparseArray([0.5, 0.123456]) + arr2 = SparseArray([0.5, 0.123457]) + + kwargs = dict(check_less_precise=check_less_precise) + + if check_less_precise is False or check_less_precise >= 5: + msg = """\ +ExtensionArray are different + +ExtensionArray values are different \\(50\\.0 %\\) +\\[left\\]: \\[0\\.5, 0\\.123456\\] +\\[right\\]: \\[0\\.5, 0\\.123457\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_extension_array_equal(arr1, arr2, **kwargs) + else: + assert_extension_array_equal(arr1, arr2, **kwargs) + + +def test_assert_extension_array_equal_dtype_mismatch(check_dtype): + end = 5 + kwargs = dict(check_dtype=check_dtype) + + arr1 = SparseArray(np.arange(end, dtype="int64")) + arr2 = SparseArray(np.arange(end, dtype="int32")) + + if check_dtype: + msg = """\ +ExtensionArray are different + +Attribute "dtype" are different +\\[left\\]: Sparse\\[int64, 0\\] +\\[right\\]: Sparse\\[int32, 0\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_extension_array_equal(arr1, arr2, **kwargs) + else: + assert_extension_array_equal(arr1, arr2, **kwargs) + + +def test_assert_extension_array_equal_missing_values(): + arr1 = SparseArray([np.nan, 1, 2, np.nan]) + arr2 = SparseArray([np.nan, 1, 2, 3]) + + msg = """\ +ExtensionArray NA mask are different + +ExtensionArray NA mask values are different \\(25\\.0 %\\) +\\[left\\]: \\[True, False, False, True\\] +\\[right\\]: \\[True, False, False, False\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_extension_array_equal(arr1, arr2) + + +@pytest.mark.parametrize("side", ["left", "right"]) +def test_assert_extension_array_equal_non_extension_array(side): + numpy_array = np.arange(5) + extension_array = SparseArray(numpy_array) + + msg = "{side} is not an ExtensionArray".format(side=side) + args = ((numpy_array, extension_array) if side == "left" + else (extension_array, numpy_array)) + + with pytest.raises(AssertionError, match=msg): + assert_extension_array_equal(*args) diff --git a/pandas/tests/util/test_assert_frame_equal.py b/pandas/tests/util/test_assert_frame_equal.py new file mode 100644 index 0000000000000..1a941c0f0c265 --- /dev/null +++ b/pandas/tests/util/test_assert_frame_equal.py @@ -0,0 +1,209 @@ +# -*- coding: utf-8 -*- + +import pytest + +from pandas import DataFrame +from pandas.util.testing import assert_frame_equal + + +@pytest.fixture(params=[True, False]) +def by_blocks(request): + return request.param + + +def _assert_frame_equal_both(a, b, **kwargs): + """ + Check that two DataFrame equal. + + This check is performed commutatively. + + Parameters + ---------- + a : DataFrame + The first DataFrame to compare. + b : DataFrame + The second DataFrame to compare. + kwargs : dict + The arguments passed to `assert_frame_equal`. + """ + assert_frame_equal(a, b, **kwargs) + assert_frame_equal(b, a, **kwargs) + + +def _assert_not_frame_equal(a, b, **kwargs): + """ + Check that two DataFrame are not equal. + + Parameters + ---------- + a : DataFrame + The first DataFrame to compare. + b : DataFrame + The second DataFrame to compare. + kwargs : dict + The arguments passed to `assert_frame_equal`. + """ + try: + assert_frame_equal(a, b, **kwargs) + msg = "The two DataFrames were equal when they shouldn't have been" + + pytest.fail(msg=msg) + except AssertionError: + pass + + +def _assert_not_frame_equal_both(a, b, **kwargs): + """ + Check that two DataFrame are not equal. + + This check is performed commutatively. + + Parameters + ---------- + a : DataFrame + The first DataFrame to compare. + b : DataFrame + The second DataFrame to compare. + kwargs : dict + The arguments passed to `assert_frame_equal`. + """ + _assert_not_frame_equal(a, b, **kwargs) + _assert_not_frame_equal(b, a, **kwargs) + + +@pytest.mark.parametrize("check_like", [True, False]) +def test_frame_equal_row_order_mismatch(check_like): + df1 = DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}, + index=["a", "b", "c"]) + df2 = DataFrame({"A": [3, 2, 1], "B": [6, 5, 4]}, + index=["c", "b", "a"]) + + if not check_like: # Do not ignore row-column orderings. + msg = "DataFrame.index are different" + with pytest.raises(AssertionError, match=msg): + assert_frame_equal(df1, df2, check_like=check_like) + else: + _assert_frame_equal_both(df1, df2, check_like=check_like) + + +@pytest.mark.parametrize("df1,df2", [ + (DataFrame({"A": [1, 2, 3]}), DataFrame({"A": [1, 2, 3, 4]})), + (DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}), DataFrame({"A": [1, 2, 3]})), +]) +def test_frame_equal_shape_mismatch(df1, df2): + msg = "DataFrame are different" + + with pytest.raises(AssertionError, match=msg): + assert_frame_equal(df1, df2) + + +@pytest.mark.parametrize("df1,df2,msg", [ + # Index + (DataFrame.from_records({"a": [1, 2], + "c": ["l1", "l2"]}, index=["a"]), + DataFrame.from_records({"a": [1.0, 2.0], + "c": ["l1", "l2"]}, index=["a"]), + "DataFrame\\.index are different"), + + # MultiIndex + (DataFrame.from_records({"a": [1, 2], "b": [2.1, 1.5], + "c": ["l1", "l2"]}, index=["a", "b"]), + DataFrame.from_records({"a": [1.0, 2.0], "b": [2.1, 1.5], + "c": ["l1", "l2"]}, index=["a", "b"]), + "MultiIndex level \\[0\\] are different") +]) +def test_frame_equal_index_dtype_mismatch(df1, df2, msg, check_index_type): + kwargs = dict(check_index_type=check_index_type) + + if check_index_type: + with pytest.raises(AssertionError, match=msg): + assert_frame_equal(df1, df2, **kwargs) + else: + assert_frame_equal(df1, df2, **kwargs) + + +def test_empty_dtypes(check_dtype): + columns = ["col1", "col2"] + df1 = DataFrame(columns=columns) + df2 = DataFrame(columns=columns) + + kwargs = dict(check_dtype=check_dtype) + df1["col1"] = df1["col1"].astype("int64") + + if check_dtype: + msg = "Attributes are different" + with pytest.raises(AssertionError, match=msg): + assert_frame_equal(df1, df2, **kwargs) + else: + assert_frame_equal(df1, df2, **kwargs) + + +def test_frame_equal_index_mismatch(): + msg = """DataFrame\\.index are different + +DataFrame\\.index values are different \\(33\\.33333 %\\) +\\[left\\]: Index\\(\\[u?'a', u?'b', u?'c'\\], dtype='object'\\) +\\[right\\]: Index\\(\\[u?'a', u?'b', u?'d'\\], dtype='object'\\)""" + + df1 = DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}, + index=["a", "b", "c"]) + df2 = DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}, + index=["a", "b", "d"]) + + with pytest.raises(AssertionError, match=msg): + assert_frame_equal(df1, df2) + + +def test_frame_equal_columns_mismatch(): + msg = """DataFrame\\.columns are different + +DataFrame\\.columns values are different \\(50\\.0 %\\) +\\[left\\]: Index\\(\\[u?'A', u?'B'\\], dtype='object'\\) +\\[right\\]: Index\\(\\[u?'A', u?'b'\\], dtype='object'\\)""" + + df1 = DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}, + index=["a", "b", "c"]) + df2 = DataFrame({"A": [1, 2, 3], "b": [4, 5, 6]}, + index=["a", "b", "c"]) + + with pytest.raises(AssertionError, match=msg): + assert_frame_equal(df1, df2) + + +def test_frame_equal_block_mismatch(by_blocks): + msg = """DataFrame\\.iloc\\[:, 1\\] are different + +DataFrame\\.iloc\\[:, 1\\] values are different \\(33\\.33333 %\\) +\\[left\\]: \\[4, 5, 6\\] +\\[right\\]: \\[4, 5, 7\\]""" + + df1 = DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) + df2 = DataFrame({"A": [1, 2, 3], "B": [4, 5, 7]}) + + with pytest.raises(AssertionError, match=msg): + assert_frame_equal(df1, df2, by_blocks=by_blocks) + + +@pytest.mark.parametrize("df1,df2,msg", [ + (DataFrame({"A": [u"á", u"à", u"ä"], "E": [u"é", u"è", u"ë"]}), + DataFrame({"A": [u"á", u"à", u"ä"], "E": [u"é", u"è", u"e̊"]}), + """DataFrame\\.iloc\\[:, 1\\] are different + +DataFrame\\.iloc\\[:, 1\\] values are different \\(33\\.33333 %\\) +\\[left\\]: \\[é, è, ë\\] +\\[right\\]: \\[é, è, e̊\\]"""), + (DataFrame({"A": [u"á", u"à", u"ä"], "E": [u"é", u"è", u"ë"]}), + DataFrame({"A": ["a", "a", "a"], "E": ["e", "e", "e"]}), + """DataFrame\\.iloc\\[:, 0\\] are different + +DataFrame\\.iloc\\[:, 0\\] values are different \\(100\\.0 %\\) +\\[left\\]: \\[á, à, ä\\] +\\[right\\]: \\[a, a, a\\]"""), +]) +def test_frame_equal_unicode(df1, df2, msg, by_blocks): + # see gh-20503 + # + # Test ensures that `assert_frame_equals` raises the right exception + # when comparing DataFrames containing differing unicode objects. + with pytest.raises(AssertionError, match=msg): + assert_frame_equal(df1, df2, by_blocks=by_blocks) diff --git a/pandas/tests/util/test_assert_index_equal.py b/pandas/tests/util/test_assert_index_equal.py new file mode 100644 index 0000000000000..b96345d4bd7ce --- /dev/null +++ b/pandas/tests/util/test_assert_index_equal.py @@ -0,0 +1,179 @@ +# -*- coding: utf-8 -*- + +import numpy as np +import pytest + +from pandas import Categorical, Index, MultiIndex, NaT +from pandas.util.testing import assert_index_equal + + +def test_index_equal_levels_mismatch(): + msg = """Index are different + +Index levels are different +\\[left\\]: 1, Int64Index\\(\\[1, 2, 3\\], dtype='int64'\\) +\\[right\\]: 2, MultiIndex\\(levels=\\[\\[u?'A', u?'B'\\], \\[1, 2, 3, 4\\]\\], + labels=\\[\\[0, 0, 1, 1\\], \\[0, 1, 2, 3\\]\\]\\)""" + + idx1 = Index([1, 2, 3]) + idx2 = MultiIndex.from_tuples([("A", 1), ("A", 2), + ("B", 3), ("B", 4)]) + + with pytest.raises(AssertionError, match=msg): + assert_index_equal(idx1, idx2, exact=False) + + +def test_index_equal_values_mismatch(check_exact): + msg = """MultiIndex level \\[1\\] are different + +MultiIndex level \\[1\\] values are different \\(25\\.0 %\\) +\\[left\\]: Int64Index\\(\\[2, 2, 3, 4\\], dtype='int64'\\) +\\[right\\]: Int64Index\\(\\[1, 2, 3, 4\\], dtype='int64'\\)""" + + idx1 = MultiIndex.from_tuples([("A", 2), ("A", 2), + ("B", 3), ("B", 4)]) + idx2 = MultiIndex.from_tuples([("A", 1), ("A", 2), + ("B", 3), ("B", 4)]) + + with pytest.raises(AssertionError, match=msg): + assert_index_equal(idx1, idx2, check_exact=check_exact) + + +def test_index_equal_length_mismatch(check_exact): + msg = """Index are different + +Index length are different +\\[left\\]: 3, Int64Index\\(\\[1, 2, 3\\], dtype='int64'\\) +\\[right\\]: 4, Int64Index\\(\\[1, 2, 3, 4\\], dtype='int64'\\)""" + + idx1 = Index([1, 2, 3]) + idx2 = Index([1, 2, 3, 4]) + + with pytest.raises(AssertionError, match=msg): + assert_index_equal(idx1, idx2, check_exact=check_exact) + + +def test_index_equal_class_mismatch(check_exact): + msg = """Index are different + +Index classes are different +\\[left\\]: Int64Index\\(\\[1, 2, 3\\], dtype='int64'\\) +\\[right\\]: Float64Index\\(\\[1\\.0, 2\\.0, 3\\.0\\], dtype='float64'\\)""" + + idx1 = Index([1, 2, 3]) + idx2 = Index([1, 2, 3.0]) + + with pytest.raises(AssertionError, match=msg): + assert_index_equal(idx1, idx2, exact=True, check_exact=check_exact) + + +def test_index_equal_values_close(check_exact): + idx1 = Index([1, 2, 3.]) + idx2 = Index([1, 2, 3.0000000001]) + + if check_exact: + msg = """Index are different + +Index values are different \\(33\\.33333 %\\) +\\[left\\]: Float64Index\\(\\[1.0, 2.0, 3.0], dtype='float64'\\) +\\[right\\]: Float64Index\\(\\[1.0, 2.0, 3.0000000001\\], dtype='float64'\\)""" + + with pytest.raises(AssertionError, match=msg): + assert_index_equal(idx1, idx2, check_exact=check_exact) + else: + assert_index_equal(idx1, idx2, check_exact=check_exact) + + +def test_index_equal_values_less_close(check_exact, check_less_precise): + idx1 = Index([1, 2, 3.]) + idx2 = Index([1, 2, 3.0001]) + kwargs = dict(check_exact=check_exact, + check_less_precise=check_less_precise) + + if check_exact or not check_less_precise: + msg = """Index are different + +Index values are different \\(33\\.33333 %\\) +\\[left\\]: Float64Index\\(\\[1.0, 2.0, 3.0], dtype='float64'\\) +\\[right\\]: Float64Index\\(\\[1.0, 2.0, 3.0001\\], dtype='float64'\\)""" + + with pytest.raises(AssertionError, match=msg): + assert_index_equal(idx1, idx2, **kwargs) + else: + assert_index_equal(idx1, idx2, **kwargs) + + +def test_index_equal_values_too_far(check_exact, check_less_precise): + idx1 = Index([1, 2, 3]) + idx2 = Index([1, 2, 4]) + kwargs = dict(check_exact=check_exact, + check_less_precise=check_less_precise) + + msg = """Index are different + +Index values are different \\(33\\.33333 %\\) +\\[left\\]: Int64Index\\(\\[1, 2, 3\\], dtype='int64'\\) +\\[right\\]: Int64Index\\(\\[1, 2, 4\\], dtype='int64'\\)""" + + with pytest.raises(AssertionError, match=msg): + assert_index_equal(idx1, idx2, **kwargs) + + +def test_index_equal_level_values_mismatch(check_exact, check_less_precise): + idx1 = MultiIndex.from_tuples([("A", 2), ("A", 2), + ("B", 3), ("B", 4)]) + idx2 = MultiIndex.from_tuples([("A", 1), ("A", 2), + ("B", 3), ("B", 4)]) + kwargs = dict(check_exact=check_exact, + check_less_precise=check_less_precise) + + msg = """MultiIndex level \\[1\\] are different + +MultiIndex level \\[1\\] values are different \\(25\\.0 %\\) +\\[left\\]: Int64Index\\(\\[2, 2, 3, 4\\], dtype='int64'\\) +\\[right\\]: Int64Index\\(\\[1, 2, 3, 4\\], dtype='int64'\\)""" + + with pytest.raises(AssertionError, match=msg): + assert_index_equal(idx1, idx2, **kwargs) + + +@pytest.mark.parametrize("name1,name2", [ + (None, "x"), ("x", "x"), (np.nan, np.nan), (NaT, NaT), (np.nan, NaT) +]) +def test_index_equal_names(name1, name2): + msg = """Index are different + +Attribute "names" are different +\\[left\\]: \\[{name1}\\] +\\[right\\]: \\[{name2}\\]""" + + idx1 = Index([1, 2, 3], name=name1) + idx2 = Index([1, 2, 3], name=name2) + + if name1 == name2 or name1 is name2: + assert_index_equal(idx1, idx2) + else: + name1 = "u?'x'" if name1 == "x" else name1 + name2 = "u?'x'" if name2 == "x" else name2 + msg = msg.format(name1=name1, name2=name2) + + with pytest.raises(AssertionError, match=msg): + assert_index_equal(idx1, idx2) + + +def test_index_equal_category_mismatch(check_categorical): + msg = """Index are different + +Attribute "dtype" are different +\\[left\\]: CategoricalDtype\\(categories=\\[u?'a', u?'b'\\], ordered=False\\) +\\[right\\]: CategoricalDtype\\(categories=\\[u?'a', u?'b', u?'c'\\], \ +ordered=False\\)""" + + idx1 = Index(Categorical(["a", "b"])) + idx2 = Index(Categorical(["a", "b"], categories=["a", "b", "c"])) + + if check_categorical: + with pytest.raises(AssertionError, match=msg): + assert_index_equal(idx1, idx2, check_categorical=check_categorical) + else: + assert_index_equal(idx1, idx2, check_categorical=check_categorical) diff --git a/pandas/tests/util/test_assert_interval_array_equal.py b/pandas/tests/util/test_assert_interval_array_equal.py new file mode 100644 index 0000000000000..c81a27f9b3f19 --- /dev/null +++ b/pandas/tests/util/test_assert_interval_array_equal.py @@ -0,0 +1,80 @@ +# -*- coding: utf-8 -*- + +import pytest + +from pandas import interval_range +from pandas.util.testing import assert_interval_array_equal + + +@pytest.mark.parametrize("kwargs", [ + dict(start=0, periods=4), + dict(start=1, periods=5), + dict(start=5, end=10, closed="left"), +]) +def test_interval_array_equal(kwargs): + arr = interval_range(**kwargs).values + assert_interval_array_equal(arr, arr) + + +def test_interval_array_equal_closed_mismatch(): + kwargs = dict(start=0, periods=5) + arr1 = interval_range(closed="left", **kwargs).values + arr2 = interval_range(closed="right", **kwargs).values + + msg = """\ +IntervalArray are different + +Attribute "closed" are different +\\[left\\]: left +\\[right\\]: right""" + + with pytest.raises(AssertionError, match=msg): + assert_interval_array_equal(arr1, arr2) + + +def test_interval_array_equal_periods_mismatch(): + kwargs = dict(start=0) + arr1 = interval_range(periods=5, **kwargs).values + arr2 = interval_range(periods=6, **kwargs).values + + msg = """\ +IntervalArray.left are different + +IntervalArray.left length are different +\\[left\\]: 5, Int64Index\\(\\[0, 1, 2, 3, 4\\], dtype='int64'\\) +\\[right\\]: 6, Int64Index\\(\\[0, 1, 2, 3, 4, 5\\], dtype='int64'\\)""" + + with pytest.raises(AssertionError, match=msg): + assert_interval_array_equal(arr1, arr2) + + +def test_interval_array_equal_end_mismatch(): + kwargs = dict(start=0, periods=5) + arr1 = interval_range(end=10, **kwargs).values + arr2 = interval_range(end=20, **kwargs).values + + msg = """\ +IntervalArray.left are different + +IntervalArray.left values are different \\(80.0 %\\) +\\[left\\]: Int64Index\\(\\[0, 2, 4, 6, 8\\], dtype='int64'\\) +\\[right\\]: Int64Index\\(\\[0, 4, 8, 12, 16\\], dtype='int64'\\)""" + + with pytest.raises(AssertionError, match=msg): + assert_interval_array_equal(arr1, arr2) + + +def test_interval_array_equal_start_mismatch(): + kwargs = dict(periods=4) + arr1 = interval_range(start=0, **kwargs).values + arr2 = interval_range(start=1, **kwargs).values + + msg = """\ +IntervalArray.left are different + +IntervalArray.left values are different \\(100.0 %\\) +\\[left\\]: Int64Index\\(\\[0, 1, 2, 3\\], dtype='int64'\\) +\\[right\\]: Int64Index\\(\\[1, 2, 3, 4\\], dtype='int64'\\)""" + + with pytest.raises(AssertionError, match=msg): + assert_interval_array_equal(arr1, arr2) diff --git a/pandas/tests/util/test_assert_numpy_array_equal.py b/pandas/tests/util/test_assert_numpy_array_equal.py new file mode 100644 index 0000000000000..99037fcf96194 --- /dev/null +++ b/pandas/tests/util/test_assert_numpy_array_equal.py @@ -0,0 +1,177 @@ +# -*- coding: utf-8 -*- + +import numpy as np +import pytest + +from pandas import Timestamp +from pandas.util.testing import assert_numpy_array_equal + + +def test_assert_numpy_array_equal_shape_mismatch(): + msg = """numpy array are different + +numpy array shapes are different +\\[left\\]: \\(2L*,\\) +\\[right\\]: \\(3L*,\\)""" + + with pytest.raises(AssertionError, match=msg): + assert_numpy_array_equal(np.array([1, 2]), np.array([3, 4, 5])) + + +def test_assert_numpy_array_equal_bad_type(): + expected = "Expected type" + + with pytest.raises(AssertionError, match=expected): + assert_numpy_array_equal(1, 2) + + +@pytest.mark.parametrize("a,b,klass1,klass2", [ + (np.array([1]), 1, "ndarray", "int"), + (1, np.array([1]), "int", "ndarray"), +]) +def test_assert_numpy_array_equal_class_mismatch(a, b, klass1, klass2): + msg = """numpy array are different + +numpy array classes are different +\\[left\\]: {klass1} +\\[right\\]: {klass2}""".format(klass1=klass1, klass2=klass2) + + with pytest.raises(AssertionError, match=msg): + assert_numpy_array_equal(a, b) + + +def test_assert_numpy_array_equal_value_mismatch1(): + msg = """numpy array are different + +numpy array values are different \\(66\\.66667 %\\) +\\[left\\]: \\[nan, 2\\.0, 3\\.0\\] +\\[right\\]: \\[1\\.0, nan, 3\\.0\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_numpy_array_equal(np.array([np.nan, 2, 3]), + np.array([1, np.nan, 3])) + + +def test_assert_numpy_array_equal_value_mismatch2(): + msg = """numpy array are different + +numpy array values are different \\(50\\.0 %\\) +\\[left\\]: \\[1, 2\\] +\\[right\\]: \\[1, 3\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_numpy_array_equal(np.array([1, 2]), np.array([1, 3])) + + +def test_assert_numpy_array_equal_value_mismatch3(): + msg = """numpy array are different + +numpy array values are different \\(16\\.66667 %\\) +\\[left\\]: \\[\\[1, 2\\], \\[3, 4\\], \\[5, 6\\]\\] +\\[right\\]: \\[\\[1, 3\\], \\[3, 4\\], \\[5, 6\\]\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_numpy_array_equal(np.array([[1, 2], [3, 4], [5, 6]]), + np.array([[1, 3], [3, 4], [5, 6]])) + + +def test_assert_numpy_array_equal_value_mismatch4(): + msg = """numpy array are different + +numpy array values are different \\(50\\.0 %\\) +\\[left\\]: \\[1\\.1, 2\\.000001\\] +\\[right\\]: \\[1\\.1, 2.0\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_numpy_array_equal(np.array([1.1, 2.000001]), + np.array([1.1, 2.0])) + + +def test_assert_numpy_array_equal_value_mismatch5(): + msg = """numpy array are different + +numpy array values are different \\(16\\.66667 %\\) +\\[left\\]: \\[\\[1, 2\\], \\[3, 4\\], \\[5, 6\\]\\] +\\[right\\]: \\[\\[1, 3\\], \\[3, 4\\], \\[5, 6\\]\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_numpy_array_equal(np.array([[1, 2], [3, 4], [5, 6]]), + np.array([[1, 3], [3, 4], [5, 6]])) + + +def test_assert_numpy_array_equal_value_mismatch6(): + msg = """numpy array are different + +numpy array values are different \\(25\\.0 %\\) +\\[left\\]: \\[\\[1, 2\\], \\[3, 4\\]\\] +\\[right\\]: \\[\\[1, 3\\], \\[3, 4\\]\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_numpy_array_equal(np.array([[1, 2], [3, 4]]), + np.array([[1, 3], [3, 4]])) + + +def test_assert_numpy_array_equal_shape_mismatch_override(): + msg = """Index are different + +Index shapes are different +\\[left\\]: \\(2L*,\\) +\\[right\\]: \\(3L*,\\)""" + + with pytest.raises(AssertionError, match=msg): + assert_numpy_array_equal(np.array([1, 2]), + np.array([3, 4, 5]), + obj="Index") + + +def test_numpy_array_equal_unicode(): + # see gh-20503 + # + # Test ensures that `assert_numpy_array_equals` raises the right + # exception when comparing np.arrays containing differing unicode objects. + msg = """numpy array are different + +numpy array values are different \\(33\\.33333 %\\) +\\[left\\]: \\[á, à, ä\\] +\\[right\\]: \\[á, à, å\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_numpy_array_equal(np.array([u"á", u"à", u"ä"]), + np.array([u"á", u"à", u"å"])) + + +def test_numpy_array_equal_object(): + a = np.array([Timestamp("2011-01-01"), Timestamp("2011-01-01")]) + b = np.array([Timestamp("2011-01-01"), Timestamp("2011-01-02")]) + + msg = """numpy array are different + +numpy array values are different \\(50\\.0 %\\) +\\[left\\]: \\[2011-01-01 00:00:00, 2011-01-01 00:00:00\\] +\\[right\\]: \\[2011-01-01 00:00:00, 2011-01-02 00:00:00\\]""" + + with pytest.raises(AssertionError, match=msg): + assert_numpy_array_equal(a, b) + + +@pytest.mark.parametrize("other_type", ["same", "copy"]) +@pytest.mark.parametrize("check_same", ["same", "copy"]) +def test_numpy_array_equal_copy_flag(other_type, check_same): + a = np.array([1, 2, 3]) + msg = None + + if other_type == "same": + other = a.view() + else: + other = a.copy() + + if check_same != other_type: + msg = (r"array\(\[1, 2, 3\]\) is not array\(\[1, 2, 3\]\)" + if check_same == "same" + else r"array\(\[1, 2, 3\]\) is array\(\[1, 2, 3\]\)") + + if msg is not None: + with pytest.raises(AssertionError, match=msg): + assert_numpy_array_equal(a, other, check_same=check_same) + else: + assert_numpy_array_equal(a, other, check_same=check_same) diff --git a/pandas/tests/util/test_assert_series_equal.py b/pandas/tests/util/test_assert_series_equal.py new file mode 100644 index 0000000000000..537a0e01ff85f --- /dev/null +++ b/pandas/tests/util/test_assert_series_equal.py @@ -0,0 +1,185 @@ +# -*- coding: utf-8 -*- + +import pytest + +from pandas import Categorical, DataFrame, Series +from pandas.util.testing import assert_series_equal + + +def _assert_series_equal_both(a, b, **kwargs): + """ + Check that two Series equal. + + This check is performed commutatively. + + Parameters + ---------- + a : Series + The first Series to compare. + b : Series + The second Series to compare. + kwargs : dict + The arguments passed to `assert_series_equal`. + """ + assert_series_equal(a, b, **kwargs) + assert_series_equal(b, a, **kwargs) + + +def _assert_not_series_equal(a, b, **kwargs): + """ + Check that two Series are not equal. + + Parameters + ---------- + a : Series + The first Series to compare. + b : Series + The second Series to compare. + kwargs : dict + The arguments passed to `assert_series_equal`. + """ + try: + assert_series_equal(a, b, **kwargs) + msg = "The two Series were equal when they shouldn't have been" + + pytest.fail(msg=msg) + except AssertionError: + pass + + +def _assert_not_series_equal_both(a, b, **kwargs): + """ + Check that two Series are not equal. + + This check is performed commutatively. + + Parameters + ---------- + a : Series + The first Series to compare. + b : Series + The second Series to compare. + kwargs : dict + The arguments passed to `assert_series_equal`. + """ + _assert_not_series_equal(a, b, **kwargs) + _assert_not_series_equal(b, a, **kwargs) + + +@pytest.mark.parametrize("data", [ + range(3), list("abc"), list(u"áàä"), +]) +def test_series_equal(data): + _assert_series_equal_both(Series(data), Series(data)) + + +@pytest.mark.parametrize("data1,data2", [ + (range(3), range(1, 4)), + (list("abc"), list("xyz")), + (list(u"áàä"), list(u"éèë")), + (list(u"áàä"), list(b"aaa")), + (range(3), range(4)), +]) +def test_series_not_equal_value_mismatch(data1, data2): + _assert_not_series_equal_both(Series(data1), Series(data2)) + + +@pytest.mark.parametrize("kwargs", [ + dict(dtype="float64"), # dtype mismatch + dict(index=[1, 2, 4]), # index mismatch + dict(name="foo"), # name mismatch +]) +def test_series_not_equal_metadata_mismatch(kwargs): + data = range(3) + s1 = Series(data) + + s2 = Series(data, **kwargs) + _assert_not_series_equal_both(s1, s2) + + +@pytest.mark.parametrize("data1,data2", [(0.12345, 0.12346), (0.1235, 0.1236)]) +@pytest.mark.parametrize("dtype", ["float32", "float64"]) +@pytest.mark.parametrize("check_less_precise", [False, True, 0, 1, 2, 3, 10]) +def test_less_precise(data1, data2, dtype, check_less_precise): + s1 = Series([data1], dtype=dtype) + s2 = Series([data2], dtype=dtype) + + kwargs = dict(check_less_precise=check_less_precise) + + if ((check_less_precise is False or check_less_precise == 10) or + ((check_less_precise is True or check_less_precise >= 3) and + abs(data1 - data2) >= 0.0001)): + msg = "Series values are different" + with pytest.raises(AssertionError, match=msg): + assert_series_equal(s1, s2, **kwargs) + else: + _assert_series_equal_both(s1, s2, **kwargs) + + +@pytest.mark.parametrize("s1,s2,msg", [ + # Index + (Series(["l1", "l2"], index=[1, 2]), + Series(["l1", "l2"], index=[1., 2.]), + "Series\\.index are different"), + + # MultiIndex + (DataFrame.from_records({"a": [1, 2], "b": [2.1, 1.5], + "c": ["l1", "l2"]}, index=["a", "b"]).c, + DataFrame.from_records({"a": [1., 2.], "b": [2.1, 1.5], + "c": ["l1", "l2"]}, index=["a", "b"]).c, + "MultiIndex level \\[0\\] are different") +]) +def test_series_equal_index_dtype(s1, s2, msg, check_index_type): + kwargs = dict(check_index_type=check_index_type) + + if check_index_type: + with pytest.raises(AssertionError, match=msg): + assert_series_equal(s1, s2, **kwargs) + else: + assert_series_equal(s1, s2, **kwargs) + + +def test_series_equal_length_mismatch(check_less_precise): + msg = """Series are different + +Series length are different +\\[left\\]: 3, RangeIndex\\(start=0, stop=3, step=1\\) +\\[right\\]: 4, RangeIndex\\(start=0, stop=4, step=1\\)""" + + s1 = Series([1, 2, 3]) + s2 = Series([1, 2, 3, 4]) + + with pytest.raises(AssertionError, match=msg): + assert_series_equal(s1, s2, check_less_precise=check_less_precise) + + +def test_series_equal_values_mismatch(check_less_precise): + msg = """Series are different + +Series values are different \\(33\\.33333 %\\) +\\[left\\]: \\[1, 2, 3\\] +\\[right\\]: \\[1, 2, 4\\]""" + + s1 = Series([1, 2, 3]) + s2 = Series([1, 2, 4]) + + with pytest.raises(AssertionError, match=msg): + assert_series_equal(s1, s2, check_less_precise=check_less_precise) + + +def test_series_equal_categorical_mismatch(check_categorical): + msg = """Attributes are different + +Attribute "dtype" are different +\\[left\\]: CategoricalDtype\\(categories=\\[u?'a', u?'b'\\], ordered=False\\) +\\[right\\]: CategoricalDtype\\(categories=\\[u?'a', u?'b', u?'c'\\], \ +ordered=False\\)""" + + s1 = Series(Categorical(["a", "b"])) + s2 = Series(Categorical(["a", "b"], categories=list("abc"))) + + if check_categorical: + with pytest.raises(AssertionError, match=msg): + assert_series_equal(s1, s2, check_categorical=check_categorical) + else: + _assert_series_equal_both(s1, s2, check_categorical=check_categorical) diff --git a/pandas/tests/util/test_deprecate.py b/pandas/tests/util/test_deprecate.py new file mode 100644 index 0000000000000..7fa7989eff690 --- /dev/null +++ b/pandas/tests/util/test_deprecate.py @@ -0,0 +1,63 @@ +from textwrap import dedent + +import pytest + +from pandas.util._decorators import deprecate + +import pandas.util.testing as tm + + +def new_func(): + """ + This is the summary. The deprecate directive goes next. + + This is the extended summary. The deprecate directive goes before this. + """ + return 'new_func called' + + +def new_func_no_docstring(): + return 'new_func_no_docstring called' + + +def new_func_wrong_docstring(): + """Summary should be in the next line.""" + return 'new_func_wrong_docstring called' + + +def new_func_with_deprecation(): + """ + This is the summary. The deprecate directive goes next. + + .. deprecated:: 1.0 + Use new_func instead. + + This is the extended summary. The deprecate directive goes before this. + """ + pass + + +def test_deprecate_ok(): + depr_func = deprecate('depr_func', new_func, '1.0', + msg='Use new_func instead.') + + with tm.assert_produces_warning(FutureWarning): + result = depr_func() + + assert result == 'new_func called' + assert depr_func.__doc__ == dedent(new_func_with_deprecation.__doc__) + + +def test_deprecate_no_docstring(): + depr_func = deprecate('depr_func', new_func_no_docstring, '1.0', + msg='Use new_func instead.') + with tm.assert_produces_warning(FutureWarning): + result = depr_func() + assert result == 'new_func_no_docstring called' + + +def test_deprecate_wrong_docstring(): + with pytest.raises(AssertionError, match='deprecate needs a correctly ' + 'formatted docstring'): + deprecate('depr_func', new_func_wrong_docstring, '1.0', + msg='Use new_func instead.') diff --git a/pandas/tests/util/test_deprecate_kwarg.py b/pandas/tests/util/test_deprecate_kwarg.py new file mode 100644 index 0000000000000..7287df9db8a62 --- /dev/null +++ b/pandas/tests/util/test_deprecate_kwarg.py @@ -0,0 +1,93 @@ +# -*- coding: utf-8 -*- +import pytest + +from pandas.util._decorators import deprecate_kwarg + +import pandas.util.testing as tm + + +@deprecate_kwarg("old", "new") +def _f1(new=False): + return new + + +_f2_mappings = {"yes": True, "no": False} + + +@deprecate_kwarg("old", "new", _f2_mappings) +def _f2(new=False): + return new + + +def _f3_mapping(x): + return x + 1 + + +@deprecate_kwarg("old", "new", _f3_mapping) +def _f3(new=0): + return new + + +@pytest.mark.parametrize("key,klass", [ + ("old", FutureWarning), + ("new", None) +]) +def test_deprecate_kwarg(key, klass): + x = 78 + + with tm.assert_produces_warning(klass): + assert _f1(**{key: x}) == x + + +@pytest.mark.parametrize("key", list(_f2_mappings.keys())) +def test_dict_deprecate_kwarg(key): + with tm.assert_produces_warning(FutureWarning): + assert _f2(old=key) == _f2_mappings[key] + + +@pytest.mark.parametrize("key", ["bogus", 12345, -1.23]) +def test_missing_deprecate_kwarg(key): + with tm.assert_produces_warning(FutureWarning): + assert _f2(old=key) == key + + +@pytest.mark.parametrize("x", [1, -1.4, 0]) +def test_callable_deprecate_kwarg(x): + with tm.assert_produces_warning(FutureWarning): + assert _f3(old=x) == _f3_mapping(x) + + +def test_callable_deprecate_kwarg_fail(): + msg = "((can only|cannot) concatenate)|(must be str)|(Can't convert)" + + with pytest.raises(TypeError, match=msg): + _f3(old="hello") + + +def test_bad_deprecate_kwarg(): + msg = "mapping from old to new argument values must be dict or callable!" + + with pytest.raises(TypeError, match=msg): + @deprecate_kwarg("old", "new", 0) + def f4(new=None): + return new + + +@deprecate_kwarg("old", None) +def _f4(old=True, unchanged=True): + return old, unchanged + + +@pytest.mark.parametrize("key", ["old", "unchanged"]) +def test_deprecate_keyword(key): + x = 9 + + if key == "old": + klass = FutureWarning + expected = (x, True) + else: + klass = None + expected = (True, x) + + with tm.assert_produces_warning(klass): + assert _f4(**{key: x}) == expected diff --git a/pandas/tests/util/test_hashing.py b/pandas/tests/util/test_hashing.py index 9f5b4f7b90d9f..d36de931e2610 100644 --- a/pandas/tests/util/test_hashing.py +++ b/pandas/tests/util/test_hashing.py @@ -10,272 +10,319 @@ import pandas.util.testing as tm -class TestHashing(object): - - @pytest.fixture(params=[ - Series([1, 2, 3] * 3, dtype='int32'), - Series([None, 2.5, 3.5] * 3, dtype='float32'), - Series(['a', 'b', 'c'] * 3, dtype='category'), - Series(['d', 'e', 'f'] * 3), - Series([True, False, True] * 3), - Series(pd.date_range('20130101', periods=9)), - Series(pd.date_range('20130101', periods=9, tz='US/Eastern')), - Series(pd.timedelta_range('2000', periods=9))]) - def series(self, request): - return request.param - - def test_consistency(self): - # check that our hash doesn't change because of a mistake - # in the actual code; this is the ground truth - result = hash_pandas_object(Index(['foo', 'bar', 'baz'])) - expected = Series(np.array([3600424527151052760, 1374399572096150070, - 477881037637427054], dtype='uint64'), - index=['foo', 'bar', 'baz']) - tm.assert_series_equal(result, expected) - - def test_hash_array(self, series): - a = series.values - tm.assert_numpy_array_equal(hash_array(a), hash_array(a)) - - def test_hash_array_mixed(self): - result1 = hash_array(np.array([3, 4, 'All'])) - result2 = hash_array(np.array(['3', '4', 'All'])) - result3 = hash_array(np.array([3, 4, 'All'], dtype=object)) - tm.assert_numpy_array_equal(result1, result2) - tm.assert_numpy_array_equal(result1, result3) - - @pytest.mark.parametrize('val', [5, 'foo', pd.Timestamp('20130101')]) - def test_hash_array_errors(self, val): - msg = 'must pass a ndarray-like' - with pytest.raises(TypeError, match=msg): - hash_array(val) - - def check_equal(self, obj, **kwargs): - a = hash_pandas_object(obj, **kwargs) - b = hash_pandas_object(obj, **kwargs) - tm.assert_series_equal(a, b) - - kwargs.pop('index', None) - a = hash_pandas_object(obj, **kwargs) - b = hash_pandas_object(obj, **kwargs) - tm.assert_series_equal(a, b) - - def check_not_equal_with_index(self, obj): - - # check that we are not hashing the same if - # we include the index - if not isinstance(obj, Index): - a = hash_pandas_object(obj, index=True) - b = hash_pandas_object(obj, index=False) - if len(obj): - assert not (a == b).all() - - def test_hash_tuples(self): - tups = [(1, 'one'), (1, 'two'), (2, 'one')] - result = hash_tuples(tups) - expected = hash_pandas_object(MultiIndex.from_tuples(tups)).values - tm.assert_numpy_array_equal(result, expected) - - result = hash_tuples(tups[0]) - assert result == expected[0] - - @pytest.mark.parametrize('tup', [ - (1, 'one'), (1, np.nan), (1.0, pd.NaT, 'A'), - ('A', pd.Timestamp("2012-01-01"))]) - def test_hash_tuple(self, tup): - # test equivalence between hash_tuples and hash_tuple - result = hash_tuple(tup) - expected = hash_tuples([tup])[0] - assert result == expected - - @pytest.mark.parametrize('val', [ - 1, 1.4, 'A', b'A', u'A', pd.Timestamp("2012-01-01"), - pd.Timestamp("2012-01-01", tz='Europe/Brussels'), - datetime.datetime(2012, 1, 1), - pd.Timestamp("2012-01-01", tz='EST').to_pydatetime(), - pd.Timedelta('1 days'), datetime.timedelta(1), - pd.Period('2012-01-01', freq='D'), pd.Interval(0, 1), - np.nan, pd.NaT, None]) - def test_hash_scalar(self, val): - result = _hash_scalar(val) - expected = hash_array(np.array([val], dtype=object), categorize=True) - assert result[0] == expected[0] - - @pytest.mark.parametrize('val', [5, 'foo', pd.Timestamp('20130101')]) - def test_hash_tuples_err(self, val): - msg = 'must be convertible to a list-of-tuples' - with pytest.raises(TypeError, match=msg): - hash_tuples(val) - - def test_multiindex_unique(self): - mi = MultiIndex.from_tuples([(118, 472), (236, 118), - (51, 204), (102, 51)]) - assert mi.is_unique is True - result = hash_pandas_object(mi) - assert result.is_unique is True - - def test_multiindex_objects(self): - mi = MultiIndex(levels=[['b', 'd', 'a'], [1, 2, 3]], - labels=[[0, 1, 0, 2], [2, 0, 0, 1]], - names=['col1', 'col2']) - recons = mi._sort_levels_monotonic() - - # these are equal - assert mi.equals(recons) - assert Index(mi.values).equals(Index(recons.values)) - - # _hashed_values and hash_pandas_object(..., index=False) - # equivalency - expected = hash_pandas_object( - mi, index=False).values - result = mi._hashed_values - tm.assert_numpy_array_equal(result, expected) - - expected = hash_pandas_object( - recons, index=False).values - result = recons._hashed_values - tm.assert_numpy_array_equal(result, expected) - - expected = mi._hashed_values - result = recons._hashed_values - - # values should match, but in different order - tm.assert_numpy_array_equal(np.sort(result), - np.sort(expected)) - - @pytest.mark.parametrize('obj', [ - Series([1, 2, 3]), - Series([1.0, 1.5, 3.2]), - Series([1.0, 1.5, np.nan]), - Series([1.0, 1.5, 3.2], index=[1.5, 1.1, 3.3]), - Series(['a', 'b', 'c']), - Series(['a', np.nan, 'c']), - Series(['a', None, 'c']), - Series([True, False, True]), - Series(), - Index([1, 2, 3]), - Index([True, False, True]), - DataFrame({'x': ['a', 'b', 'c'], 'y': [1, 2, 3]}), - DataFrame(), - tm.makeMissingDataframe(), - tm.makeMixedDataFrame(), - tm.makeTimeDataFrame(), - tm.makeTimeSeries(), - tm.makeTimedeltaIndex(), - tm.makePeriodIndex(), - Series(tm.makePeriodIndex()), - Series(pd.date_range('20130101', periods=3, tz='US/Eastern')), - MultiIndex.from_product([range(5), ['foo', 'bar', 'baz'], - pd.date_range('20130101', periods=2)]), - MultiIndex.from_product([pd.CategoricalIndex(list('aabc')), range(3)]) - ]) - def test_hash_pandas_object(self, obj): - self.check_equal(obj) - self.check_not_equal_with_index(obj) - - def test_hash_pandas_object2(self, series): - self.check_equal(series) - self.check_not_equal_with_index(series) - - @pytest.mark.parametrize('obj', [ - Series([], dtype='float64'), Series([], dtype='object'), Index([])]) - def test_hash_pandas_empty_object(self, obj): - # these are by-definition the same with - # or w/o the index as the data is empty - self.check_equal(obj) - - @pytest.mark.parametrize('s1', [ - Series(['a', 'b', 'c', 'd']), - Series([1000, 2000, 3000, 4000]), - Series(pd.date_range(0, periods=4))]) - @pytest.mark.parametrize('categorize', [True, False]) - def test_categorical_consistency(self, s1, categorize): - # GH15143 - # Check that categoricals hash consistent with their values, not codes - # This should work for categoricals of any dtype - s2 = s1.astype('category').cat.set_categories(s1) - s3 = s2.cat.set_categories(list(reversed(s1))) - - # These should all hash identically - h1 = hash_pandas_object(s1, categorize=categorize) - h2 = hash_pandas_object(s2, categorize=categorize) - h3 = hash_pandas_object(s3, categorize=categorize) - tm.assert_series_equal(h1, h2) - tm.assert_series_equal(h1, h3) - - def test_categorical_with_nan_consistency(self): - c = pd.Categorical.from_codes( - [-1, 0, 1, 2, 3, 4], - categories=pd.date_range('2012-01-01', periods=5, name='B')) - expected = hash_array(c, categorize=False) - c = pd.Categorical.from_codes( - [-1, 0], - categories=[pd.Timestamp('2012-01-01')]) - result = hash_array(c, categorize=False) - assert result[0] in expected - assert result[1] in expected - - @pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") - def test_pandas_errors(self): - with pytest.raises(TypeError): - hash_pandas_object(pd.Timestamp('20130101')) - - obj = tm.makePanel() - - with pytest.raises(TypeError): - hash_pandas_object(obj) - - def test_hash_keys(self): - # using different hash keys, should have different hashes - # for the same data - - # this only matters for object dtypes - obj = Series(list('abc')) - a = hash_pandas_object(obj, hash_key='9876543210123456') - b = hash_pandas_object(obj, hash_key='9876543210123465') - assert (a != b).all() - - def test_invalid_key(self): - # this only matters for object dtypes - msg = 'key should be a 16-byte string encoded' - with pytest.raises(ValueError, match=msg): - hash_pandas_object(Series(list('abc')), hash_key='foo') - - def test_alread_encoded(self): - # if already encoded then ok - - obj = Series(list('abc')).str.encode('utf8') - self.check_equal(obj) - - def test_alternate_encoding(self): - - obj = Series(list('abc')) - self.check_equal(obj, encoding='ascii') - - @pytest.mark.parametrize('l_exp', range(8)) - @pytest.mark.parametrize('l_add', [0, 1]) - def test_same_len_hash_collisions(self, l_exp, l_add): - length = 2**(l_exp + 8) + l_add - s = tm.rands_array(length, 2) - result = hash_array(s, 'utf8') - assert not result[0] == result[1] - - def test_hash_collisions(self): - - # hash collisions are bad - # https://github.com/pandas-dev/pandas/issues/14711#issuecomment-264885726 - L = ['Ingrid-9Z9fKIZmkO7i7Cn51Li34pJm44fgX6DYGBNj3VPlOH50m7HnBlPxfIwFMrcNJNMP6PSgLmwWnInciMWrCSAlLEvt7JkJl4IxiMrVbXSa8ZQoVaq5xoQPjltuJEfwdNlO6jo8qRRHvD8sBEBMQASrRa6TsdaPTPCBo3nwIBpE7YzzmyH0vMBhjQZLx1aCT7faSEx7PgFxQhHdKFWROcysamgy9iVj8DO2Fmwg1NNl93rIAqC3mdqfrCxrzfvIY8aJdzin2cHVzy3QUJxZgHvtUtOLxoqnUHsYbNTeq0xcLXpTZEZCxD4PGubIuCNf32c33M7HFsnjWSEjE2yVdWKhmSVodyF8hFYVmhYnMCztQnJrt3O8ZvVRXd5IKwlLexiSp4h888w7SzAIcKgc3g5XQJf6MlSMftDXm9lIsE1mJNiJEv6uY6pgvC3fUPhatlR5JPpVAHNSbSEE73MBzJrhCAbOLXQumyOXigZuPoME7QgJcBalliQol7YZ9', # noqa - 'Tim-b9MddTxOWW2AT1Py6vtVbZwGAmYCjbp89p8mxsiFoVX4FyDOF3wFiAkyQTUgwg9sVqVYOZo09Dh1AzhFHbgij52ylF0SEwgzjzHH8TGY8Lypart4p4onnDoDvVMBa0kdthVGKl6K0BDVGzyOXPXKpmnMF1H6rJzqHJ0HywfwS4XYpVwlAkoeNsiicHkJUFdUAhG229INzvIAiJuAHeJDUoyO4DCBqtoZ5TDend6TK7Y914yHlfH3g1WZu5LksKv68VQHJriWFYusW5e6ZZ6dKaMjTwEGuRgdT66iU5nqWTHRH8WSzpXoCFwGcTOwyuqPSe0fTe21DVtJn1FKj9F9nEnR9xOvJUO7E0piCIF4Ad9yAIDY4DBimpsTfKXCu1vdHpKYerzbndfuFe5AhfMduLYZJi5iAw8qKSwR5h86ttXV0Mc0QmXz8dsRvDgxjXSmupPxBggdlqUlC828hXiTPD7am0yETBV0F3bEtvPiNJfremszcV8NcqAoARMe'] # noqa - - # these should be different! - result1 = hash_array(np.asarray(L[0:1], dtype=object), 'utf8') - expected1 = np.array([14963968704024874985], dtype=np.uint64) - tm.assert_numpy_array_equal(result1, expected1) - - result2 = hash_array(np.asarray(L[1:2], dtype=object), 'utf8') - expected2 = np.array([16428432627716348016], dtype=np.uint64) - tm.assert_numpy_array_equal(result2, expected2) - - result = hash_array(np.asarray(L, dtype=object), 'utf8') - tm.assert_numpy_array_equal( - result, np.concatenate([expected1, expected2], axis=0)) +@pytest.fixture(params=[ + Series([1, 2, 3] * 3, dtype="int32"), + Series([None, 2.5, 3.5] * 3, dtype="float32"), + Series(["a", "b", "c"] * 3, dtype="category"), + Series(["d", "e", "f"] * 3), + Series([True, False, True] * 3), + Series(pd.date_range("20130101", periods=9)), + Series(pd.date_range("20130101", periods=9, tz="US/Eastern")), + Series(pd.timedelta_range("2000", periods=9))]) +def series(request): + return request.param + + +@pytest.fixture(params=[True, False]) +def index(request): + return request.param + + +def _check_equal(obj, **kwargs): + """ + Check that hashing an objects produces the same value each time. + + Parameters + ---------- + obj : object + The object to hash. + kwargs : kwargs + Keyword arguments to pass to the hashing function. + """ + a = hash_pandas_object(obj, **kwargs) + b = hash_pandas_object(obj, **kwargs) + tm.assert_series_equal(a, b) + + +def _check_not_equal_with_index(obj): + """ + Check the hash of an object with and without its index is not the same. + + Parameters + ---------- + obj : object + The object to hash. + """ + if not isinstance(obj, Index): + a = hash_pandas_object(obj, index=True) + b = hash_pandas_object(obj, index=False) + + if len(obj): + assert not (a == b).all() + + +def test_consistency(): + # Check that our hash doesn't change because of a mistake + # in the actual code; this is the ground truth. + result = hash_pandas_object(Index(["foo", "bar", "baz"])) + expected = Series(np.array([3600424527151052760, 1374399572096150070, + 477881037637427054], dtype="uint64"), + index=["foo", "bar", "baz"]) + tm.assert_series_equal(result, expected) + + +def test_hash_array(series): + arr = series.values + tm.assert_numpy_array_equal(hash_array(arr), hash_array(arr)) + + +@pytest.mark.parametrize("arr2", [ + np.array([3, 4, "All"]), + np.array([3, 4, "All"], dtype=object), +]) +def test_hash_array_mixed(arr2): + result1 = hash_array(np.array(["3", "4", "All"])) + result2 = hash_array(arr2) + + tm.assert_numpy_array_equal(result1, result2) + + +@pytest.mark.parametrize("val", [5, "foo", pd.Timestamp("20130101")]) +def test_hash_array_errors(val): + msg = "must pass a ndarray-like" + with pytest.raises(TypeError, match=msg): + hash_array(val) + + +def test_hash_tuples(): + tuples = [(1, "one"), (1, "two"), (2, "one")] + result = hash_tuples(tuples) + + expected = hash_pandas_object(MultiIndex.from_tuples(tuples)).values + tm.assert_numpy_array_equal(result, expected) + + result = hash_tuples(tuples[0]) + assert result == expected[0] + + +@pytest.mark.parametrize("tup", [ + (1, "one"), (1, np.nan), (1.0, pd.NaT, "A"), + ("A", pd.Timestamp("2012-01-01"))]) +def test_hash_tuple(tup): + # Test equivalence between + # hash_tuples and hash_tuple. + result = hash_tuple(tup) + expected = hash_tuples([tup])[0] + + assert result == expected + + +@pytest.mark.parametrize("val", [ + 1, 1.4, "A", b"A", u"A", pd.Timestamp("2012-01-01"), + pd.Timestamp("2012-01-01", tz="Europe/Brussels"), + datetime.datetime(2012, 1, 1), + pd.Timestamp("2012-01-01", tz="EST").to_pydatetime(), + pd.Timedelta("1 days"), datetime.timedelta(1), + pd.Period("2012-01-01", freq="D"), pd.Interval(0, 1), + np.nan, pd.NaT, None]) +def test_hash_scalar(val): + result = _hash_scalar(val) + expected = hash_array(np.array([val], dtype=object), categorize=True) + + assert result[0] == expected[0] + + +@pytest.mark.parametrize("val", [5, "foo", pd.Timestamp("20130101")]) +def test_hash_tuples_err(val): + msg = "must be convertible to a list-of-tuples" + with pytest.raises(TypeError, match=msg): + hash_tuples(val) + + +def test_multiindex_unique(): + mi = MultiIndex.from_tuples([(118, 472), (236, 118), + (51, 204), (102, 51)]) + assert mi.is_unique is True + + result = hash_pandas_object(mi) + assert result.is_unique is True + + +def test_multiindex_objects(): + mi = MultiIndex(levels=[["b", "d", "a"], [1, 2, 3]], + codes=[[0, 1, 0, 2], [2, 0, 0, 1]], + names=["col1", "col2"]) + recons = mi._sort_levels_monotonic() + + # These are equal. + assert mi.equals(recons) + assert Index(mi.values).equals(Index(recons.values)) + + # _hashed_values and hash_pandas_object(..., index=False) equivalency. + expected = hash_pandas_object(mi, index=False).values + result = mi._hashed_values + + tm.assert_numpy_array_equal(result, expected) + + expected = hash_pandas_object(recons, index=False).values + result = recons._hashed_values + + tm.assert_numpy_array_equal(result, expected) + + expected = mi._hashed_values + result = recons._hashed_values + + # Values should match, but in different order. + tm.assert_numpy_array_equal(np.sort(result), np.sort(expected)) + + +@pytest.mark.parametrize("obj", [ + Series([1, 2, 3]), + Series([1.0, 1.5, 3.2]), + Series([1.0, 1.5, np.nan]), + Series([1.0, 1.5, 3.2], index=[1.5, 1.1, 3.3]), + Series(["a", "b", "c"]), + Series(["a", np.nan, "c"]), + Series(["a", None, "c"]), + Series([True, False, True]), + Series(), + Index([1, 2, 3]), + Index([True, False, True]), + DataFrame({"x": ["a", "b", "c"], "y": [1, 2, 3]}), + DataFrame(), + tm.makeMissingDataframe(), + tm.makeMixedDataFrame(), + tm.makeTimeDataFrame(), + tm.makeTimeSeries(), + tm.makeTimedeltaIndex(), + tm.makePeriodIndex(), + Series(tm.makePeriodIndex()), + Series(pd.date_range("20130101", periods=3, tz="US/Eastern")), + MultiIndex.from_product([range(5), ["foo", "bar", "baz"], + pd.date_range("20130101", periods=2)]), + MultiIndex.from_product([pd.CategoricalIndex(list("aabc")), range(3)]) +]) +def test_hash_pandas_object(obj, index): + _check_equal(obj, index=index) + _check_not_equal_with_index(obj) + + +def test_hash_pandas_object2(series, index): + _check_equal(series, index=index) + _check_not_equal_with_index(series) + + +@pytest.mark.parametrize("obj", [ + Series([], dtype="float64"), Series([], dtype="object"), Index([])]) +def test_hash_pandas_empty_object(obj, index): + # These are by-definition the same with + # or without the index as the data is empty. + _check_equal(obj, index=index) + + +@pytest.mark.parametrize("s1", [ + Series(["a", "b", "c", "d"]), + Series([1000, 2000, 3000, 4000]), + Series(pd.date_range(0, periods=4))]) +@pytest.mark.parametrize("categorize", [True, False]) +def test_categorical_consistency(s1, categorize): + # see gh-15143 + # + # Check that categoricals hash consistent with their values, + # not codes. This should work for categoricals of any dtype. + s2 = s1.astype("category").cat.set_categories(s1) + s3 = s2.cat.set_categories(list(reversed(s1))) + + # These should all hash identically. + h1 = hash_pandas_object(s1, categorize=categorize) + h2 = hash_pandas_object(s2, categorize=categorize) + h3 = hash_pandas_object(s3, categorize=categorize) + + tm.assert_series_equal(h1, h2) + tm.assert_series_equal(h1, h3) + + +def test_categorical_with_nan_consistency(): + c = pd.Categorical.from_codes( + [-1, 0, 1, 2, 3, 4], + categories=pd.date_range("2012-01-01", periods=5, name="B")) + expected = hash_array(c, categorize=False) + + c = pd.Categorical.from_codes( + [-1, 0], + categories=[pd.Timestamp("2012-01-01")]) + result = hash_array(c, categorize=False) + + assert result[0] in expected + assert result[1] in expected + + +@pytest.mark.filterwarnings("ignore:\\nPanel:FutureWarning") +@pytest.mark.parametrize("obj", [pd.Timestamp("20130101"), tm.makePanel()]) +def test_pandas_errors(obj): + msg = "Unexpected type for hashing" + with pytest.raises(TypeError, match=msg): + hash_pandas_object(obj) + + +def test_hash_keys(): + # Using different hash keys, should have + # different hashes for the same data. + # + # This only matters for object dtypes. + obj = Series(list("abc")) + + a = hash_pandas_object(obj, hash_key="9876543210123456") + b = hash_pandas_object(obj, hash_key="9876543210123465") + + assert (a != b).all() + + +def test_invalid_key(): + # This only matters for object dtypes. + msg = "key should be a 16-byte string encoded" + + with pytest.raises(ValueError, match=msg): + hash_pandas_object(Series(list("abc")), hash_key="foo") + + +def test_already_encoded(index): + # If already encoded, then ok. + obj = Series(list("abc")).str.encode("utf8") + _check_equal(obj, index=index) + + +def test_alternate_encoding(index): + obj = Series(list("abc")) + _check_equal(obj, index=index, encoding="ascii") + + +@pytest.mark.parametrize("l_exp", range(8)) +@pytest.mark.parametrize("l_add", [0, 1]) +def test_same_len_hash_collisions(l_exp, l_add): + length = 2**(l_exp + 8) + l_add + s = tm.rands_array(length, 2) + + result = hash_array(s, "utf8") + assert not result[0] == result[1] + + +def test_hash_collisions(): + # Hash collisions are bad. + # + # https://github.com/pandas-dev/pandas/issues/14711#issuecomment-264885726 + hashes = ["Ingrid-9Z9fKIZmkO7i7Cn51Li34pJm44fgX6DYGBNj3VPlOH50m7HnBlPxfIwFMrcNJNMP6PSgLmwWnInciMWrCSAlLEvt7JkJl4IxiMrVbXSa8ZQoVaq5xoQPjltuJEfwdNlO6jo8qRRHvD8sBEBMQASrRa6TsdaPTPCBo3nwIBpE7YzzmyH0vMBhjQZLx1aCT7faSEx7PgFxQhHdKFWROcysamgy9iVj8DO2Fmwg1NNl93rIAqC3mdqfrCxrzfvIY8aJdzin2cHVzy3QUJxZgHvtUtOLxoqnUHsYbNTeq0xcLXpTZEZCxD4PGubIuCNf32c33M7HFsnjWSEjE2yVdWKhmSVodyF8hFYVmhYnMCztQnJrt3O8ZvVRXd5IKwlLexiSp4h888w7SzAIcKgc3g5XQJf6MlSMftDXm9lIsE1mJNiJEv6uY6pgvC3fUPhatlR5JPpVAHNSbSEE73MBzJrhCAbOLXQumyOXigZuPoME7QgJcBalliQol7YZ9", # noqa + "Tim-b9MddTxOWW2AT1Py6vtVbZwGAmYCjbp89p8mxsiFoVX4FyDOF3wFiAkyQTUgwg9sVqVYOZo09Dh1AzhFHbgij52ylF0SEwgzjzHH8TGY8Lypart4p4onnDoDvVMBa0kdthVGKl6K0BDVGzyOXPXKpmnMF1H6rJzqHJ0HywfwS4XYpVwlAkoeNsiicHkJUFdUAhG229INzvIAiJuAHeJDUoyO4DCBqtoZ5TDend6TK7Y914yHlfH3g1WZu5LksKv68VQHJriWFYusW5e6ZZ6dKaMjTwEGuRgdT66iU5nqWTHRH8WSzpXoCFwGcTOwyuqPSe0fTe21DVtJn1FKj9F9nEnR9xOvJUO7E0piCIF4Ad9yAIDY4DBimpsTfKXCu1vdHpKYerzbndfuFe5AhfMduLYZJi5iAw8qKSwR5h86ttXV0Mc0QmXz8dsRvDgxjXSmupPxBggdlqUlC828hXiTPD7am0yETBV0F3bEtvPiNJfremszcV8NcqAoARMe"] # noqa + + # These should be different. + result1 = hash_array(np.asarray(hashes[0:1], dtype=object), "utf8") + expected1 = np.array([14963968704024874985], dtype=np.uint64) + tm.assert_numpy_array_equal(result1, expected1) + + result2 = hash_array(np.asarray(hashes[1:2], dtype=object), "utf8") + expected2 = np.array([16428432627716348016], dtype=np.uint64) + tm.assert_numpy_array_equal(result2, expected2) + + result = hash_array(np.asarray(hashes, dtype=object), "utf8") + tm.assert_numpy_array_equal(result, np.concatenate([expected1, + expected2], axis=0)) diff --git a/pandas/tests/util/test_locale.py b/pandas/tests/util/test_locale.py new file mode 100644 index 0000000000000..b848b22994e7a --- /dev/null +++ b/pandas/tests/util/test_locale.py @@ -0,0 +1,94 @@ +# -*- coding: utf-8 -*- +import codecs +import locale +import os + +import pytest + +from pandas.compat import is_platform_windows + +import pandas.core.common as com +import pandas.util.testing as tm + +_all_locales = tm.get_locales() or [] +_current_locale = locale.getlocale() + +# Don't run any of these tests if we are on Windows or have no locales. +pytestmark = pytest.mark.skipif(is_platform_windows() or not _all_locales, + reason="Need non-Windows and locales") + +_skip_if_only_one_locale = pytest.mark.skipif( + len(_all_locales) <= 1, reason="Need multiple locales for meaningful test") + + +def test_can_set_locale_valid_set(): + # Can set the default locale. + assert tm.can_set_locale("") + + +def test_can_set_locale_invalid_set(): + # Cannot set an invalid locale. + assert not tm.can_set_locale("non-existent_locale") + + +def test_can_set_locale_invalid_get(monkeypatch): + # see gh-22129 + # + # In some cases, an invalid locale can be set, + # but a subsequent getlocale() raises a ValueError. + + def mock_get_locale(): + raise ValueError() + + with monkeypatch.context() as m: + m.setattr(locale, "getlocale", mock_get_locale) + assert not tm.can_set_locale("") + + +def test_get_locales_at_least_one(): + # see gh-9744 + assert len(_all_locales) > 0 + + +@_skip_if_only_one_locale +def test_get_locales_prefix(): + first_locale = _all_locales[0] + assert len(tm.get_locales(prefix=first_locale[:2])) > 0 + + +@_skip_if_only_one_locale +def test_set_locale(): + if com._all_none(_current_locale): + # Not sure why, but on some Travis runs with pytest, + # getlocale() returned (None, None). + pytest.skip("Current locale is not set.") + + locale_override = os.environ.get("LOCALE_OVERRIDE", None) + + if locale_override is None: + lang, enc = "it_CH", "UTF-8" + elif locale_override == "C": + lang, enc = "en_US", "ascii" + else: + lang, enc = locale_override.split(".") + + enc = codecs.lookup(enc).name + new_locale = lang, enc + + if not tm.can_set_locale(new_locale): + msg = "unsupported locale setting" + + with pytest.raises(locale.Error, match=msg): + with tm.set_locale(new_locale): + pass + else: + with tm.set_locale(new_locale) as normalized_locale: + new_lang, new_enc = normalized_locale.split(".") + new_enc = codecs.lookup(enc).name + + normalized_locale = new_lang, new_enc + assert normalized_locale == new_locale + + # Once we exit the "with" statement, locale should be back to what it was. + current_locale = locale.getlocale() + assert current_locale == _current_locale diff --git a/pandas/tests/util/test_move.py b/pandas/tests/util/test_move.py new file mode 100644 index 0000000000000..ef98f2032e6ca --- /dev/null +++ b/pandas/tests/util/test_move.py @@ -0,0 +1,79 @@ +# -*- coding: utf-8 -*- +import sys +from uuid import uuid4 + +import pytest + +from pandas.compat import PY3, intern +from pandas.util._move import BadMove, move_into_mutable_buffer, stolenbuf + + +def test_cannot_create_instance_of_stolen_buffer(): + # Stolen buffers need to be created through the smart constructor + # "move_into_mutable_buffer," which has a bunch of checks in it. + + msg = "cannot create 'pandas.util._move.stolenbuf' instances" + with pytest.raises(TypeError, match=msg): + stolenbuf() + + +def test_more_than_one_ref(): + # Test case for when we try to use "move_into_mutable_buffer" + # when the object being moved has other references. + + b = b"testing" + + with pytest.raises(BadMove, match="testing") as e: + def handle_success(type_, value, tb): + assert value.args[0] is b + return type(e).handle_success(e, type_, value, tb) # super + + e.handle_success = handle_success + move_into_mutable_buffer(b) + + +def test_exactly_one_ref(): + # Test case for when the object being moved has exactly one reference. + + b = b"testing" + + # We need to pass an expression on the stack to ensure that there are + # not extra references hanging around. We cannot rewrite this test as + # buf = b[:-3] + # as_stolen_buf = move_into_mutable_buffer(buf) + # because then we would have more than one reference to buf. + as_stolen_buf = move_into_mutable_buffer(b[:-3]) + + # Materialize as byte-array to show that it is mutable. + assert bytearray(as_stolen_buf) == b"test" + + +@pytest.mark.skipif(PY3, reason="bytes objects cannot be interned in PY3") +def test_interned(): + salt = uuid4().hex + + def make_string(): + # We need to actually create a new string so that it has refcount + # one. We use a uuid so that we know the string could not already + # be in the intern table. + return "".join(("testing: ", salt)) + + # This should work, the string has one reference on the stack. + move_into_mutable_buffer(make_string()) + refcount = [None] # nonlocal + + def ref_capture(ob): + # Subtract two because those are the references owned by this frame: + # 1. The local variables of this stack frame. + # 2. The python data stack of this stack frame. + refcount[0] = sys.getrefcount(ob) - 2 + return ob + + with pytest.raises(BadMove, match="testing"): + # If we intern the string, it will still have one reference. Now, + # it is in the intern table, so if other people intern the same + # string while the mutable buffer holds the first string they will + # be the same instance. + move_into_mutable_buffer(ref_capture(intern(make_string()))) # noqa + + assert refcount[0] == 1 diff --git a/pandas/tests/util/test_safe_import.py b/pandas/tests/util/test_safe_import.py new file mode 100644 index 0000000000000..a9c52ef788390 --- /dev/null +++ b/pandas/tests/util/test_safe_import.py @@ -0,0 +1,45 @@ +# -*- coding: utf-8 -*- +import sys +import types + +import pytest + +import pandas.util._test_decorators as td + + +@pytest.mark.parametrize("name", ["foo", "hello123"]) +def test_safe_import_non_existent(name): + assert not td.safe_import(name) + + +def test_safe_import_exists(): + assert td.safe_import("pandas") + + +@pytest.mark.parametrize("min_version,valid", [ + ("0.0.0", True), + ("99.99.99", False) +]) +def test_safe_import_versions(min_version, valid): + result = td.safe_import("pandas", min_version=min_version) + result = result if valid else not result + assert result + + +@pytest.mark.parametrize("min_version,valid", [ + (None, False), + ("1.0", True), + ("2.0", False) +]) +def test_safe_import_dummy(monkeypatch, min_version, valid): + mod_name = "hello123" + + mod = types.ModuleType(mod_name) + mod.__version__ = "1.5" + + if min_version is not None: + monkeypatch.setitem(sys.modules, mod_name, mod) + + result = td.safe_import(mod_name, min_version=min_version) + result = result if valid else not result + assert result diff --git a/pandas/tests/util/test_testing.py b/pandas/tests/util/test_testing.py deleted file mode 100644 index e649cea14ec39..0000000000000 --- a/pandas/tests/util/test_testing.py +++ /dev/null @@ -1,984 +0,0 @@ -# -*- coding: utf-8 -*- -import os -import sys -import textwrap - -import numpy as np -import pytest - -from pandas.compat import raise_with_traceback -import pandas.util._test_decorators as td - -import pandas as pd -from pandas import DataFrame, Series, compat -from pandas.core.arrays.sparse import SparseArray -import pandas.util.testing as tm -from pandas.util.testing import ( - RNGContext, assert_almost_equal, assert_extension_array_equal, - assert_frame_equal, assert_index_equal, assert_numpy_array_equal, - assert_series_equal) - - -class TestAssertAlmostEqual(object): - - def _assert_almost_equal_both(self, a, b, **kwargs): - assert_almost_equal(a, b, **kwargs) - assert_almost_equal(b, a, **kwargs) - - def _assert_not_almost_equal_both(self, a, b, **kwargs): - pytest.raises(AssertionError, assert_almost_equal, a, b, **kwargs) - pytest.raises(AssertionError, assert_almost_equal, b, a, **kwargs) - - def test_assert_almost_equal_numbers(self): - self._assert_almost_equal_both(1.1, 1.1) - self._assert_almost_equal_both(1.1, 1.100001) - self._assert_almost_equal_both(np.int16(1), 1.000001) - self._assert_almost_equal_both(np.float64(1.1), 1.1) - self._assert_almost_equal_both(np.uint32(5), 5) - - self._assert_not_almost_equal_both(1.1, 1) - self._assert_not_almost_equal_both(1.1, True) - self._assert_not_almost_equal_both(1, 2) - self._assert_not_almost_equal_both(1.0001, np.int16(1)) - - def test_assert_almost_equal_numbers_with_zeros(self): - self._assert_almost_equal_both(0, 0) - self._assert_almost_equal_both(0, 0.0) - self._assert_almost_equal_both(0, np.float64(0)) - self._assert_almost_equal_both(0.000001, 0) - - self._assert_not_almost_equal_both(0.001, 0) - self._assert_not_almost_equal_both(1, 0) - - def test_assert_almost_equal_numbers_with_mixed(self): - self._assert_not_almost_equal_both(1, 'abc') - self._assert_not_almost_equal_both(1, [1, ]) - self._assert_not_almost_equal_both(1, object()) - - @pytest.mark.parametrize( - "left_dtype", - ['M8[ns]', 'm8[ns]', 'float64', 'int64', 'object']) - @pytest.mark.parametrize( - "right_dtype", - ['M8[ns]', 'm8[ns]', 'float64', 'int64', 'object']) - def test_assert_almost_equal_edge_case_ndarrays( - self, left_dtype, right_dtype): - - # empty compare - self._assert_almost_equal_both(np.array([], dtype=left_dtype), - np.array([], dtype=right_dtype), - check_dtype=False) - - def test_assert_almost_equal_dicts(self): - self._assert_almost_equal_both({'a': 1, 'b': 2}, {'a': 1, 'b': 2}) - - self._assert_not_almost_equal_both({'a': 1, 'b': 2}, {'a': 1, 'b': 3}) - self._assert_not_almost_equal_both({'a': 1, 'b': 2}, - {'a': 1, 'b': 2, 'c': 3}) - self._assert_not_almost_equal_both({'a': 1}, 1) - self._assert_not_almost_equal_both({'a': 1}, 'abc') - self._assert_not_almost_equal_both({'a': 1}, [1, ]) - - def test_assert_almost_equal_dict_like_object(self): - class DictLikeObj(object): - - def keys(self): - return ('a', ) - - def __getitem__(self, item): - if item == 'a': - return 1 - - self._assert_almost_equal_both({'a': 1}, DictLikeObj(), - check_dtype=False) - - self._assert_not_almost_equal_both({'a': 2}, DictLikeObj(), - check_dtype=False) - - def test_assert_almost_equal_strings(self): - self._assert_almost_equal_both('abc', 'abc') - - self._assert_not_almost_equal_both('abc', 'abcd') - self._assert_not_almost_equal_both('abc', 'abd') - self._assert_not_almost_equal_both('abc', 1) - self._assert_not_almost_equal_both('abc', [1, ]) - - def test_assert_almost_equal_iterables(self): - self._assert_almost_equal_both([1, 2, 3], [1, 2, 3]) - self._assert_almost_equal_both(np.array([1, 2, 3]), - np.array([1, 2, 3])) - - # class / dtype are different - self._assert_not_almost_equal_both(np.array([1, 2, 3]), [1, 2, 3]) - self._assert_not_almost_equal_both(np.array([1, 2, 3]), - np.array([1., 2., 3.])) - - # Can't compare generators - self._assert_not_almost_equal_both(iter([1, 2, 3]), [1, 2, 3]) - - self._assert_not_almost_equal_both([1, 2, 3], [1, 2, 4]) - self._assert_not_almost_equal_both([1, 2, 3], [1, 2, 3, 4]) - self._assert_not_almost_equal_both([1, 2, 3], 1) - - def test_assert_almost_equal_null(self): - self._assert_almost_equal_both(None, None) - - self._assert_not_almost_equal_both(None, np.NaN) - self._assert_not_almost_equal_both(None, 0) - self._assert_not_almost_equal_both(np.NaN, 0) - - def test_assert_almost_equal_inf(self): - self._assert_almost_equal_both(np.inf, np.inf) - self._assert_almost_equal_both(np.inf, float("inf")) - self._assert_not_almost_equal_both(np.inf, 0) - self._assert_almost_equal_both(np.array([np.inf, np.nan, -np.inf]), - np.array([np.inf, np.nan, -np.inf])) - self._assert_almost_equal_both(np.array([np.inf, None, -np.inf], - dtype=np.object_), - np.array([np.inf, np.nan, -np.inf], - dtype=np.object_)) - - def test_assert_almost_equal_pandas(self): - tm.assert_almost_equal(pd.Index([1., 1.1]), - pd.Index([1., 1.100001])) - tm.assert_almost_equal(pd.Series([1., 1.1]), - pd.Series([1., 1.100001])) - tm.assert_almost_equal(pd.DataFrame({'a': [1., 1.1]}), - pd.DataFrame({'a': [1., 1.100001]})) - - def test_assert_almost_equal_object(self): - a = [pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-01')] - b = [pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-01')] - self._assert_almost_equal_both(a, b) - - -class TestUtilTesting(object): - - def test_raise_with_traceback(self): - with pytest.raises(LookupError, match="error_text"): - try: - raise ValueError("THIS IS AN ERROR") - except ValueError as e: - e = LookupError("error_text") - raise_with_traceback(e) - with pytest.raises(LookupError, match="error_text"): - try: - raise ValueError("This is another error") - except ValueError: - e = LookupError("error_text") - _, _, traceback = sys.exc_info() - raise_with_traceback(e, traceback) - - def test_convert_rows_list_to_csv_str(self): - rows_list = ["aaa", "bbb", "ccc"] - ret = tm.convert_rows_list_to_csv_str(rows_list) - - if compat.is_platform_windows(): - expected = "aaa\r\nbbb\r\nccc\r\n" - else: - expected = "aaa\nbbb\nccc\n" - - assert ret == expected - - -class TestAssertNumpyArrayEqual(object): - - @td.skip_if_windows - def test_numpy_array_equal_message(self): - - expected = """numpy array are different - -numpy array shapes are different -\\[left\\]: \\(2,\\) -\\[right\\]: \\(3,\\)""" - - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal(np.array([1, 2]), np.array([3, 4, 5])) - - with pytest.raises(AssertionError, match=expected): - assert_almost_equal(np.array([1, 2]), np.array([3, 4, 5])) - - # scalar comparison - expected = """Expected type """ - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal(1, 2) - expected = """expected 2\\.00000 but got 1\\.00000, with decimal 5""" - with pytest.raises(AssertionError, match=expected): - assert_almost_equal(1, 2) - - # array / scalar array comparison - expected = """numpy array are different - -numpy array classes are different -\\[left\\]: ndarray -\\[right\\]: int""" - - with pytest.raises(AssertionError, match=expected): - # numpy_array_equal only accepts np.ndarray - assert_numpy_array_equal(np.array([1]), 1) - with pytest.raises(AssertionError, match=expected): - assert_almost_equal(np.array([1]), 1) - - # scalar / array comparison - expected = """numpy array are different - -numpy array classes are different -\\[left\\]: int -\\[right\\]: ndarray""" - - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal(1, np.array([1])) - with pytest.raises(AssertionError, match=expected): - assert_almost_equal(1, np.array([1])) - - expected = """numpy array are different - -numpy array values are different \\(66\\.66667 %\\) -\\[left\\]: \\[nan, 2\\.0, 3\\.0\\] -\\[right\\]: \\[1\\.0, nan, 3\\.0\\]""" - - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal(np.array([np.nan, 2, 3]), - np.array([1, np.nan, 3])) - with pytest.raises(AssertionError, match=expected): - assert_almost_equal(np.array([np.nan, 2, 3]), - np.array([1, np.nan, 3])) - - expected = """numpy array are different - -numpy array values are different \\(50\\.0 %\\) -\\[left\\]: \\[1, 2\\] -\\[right\\]: \\[1, 3\\]""" - - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal(np.array([1, 2]), np.array([1, 3])) - with pytest.raises(AssertionError, match=expected): - assert_almost_equal(np.array([1, 2]), np.array([1, 3])) - - expected = """numpy array are different - -numpy array values are different \\(50\\.0 %\\) -\\[left\\]: \\[1\\.1, 2\\.000001\\] -\\[right\\]: \\[1\\.1, 2.0\\]""" - - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal( - np.array([1.1, 2.000001]), np.array([1.1, 2.0])) - - # must pass - assert_almost_equal(np.array([1.1, 2.000001]), np.array([1.1, 2.0])) - - expected = """numpy array are different - -numpy array values are different \\(16\\.66667 %\\) -\\[left\\]: \\[\\[1, 2\\], \\[3, 4\\], \\[5, 6\\]\\] -\\[right\\]: \\[\\[1, 3\\], \\[3, 4\\], \\[5, 6\\]\\]""" - - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal(np.array([[1, 2], [3, 4], [5, 6]]), - np.array([[1, 3], [3, 4], [5, 6]])) - with pytest.raises(AssertionError, match=expected): - assert_almost_equal(np.array([[1, 2], [3, 4], [5, 6]]), - np.array([[1, 3], [3, 4], [5, 6]])) - - expected = """numpy array are different - -numpy array values are different \\(25\\.0 %\\) -\\[left\\]: \\[\\[1, 2\\], \\[3, 4\\]\\] -\\[right\\]: \\[\\[1, 3\\], \\[3, 4\\]\\]""" - - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal(np.array([[1, 2], [3, 4]]), - np.array([[1, 3], [3, 4]])) - with pytest.raises(AssertionError, match=expected): - assert_almost_equal(np.array([[1, 2], [3, 4]]), - np.array([[1, 3], [3, 4]])) - - # allow to overwrite message - expected = """Index are different - -Index shapes are different -\\[left\\]: \\(2,\\) -\\[right\\]: \\(3,\\)""" - - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal(np.array([1, 2]), np.array([3, 4, 5]), - obj='Index') - with pytest.raises(AssertionError, match=expected): - assert_almost_equal(np.array([1, 2]), np.array([3, 4, 5]), - obj='Index') - - def test_numpy_array_equal_unicode_message(self): - # Test ensures that `assert_numpy_array_equals` raises the right - # exception when comparing np.arrays containing differing - # unicode objects (#20503) - - expected = """numpy array are different - -numpy array values are different \\(33\\.33333 %\\) -\\[left\\]: \\[á, à, ä\\] -\\[right\\]: \\[á, à, å\\]""" - - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal(np.array([u'á', u'à', u'ä']), - np.array([u'á', u'à', u'å'])) - with pytest.raises(AssertionError, match=expected): - assert_almost_equal(np.array([u'á', u'à', u'ä']), - np.array([u'á', u'à', u'å'])) - - @td.skip_if_windows - def test_numpy_array_equal_object_message(self): - - a = np.array([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-01')]) - b = np.array([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02')]) - - expected = """numpy array are different - -numpy array values are different \\(50\\.0 %\\) -\\[left\\]: \\[2011-01-01 00:00:00, 2011-01-01 00:00:00\\] -\\[right\\]: \\[2011-01-01 00:00:00, 2011-01-02 00:00:00\\]""" - - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal(a, b) - with pytest.raises(AssertionError, match=expected): - assert_almost_equal(a, b) - - def test_numpy_array_equal_copy_flag(self): - a = np.array([1, 2, 3]) - b = a.copy() - c = a.view() - expected = r'array\(\[1, 2, 3\]\) is not array\(\[1, 2, 3\]\)' - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal(a, b, check_same='same') - expected = r'array\(\[1, 2, 3\]\) is array\(\[1, 2, 3\]\)' - with pytest.raises(AssertionError, match=expected): - assert_numpy_array_equal(a, c, check_same='copy') - - def test_assert_almost_equal_iterable_message(self): - - expected = """Iterable are different - -Iterable length are different -\\[left\\]: 2 -\\[right\\]: 3""" - - with pytest.raises(AssertionError, match=expected): - assert_almost_equal([1, 2], [3, 4, 5]) - - expected = """Iterable are different - -Iterable values are different \\(50\\.0 %\\) -\\[left\\]: \\[1, 2\\] -\\[right\\]: \\[1, 3\\]""" - - with pytest.raises(AssertionError, match=expected): - assert_almost_equal([1, 2], [1, 3]) - - -class TestAssertIndexEqual(object): - - def test_index_equal_message(self): - - expected = """Index are different - -Index levels are different -\\[left\\]: 1, Int64Index\\(\\[1, 2, 3\\], dtype='int64'\\) -\\[right\\]: 2, MultiIndex\\(levels=\\[\\[u?'A', u?'B'\\], \\[1, 2, 3, 4\\]\\], - labels=\\[\\[0, 0, 1, 1\\], \\[0, 1, 2, 3\\]\\]\\)""" - - idx1 = pd.Index([1, 2, 3]) - idx2 = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), - ('B', 3), ('B', 4)]) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2, exact=False) - - expected = """MultiIndex level \\[1\\] are different - -MultiIndex level \\[1\\] values are different \\(25\\.0 %\\) -\\[left\\]: Int64Index\\(\\[2, 2, 3, 4\\], dtype='int64'\\) -\\[right\\]: Int64Index\\(\\[1, 2, 3, 4\\], dtype='int64'\\)""" - - idx1 = pd.MultiIndex.from_tuples([('A', 2), ('A', 2), - ('B', 3), ('B', 4)]) - idx2 = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), - ('B', 3), ('B', 4)]) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2, check_exact=False) - - expected = """Index are different - -Index length are different -\\[left\\]: 3, Int64Index\\(\\[1, 2, 3\\], dtype='int64'\\) -\\[right\\]: 4, Int64Index\\(\\[1, 2, 3, 4\\], dtype='int64'\\)""" - - idx1 = pd.Index([1, 2, 3]) - idx2 = pd.Index([1, 2, 3, 4]) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2, check_exact=False) - - expected = """Index are different - -Index classes are different -\\[left\\]: Int64Index\\(\\[1, 2, 3\\], dtype='int64'\\) -\\[right\\]: Float64Index\\(\\[1\\.0, 2\\.0, 3\\.0\\], dtype='float64'\\)""" - - idx1 = pd.Index([1, 2, 3]) - idx2 = pd.Index([1, 2, 3.0]) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2, exact=True) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2, exact=True, check_exact=False) - - expected = """Index are different - -Index values are different \\(33\\.33333 %\\) -\\[left\\]: Float64Index\\(\\[1.0, 2.0, 3.0], dtype='float64'\\) -\\[right\\]: Float64Index\\(\\[1.0, 2.0, 3.0000000001\\], dtype='float64'\\)""" - - idx1 = pd.Index([1, 2, 3.]) - idx2 = pd.Index([1, 2, 3.0000000001]) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2) - - # must success - assert_index_equal(idx1, idx2, check_exact=False) - - expected = """Index are different - -Index values are different \\(33\\.33333 %\\) -\\[left\\]: Float64Index\\(\\[1.0, 2.0, 3.0], dtype='float64'\\) -\\[right\\]: Float64Index\\(\\[1.0, 2.0, 3.0001\\], dtype='float64'\\)""" - - idx1 = pd.Index([1, 2, 3.]) - idx2 = pd.Index([1, 2, 3.0001]) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2, check_exact=False) - # must success - assert_index_equal(idx1, idx2, check_exact=False, - check_less_precise=True) - - expected = """Index are different - -Index values are different \\(33\\.33333 %\\) -\\[left\\]: Int64Index\\(\\[1, 2, 3\\], dtype='int64'\\) -\\[right\\]: Int64Index\\(\\[1, 2, 4\\], dtype='int64'\\)""" - - idx1 = pd.Index([1, 2, 3]) - idx2 = pd.Index([1, 2, 4]) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2, check_less_precise=True) - - expected = """MultiIndex level \\[1\\] are different - -MultiIndex level \\[1\\] values are different \\(25\\.0 %\\) -\\[left\\]: Int64Index\\(\\[2, 2, 3, 4\\], dtype='int64'\\) -\\[right\\]: Int64Index\\(\\[1, 2, 3, 4\\], dtype='int64'\\)""" - - idx1 = pd.MultiIndex.from_tuples([('A', 2), ('A', 2), - ('B', 3), ('B', 4)]) - idx2 = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), - ('B', 3), ('B', 4)]) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2, check_exact=False) - - def test_index_equal_metadata_message(self): - - expected = """Index are different - -Attribute "names" are different -\\[left\\]: \\[None\\] -\\[right\\]: \\[u?'x'\\]""" - - idx1 = pd.Index([1, 2, 3]) - idx2 = pd.Index([1, 2, 3], name='x') - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2) - - # same name, should pass - assert_index_equal(pd.Index([1, 2, 3], name=np.nan), - pd.Index([1, 2, 3], name=np.nan)) - assert_index_equal(pd.Index([1, 2, 3], name=pd.NaT), - pd.Index([1, 2, 3], name=pd.NaT)) - - expected = """Index are different - -Attribute "names" are different -\\[left\\]: \\[nan\\] -\\[right\\]: \\[NaT\\]""" - - idx1 = pd.Index([1, 2, 3], name=np.nan) - idx2 = pd.Index([1, 2, 3], name=pd.NaT) - with pytest.raises(AssertionError, match=expected): - assert_index_equal(idx1, idx2) - - def test_categorical_index_equality(self): - expected = """Index are different - -Attribute "dtype" are different -\\[left\\]: CategoricalDtype\\(categories=\\[u?'a', u?'b'\\], ordered=False\\) -\\[right\\]: CategoricalDtype\\(categories=\\[u?'a', u?'b', u?'c'\\], \ -ordered=False\\)""" - - with pytest.raises(AssertionError, match=expected): - assert_index_equal(pd.Index(pd.Categorical(['a', 'b'])), - pd.Index(pd.Categorical(['a', 'b'], - categories=['a', 'b', 'c']))) - - def test_categorical_index_equality_relax_categories_check(self): - assert_index_equal(pd.Index(pd.Categorical(['a', 'b'])), - pd.Index(pd.Categorical(['a', 'b'], - categories=['a', 'b', 'c'])), - check_categorical=False) - - -class TestAssertSeriesEqual(object): - - def _assert_equal(self, x, y, **kwargs): - assert_series_equal(x, y, **kwargs) - assert_series_equal(y, x, **kwargs) - - def _assert_not_equal(self, a, b, **kwargs): - pytest.raises(AssertionError, assert_series_equal, a, b, **kwargs) - pytest.raises(AssertionError, assert_series_equal, b, a, **kwargs) - - def test_equal(self): - self._assert_equal(Series(range(3)), Series(range(3))) - self._assert_equal(Series(list('abc')), Series(list('abc'))) - self._assert_equal(Series(list(u'áàä')), Series(list(u'áàä'))) - - def test_not_equal(self): - self._assert_not_equal(Series(range(3)), Series(range(3)) + 1) - self._assert_not_equal(Series(list('abc')), Series(list('xyz'))) - self._assert_not_equal(Series(list(u'áàä')), Series(list(u'éèë'))) - self._assert_not_equal(Series(list(u'áàä')), Series(list(b'aaa'))) - self._assert_not_equal(Series(range(3)), Series(range(4))) - self._assert_not_equal( - Series(range(3)), Series( - range(3), dtype='float64')) - self._assert_not_equal( - Series(range(3)), Series( - range(3), index=[1, 2, 4])) - - # ATM meta data is not checked in assert_series_equal - # self._assert_not_equal(Series(range(3)),Series(range(3),name='foo'),check_names=True) - - def test_less_precise(self): - s1 = Series([0.12345], dtype='float64') - s2 = Series([0.12346], dtype='float64') - - pytest.raises(AssertionError, assert_series_equal, s1, s2) - self._assert_equal(s1, s2, check_less_precise=True) - for i in range(4): - self._assert_equal(s1, s2, check_less_precise=i) - pytest.raises(AssertionError, assert_series_equal, s1, s2, 10) - - s1 = Series([0.12345], dtype='float32') - s2 = Series([0.12346], dtype='float32') - - pytest.raises(AssertionError, assert_series_equal, s1, s2) - self._assert_equal(s1, s2, check_less_precise=True) - for i in range(4): - self._assert_equal(s1, s2, check_less_precise=i) - pytest.raises(AssertionError, assert_series_equal, s1, s2, 10) - - # even less than less precise - s1 = Series([0.1235], dtype='float32') - s2 = Series([0.1236], dtype='float32') - - pytest.raises(AssertionError, assert_series_equal, s1, s2) - pytest.raises(AssertionError, assert_series_equal, s1, s2, True) - - def test_index_dtype(self): - df1 = DataFrame.from_records( - {'a': [1, 2], 'c': ['l1', 'l2']}, index=['a']) - df2 = DataFrame.from_records( - {'a': [1.0, 2.0], 'c': ['l1', 'l2']}, index=['a']) - self._assert_not_equal(df1.c, df2.c, check_index_type=True) - - def test_multiindex_dtype(self): - df1 = DataFrame.from_records( - {'a': [1, 2], 'b': [2.1, 1.5], - 'c': ['l1', 'l2']}, index=['a', 'b']) - df2 = DataFrame.from_records( - {'a': [1.0, 2.0], 'b': [2.1, 1.5], - 'c': ['l1', 'l2']}, index=['a', 'b']) - self._assert_not_equal(df1.c, df2.c, check_index_type=True) - - def test_series_equal_message(self): - - expected = """Series are different - -Series length are different -\\[left\\]: 3, RangeIndex\\(start=0, stop=3, step=1\\) -\\[right\\]: 4, RangeIndex\\(start=0, stop=4, step=1\\)""" - - with pytest.raises(AssertionError, match=expected): - assert_series_equal(pd.Series([1, 2, 3]), pd.Series([1, 2, 3, 4])) - - expected = """Series are different - -Series values are different \\(33\\.33333 %\\) -\\[left\\]: \\[1, 2, 3\\] -\\[right\\]: \\[1, 2, 4\\]""" - - with pytest.raises(AssertionError, match=expected): - assert_series_equal(pd.Series([1, 2, 3]), pd.Series([1, 2, 4])) - with pytest.raises(AssertionError, match=expected): - assert_series_equal(pd.Series([1, 2, 3]), pd.Series([1, 2, 4]), - check_less_precise=True) - - def test_categorical_series_equality(self): - expected = """Attributes are different - -Attribute "dtype" are different -\\[left\\]: CategoricalDtype\\(categories=\\[u?'a', u?'b'\\], ordered=False\\) -\\[right\\]: CategoricalDtype\\(categories=\\[u?'a', u?'b', u?'c'\\], \ -ordered=False\\)""" - - with pytest.raises(AssertionError, match=expected): - assert_series_equal(pd.Series(pd.Categorical(['a', 'b'])), - pd.Series(pd.Categorical(['a', 'b'], - categories=['a', 'b', 'c']))) - - def test_categorical_series_equality_relax_categories_check(self): - assert_series_equal(pd.Series(pd.Categorical(['a', 'b'])), - pd.Series(pd.Categorical(['a', 'b'], - categories=['a', 'b', 'c'])), - check_categorical=False) - - -class TestAssertFrameEqual(object): - - def _assert_equal(self, x, y, **kwargs): - assert_frame_equal(x, y, **kwargs) - assert_frame_equal(y, x, **kwargs) - - def _assert_not_equal(self, a, b, **kwargs): - pytest.raises(AssertionError, assert_frame_equal, a, b, **kwargs) - pytest.raises(AssertionError, assert_frame_equal, b, a, **kwargs) - - def test_equal_with_different_row_order(self): - # check_like=True ignores row-column orderings - df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, - index=['a', 'b', 'c']) - df2 = pd.DataFrame({'A': [3, 2, 1], 'B': [6, 5, 4]}, - index=['c', 'b', 'a']) - - self._assert_equal(df1, df2, check_like=True) - self._assert_not_equal(df1, df2) - - def test_not_equal_with_different_shape(self): - self._assert_not_equal(pd.DataFrame({'A': [1, 2, 3]}), - pd.DataFrame({'A': [1, 2, 3, 4]})) - - def test_index_dtype(self): - df1 = DataFrame.from_records( - {'a': [1, 2], 'c': ['l1', 'l2']}, index=['a']) - df2 = DataFrame.from_records( - {'a': [1.0, 2.0], 'c': ['l1', 'l2']}, index=['a']) - self._assert_not_equal(df1, df2, check_index_type=True) - - def test_multiindex_dtype(self): - df1 = DataFrame.from_records( - {'a': [1, 2], 'b': [2.1, 1.5], - 'c': ['l1', 'l2']}, index=['a', 'b']) - df2 = DataFrame.from_records( - {'a': [1.0, 2.0], 'b': [2.1, 1.5], - 'c': ['l1', 'l2']}, index=['a', 'b']) - self._assert_not_equal(df1, df2, check_index_type=True) - - def test_empty_dtypes(self): - df1 = pd.DataFrame(columns=["col1", "col2"]) - df1["col1"] = df1["col1"].astype('int64') - df2 = pd.DataFrame(columns=["col1", "col2"]) - self._assert_equal(df1, df2, check_dtype=False) - self._assert_not_equal(df1, df2, check_dtype=True) - - def test_frame_equal_message(self): - - expected = """DataFrame are different - -DataFrame shape mismatch -\\[left\\]: \\(3, 2\\) -\\[right\\]: \\(3, 1\\)""" - - with pytest.raises(AssertionError, match=expected): - assert_frame_equal(pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}), - pd.DataFrame({'A': [1, 2, 3]})) - - expected = """DataFrame\\.index are different - -DataFrame\\.index values are different \\(33\\.33333 %\\) -\\[left\\]: Index\\(\\[u?'a', u?'b', u?'c'\\], dtype='object'\\) -\\[right\\]: Index\\(\\[u?'a', u?'b', u?'d'\\], dtype='object'\\)""" - - with pytest.raises(AssertionError, match=expected): - assert_frame_equal(pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, - index=['a', 'b', 'c']), - pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, - index=['a', 'b', 'd'])) - - expected = """DataFrame\\.columns are different - -DataFrame\\.columns values are different \\(50\\.0 %\\) -\\[left\\]: Index\\(\\[u?'A', u?'B'\\], dtype='object'\\) -\\[right\\]: Index\\(\\[u?'A', u?'b'\\], dtype='object'\\)""" - - with pytest.raises(AssertionError, match=expected): - assert_frame_equal(pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, - index=['a', 'b', 'c']), - pd.DataFrame({'A': [1, 2, 3], 'b': [4, 5, 6]}, - index=['a', 'b', 'c'])) - - expected = """DataFrame\\.iloc\\[:, 1\\] are different - -DataFrame\\.iloc\\[:, 1\\] values are different \\(33\\.33333 %\\) -\\[left\\]: \\[4, 5, 6\\] -\\[right\\]: \\[4, 5, 7\\]""" - - with pytest.raises(AssertionError, match=expected): - assert_frame_equal(pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}), - pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 7]})) - - with pytest.raises(AssertionError, match=expected): - assert_frame_equal(pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}), - pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 7]}), - by_blocks=True) - - def test_frame_equal_message_unicode(self): - # Test ensures that `assert_frame_equals` raises the right - # exception when comparing DataFrames containing differing - # unicode objects (#20503) - - expected = """DataFrame\\.iloc\\[:, 1\\] are different - -DataFrame\\.iloc\\[:, 1\\] values are different \\(33\\.33333 %\\) -\\[left\\]: \\[é, è, ë\\] -\\[right\\]: \\[é, è, e̊\\]""" - - with pytest.raises(AssertionError, match=expected): - assert_frame_equal(pd.DataFrame({'A': [u'á', u'à', u'ä'], - 'E': [u'é', u'è', u'ë']}), - pd.DataFrame({'A': [u'á', u'à', u'ä'], - 'E': [u'é', u'è', u'e̊']})) - - with pytest.raises(AssertionError, match=expected): - assert_frame_equal(pd.DataFrame({'A': [u'á', u'à', u'ä'], - 'E': [u'é', u'è', u'ë']}), - pd.DataFrame({'A': [u'á', u'à', u'ä'], - 'E': [u'é', u'è', u'e̊']}), - by_blocks=True) - - expected = """DataFrame\\.iloc\\[:, 0\\] are different - -DataFrame\\.iloc\\[:, 0\\] values are different \\(100\\.0 %\\) -\\[left\\]: \\[á, à, ä\\] -\\[right\\]: \\[a, a, a\\]""" - - with pytest.raises(AssertionError, match=expected): - assert_frame_equal(pd.DataFrame({'A': [u'á', u'à', u'ä'], - 'E': [u'é', u'è', u'ë']}), - pd.DataFrame({'A': ['a', 'a', 'a'], - 'E': ['e', 'e', 'e']})) - - with pytest.raises(AssertionError, match=expected): - assert_frame_equal(pd.DataFrame({'A': [u'á', u'à', u'ä'], - 'E': [u'é', u'è', u'ë']}), - pd.DataFrame({'A': ['a', 'a', 'a'], - 'E': ['e', 'e', 'e']}), - by_blocks=True) - - -class TestAssertCategoricalEqual(object): - - def test_categorical_equal_message(self): - - expected = """Categorical\\.categories are different - -Categorical\\.categories values are different \\(25\\.0 %\\) -\\[left\\]: Int64Index\\(\\[1, 2, 3, 4\\], dtype='int64'\\) -\\[right\\]: Int64Index\\(\\[1, 2, 3, 5\\], dtype='int64'\\)""" - - a = pd.Categorical([1, 2, 3, 4]) - b = pd.Categorical([1, 2, 3, 5]) - with pytest.raises(AssertionError, match=expected): - tm.assert_categorical_equal(a, b) - - expected = """Categorical\\.codes are different - -Categorical\\.codes values are different \\(50\\.0 %\\) -\\[left\\]: \\[0, 1, 3, 2\\] -\\[right\\]: \\[0, 1, 2, 3\\]""" - - a = pd.Categorical([1, 2, 4, 3], categories=[1, 2, 3, 4]) - b = pd.Categorical([1, 2, 3, 4], categories=[1, 2, 3, 4]) - with pytest.raises(AssertionError, match=expected): - tm.assert_categorical_equal(a, b) - - expected = """Categorical are different - -Attribute "ordered" are different -\\[left\\]: False -\\[right\\]: True""" - - a = pd.Categorical([1, 2, 3, 4], ordered=False) - b = pd.Categorical([1, 2, 3, 4], ordered=True) - with pytest.raises(AssertionError, match=expected): - tm.assert_categorical_equal(a, b) - - -class TestAssertIntervalArrayEqual(object): - def test_interval_array_equal_message(self): - a = pd.interval_range(0, periods=4).values - b = pd.interval_range(1, periods=4).values - - msg = textwrap.dedent("""\ - IntervalArray.left are different - - IntervalArray.left values are different \\(100.0 %\\) - \\[left\\]: Int64Index\\(\\[0, 1, 2, 3\\], dtype='int64'\\) - \\[right\\]: Int64Index\\(\\[1, 2, 3, 4\\], dtype='int64'\\)""") - with pytest.raises(AssertionError, match=msg): - tm.assert_interval_array_equal(a, b) - - -class TestAssertExtensionArrayEqual(object): - - def test_check_exact(self): - # GH 23709 - left = SparseArray([-0.17387645482451206, 0.3414148016424936]) - right = SparseArray([-0.17387645482451206, 0.3414148016424937]) - - # passes with check_exact=False (should be default) - assert_extension_array_equal(left, right) - assert_extension_array_equal(left, right, check_exact=False) - - # raises with check_exact=True - msg = textwrap.dedent("""\ - ExtensionArray are different - - ExtensionArray values are different \\(50\\.0 %\\) - \\[left\\]: \\[-0\\.17387645482.*, 0\\.341414801642.*\\] - \\[right\\]: \\[-0\\.17387645482.*, 0\\.341414801642.*\\]""") - with pytest.raises(AssertionError, match=msg): - assert_extension_array_equal(left, right, check_exact=True) - - @pytest.mark.parametrize('check_less_precise', [True, 0, 1, 2, 3, 4]) - def test_check_less_precise_passes(self, check_less_precise): - left = SparseArray([0.5, 0.123456]) - right = SparseArray([0.5, 0.123457]) - assert_extension_array_equal( - left, right, check_less_precise=check_less_precise) - - @pytest.mark.parametrize('check_less_precise', [False, 5, 6, 7, 8, 9]) - def test_check_less_precise_fails(self, check_less_precise): - left = SparseArray([0.5, 0.123456]) - right = SparseArray([0.5, 0.123457]) - - msg = textwrap.dedent("""\ - ExtensionArray are different - - ExtensionArray values are different \\(50\\.0 %\\) - \\[left\\]: \\[0\\.5, 0\\.123456\\] - \\[right\\]: \\[0\\.5, 0\\.123457\\]""") - with pytest.raises(AssertionError, match=msg): - assert_extension_array_equal( - left, right, check_less_precise=check_less_precise) - - def test_check_dtype(self): - left = SparseArray(np.arange(5, dtype='int64')) - right = SparseArray(np.arange(5, dtype='int32')) - - # passes with check_dtype=False - assert_extension_array_equal(left, right, check_dtype=False) - - # raises with check_dtype=True - msg = textwrap.dedent("""\ - ExtensionArray are different - - Attribute "dtype" are different - \\[left\\]: Sparse\\[int64, 0\\] - \\[right\\]: Sparse\\[int32, 0\\]""") - with pytest.raises(AssertionError, match=msg): - assert_extension_array_equal(left, right, check_dtype=True) - - def test_missing_values(self): - left = SparseArray([np.nan, 1, 2, np.nan]) - right = SparseArray([np.nan, 1, 2, 3]) - - msg = textwrap.dedent("""\ - ExtensionArray NA mask are different - - ExtensionArray NA mask values are different \\(25\\.0 %\\) - \\[left\\]: \\[True, False, False, True\\] - \\[right\\]: \\[True, False, False, False\\]""") - with pytest.raises(AssertionError, match=msg): - assert_extension_array_equal(left, right) - - def test_non_extension_array(self): - numpy_array = np.arange(5) - extension_array = SparseArray(np.arange(5)) - - msg = 'left is not an ExtensionArray' - with pytest.raises(AssertionError, match=msg): - assert_extension_array_equal(numpy_array, extension_array) - - msg = 'right is not an ExtensionArray' - with pytest.raises(AssertionError, match=msg): - assert_extension_array_equal(extension_array, numpy_array) - - -class TestRNGContext(object): - - def test_RNGContext(self): - expected0 = 1.764052345967664 - expected1 = 1.6243453636632417 - - with RNGContext(0): - with RNGContext(1): - assert np.random.randn() == expected1 - assert np.random.randn() == expected0 - - -def test_datapath_missing(datapath, request): - if not request.config.getoption("--strict-data-files"): - pytest.skip("Need to set '--strict-data-files'") - - with pytest.raises(ValueError): - datapath('not_a_file') - - result = datapath('data', 'iris.csv') - expected = os.path.join( - os.path.dirname(os.path.dirname(__file__)), - 'data', - 'iris.csv' - ) - - assert result == expected - - -def test_create_temp_directory(): - with tm.ensure_clean_dir() as path: - assert os.path.exists(path) - assert os.path.isdir(path) - assert not os.path.exists(path) - - -def test_assert_raises_regex_deprecated(): - # see gh-23592 - - with tm.assert_produces_warning(FutureWarning): - msg = "Not equal!" - - with tm.assert_raises_regex(AssertionError, msg): - assert 1 == 2, msg diff --git a/pandas/tests/util/test_util.py b/pandas/tests/util/test_util.py index a6cb54ee43909..a2dc9b699566a 100644 --- a/pandas/tests/util/test_util.py +++ b/pandas/tests/util/test_util.py @@ -1,532 +1,127 @@ # -*- coding: utf-8 -*- -import codecs -from collections import OrderedDict -import locale import os import sys -from uuid import uuid4 import pytest -from pandas.compat import PY3, intern +import pandas.compat as compat +from pandas.compat import raise_with_traceback from pandas.util._decorators import deprecate_kwarg, make_signature -from pandas.util._move import BadMove, move_into_mutable_buffer, stolenbuf -import pandas.util._test_decorators as td -from pandas.util._validators import ( - validate_args, validate_args_and_kwargs, validate_bool_kwarg, - validate_kwargs) +from pandas.util._validators import validate_kwargs -import pandas.core.common as com import pandas.util.testing as tm -class TestDecorators(object): - - def setup_method(self, method): - @deprecate_kwarg('old', 'new') - def _f1(new=False): - return new - - @deprecate_kwarg('old', 'new', {'yes': True, 'no': False}) - def _f2(new=False): - return new - - @deprecate_kwarg('old', 'new', lambda x: x + 1) - def _f3(new=0): - return new - - @deprecate_kwarg('old', None) - def _f4(old=True, unchanged=True): - return old - - self.f1 = _f1 - self.f2 = _f2 - self.f3 = _f3 - self.f4 = _f4 - - def test_deprecate_kwarg(self): - x = 78 - with tm.assert_produces_warning(FutureWarning): - result = self.f1(old=x) - assert result is x - with tm.assert_produces_warning(None): - self.f1(new=x) - - def test_dict_deprecate_kwarg(self): - x = 'yes' - with tm.assert_produces_warning(FutureWarning): - result = self.f2(old=x) - assert result - - def test_missing_deprecate_kwarg(self): - x = 'bogus' - with tm.assert_produces_warning(FutureWarning): - result = self.f2(old=x) - assert result == 'bogus' - - def test_callable_deprecate_kwarg(self): - x = 5 - with tm.assert_produces_warning(FutureWarning): - result = self.f3(old=x) - assert result == x + 1 - with pytest.raises(TypeError): - self.f3(old='hello') - - def test_bad_deprecate_kwarg(self): - with pytest.raises(TypeError): - @deprecate_kwarg('old', 'new', 0) - def f4(new=None): - pass - - def test_deprecate_keyword(self): - x = 9 - with tm.assert_produces_warning(FutureWarning): - result = self.f4(old=x) - assert result is x - with tm.assert_produces_warning(None): - result = self.f4(unchanged=x) - assert result is True - - def test_rands(): r = tm.rands(10) - assert(len(r) == 10) + assert len(r) == 10 -def test_rands_array(): +def test_rands_array_1d(): arr = tm.rands_array(5, size=10) - assert(arr.shape == (10,)) - assert(len(arr[0]) == 5) + assert arr.shape == (10,) + assert len(arr[0]) == 5 + +def test_rands_array_2d(): arr = tm.rands_array(7, size=(10, 10)) - assert(arr.shape == (10, 10)) - assert(len(arr[1, 1]) == 7) - - -class TestValidateArgs(object): - fname = 'func' - - def test_bad_min_fname_arg_count(self): - msg = "'max_fname_arg_count' must be non-negative" - with pytest.raises(ValueError, match=msg): - validate_args(self.fname, (None,), -1, 'foo') - - def test_bad_arg_length_max_value_single(self): - args = (None, None) - compat_args = ('foo',) - - min_fname_arg_count = 0 - max_length = len(compat_args) + min_fname_arg_count - actual_length = len(args) + min_fname_arg_count - msg = (r"{fname}\(\) takes at most {max_length} " - r"argument \({actual_length} given\)" - .format(fname=self.fname, max_length=max_length, - actual_length=actual_length)) - - with pytest.raises(TypeError, match=msg): - validate_args(self.fname, args, - min_fname_arg_count, - compat_args) - - def test_bad_arg_length_max_value_multiple(self): - args = (None, None) - compat_args = dict(foo=None) - - min_fname_arg_count = 2 - max_length = len(compat_args) + min_fname_arg_count - actual_length = len(args) + min_fname_arg_count - msg = (r"{fname}\(\) takes at most {max_length} " - r"arguments \({actual_length} given\)" - .format(fname=self.fname, max_length=max_length, - actual_length=actual_length)) - - with pytest.raises(TypeError, match=msg): - validate_args(self.fname, args, - min_fname_arg_count, - compat_args) - - def test_not_all_defaults(self): - bad_arg = 'foo' - msg = ("the '{arg}' parameter is not supported " - r"in the pandas implementation of {func}\(\)". - format(arg=bad_arg, func=self.fname)) - - compat_args = OrderedDict() - compat_args['foo'] = 2 - compat_args['bar'] = -1 - compat_args['baz'] = 3 - - arg_vals = (1, -1, 3) - - for i in range(1, 3): - with pytest.raises(ValueError, match=msg): - validate_args(self.fname, arg_vals[:i], 2, compat_args) - - def test_validation(self): - # No exceptions should be thrown - validate_args(self.fname, (None,), 2, dict(out=None)) - - compat_args = OrderedDict() - compat_args['axis'] = 1 - compat_args['out'] = None - - validate_args(self.fname, (1, None), 2, compat_args) - - -class TestValidateKwargs(object): - fname = 'func' - - def test_bad_kwarg(self): - goodarg = 'f' - badarg = goodarg + 'o' - - compat_args = OrderedDict() - compat_args[goodarg] = 'foo' - compat_args[badarg + 'o'] = 'bar' - kwargs = {goodarg: 'foo', badarg: 'bar'} - msg = (r"{fname}\(\) got an unexpected " - r"keyword argument '{arg}'".format( - fname=self.fname, arg=badarg)) - - with pytest.raises(TypeError, match=msg): - validate_kwargs(self.fname, kwargs, compat_args) - - def test_not_all_none(self): - bad_arg = 'foo' - msg = (r"the '{arg}' parameter is not supported " - r"in the pandas implementation of {func}\(\)". - format(arg=bad_arg, func=self.fname)) - - compat_args = OrderedDict() - compat_args['foo'] = 1 - compat_args['bar'] = 's' - compat_args['baz'] = None - - kwarg_keys = ('foo', 'bar', 'baz') - kwarg_vals = (2, 's', None) - - for i in range(1, 3): - kwargs = dict(zip(kwarg_keys[:i], - kwarg_vals[:i])) - - with pytest.raises(ValueError, match=msg): - validate_kwargs(self.fname, kwargs, compat_args) - - def test_validation(self): - # No exceptions should be thrown - compat_args = OrderedDict() - compat_args['f'] = None - compat_args['b'] = 1 - compat_args['ba'] = 's' - kwargs = dict(f=None, b=1) - validate_kwargs(self.fname, kwargs, compat_args) - - def test_validate_bool_kwarg(self): - arg_names = ['inplace', 'copy'] - invalid_values = [1, "True", [1, 2, 3], 5.0] - valid_values = [True, False, None] - - for name in arg_names: - for value in invalid_values: - msg = ("For argument \"%s\" " - "expected type bool, " - "received type %s" % - (name, type(value).__name__)) - with pytest.raises(ValueError, match=msg): - validate_bool_kwarg(value, name) - - for value in valid_values: - assert validate_bool_kwarg(value, name) == value - - -class TestValidateKwargsAndArgs(object): - fname = 'func' - - def test_invalid_total_length_max_length_one(self): - compat_args = ('foo',) - kwargs = {'foo': 'FOO'} - args = ('FoO', 'BaZ') - - min_fname_arg_count = 0 - max_length = len(compat_args) + min_fname_arg_count - actual_length = len(kwargs) + len(args) + min_fname_arg_count - msg = (r"{fname}\(\) takes at most {max_length} " - r"argument \({actual_length} given\)" - .format(fname=self.fname, max_length=max_length, - actual_length=actual_length)) - - with pytest.raises(TypeError, match=msg): - validate_args_and_kwargs(self.fname, args, kwargs, - min_fname_arg_count, - compat_args) - - def test_invalid_total_length_max_length_multiple(self): - compat_args = ('foo', 'bar', 'baz') - kwargs = {'foo': 'FOO', 'bar': 'BAR'} - args = ('FoO', 'BaZ') - - min_fname_arg_count = 2 - max_length = len(compat_args) + min_fname_arg_count - actual_length = len(kwargs) + len(args) + min_fname_arg_count - msg = (r"{fname}\(\) takes at most {max_length} " - r"arguments \({actual_length} given\)" - .format(fname=self.fname, max_length=max_length, - actual_length=actual_length)) - - with pytest.raises(TypeError, match=msg): - validate_args_and_kwargs(self.fname, args, kwargs, - min_fname_arg_count, - compat_args) - - def test_no_args_with_kwargs(self): - bad_arg = 'bar' - min_fname_arg_count = 2 - - compat_args = OrderedDict() - compat_args['foo'] = -5 - compat_args[bad_arg] = 1 - - msg = (r"the '{arg}' parameter is not supported " - r"in the pandas implementation of {func}\(\)". - format(arg=bad_arg, func=self.fname)) - - args = () - kwargs = {'foo': -5, bad_arg: 2} - with pytest.raises(ValueError, match=msg): - validate_args_and_kwargs(self.fname, args, kwargs, - min_fname_arg_count, compat_args) - - args = (-5, 2) - kwargs = {} - with pytest.raises(ValueError, match=msg): - validate_args_and_kwargs(self.fname, args, kwargs, - min_fname_arg_count, compat_args) - - def test_duplicate_argument(self): - min_fname_arg_count = 2 - compat_args = OrderedDict() - compat_args['foo'] = None - compat_args['bar'] = None - compat_args['baz'] = None - kwargs = {'foo': None, 'bar': None} - args = (None,) # duplicate value for 'foo' - - msg = (r"{fname}\(\) got multiple values for keyword " - r"argument '{arg}'".format(fname=self.fname, arg='foo')) - - with pytest.raises(TypeError, match=msg): - validate_args_and_kwargs(self.fname, args, kwargs, - min_fname_arg_count, - compat_args) - - def test_validation(self): - # No exceptions should be thrown - compat_args = OrderedDict() - compat_args['foo'] = 1 - compat_args['bar'] = None - compat_args['baz'] = -2 - kwargs = {'baz': -2} - args = (1, None) - - min_fname_arg_count = 2 - validate_args_and_kwargs(self.fname, args, kwargs, - min_fname_arg_count, - compat_args) - - -class TestMove(object): - - def test_cannot_create_instance_of_stolenbuffer(self): - """Stolen buffers need to be created through the smart constructor - ``move_into_mutable_buffer`` which has a bunch of checks in it. - """ - msg = "cannot create 'pandas.util._move.stolenbuf' instances" - with pytest.raises(TypeError, match=msg): - stolenbuf() - - def test_more_than_one_ref(self): - """Test case for when we try to use ``move_into_mutable_buffer`` when - the object being moved has other references. - """ - b = b'testing' - - with pytest.raises(BadMove) as e: - def handle_success(type_, value, tb): - assert value.args[0] is b - return type(e).handle_success(e, type_, value, tb) # super - - e.handle_success = handle_success - move_into_mutable_buffer(b) - - def test_exactly_one_ref(self): - """Test case for when the object being moved has exactly one reference. - """ - b = b'testing' - - # We need to pass an expression on the stack to ensure that there are - # not extra references hanging around. We cannot rewrite this test as - # buf = b[:-3] - # as_stolen_buf = move_into_mutable_buffer(buf) - # because then we would have more than one reference to buf. - as_stolen_buf = move_into_mutable_buffer(b[:-3]) - - # materialize as bytearray to show that it is mutable - assert bytearray(as_stolen_buf) == b'test' - - @pytest.mark.skipif(PY3, reason='bytes objects cannot be interned in py3') - def test_interned(self): - salt = uuid4().hex - - def make_string(): - # We need to actually create a new string so that it has refcount - # one. We use a uuid so that we know the string could not already - # be in the intern table. - return ''.join(('testing: ', salt)) - - # This should work, the string has one reference on the stack. - move_into_mutable_buffer(make_string()) - - refcount = [None] # nonlocal - - def ref_capture(ob): - # Subtract two because those are the references owned by this - # frame: - # 1. The local variables of this stack frame. - # 2. The python data stack of this stack frame. - refcount[0] = sys.getrefcount(ob) - 2 - return ob - - with pytest.raises(BadMove): - # If we intern the string it will still have one reference but now - # it is in the intern table so if other people intern the same - # string while the mutable buffer holds the first string they will - # be the same instance. - move_into_mutable_buffer(ref_capture(intern(make_string()))) # noqa - - assert refcount[0] == 1 - - -def test_numpy_errstate_is_default(): - # The defaults since numpy 1.6.0 - expected = {'over': 'warn', 'divide': 'warn', 'invalid': 'warn', - 'under': 'ignore'} + assert arr.shape == (10, 10) + assert len(arr[1, 1]) == 7 + + +def test_numpy_err_state_is_default(): + expected = {"over": "warn", "divide": "warn", + "invalid": "warn", "under": "ignore"} import numpy as np - from pandas.compat import numpy # noqa - # The errstate should be unchanged after that import. + + # The error state should be unchanged after that import. assert np.geterr() == expected -@td.skip_if_windows -class TestLocaleUtils(object): - - @classmethod - def setup_class(cls): - cls.locales = tm.get_locales() - cls.current_locale = locale.getlocale() - - if not cls.locales: - pytest.skip("No locales found") - - @classmethod - def teardown_class(cls): - del cls.locales - del cls.current_locale - - def test_can_set_locale_valid_set(self): - # Setting the default locale should return True - assert tm.can_set_locale('') is True - - def test_can_set_locale_invalid_set(self): - # Setting an invalid locale should return False - assert tm.can_set_locale('non-existent_locale') is False - - def test_can_set_locale_invalid_get(self, monkeypatch): - # In some cases, an invalid locale can be set, - # but a subsequent getlocale() raises a ValueError - # See GH 22129 - - def mockgetlocale(): - raise ValueError() - - with monkeypatch.context() as m: - m.setattr(locale, 'getlocale', mockgetlocale) - assert tm.can_set_locale('') is False - - def test_get_locales(self): - # all systems should have at least a single locale - # GH9744 - assert len(tm.get_locales()) > 0 - - def test_get_locales_prefix(self): - if len(self.locales) == 1: - pytest.skip("Only a single locale found, no point in " - "trying to test filtering locale prefixes") - first_locale = self.locales[0] - assert len(tm.get_locales(prefix=first_locale[:2])) > 0 - - def test_set_locale(self): - if len(self.locales) == 1: - pytest.skip("Only a single locale found, no point in " - "trying to test setting another locale") - - if com._all_none(*self.current_locale): - # Not sure why, but on some travis runs with pytest, - # getlocale() returned (None, None). - pytest.skip("Current locale is not set.") - - locale_override = os.environ.get('LOCALE_OVERRIDE', None) - - if locale_override is None: - lang, enc = 'it_CH', 'UTF-8' - elif locale_override == 'C': - lang, enc = 'en_US', 'ascii' - else: - lang, enc = locale_override.split('.') - - enc = codecs.lookup(enc).name - new_locale = lang, enc - - if not tm.can_set_locale(new_locale): - with pytest.raises(locale.Error): - with tm.set_locale(new_locale): - pass - else: - with tm.set_locale(new_locale) as normalized_locale: - new_lang, new_enc = normalized_locale.split('.') - new_enc = codecs.lookup(enc).name - normalized_locale = new_lang, new_enc - assert normalized_locale == new_locale - - current_locale = locale.getlocale() - assert current_locale == self.current_locale - - -def test_make_signature(): - # See GH 17608 - # Case where the func does not have default kwargs - sig = make_signature(validate_kwargs) - assert sig == (['fname', 'kwargs', 'compat_args'], - ['fname', 'kwargs', 'compat_args']) - - # Case where the func does have default kwargs - sig = make_signature(deprecate_kwarg) - assert sig == (['old_arg_name', 'new_arg_name', - 'mapping=None', 'stacklevel=2'], - ['old_arg_name', 'new_arg_name', 'mapping', 'stacklevel']) - - -def test_safe_import(monkeypatch): - assert not td.safe_import("foo") - assert not td.safe_import("pandas", min_version="99.99.99") - - # Create dummy module to be imported - import types - import sys - mod_name = "hello123" - mod = types.ModuleType(mod_name) - mod.__version__ = "1.5" - - assert not td.safe_import(mod_name) - monkeypatch.setitem(sys.modules, mod_name, mod) - assert not td.safe_import(mod_name, min_version="2.0") - assert td.safe_import(mod_name, min_version="1.0") +@pytest.mark.parametrize("func,expected", [ + # Case where the func does not have default kwargs. + (validate_kwargs, (["fname", "kwargs", "compat_args"], + ["fname", "kwargs", "compat_args"])), + + # Case where the func does have default kwargs. + (deprecate_kwarg, (["old_arg_name", "new_arg_name", + "mapping=None", "stacklevel=2"], + ["old_arg_name", "new_arg_name", + "mapping", "stacklevel"])) +]) +def test_make_signature(func, expected): + # see gh-17608 + assert make_signature(func) == expected + + +def test_raise_with_traceback(): + with pytest.raises(LookupError, match="error_text"): + try: + raise ValueError("THIS IS AN ERROR") + except ValueError: + e = LookupError("error_text") + raise_with_traceback(e) + + with pytest.raises(LookupError, match="error_text"): + try: + raise ValueError("This is another error") + except ValueError: + e = LookupError("error_text") + _, _, traceback = sys.exc_info() + raise_with_traceback(e, traceback) + + +def test_convert_rows_list_to_csv_str(): + rows_list = ["aaa", "bbb", "ccc"] + ret = tm.convert_rows_list_to_csv_str(rows_list) + + if compat.is_platform_windows(): + expected = "aaa\r\nbbb\r\nccc\r\n" + else: + expected = "aaa\nbbb\nccc\n" + + assert ret == expected + + +def test_create_temp_directory(): + with tm.ensure_clean_dir() as path: + assert os.path.exists(path) + assert os.path.isdir(path) + assert not os.path.exists(path) + + +def test_assert_raises_regex_deprecated(): + # see gh-23592 + + with tm.assert_produces_warning(FutureWarning): + msg = "Not equal!" + + with tm.assert_raises_regex(AssertionError, msg): + assert 1 == 2, msg + + +def test_datapath_missing(datapath, request): + if not request.config.getoption("--strict-data-files"): + pytest.skip("Need to set '--strict-data-files'") + + with pytest.raises(ValueError, match="Could not find file"): + datapath("not_a_file") + + args = ("data", "iris.csv") + + result = datapath(*args) + expected = os.path.join(os.path.dirname(os.path.dirname(__file__)), *args) + + assert result == expected + + +def test_rng_context(): + import numpy as np + + expected0 = 1.764052345967664 + expected1 = 1.6243453636632417 + + with tm.RNGContext(0): + with tm.RNGContext(1): + assert np.random.randn() == expected1 + assert np.random.randn() == expected0 diff --git a/pandas/tests/util/test_validate_args.py b/pandas/tests/util/test_validate_args.py new file mode 100644 index 0000000000000..ca71b0c9d2522 --- /dev/null +++ b/pandas/tests/util/test_validate_args.py @@ -0,0 +1,76 @@ +# -*- coding: utf-8 -*- +from collections import OrderedDict + +import pytest + +from pandas.util._validators import validate_args + +_fname = "func" + + +def test_bad_min_fname_arg_count(): + msg = "'max_fname_arg_count' must be non-negative" + + with pytest.raises(ValueError, match=msg): + validate_args(_fname, (None,), -1, "foo") + + +def test_bad_arg_length_max_value_single(): + args = (None, None) + compat_args = ("foo",) + + min_fname_arg_count = 0 + max_length = len(compat_args) + min_fname_arg_count + actual_length = len(args) + min_fname_arg_count + msg = (r"{fname}\(\) takes at most {max_length} " + r"argument \({actual_length} given\)" + .format(fname=_fname, max_length=max_length, + actual_length=actual_length)) + + with pytest.raises(TypeError, match=msg): + validate_args(_fname, args, min_fname_arg_count, compat_args) + + +def test_bad_arg_length_max_value_multiple(): + args = (None, None) + compat_args = dict(foo=None) + + min_fname_arg_count = 2 + max_length = len(compat_args) + min_fname_arg_count + actual_length = len(args) + min_fname_arg_count + msg = (r"{fname}\(\) takes at most {max_length} " + r"arguments \({actual_length} given\)" + .format(fname=_fname, max_length=max_length, + actual_length=actual_length)) + + with pytest.raises(TypeError, match=msg): + validate_args(_fname, args, min_fname_arg_count, compat_args) + + +@pytest.mark.parametrize("i", range(1, 3)) +def test_not_all_defaults(i): + bad_arg = "foo" + msg = ("the '{arg}' parameter is not supported " + r"in the pandas implementation of {func}\(\)". + format(arg=bad_arg, func=_fname)) + + compat_args = OrderedDict() + compat_args["foo"] = 2 + compat_args["bar"] = -1 + compat_args["baz"] = 3 + + arg_vals = (1, -1, 3) + + with pytest.raises(ValueError, match=msg): + validate_args(_fname, arg_vals[:i], 2, compat_args) + + +def test_validation(): + # No exceptions should be raised. + validate_args(_fname, (None,), 2, dict(out=None)) + + compat_args = OrderedDict() + compat_args["axis"] = 1 + compat_args["out"] = None + + validate_args(_fname, (1, None), 2, compat_args) diff --git a/pandas/tests/util/test_validate_args_and_kwargs.py b/pandas/tests/util/test_validate_args_and_kwargs.py new file mode 100644 index 0000000000000..c3c0b3dedc085 --- /dev/null +++ b/pandas/tests/util/test_validate_args_and_kwargs.py @@ -0,0 +1,105 @@ +# -*- coding: utf-8 -*- +from collections import OrderedDict + +import pytest + +from pandas.util._validators import validate_args_and_kwargs + +_fname = "func" + + +def test_invalid_total_length_max_length_one(): + compat_args = ("foo",) + kwargs = {"foo": "FOO"} + args = ("FoO", "BaZ") + + min_fname_arg_count = 0 + max_length = len(compat_args) + min_fname_arg_count + actual_length = len(kwargs) + len(args) + min_fname_arg_count + + msg = (r"{fname}\(\) takes at most {max_length} " + r"argument \({actual_length} given\)" + .format(fname=_fname, max_length=max_length, + actual_length=actual_length)) + + with pytest.raises(TypeError, match=msg): + validate_args_and_kwargs(_fname, args, kwargs, + min_fname_arg_count, + compat_args) + + +def test_invalid_total_length_max_length_multiple(): + compat_args = ("foo", "bar", "baz") + kwargs = {"foo": "FOO", "bar": "BAR"} + args = ("FoO", "BaZ") + + min_fname_arg_count = 2 + max_length = len(compat_args) + min_fname_arg_count + actual_length = len(kwargs) + len(args) + min_fname_arg_count + + msg = (r"{fname}\(\) takes at most {max_length} " + r"arguments \({actual_length} given\)" + .format(fname=_fname, max_length=max_length, + actual_length=actual_length)) + + with pytest.raises(TypeError, match=msg): + validate_args_and_kwargs(_fname, args, kwargs, + min_fname_arg_count, + compat_args) + + +@pytest.mark.parametrize("args,kwargs", [ + ((), {"foo": -5, "bar": 2}), + ((-5, 2), {}) +]) +def test_missing_args_or_kwargs(args, kwargs): + bad_arg = "bar" + min_fname_arg_count = 2 + + compat_args = OrderedDict() + compat_args["foo"] = -5 + compat_args[bad_arg] = 1 + + msg = (r"the '{arg}' parameter is not supported " + r"in the pandas implementation of {func}\(\)". + format(arg=bad_arg, func=_fname)) + + with pytest.raises(ValueError, match=msg): + validate_args_and_kwargs(_fname, args, kwargs, + min_fname_arg_count, compat_args) + + +def test_duplicate_argument(): + min_fname_arg_count = 2 + + compat_args = OrderedDict() + compat_args["foo"] = None + compat_args["bar"] = None + compat_args["baz"] = None + + kwargs = {"foo": None, "bar": None} + args = (None,) # duplicate value for "foo" + + msg = (r"{fname}\(\) got multiple values for keyword " + r"argument '{arg}'".format(fname=_fname, arg="foo")) + + with pytest.raises(TypeError, match=msg): + validate_args_and_kwargs(_fname, args, kwargs, + min_fname_arg_count, + compat_args) + + +def test_validation(): + # No exceptions should be raised. + compat_args = OrderedDict() + compat_args["foo"] = 1 + compat_args["bar"] = None + compat_args["baz"] = -2 + kwargs = {"baz": -2} + + args = (1, None) + min_fname_arg_count = 2 + + validate_args_and_kwargs(_fname, args, kwargs, + min_fname_arg_count, + compat_args) diff --git a/pandas/tests/util/test_validate_kwargs.py b/pandas/tests/util/test_validate_kwargs.py new file mode 100644 index 0000000000000..f36818ddfc9a8 --- /dev/null +++ b/pandas/tests/util/test_validate_kwargs.py @@ -0,0 +1,72 @@ +# -*- coding: utf-8 -*- +from collections import OrderedDict + +import pytest + +from pandas.util._validators import validate_bool_kwarg, validate_kwargs + +_fname = "func" + + +def test_bad_kwarg(): + good_arg = "f" + bad_arg = good_arg + "o" + + compat_args = OrderedDict() + compat_args[good_arg] = "foo" + compat_args[bad_arg + "o"] = "bar" + kwargs = {good_arg: "foo", bad_arg: "bar"} + + msg = (r"{fname}\(\) got an unexpected " + r"keyword argument '{arg}'".format(fname=_fname, arg=bad_arg)) + + with pytest.raises(TypeError, match=msg): + validate_kwargs(_fname, kwargs, compat_args) + + +@pytest.mark.parametrize("i", range(1, 3)) +def test_not_all_none(i): + bad_arg = "foo" + msg = (r"the '{arg}' parameter is not supported " + r"in the pandas implementation of {func}\(\)". + format(arg=bad_arg, func=_fname)) + + compat_args = OrderedDict() + compat_args["foo"] = 1 + compat_args["bar"] = "s" + compat_args["baz"] = None + + kwarg_keys = ("foo", "bar", "baz") + kwarg_vals = (2, "s", None) + + kwargs = dict(zip(kwarg_keys[:i], kwarg_vals[:i])) + + with pytest.raises(ValueError, match=msg): + validate_kwargs(_fname, kwargs, compat_args) + + +def test_validation(): + # No exceptions should be raised. + compat_args = OrderedDict() + compat_args["f"] = None + compat_args["b"] = 1 + compat_args["ba"] = "s" + + kwargs = dict(f=None, b=1) + validate_kwargs(_fname, kwargs, compat_args) + + +@pytest.mark.parametrize("name", ["inplace", "copy"]) +@pytest.mark.parametrize("value", [1, "True", [1, 2, 3], 5.0]) +def test_validate_bool_kwarg_fail(name, value): + msg = ("For argument \"%s\" expected type bool, received type %s" % + (name, type(value).__name__)) + + with pytest.raises(ValueError, match=msg): + validate_bool_kwarg(value, name) + + +@pytest.mark.parametrize("name", ["inplace", "copy"]) +@pytest.mark.parametrize("value", [True, False, None]) +def test_validate_bool_kwarg(name, value): + assert validate_bool_kwarg(value, name) == value diff --git a/pandas/tseries/frequencies.py b/pandas/tseries/frequencies.py index 95904fab05322..8cdec31d7ce8a 100644 --- a/pandas/tseries/frequencies.py +++ b/pandas/tseries/frequencies.py @@ -17,6 +17,7 @@ from pandas._libs.tslibs.offsets import _offset_to_period_map # noqa:E402 import pandas._libs.tslibs.resolution as libresolution from pandas._libs.tslibs.resolution import Resolution +from pandas._libs.tslibs.timezones import UTC import pandas.compat as compat from pandas.compat import zip from pandas.util._decorators import cache_readonly @@ -287,15 +288,15 @@ def __init__(self, index, warn=True): # the timezone so they are in local time if hasattr(index, 'tz'): if index.tz is not None: - self.values = tz_convert(self.values, 'UTC', index.tz) + self.values = tz_convert(self.values, UTC, index.tz) self.warn = warn if len(index) < 3: raise ValueError('Need at least 3 dates to infer frequency') - self.is_monotonic = (self.index.is_monotonic_increasing or - self.index.is_monotonic_decreasing) + self.is_monotonic = (self.index._is_monotonic_increasing or + self.index._is_monotonic_decreasing) @cache_readonly def deltas(self): @@ -322,7 +323,7 @@ def get_freq(self): # noqa:F811 ------- freqstr : str or None """ - if not self.is_monotonic or not self.index.is_unique: + if not self.is_monotonic or not self.index._is_unique: return None delta = self.deltas[0] diff --git a/pandas/tseries/holiday.py b/pandas/tseries/holiday.py index 40e2b76672a4e..4016114919f5b 100644 --- a/pandas/tseries/holiday.py +++ b/pandas/tseries/holiday.py @@ -7,7 +7,7 @@ from pandas.compat import add_metaclass from pandas.errors import PerformanceWarning -from pandas import DateOffset, DatetimeIndex, Series, Timestamp +from pandas import DateOffset, Series, Timestamp, date_range from pandas.tseries.offsets import Day, Easter @@ -254,9 +254,9 @@ def _reference_dates(self, start_date, end_date): reference_end_date = Timestamp( datetime(end_date.year + 1, self.month, self.day)) # Don't process unnecessary holidays - dates = DatetimeIndex(start=reference_start_date, - end=reference_end_date, - freq=year_offset, tz=start_date.tz) + dates = date_range(start=reference_start_date, + end=reference_end_date, + freq=year_offset, tz=start_date.tz) return dates diff --git a/pandas/tseries/offsets.py b/pandas/tseries/offsets.py index ca81b3bcfef2a..573f02fe0aa52 100644 --- a/pandas/tseries/offsets.py +++ b/pandas/tseries/offsets.py @@ -9,7 +9,7 @@ from pandas._libs.tslibs import ( NaT, OutOfBoundsDatetime, Timedelta, Timestamp, ccalendar, conversion, delta_to_nanoseconds, frequencies as libfrequencies, normalize_date, - offsets as liboffsets) + offsets as liboffsets, timezones) from pandas._libs.tslibs.offsets import ( ApplyTypeError, BaseOffset, _get_calendar, _is_normalized, _to_dt64, apply_index_wraps, as_datetime, roll_yearday, shift_month) @@ -32,7 +32,7 @@ 'LastWeekOfMonth', 'FY5253Quarter', 'FY5253', 'Week', 'WeekOfMonth', 'Easter', 'Hour', 'Minute', 'Second', 'Milli', 'Micro', 'Nano', - 'DateOffset', 'CalendarDay'] + 'DateOffset'] # convert to/from datetime/timestamp to allow invalid Timestamp ranges to # pass thru @@ -81,7 +81,7 @@ def wrapper(self, other): if result.tz is not None: # convert to UTC value = conversion.tz_convert_single( - result.value, 'UTC', result.tz) + result.value, timezones.UTC, result.tz) else: value = result.value result = Timestamp(value + nano) @@ -248,7 +248,7 @@ def apply_index(self, i): """ Vectorized apply of DateOffset to DatetimeIndex, raises NotImplentedError for offsets without a - vectorized implementation + vectorized implementation. Parameters ---------- @@ -275,14 +275,17 @@ def apply_index(self, i): kwds.get('months', 0)) * self.n) if months: shifted = liboffsets.shift_months(i.asi8, months) - i = i._shallow_copy(shifted) + i = type(i)(shifted, freq=i.freq, dtype=i.dtype) weeks = (kwds.get('weeks', 0)) * self.n if weeks: # integer addition on PeriodIndex is deprecated, # so we directly use _time_shift instead asper = i.to_period('W') - shifted = asper._data._time_shift(weeks) + if not isinstance(asper._data, np.ndarray): + # unwrap PeriodIndex --> PeriodArray + asper = asper._data + shifted = asper._time_shift(weeks) i = shifted.to_timestamp() + i.to_perioddelta('W') timedelta_kwds = {k: v for k, v in kwds.items() @@ -330,14 +333,18 @@ def name(self): return self.rule_code def rollback(self, dt): - """Roll provided date backward to next offset only if not on offset""" + """ + Roll provided date backward to next offset only if not on offset. + """ dt = as_timestamp(dt) if not self.onOffset(dt): dt = dt - self.__class__(1, normalize=self.normalize, **self.kwds) return dt def rollforward(self, dt): - """Roll provided date forward to next offset only if not on offset""" + """ + Roll provided date forward to next offset only if not on offset. + """ dt = as_timestamp(dt) if not self.onOffset(dt): dt = dt + self.__class__(1, normalize=self.normalize, **self.kwds) @@ -407,7 +414,7 @@ def _from_name(cls, suffix=None): class _CustomMixin(object): """ Mixin for classes that define and validate calendar, holidays, - and weekdays attributes + and weekdays attributes. """ def __init__(self, weekmask, holidays, calendar): calendar, holidays = _get_calendar(weekmask=weekmask, @@ -423,11 +430,15 @@ def __init__(self, weekmask, holidays, calendar): class BusinessMixin(object): - """ Mixin to business types to provide related functions """ + """ + Mixin to business types to provide related functions. + """ @property def offset(self): - """Alias for self._offset""" + """ + Alias for self._offset. + """ # Alias for backward compat return self._offset @@ -444,7 +455,7 @@ def _repr_attrs(self): class BusinessDay(BusinessMixin, SingleConstructorOffset): """ - DateOffset subclass representing possibly n business days + DateOffset subclass representing possibly n business days. """ _prefix = 'B' _adjust_dst = True @@ -531,17 +542,21 @@ def apply_index(self, i): # to_period rolls forward to next BDay; track and # reduce n where it does when rolling forward asper = i.to_period('B') + if not isinstance(asper._data, np.ndarray): + # unwrap PeriodIndex --> PeriodArray + asper = asper._data + if self.n > 0: shifted = (i.to_perioddelta('B') - time).asi8 != 0 # Integer-array addition is deprecated, so we use # _time_shift directly roll = np.where(shifted, self.n - 1, self.n) - shifted = asper._data._addsub_int_array(roll, operator.add) + shifted = asper._addsub_int_array(roll, operator.add) else: # Integer addition is deprecated, so we use _time_shift directly roll = self.n - shifted = asper._data._time_shift(roll) + shifted = asper._time_shift(roll) result = shifted.to_timestamp() + time return result @@ -564,7 +579,9 @@ def __init__(self, start='09:00', end='17:00', offset=timedelta(0)): @cache_readonly def next_bday(self): - """used for moving to next businessday""" + """ + Used for moving to next business day. + """ if self.n >= 0: nb_offset = 1 else: @@ -637,7 +654,9 @@ def _get_business_hours_by_sec(self): @apply_wraps def rollback(self, dt): - """Roll provided date backward to next offset only if not on offset""" + """ + Roll provided date backward to next offset only if not on offset. + """ if not self.onOffset(dt): businesshours = self._get_business_hours_by_sec if self.n >= 0: @@ -650,7 +669,9 @@ def rollback(self, dt): @apply_wraps def rollforward(self, dt): - """Roll provided date forward to next offset only if not on offset""" + """ + Roll provided date forward to next offset only if not on offset. + """ if not self.onOffset(dt): if self.n >= 0: return self._next_opening_time(dt) @@ -747,7 +768,7 @@ def onOffset(self, dt): def _onOffset(self, dt, businesshours): """ - Slight speedups using calculated values + Slight speedups using calculated values. """ # if self.normalize and not _is_normalized(dt): # return False @@ -775,10 +796,9 @@ def _repr_attrs(self): class BusinessHour(BusinessHourMixin, SingleConstructorOffset): """ - DateOffset subclass representing possibly n business days + DateOffset subclass representing possibly n business days. .. versionadded:: 0.16.1 - """ _prefix = 'BH' _anchor = 0 @@ -793,7 +813,7 @@ def __init__(self, n=1, normalize=False, start='09:00', class CustomBusinessDay(_CustomMixin, BusinessDay): """ DateOffset subclass representing possibly n custom business days, - excluding holidays + excluding holidays. Parameters ---------- @@ -860,10 +880,9 @@ def onOffset(self, dt): class CustomBusinessHour(_CustomMixin, BusinessHourMixin, SingleConstructorOffset): """ - DateOffset subclass representing possibly n custom business days + DateOffset subclass representing possibly n custom business days. .. versionadded:: 0.18.1 - """ _prefix = 'CBH' _anchor = 0 @@ -914,29 +933,39 @@ def apply(self, other): @apply_index_wraps def apply_index(self, i): shifted = liboffsets.shift_months(i.asi8, self.n, self._day_opt) - return i._shallow_copy(shifted) + # TODO: going through __new__ raises on call to _validate_frequency; + # are we passing incorrect freq? + return type(i)._simple_new(shifted, freq=i.freq, tz=i.tz) class MonthEnd(MonthOffset): - """DateOffset of one month end""" + """ + DateOffset of one month end. + """ _prefix = 'M' _day_opt = 'end' class MonthBegin(MonthOffset): - """DateOffset of one month at beginning""" + """ + DateOffset of one month at beginning. + """ _prefix = 'MS' _day_opt = 'start' class BusinessMonthEnd(MonthOffset): - """DateOffset increments between business EOM dates""" + """ + DateOffset increments between business EOM dates. + """ _prefix = 'BM' _day_opt = 'business_end' class BusinessMonthBegin(MonthOffset): - """DateOffset of one business month at beginning""" + """ + DateOffset of one business month at beginning. + """ _prefix = 'BMS' _day_opt = 'business_start' @@ -944,7 +973,7 @@ class BusinessMonthBegin(MonthOffset): class _CustomBusinessMonth(_CustomMixin, BusinessMixin, MonthOffset): """ DateOffset subclass representing one custom business month, incrementing - between [BEGIN/END] of month dates + between [BEGIN/END] of month dates. Parameters ---------- @@ -974,7 +1003,9 @@ def __init__(self, n=1, normalize=False, weekmask='Mon Tue Wed Thu Fri', @cache_readonly def cbday_roll(self): - """Define default roll function to be called in apply method""" + """ + Define default roll function to be called in apply method. + """ cbday = CustomBusinessDay(n=self.n, normalize=False, **self.kwds) if self._prefix.endswith('S'): @@ -997,7 +1028,9 @@ def m_offset(self): @cache_readonly def month_roll(self): - """Define default roll function to be called in apply method""" + """ + Define default roll function to be called in apply method. + """ if self._prefix.endswith('S'): # MonthBegin roll_func = self.m_offset.rollback @@ -1087,7 +1120,9 @@ def apply(self, other): return self._apply(n, other) def _apply(self, n, other): - """Handle specific apply logic for child classes""" + """ + Handle specific apply logic for child classes. + """ raise AbstractMethodError(self) @apply_index_wraps @@ -1114,7 +1149,11 @@ def apply_index(self, i): # integer-array addition on PeriodIndex is deprecated, # so we use _addsub_int_array directly asper = i.to_period('M') - shifted = asper._data._addsub_int_array(roll // 2, operator.add) + if not isinstance(asper._data, np.ndarray): + # unwrap PeriodIndex --> PeriodArray + asper = asper._data + + shifted = asper._addsub_int_array(roll // 2, operator.add) i = type(dti)(shifted.to_timestamp()) # apply the correct day @@ -1123,7 +1162,8 @@ def apply_index(self, i): return i + time def _get_roll(self, i, before_day_of_month, after_day_of_month): - """Return an array with the correct n for each date in i. + """ + Return an array with the correct n for each date in i. The roll array is based on the fact that i gets rolled back to the first day of the month. @@ -1131,7 +1171,9 @@ def _get_roll(self, i, before_day_of_month, after_day_of_month): raise AbstractMethodError(self) def _apply_index_days(self, i, roll): - """Apply the correct day for each date in i""" + """ + Apply the correct day for each date in i. + """ raise AbstractMethodError(self) @@ -1178,7 +1220,8 @@ def _get_roll(self, i, before_day_of_month, after_day_of_month): return roll def _apply_index_days(self, i, roll): - """Add days portion of offset to DatetimeIndex i + """ + Add days portion of offset to DatetimeIndex i. Parameters ---------- @@ -1235,7 +1278,8 @@ def _get_roll(self, i, before_day_of_month, after_day_of_month): return roll def _apply_index_days(self, i, roll): - """Add days portion of offset to DatetimeIndex i + """ + Add days portion of offset to DatetimeIndex i. Parameters ---------- @@ -1255,7 +1299,7 @@ def _apply_index_days(self, i, roll): class Week(DateOffset): """ - Weekly offset + Weekly offset. Parameters ---------- @@ -1298,13 +1342,19 @@ def apply_index(self, i): if self.weekday is None: # integer addition on PeriodIndex is deprecated, # so we use _time_shift directly - shifted = i.to_period('W')._data._time_shift(self.n) + asper = i.to_period('W') + if not isinstance(asper._data, np.ndarray): + # unwrap PeriodIndex --> PeriodArray + asper = asper._data + + shifted = asper._time_shift(self.n) return shifted.to_timestamp() + i.to_perioddelta('W') else: return self._end_apply_index(i) def _end_apply_index(self, dtindex): - """Add self to the given DatetimeIndex, specialized for case where + """ + Add self to the given DatetimeIndex, specialized for case where self.weekday is non-null. Parameters @@ -1319,6 +1369,10 @@ def _end_apply_index(self, dtindex): base, mult = libfrequencies.get_freq_code(self.freqstr) base_period = dtindex.to_period(base) + if not isinstance(base_period._data, np.ndarray): + # unwrap PeriodIndex --> PeriodArray + base_period = base_period._data + if self.n > 0: # when adding, dates on end roll to next normed = dtindex - off + Timedelta(1, 'D') - Timedelta(1, 'ns') @@ -1326,13 +1380,13 @@ def _end_apply_index(self, dtindex): self.n, self.n - 1) # integer-array addition on PeriodIndex is deprecated, # so we use _addsub_int_array directly - shifted = base_period._data._addsub_int_array(roll, operator.add) + shifted = base_period._addsub_int_array(roll, operator.add) base = shifted.to_timestamp(how='end') else: # integer addition on PeriodIndex is deprecated, # so we use _time_shift directly roll = self.n - base = base_period._data._time_shift(roll).to_timestamp(how='end') + base = base_period._time_shift(roll).to_timestamp(how='end') return base + off + Timedelta(1, 'ns') - Timedelta(1, 'D') @@ -1361,7 +1415,9 @@ def _from_name(cls, suffix=None): class _WeekOfMonthMixin(object): - """Mixin for methods common to WeekOfMonth and LastWeekOfMonth""" + """ + Mixin for methods common to WeekOfMonth and LastWeekOfMonth. + """ @apply_wraps def apply(self, other): compare_day = self._get_offset_day(other) @@ -1384,7 +1440,7 @@ def onOffset(self, dt): class WeekOfMonth(_WeekOfMonthMixin, DateOffset): """ - Describes monthly dates like "the Tuesday of the 2nd week of each month" + Describes monthly dates like "the Tuesday of the 2nd week of each month". Parameters ---------- @@ -1456,7 +1512,7 @@ def _from_name(cls, suffix=None): class LastWeekOfMonth(_WeekOfMonthMixin, DateOffset): """ Describes monthly dates in last week of month like "the last Tuesday of - each month" + each month". Parameters ---------- @@ -1469,7 +1525,6 @@ class LastWeekOfMonth(_WeekOfMonthMixin, DateOffset): 4: Fridays 5: Saturdays 6: Sundays - """ _prefix = 'LWOM' _adjust_dst = True @@ -1525,7 +1580,9 @@ def _from_name(cls, suffix=None): class QuarterOffset(DateOffset): - """Quarter representation - doesn't call super""" + """ + Quarter representation - doesn't call super. + """ _default_startingMonth = None _from_name_startingMonth = None _adjust_dst = True @@ -1582,11 +1639,16 @@ def onOffset(self, dt): def apply_index(self, dtindex): shifted = liboffsets.shift_quarters(dtindex.asi8, self.n, self.startingMonth, self._day_opt) - return dtindex._shallow_copy(shifted) + # TODO: going through __new__ raises on call to _validate_frequency; + # are we passing incorrect freq? + return type(dtindex)._simple_new(shifted, freq=dtindex.freq, + tz=dtindex.tz) class BQuarterEnd(QuarterOffset): - """DateOffset increments between business Quarter dates + """ + DateOffset increments between business Quarter dates. + startingMonth = 1 corresponds to dates like 1/31/2007, 4/30/2007, ... startingMonth = 2 corresponds to dates like 2/28/2007, 5/31/2007, ... startingMonth = 3 corresponds to dates like 3/30/2007, 6/29/2007, ... @@ -1609,7 +1671,9 @@ class BQuarterBegin(QuarterOffset): class QuarterEnd(QuarterOffset): - """DateOffset increments between business Quarter dates + """ + DateOffset increments between business Quarter dates. + startingMonth = 1 corresponds to dates like 1/31/2007, 4/30/2007, ... startingMonth = 2 corresponds to dates like 2/28/2007, 5/31/2007, ... startingMonth = 3 corresponds to dates like 3/31/2007, 6/30/2007, ... @@ -1632,7 +1696,9 @@ class QuarterBegin(QuarterOffset): # Year-Based Offset Classes class YearOffset(DateOffset): - """DateOffset that just needs a month""" + """ + DateOffset that just needs a month. + """ _adjust_dst = True _attributes = frozenset(['n', 'normalize', 'month']) @@ -1653,7 +1719,10 @@ def apply_index(self, dtindex): shifted = liboffsets.shift_quarters(dtindex.asi8, self.n, self.month, self._day_opt, modby=12) - return dtindex._shallow_copy(shifted) + # TODO: going through __new__ raises on call to _validate_frequency; + # are we passing incorrect freq? + return type(dtindex)._simple_new(shifted, freq=dtindex.freq, + tz=dtindex.tz) def onOffset(self, dt): if self.normalize and not _is_normalized(dt): @@ -1683,7 +1752,9 @@ def rule_code(self): class BYearEnd(YearOffset): - """DateOffset increments between business EOM dates""" + """ + DateOffset increments between business EOM dates. + """ _outputName = 'BusinessYearEnd' _default_month = 12 _prefix = 'BA' @@ -1691,7 +1762,9 @@ class BYearEnd(YearOffset): class BYearBegin(YearOffset): - """DateOffset increments between business year begin dates""" + """ + DateOffset increments between business year begin dates. + """ _outputName = 'BusinessYearBegin' _default_month = 1 _prefix = 'BAS' @@ -1699,14 +1772,18 @@ class BYearBegin(YearOffset): class YearEnd(YearOffset): - """DateOffset increments between calendar year ends""" + """ + DateOffset increments between calendar year ends. + """ _default_month = 12 _prefix = 'A' _day_opt = 'end' class YearBegin(YearOffset): - """DateOffset increments between calendar year begin dates""" + """ + DateOffset increments between calendar year begin dates. + """ _default_month = 1 _prefix = 'AS' _day_opt = 'start' @@ -1976,8 +2053,11 @@ def isAnchored(self): return self.n == 1 and self._offset.isAnchored() def _rollback_to_year(self, other): - """roll `other` back to the most recent date that was on a fiscal year - end. Return the date of that year-end, the number of full quarters + """ + Roll `other` back to the most recent date that was on a fiscal year + end. + + Return the date of that year-end, the number of full quarters elapsed between that year-end and other, and the remaining Timedelta since the most recent quarter-end. @@ -2100,10 +2180,9 @@ def _from_name(cls, *args): class Easter(DateOffset): """ - DateOffset for the Easter holiday using - logic defined in dateutil. Right now uses - the revised method which is valid in years - 1583-4099. + DateOffset for the Easter holiday using logic defined in dateutil. + + Right now uses the revised method which is valid in years 1583-4099. """ _adjust_dst = True _attributes = frozenset(['n', 'normalize']) @@ -2137,54 +2216,6 @@ def onOffset(self, dt): return False return date(dt.year, dt.month, dt.day) == easter(dt.year) - -class CalendarDay(SingleConstructorOffset): - """ - Calendar day offset. Respects calendar arithmetic as opposed to Day which - respects absolute time. - """ - _adjust_dst = True - _inc = Timedelta(days=1) - _prefix = 'CD' - _attributes = frozenset(['n', 'normalize']) - - def __init__(self, n=1, normalize=False): - BaseOffset.__init__(self, n, normalize) - - @apply_wraps - def apply(self, other): - """ - Apply scalar arithmetic with CalendarDay offset. Incoming datetime - objects can be tz-aware or naive. - """ - if type(other) == type(self): - # Add other CalendarDays - return type(self)(self.n + other.n, normalize=self.normalize) - tzinfo = getattr(other, 'tzinfo', None) - if tzinfo is not None: - other = other.replace(tzinfo=None) - - other = other + self.n * self._inc - - if tzinfo is not None: - # This can raise a AmbiguousTimeError or NonExistentTimeError - other = conversion.localize_pydatetime(other, tzinfo) - - try: - return as_timestamp(other) - except TypeError: - raise TypeError("Cannot perform arithmetic between {other} and " - "CalendarDay".format(other=type(other))) - - @apply_index_wraps - def apply_index(self, i): - """ - Apply the CalendarDay offset to a DatetimeIndex. Incoming DatetimeIndex - objects are assumed to be tz_naive - """ - return i + self.n * self._inc - - # --------------------------------------------------------------------- # Ticks @@ -2378,12 +2409,11 @@ class Nano(Tick): # --------------------------------------------------------------------- -def generate_range(start=None, end=None, periods=None, - offset=BDay(), time_rule=None): +def generate_range(start=None, end=None, periods=None, offset=BDay()): """ Generates a sequence of dates corresponding to the specified time offset. Similar to dateutil.rrule except uses pandas DateOffset - objects to represent time increments + objects to represent time increments. Parameters ---------- @@ -2391,8 +2421,6 @@ def generate_range(start=None, end=None, periods=None, end : datetime (default None) periods : int, (default None) offset : DateOffset, (default BDay()) - time_rule : (legacy) name of DateOffset object to be used, optional - Corresponds with names expected by tseries.frequencies.get_offset Notes ----- @@ -2400,17 +2428,13 @@ def generate_range(start=None, end=None, periods=None, * At least two of (start, end, periods) must be specified. * If both start and end are specified, the returned dates will satisfy start <= date <= end. - * If both time_rule and offset are specified, time_rule supersedes offset. Returns ------- dates : generator object - """ - if time_rule is not None: - from pandas.tseries.frequencies import get_offset - - offset = get_offset(time_rule) + from pandas.tseries.frequencies import to_offset + offset = to_offset(offset) start = to_datetime(start) end = to_datetime(end) @@ -2485,6 +2509,5 @@ def generate_range(start=None, end=None, periods=None, Day, # 'D' WeekOfMonth, # 'WOM' FY5253, - FY5253Quarter, - CalendarDay # 'CD' + FY5253Quarter ]} diff --git a/pandas/util/_decorators.py b/pandas/util/_decorators.py index 818c7a51becdf..86cd8b1e698c6 100644 --- a/pandas/util/_decorators.py +++ b/pandas/util/_decorators.py @@ -1,6 +1,6 @@ -from functools import WRAPPER_ASSIGNMENTS, update_wrapper, wraps +from functools import wraps import inspect -from textwrap import dedent, wrap +from textwrap import dedent import warnings from pandas._libs.properties import cache_readonly # noqa @@ -39,26 +39,37 @@ def deprecate(name, alternative, version, alt_name=None, warning_msg = msg or '{} is deprecated, use {} instead'.format(name, alt_name) - # adding deprecated directive to the docstring - msg = msg or 'Use `{alt_name}` instead.'.format(alt_name=alt_name) - msg = '\n '.join(wrap(msg, 70)) - - @Substitution(version=version, msg=msg) - @Appender(alternative.__doc__) + @wraps(alternative) def wrapper(*args, **kwargs): - """ - .. deprecated:: %(version)s - - %(msg)s - - """ warnings.warn(warning_msg, klass, stacklevel=stacklevel) return alternative(*args, **kwargs) - # Since we are using Substitution to create the required docstring, - # remove that from the attributes that should be assigned to the wrapper - assignments = tuple(x for x in WRAPPER_ASSIGNMENTS if x != '__doc__') - update_wrapper(wrapper, alternative, assigned=assignments) + # adding deprecated directive to the docstring + msg = msg or 'Use `{alt_name}` instead.'.format(alt_name=alt_name) + doc_error_msg = ('deprecate needs a correctly formatted docstring in ' + 'the target function (should have a one liner short ' + 'summary, and opening quotes should be in their own ' + 'line). Found:\n{}'.format(alternative.__doc__)) + + # when python is running in optimized mode (i.e. `-OO`), docstrings are + # removed, so we check that a docstring with correct formatting is used + # but we allow empty docstrings + if alternative.__doc__: + if alternative.__doc__.count('\n') < 3: + raise AssertionError(doc_error_msg) + empty1, summary, empty2, doc = alternative.__doc__.split('\n', 3) + if empty1 or empty2 and not summary: + raise AssertionError(doc_error_msg) + wrapper.__doc__ = dedent(""" + {summary} + + .. deprecated:: {depr_version} + {depr_msg} + + {rest_of_docstring}""").format(summary=summary.strip(), + depr_version=version, + depr_msg=msg, + rest_of_docstring=dedent(doc)) return wrapper @@ -107,7 +118,6 @@ def deprecate_kwarg(old_arg_name, new_arg_name, mapping=None, stacklevel=2): warnings.warn(msg, FutureWarning) yes! - To raise a warning that a keyword will be removed entirely in the future >>> @deprecate_kwarg(old_arg_name='cols', new_arg_name=None) diff --git a/pandas/util/_print_versions.py b/pandas/util/_print_versions.py index 3016bf04b5258..a5c86c2cc80b3 100644 --- a/pandas/util/_print_versions.py +++ b/pandas/util/_print_versions.py @@ -85,7 +85,7 @@ def show_versions(as_json=False): ("xlrd", lambda mod: mod.__VERSION__), ("xlwt", lambda mod: mod.__VERSION__), ("xlsxwriter", lambda mod: mod.__version__), - ("lxml", lambda mod: mod.etree.__version__), + ("lxml.etree", lambda mod: mod.__version__), ("bs4", lambda mod: mod.__version__), ("html5lib", lambda mod: mod.__version__), ("sqlalchemy", lambda mod: mod.__version__), diff --git a/pandas/util/_test_decorators.py b/pandas/util/_test_decorators.py index 3f8332ade4487..0331661c3131f 100644 --- a/pandas/util/_test_decorators.py +++ b/pandas/util/_test_decorators.py @@ -158,7 +158,8 @@ def decorated_func(func): skip_if_mpl = pytest.mark.skipif(not _skip_if_no_mpl(), reason="matplotlib is present") xfail_if_mpl_2_2 = pytest.mark.xfail(_skip_if_mpl_2_2(), - reason="matplotlib 2.2") + reason="matplotlib 2.2", + strict=False) skip_if_32bit = pytest.mark.skipif(is_platform_32bit(), reason="skipping for 32 bit") skip_if_windows = pytest.mark.skipif(is_platform_windows(), diff --git a/pandas/util/move.c b/pandas/util/move.c index 9a8af5bbfbdf6..62860adb1c1f6 100644 --- a/pandas/util/move.c +++ b/pandas/util/move.c @@ -20,7 +20,7 @@ #define Py_TPFLAGS_HAVE_NEWBUFFER 0 #endif -PyObject *badmove; /* bad move exception class */ +static PyObject *badmove; /* bad move exception class */ typedef struct { PyObject_HEAD @@ -28,7 +28,7 @@ typedef struct { PyObject *invalid_bytes; } stolenbufobject; -PyTypeObject stolenbuf_type; /* forward declare type */ +static PyTypeObject stolenbuf_type; /* forward declare type */ static void stolenbuf_dealloc(stolenbufobject *self) @@ -71,7 +71,7 @@ stolenbuf_getsegcount(stolenbufobject *self, Py_ssize_t *len) return 1; } -PyBufferProcs stolenbuf_as_buffer = { +static PyBufferProcs stolenbuf_as_buffer = { (readbufferproc) stolenbuf_getreadwritebuf, (writebufferproc) stolenbuf_getreadwritebuf, (segcountproc) stolenbuf_getsegcount, @@ -81,7 +81,7 @@ PyBufferProcs stolenbuf_as_buffer = { #else /* Python 3 */ -PyBufferProcs stolenbuf_as_buffer = { +static PyBufferProcs stolenbuf_as_buffer = { (getbufferproc) stolenbuf_getbuffer, NULL, }; @@ -91,7 +91,7 @@ PyBufferProcs stolenbuf_as_buffer = { PyDoc_STRVAR(stolenbuf_doc, "A buffer that is wrapping a stolen bytes object's buffer."); -PyTypeObject stolenbuf_type = { +static PyTypeObject stolenbuf_type = { PyVarObject_HEAD_INIT(NULL, 0) "pandas.util._move.stolenbuf", /* tp_name */ sizeof(stolenbufobject), /* tp_basicsize */ @@ -185,7 +185,7 @@ move_into_mutable_buffer(PyObject *self, PyObject *bytes_rvalue) return (PyObject*) ret; } -PyMethodDef methods[] = { +static PyMethodDef methods[] = { {"move_into_mutable_buffer", (PyCFunction) move_into_mutable_buffer, METH_O, @@ -196,7 +196,7 @@ PyMethodDef methods[] = { #define MODULE_NAME "pandas.util._move" #if !COMPILING_IN_PY2 -PyModuleDef _move_module = { +static PyModuleDef move_module = { PyModuleDef_HEAD_INIT, MODULE_NAME, NULL, @@ -242,7 +242,7 @@ init_move(void) } #if !COMPILING_IN_PY2 - if (!(m = PyModule_Create(&_move_module))) + if (!(m = PyModule_Create(&move_module))) #else if (!(m = Py_InitModule(MODULE_NAME, methods))) #endif /* !COMPILING_IN_PY2 */ diff --git a/pandas/util/testing.py b/pandas/util/testing.py index 9025573c8cf6f..c9c4b99b1701a 100644 --- a/pandas/util/testing.py +++ b/pandas/util/testing.py @@ -34,7 +34,7 @@ from pandas import ( Categorical, CategoricalIndex, DataFrame, DatetimeIndex, Index, IntervalIndex, MultiIndex, Panel, PeriodIndex, RangeIndex, Series, - TimedeltaIndex, bdate_range) + bdate_range) from pandas.core.algorithms import take_1d from pandas.core.arrays import ( DatetimeArrayMixin as DatetimeArray, ExtensionArray, IntervalArray, @@ -208,6 +208,55 @@ def decompress_file(path, compression): zip_file.close() +def write_to_compressed(compression, path, data, dest="test"): + """ + Write data to a compressed file. + + Parameters + ---------- + compression : {'gzip', 'bz2', 'zip', 'xz'} + The compression type to use. + path : str + The file path to write the data. + data : str + The data to write. + dest : str, default "test" + The destination file (for ZIP only) + + Raises + ------ + ValueError : An invalid compression value was passed in. + """ + + if compression == "zip": + import zipfile + compress_method = zipfile.ZipFile + elif compression == "gzip": + import gzip + compress_method = gzip.GzipFile + elif compression == "bz2": + import bz2 + compress_method = bz2.BZ2File + elif compression == "xz": + lzma = compat.import_lzma() + compress_method = lzma.LZMAFile + else: + msg = "Unrecognized compression type: {}".format(compression) + raise ValueError(msg) + + if compression == "zip": + mode = "w" + args = (dest, data) + method = "writestr" + else: + mode = "wb" + args = (data,) + method = "write" + + with compress_method(path, mode=mode) as f: + getattr(f, method)(*args) + + def assert_almost_equal(left, right, check_dtype="equiv", check_less_precise=False, **kwargs): """ @@ -625,7 +674,7 @@ def capture_stdout(f): AssertionError: assert 'foo\n' == 'bar\n' """ - @wraps(f) + @compat.wraps(f) def wrapper(*args, **kwargs): try: sys.stdout = StringIO() @@ -782,6 +831,22 @@ def ensure_clean_dir(): pass +@contextmanager +def ensure_safe_environment_variables(): + """ + Get a context manager to safely set environment variables + + All changes will be undone on close, hence environment variables set + within this contextmanager will neither persist nor change global state. + """ + saved_environ = dict(os.environ) + try: + yield + finally: + os.environ.clear() + os.environ.update(saved_environ) + + # ----------------------------------------------------------------------------- # Comparators @@ -838,7 +903,7 @@ def _check_types(l, r, obj='Index'): def _get_ilevel_values(index, level): # accept level number only unique = index.levels[level] - labels = index.labels[level] + labels = index.codes[level] filled = take_1d(unique.values, labels, fill_value=unique._na_value) values = unique._shallow_copy(filled, name=index.names[level]) return values @@ -1073,6 +1138,7 @@ def assert_period_array_equal(left, right, obj='PeriodArray'): def assert_datetime_array_equal(left, right, obj='DatetimeArray'): + __tracebackhide__ = True _check_isinstance(left, right, DatetimeArray) assert_numpy_array_equal(left._data, right._data, @@ -1082,6 +1148,7 @@ def assert_datetime_array_equal(left, right, obj='DatetimeArray'): def assert_timedelta_array_equal(left, right, obj='TimedeltaArray'): + __tracebackhide__ = True _check_isinstance(left, right, TimedeltaArray) assert_numpy_array_equal(left._data, right._data, obj='{obj}._data'.format(obj=obj)) @@ -1338,11 +1405,11 @@ def assert_series_equal(left, right, check_dtype=True, assert_numpy_array_equal(left.get_values(), right.get_values(), check_dtype=check_dtype) elif is_interval_dtype(left) or is_interval_dtype(right): - assert_interval_array_equal(left.values, right.values) + assert_interval_array_equal(left.array, right.array) elif (is_extension_array_dtype(left) and not is_categorical_dtype(left) and is_extension_array_dtype(right) and not is_categorical_dtype(right)): - return assert_extension_array_equal(left.values, right.values) + return assert_extension_array_equal(left.array, right.array) else: _testing.assert_almost_equal(left.get_values(), right.get_values(), @@ -1659,9 +1726,9 @@ def to_array(obj): if is_period_dtype(obj): return period_array(obj) elif is_datetime64_dtype(obj) or is_datetime64tz_dtype(obj): - return DatetimeArray(obj) + return DatetimeArray._from_sequence(obj) elif is_timedelta64_dtype(obj): - return TimedeltaArray(obj) + return TimedeltaArray._from_sequence(obj) else: return np.array(obj) @@ -1938,8 +2005,8 @@ def makeDateIndex(k=10, freq='B', name=None, **kwargs): def makeTimedeltaIndex(k=10, freq='D', name=None, **kwargs): - return TimedeltaIndex(start='1 day', periods=k, freq=freq, - name=name, **kwargs) + return pd.timedelta_range(start='1 day', periods=k, freq=freq, + name=name, **kwargs) def makePeriodIndex(k=10, name=None, **kwargs): diff --git a/requirements-dev.txt b/requirements-dev.txt index d01a21ac5fed5..a7aa0bacb5bd6 100644 --- a/requirements-dev.txt +++ b/requirements-dev.txt @@ -1,23 +1,24 @@ -NumPy +numpy>=1.15 python-dateutil>=2.5.0 pytz -Cython>=0.28.2 +asv +cython>=0.28.2 flake8 flake8-comprehensions -flake8-rst==0.4.2 +flake8-rst>=0.6.0 gitpython -hypothesis>=3.58.0 +hypothesis>=3.82 isort moto -pytest>=3.6 -setuptools>=24.2.0 +pytest>=4.0 sphinx -sphinxcontrib-spelling +numpydoc beautifulsoup4>=4.2.1 blosc +botocore>=1.11 +boto3 bottleneck>=1.2.0 fastparquet>=0.1.2 -gcsfs html5lib ipython>=5.6.0 ipykernel @@ -25,19 +26,19 @@ jinja2 lxml matplotlib>=2.0.0 nbsphinx -numexpr>=2.6.1 +numexpr>=2.6.8 openpyxl pyarrow>=0.7.0 -pymysql tables>=3.4.2 pytest-cov pytest-xdist s3fs -scipy>=0.18.1 +scipy>=1.1 seaborn sqlalchemy statsmodels xarray xlrd xlsxwriter -xlwt \ No newline at end of file +xlwt +cpplint \ No newline at end of file diff --git a/scripts/generate_pip_deps_from_conda.py b/scripts/generate_pip_deps_from_conda.py index 1f79b23a259dc..7b6eb1f9a32b5 100755 --- a/scripts/generate_pip_deps_from_conda.py +++ b/scripts/generate_pip_deps_from_conda.py @@ -75,7 +75,18 @@ def main(conda_fname, pip_fname, compare=False): with open(conda_fname) as conda_fd: deps = yaml.safe_load(conda_fd)['dependencies'] - pip_content = '\n'.join(filter(None, map(conda_package_to_pip, deps))) + pip_deps = [] + for dep in deps: + if isinstance(dep, str): + conda_dep = conda_package_to_pip(dep) + if conda_dep: + pip_deps.append(conda_dep) + elif isinstance(dep, dict) and len(dep) == 1 and 'pip' in dep: + pip_deps += dep['pip'] + else: + raise ValueError('Unexpected dependency {}'.format(dep)) + + pip_content = '\n'.join(pip_deps) if compare: with open(pip_fname) as pip_fd: @@ -92,6 +103,9 @@ def main(conda_fname, pip_fname, compare=False): argparser.add_argument('--compare', action='store_true', help='compare whether the two files are equivalent') + argparser.add_argument('--azure', + action='store_true', + help='show the output in azure-pipelines format') args = argparser.parse_args() repo_path = os.path.dirname(os.path.abspath(os.path.dirname(__file__))) @@ -99,7 +113,10 @@ def main(conda_fname, pip_fname, compare=False): os.path.join(repo_path, 'requirements-dev.txt'), compare=args.compare) if res: - sys.stderr.write('`requirements-dev.txt` has to be generated with ' - '`{}` after `environment.yml` is modified.\n'.format( - sys.argv[0])) + msg = ('`requirements-dev.txt` has to be generated with `{}` after ' + '`environment.yml` is modified.\n'.format(sys.argv[0])) + if args.azure: + msg = ('##vso[task.logissue type=error;' + 'sourcepath=requirements-dev.txt]{}'.format(msg)) + sys.stderr.write(msg) sys.exit(res) diff --git a/scripts/tests/test_validate_docstrings.py b/scripts/tests/test_validate_docstrings.py index ca3efbfce20a7..ca09cbb23d145 100644 --- a/scripts/tests/test_validate_docstrings.py +++ b/scripts/tests/test_validate_docstrings.py @@ -407,6 +407,21 @@ def sections_in_wrong_order(self): before Examples. """ + def deprecation_in_wrong_order(self): + """ + This docstring has the deprecation warning in the wrong order. + + This is the extended summary. The correct order should be + summary, deprecation warning, extended summary. + + .. deprecated:: 1.0 + This should generate an error as it needs to go before + extended summary. + """ + + def method_wo_docstrings(self): + pass + class BadSummaries(object): @@ -769,6 +784,8 @@ def test_bad_generic_functions(self, func): ('BadGenericDocStrings', 'sections_in_wrong_order', ('Sections are in the wrong order. Correct order is: Parameters, ' 'See Also, Examples',)), + ('BadGenericDocStrings', 'deprecation_in_wrong_order', + ('Deprecation warning should precede extended summary',)), ('BadSeeAlso', 'desc_no_period', ('Missing period at end of description for See Also "Series.iloc"',)), ('BadSeeAlso', 'desc_first_letter_lowercase', @@ -826,6 +843,8 @@ def test_bad_generic_functions(self, func): ('Do not import numpy, as it is imported automatically',)), ('BadGenericDocStrings', 'method', ('Do not import pandas, as it is imported automatically',)), + ('BadGenericDocStrings', 'method_wo_docstrings', + ("The object does not have a docstring",)), # See Also tests ('BadSeeAlso', 'prefix_pandas', ('pandas.Series.rename in `See Also` section ' diff --git a/scripts/validate_docstrings.py b/scripts/validate_docstrings.py index 2039fda90ef0f..2baac5f2c7e31 100755 --- a/scripts/validate_docstrings.py +++ b/scripts/validate_docstrings.py @@ -78,6 +78,8 @@ '{allowed_sections}', 'GL07': 'Sections are in the wrong order. Correct order is: ' '{correct_sections}', + 'GL08': 'The object does not have a docstring', + 'GL09': 'Deprecation warning should precede extended summary', 'SS01': 'No summary found (a short summary in a single line should be ' 'present at the beginning of the docstring)', 'SS02': 'Summary does not start with a capital letter', @@ -173,6 +175,7 @@ def get_api_items(api_doc_fd): The name of the subsection in the API page where the object item is located. """ + current_module = 'pandas' previous_line = current_section = current_subsection = '' position = None for line in api_doc_fd: @@ -490,12 +493,14 @@ def first_line_ends_in_dot(self): if self.doc: return self.doc.split('\n')[0][-1] == '.' + @property + def deprecated_with_directive(self): + return '.. deprecated:: ' in (self.summary + self.extended_summary) + @property def deprecated(self): - pattern = re.compile('.. deprecated:: ') return (self.name.startswith('pandas.Panel') - or bool(pattern.search(self.summary)) - or bool(pattern.search(self.extended_summary))) + or self.deprecated_with_directive) @property def mentioned_private_classes(self): @@ -545,20 +550,24 @@ def validate_pep8(self): yield from application.guide.stats.statistics_for('') -def validate_one(func_name): +def get_validation_data(doc): """ - Validate the docstring for the given func_name + Validate the docstring. Parameters ---------- - func_name : function - Function whose docstring will be evaluated (e.g. pandas.read_csv). + doc : Docstring + A Docstring object with the given function name. Returns ------- - dict - A dictionary containing all the information obtained from validating - the docstring. + tuple + errors : list of tuple + Errors occurred during validation. + warnings : list of tuple + Warnings occurred during validation. + examples_errs : str + Examples usage displayed along the error, otherwise empty string. Notes ----- @@ -585,10 +594,13 @@ def validate_one(func_name): they are validated, are not documented more than in the source code of this function. """ - doc = Docstring(func_name) errs = [] wrns = [] + if not doc.raw_doc: + errs.append(error('GL08')) + return errs, wrns, '' + if doc.start_blank_lines != 1: errs.append(error('GL01')) if doc.end_blank_lines != 1: @@ -616,6 +628,10 @@ def validate_one(func_name): errs.append(error('GL07', correct_sections=', '.join(correct_order))) + if (doc.deprecated_with_directive + and not doc.extended_summary.startswith('.. deprecated:: ')): + errs.append(error('GL09')) + if not doc.summary: errs.append(error('SS01')) else: @@ -706,7 +722,26 @@ def validate_one(func_name): for wrong_import in ('numpy', 'pandas'): if 'import {}'.format(wrong_import) in examples_source_code: errs.append(error('EX04', imported_library=wrong_import)) + return errs, wrns, examples_errs + +def validate_one(func_name): + """ + Validate the docstring for the given func_name + + Parameters + ---------- + func_name : function + Function whose docstring will be evaluated (e.g. pandas.read_csv). + + Returns + ------- + dict + A dictionary containing all the information obtained from validating + the docstring. + """ + doc = Docstring(func_name) + errs, wrns, examples_errs = get_validation_data(doc) return {'type': doc.type, 'docstring': doc.clean_doc, 'deprecated': doc.deprecated, diff --git a/setup.cfg b/setup.cfg index e8db1308741aa..380100df774c1 100644 --- a/setup.cfg +++ b/setup.cfg @@ -31,25 +31,30 @@ exclude = env # exclude asv benchmark environments from linting [flake8-rst] -ignore = - F821, # undefined name - W391, # blank line at end of file [Seems to be a bug (v0.4.1)] +bootstrap = + import numpy as np + import pandas as pd + np # avoiding error when importing again numpy or pandas + pd # (in some cases we want to do it to show users) +ignore = E402, # module level import not at top of file + W503, # line break before binary operator + # Classes/functions in different blocks can generate those errors + E302, # expected 2 blank lines, found 0 + E305, # expected 2 blank lines after class or function definition, found 0 + # We use semicolon at the end to avoid displaying plot objects + E703, # statement ends with a semicolon + exclude = - doc/source/whatsnew/v0.7.0.rst - doc/source/whatsnew/v0.10.1.rst - doc/source/whatsnew/v0.12.0.rst - doc/source/whatsnew/v0.13.0.rst - doc/source/whatsnew/v0.13.1.rst - doc/source/whatsnew/v0.14.0.rst doc/source/whatsnew/v0.15.0.rst - doc/source/whatsnew/v0.16.0.rst - doc/source/whatsnew/v0.16.2.rst + doc/source/whatsnew/v0.15.1.rst + doc/source/whatsnew/v0.15.2.rst doc/source/whatsnew/v0.17.0.rst - doc/source/whatsnew/v0.18.0.rst - doc/source/whatsnew/v0.18.1.rst - doc/source/whatsnew/v0.20.0.rst - doc/source/whatsnew/v0.21.0.rst - doc/source/whatsnew/v0.23.0.rst + doc/source/whatsnew/v0.17.1.rst + doc/source/basics.rst + doc/source/contributing_docstring.rst + doc/source/enhancingperf.rst + doc/source/groupby.rst + [yapf] based_on_style = pep8 @@ -67,6 +72,7 @@ markers = clipboard: mark a pd.read_clipboard test doctest_optionflags = NORMALIZE_WHITESPACE IGNORE_EXCEPTION_DETAIL addopts = --strict-data-files +xfail_strict = True [coverage:run] branch = False @@ -105,39 +111,14 @@ known_post_core=pandas.tseries,pandas.io,pandas.plotting sections=FUTURE,STDLIB,THIRDPARTY,PRE_CORE,DTYPES,FIRSTPARTY,POST_CORE,LOCALFOLDER known_first_party=pandas -known_third_party=Cython,numpy,python-dateutil,pytz,pyarrow,pytest +known_third_party=Cython,numpy,dateutil,matplotlib,python-dateutil,pytz,pyarrow,pytest multi_line_output=4 force_grid_wrap=0 combine_as_imports=True force_sort_within_sections=True skip= - pandas/core/ops.py, - pandas/core/categorical.py, pandas/core/api.py, - pandas/core/indexing.py, - pandas/core/apply.py, - pandas/core/generic.py, - pandas/core/sorting.py, pandas/core/frame.py, - pandas/core/nanops.py, - pandas/core/algorithms.py, - pandas/core/strings.py, - pandas/core/panel.py, - pandas/core/config.py, - pandas/core/resample.py, - pandas/core/base.py, - pandas/core/common.py, - pandas/core/missing.py, - pandas/core/config_init.py, - pandas/core/indexes/category.py, - pandas/core/indexes/api.py, - pandas/core/indexes/numeric.py, - pandas/core/indexes/interval.py, - pandas/core/indexes/multi.py, - pandas/core/indexes/base.py, - pandas/core/indexes/accessors.py, - pandas/core/indexes/period.py, - pandas/core/indexes/frozen.py, pandas/tests/test_errors.py, pandas/tests/test_base.py, pandas/tests/test_register_accessor.py, @@ -147,7 +128,6 @@ skip= pandas/tests/test_common.py, pandas/tests/test_compat.py, pandas/tests/test_sorting.py, - pandas/tests/test_resample.py, pandas/tests/test_algos.py, pandas/tests/test_expressions.py, pandas/tests/test_strings.py, @@ -200,7 +180,6 @@ skip= pandas/tests/io/test_parquet.py, pandas/tests/io/generate_legacy_storage_files.py, pandas/tests/io/test_common.py, - pandas/tests/io/test_excel.py, pandas/tests/io/test_feather.py, pandas/tests/io/test_s3.py, pandas/tests/io/test_html.py, @@ -278,14 +257,6 @@ skip= pandas/tests/groupby/aggregate/test_cython.py, pandas/tests/groupby/aggregate/test_other.py, pandas/tests/groupby/aggregate/test_aggregate.py, - pandas/tests/tseries/test_frequencies.py, - pandas/tests/tseries/test_holiday.py, - pandas/tests/tseries/offsets/test_offsets_properties.py, - pandas/tests/tseries/offsets/test_yqm_offsets.py, - pandas/tests/tseries/offsets/test_offsets.py, - pandas/tests/tseries/offsets/test_ticks.py, - pandas/tests/tseries/offsets/conftest.py, - pandas/tests/tseries/offsets/test_fiscal.py, pandas/tests/plotting/test_datetimelike.py, pandas/tests/plotting/test_series.py, pandas/tests/plotting/test_groupby.py, @@ -331,11 +302,6 @@ skip= pandas/tests/frame/test_mutate_columns.py, pandas/tests/frame/test_alter_axes.py, pandas/tests/frame/test_rank.py, - pandas/tests/generic/test_generic.py, - pandas/tests/generic/test_label_or_level_utils.py, - pandas/tests/generic/test_series.py, - pandas/tests/generic/test_frame.py, - pandas/tests/generic/test_panel.py, pandas/tests/reshape/test_concat.py, pandas/tests/reshape/test_util.py, pandas/tests/reshape/test_reshape.py, @@ -367,13 +333,40 @@ skip= pandas/tests/sparse/frame/conftest.py, pandas/tests/computation/test_compat.py, pandas/tests/computation/test_eval.py, - pandas/plotting/_core.py, - pandas/plotting/_style.py, - pandas/plotting/_timeseries.py, - pandas/plotting/_tools.py, - pandas/plotting/_converter.py, - pandas/plotting/_misc.py, pandas/types/common.py, - pandas/plotting/_compat.py, - pandas/tests/extension/arrow/test_bool.py - doc/source/conf.py + pandas/tests/extension/arrow/test_bool.py, + doc/source/conf.py, + asv_bench/benchmarks/algorithms.py, + asv_bench/benchmarks/attrs_caching.py, + asv_bench/benchmarks/binary_ops.py, + asv_bench/benchmarks/categoricals.py, + asv_bench/benchmarks/ctors.py, + asv_bench/benchmarks/eval.py, + asv_bench/benchmarks/frame_ctor.py, + asv_bench/benchmarks/frame_methods.py, + asv_bench/benchmarks/gil.py, + asv_bench/benchmarks/groupby.py, + asv_bench/benchmarks/index_object.py, + asv_bench/benchmarks/indexing.py, + asv_bench/benchmarks/inference.py, + asv_bench/benchmarks/io/csv.py, + asv_bench/benchmarks/io/excel.py, + asv_bench/benchmarks/io/hdf.py, + asv_bench/benchmarks/io/json.py, + asv_bench/benchmarks/io/msgpack.py, + asv_bench/benchmarks/io/pickle.py, + asv_bench/benchmarks/io/sql.py, + asv_bench/benchmarks/io/stata.py, + asv_bench/benchmarks/join_merge.py, + asv_bench/benchmarks/multiindex_object.py, + asv_bench/benchmarks/panel_ctor.py, + asv_bench/benchmarks/panel_methods.py, + asv_bench/benchmarks/plotting.py, + asv_bench/benchmarks/reindex.py, + asv_bench/benchmarks/replace.py, + asv_bench/benchmarks/reshape.py, + asv_bench/benchmarks/rolling.py, + asv_bench/benchmarks/series_methods.py, + asv_bench/benchmarks/sparse.py, + asv_bench/benchmarks/stat_ops.py, + asv_bench/benchmarks/timeseries.py