Skip to content

get_indexer_non_unique for orderable indexes #15372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 54 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
c1b657e
get_indexer_non_unique for orderable indexes
horta Feb 12, 2017
2f971a2
BUG: Avoid grafting missing examples directory (#15373)
neirbowj Feb 12, 2017
1bad601
CLN: remove pandas/io/auth.py, from ga.py (now removed) (#15374)
jreback Feb 12, 2017
5fb5228
TST: consolidate remaining tests under pandas.tests
jreback Feb 12, 2017
1bcc10d
TST: fix locations for github based url tests
jreback Feb 12, 2017
f87db63
DOC: fix path in whatsnew
jreback Feb 12, 2017
1190ac6
TST: use xdist for multiple cpu testing
jreback Feb 11, 2017
0915857
Typo (#15377)
andrewkittredge Feb 12, 2017
a0f7fc0
TST: control skipping of numexpr tests if its installed / used
jreback Feb 12, 2017
dda3c42
TST: make test_gbq single cpu
jreback Feb 12, 2017
47f7ce3
C level list
horta Feb 12, 2017
09dd91b
no gil
horta Feb 12, 2017
010393c
ENH: expose Int64VectorData in hashtable.pxd
jreback Feb 13, 2017
d9e75c7
TST: xfail most test_gbq tests for now
jreback Feb 13, 2017
2e55efc
capture index error
horta Feb 13, 2017
6916dad
wrong exception handling
horta Feb 13, 2017
86ca84d
TST: Fix gbq integration tests. gbq._Dataset.dataset() would not retu…
parthea Feb 14, 2017
ff0deec
Bug: Raise ValueError with interpolate & fillna limit = 0 (#9217)
mroeschke Feb 14, 2017
5959fe1
CLN: create core/sorting.py
jreback Feb 14, 2017
4b97db4
TST: disable gbq tests again
jreback Feb 15, 2017
25fb173
TST: fix incorrect url in compressed url network tests in parser
jreback Feb 15, 2017
03bb900
TST: incorrect skip in when --skip-network is run
jreback Feb 15, 2017
bbb583c
TST: fix test_nework.py fixture under py27
jreback Feb 15, 2017
2372d27
BLD: Numexpr 2.4.6 required
Feb 15, 2017
b261dfe
TST: print skipped tests files
jreback Feb 15, 2017
e351ed0
PERF: high memory in MI
jreback Feb 15, 2017
93f5e3a
STYLE: flake8 upgraded to 3.3 on conda (#15412)
jreback Feb 15, 2017
86ef3ca
DOC: use shared_docs for Index.get_indexer, get_indexer_non_unique (#…
jreback Feb 15, 2017
d6f8b46
BLD: use latest conda version with latest miniconda installer on appv…
jreback Feb 15, 2017
f2246cf
TST: convert yield based test_pickle.py to parametrized to remove war…
jreback Feb 16, 2017
ddb22f5
TST: Parametrize simple yield tests
QuLogic Feb 16, 2017
5a8883b
BUG: Ensure the right values are set in SeriesGroupBy.nunique
Feb 16, 2017
c7300ea
BUG: Concat with inner join and empty DataFrame
abaldenko Feb 16, 2017
9b5d848
ENH: Added ability to freeze panes from DataFrame.to_excel() (#15160)
jeffcarey Feb 16, 2017
c588dd1
Documents touch-up for DataFrame.to_excel() freeze_panes option (#15436)
jeffcarey Feb 17, 2017
f4e672c
BUG: to_sql convert index name to string (#15404) (#15423)
redbullpeter Feb 17, 2017
54b6c6e
DOC: add whatsnew for #15423
jorisvandenbossche Feb 17, 2017
763f42f
TST: remove yielding tests from test_msgpacks.py (#15427)
jreback Feb 17, 2017
f65a641
ENH: Don't add rowspan/colspan if it's 1.
QuLogic Feb 17, 2017
a17a03a
DOC: correct rpy2 examples (GH15142) (#15450)
jorisvandenbossche Feb 18, 2017
29aeffb
BUG: rolling not accepting Timedelta-like window args (#15443)
mroeschke Feb 18, 2017
be4a63f
BUG: testing on windows
jreback Feb 18, 2017
c7a1e00
get_indexer_non_unique for orderable indexes
horta Feb 12, 2017
34545d4
Merge branch 'master' of https://github.com/Horta/pandas
horta Feb 20, 2017
390bfb2
get_indexer_non_unique for orderable indexes
horta Feb 12, 2017
f38cf52
C level list
horta Feb 12, 2017
9dabf34
no gil
horta Feb 12, 2017
f61b98f
capture index error
horta Feb 13, 2017
6afb8c9
wrong exception handling
horta Feb 13, 2017
5494a4c
Merge branch 'master' of https://github.com/Horta/pandas
horta Feb 20, 2017
bf4b3f5
fixed-size arrays for get_index mapping
horta Feb 20, 2017
0f37a64
dtype=np.int64
horta Feb 20, 2017
3c218ce
empty and zeros with np.int64
horta Feb 20, 2017
74ce239
as array
horta Feb 20, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 4 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,8 @@ before_script:
script:
- echo "script start"
- ci/run_build_docs.sh
- ci/script.sh
- ci/script_single.sh
- ci/script_multi.sh
- ci/lint.sh
- echo "script done"

Expand All @@ -331,5 +332,6 @@ after_script:
- echo "after_script start"
- ci/install_test.sh
- source activate pandas && python -c "import pandas; pandas.show_versions();"
- ci/print_skipped.py /tmp/pytest.xml
- ci/print_skipped.py /tmp/single.xml
- ci/print_skipped.py /tmp/multiple.xml
- echo "after_script done"
1 change: 0 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ include setup.py
graft doc
prune doc/build

graft examples
graft pandas

global-exclude *.so
Expand Down
25 changes: 10 additions & 15 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,19 +18,19 @@ environment:

matrix:

- CONDA_ROOT: "C:\\Miniconda3.5_64"
- CONDA_ROOT: "C:\\Miniconda3_64"
PYTHON_VERSION: "3.6"
PYTHON_ARCH: "64"
CONDA_PY: "36"
CONDA_NPY: "111"
CONDA_NPY: "112"

- CONDA_ROOT: "C:\\Miniconda3.5_64"
- CONDA_ROOT: "C:\\Miniconda3_64"
PYTHON_VERSION: "2.7"
PYTHON_ARCH: "64"
CONDA_PY: "27"
CONDA_NPY: "110"

- CONDA_ROOT: "C:\\Miniconda3.5_64"
- CONDA_ROOT: "C:\\Miniconda3_64"
PYTHON_VERSION: "3.5"
PYTHON_ARCH: "64"
CONDA_PY: "35"
Expand Down Expand Up @@ -66,8 +66,7 @@ install:

# install our build environment
- cmd: conda config --set show_channel_urls true --set always_yes true --set changeps1 false
#- cmd: conda update -q conda
- cmd: conda install conda=4.2.15
- cmd: conda update -q conda
- cmd: conda config --set ssl_verify false

# add the pandas channel *before* defaults to have defaults take priority
Expand All @@ -79,23 +78,19 @@ install:
# this is now the downloaded conda...
- cmd: conda info -a

# build em using the local source checkout in the correct windows env
- cmd: '%CMD_IN_ENV% conda build ci\appveyor.recipe -q'

# create our env
- cmd: conda create -q -n pandas python=%PYTHON_VERSION% nose pytest
- cmd: conda create -q -n pandas python=%PYTHON_VERSION% cython pytest
- cmd: activate pandas
- SET REQ=ci\requirements-%PYTHON_VERSION%-%PYTHON_ARCH%.run
- cmd: echo "installing requirements from %REQ%"
- cmd: conda install -n pandas -q --file=%REQ%
- cmd: conda list -n pandas
- cmd: echo "installing requirements from %REQ% - done"
- ps: conda install -n pandas (conda build ci\appveyor.recipe -q --output)

# build em using the local source checkout in the correct windows env
- cmd: '%CMD_IN_ENV% python setup.py build_ext --inplace'

test_script:
# tests
- cmd: activate pandas
- cmd: conda list
- cmd: cd \
- cmd: python -c "import pandas; pandas.test(['--skip-slow', '--skip-network'])"

- cmd: test.bat
30 changes: 28 additions & 2 deletions asv_bench/benchmarks/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ def setup(self):

def time_getitem_scalar(self):
self.ts[self.dt]


class DataFrameIndexing(object):
goal_time = 0.2
Expand Down Expand Up @@ -189,6 +189,15 @@ def setup(self):
self.eps_C = 5
self.eps_D = 5000
self.mdt2 = self.mdt.set_index(['A', 'B', 'C', 'D']).sortlevel()
self.miint = MultiIndex.from_product(
[np.arange(1000),
np.arange(1000)], names=['one', 'two'])

import string
self.mistring = MultiIndex.from_product(
[np.arange(1000),
np.arange(20), list(string.ascii_letters)],
names=['one', 'two', 'three'])

def time_series_xs_mi_ix(self):
self.s.ix[999]
Expand All @@ -197,7 +206,24 @@ def time_frame_xs_mi_ix(self):
self.df.ix[999]

def time_multiindex_slicers(self):
self.mdt2.loc[self.idx[(self.test_A - self.eps_A):(self.test_A + self.eps_A), (self.test_B - self.eps_B):(self.test_B + self.eps_B), (self.test_C - self.eps_C):(self.test_C + self.eps_C), (self.test_D - self.eps_D):(self.test_D + self.eps_D)], :]
self.mdt2.loc[self.idx[
(self.test_A - self.eps_A):(self.test_A + self.eps_A),
(self.test_B - self.eps_B):(self.test_B + self.eps_B),
(self.test_C - self.eps_C):(self.test_C + self.eps_C),
(self.test_D - self.eps_D):(self.test_D + self.eps_D)], :]

def time_multiindex_get_indexer(self):
self.miint.get_indexer(
np.array([(0, 10), (0, 11), (0, 12),
(0, 13), (0, 14), (0, 15),
(0, 16), (0, 17), (0, 18),
(0, 19)], dtype=object))

def time_multiindex_string_get_loc(self):
self.mistring.get_loc((999, 19, 'Z'))

def time_is_monotonic(self):
self.miint.is_monotonic


class PanelIndexing(object):
Expand Down
4 changes: 2 additions & 2 deletions asv_bench/benchmarks/reindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ def setup(self):
data=np.random.rand(10000, 30), columns=range(30))

# multi-index
N = 1000
K = 20
N = 5000
K = 200
level1 = tm.makeStringIndex(N).values.repeat(K)
level2 = np.tile(tm.makeStringIndex(K).values, N)
index = MultiIndex.from_arrays([level1, level2])
Expand Down
2 changes: 0 additions & 2 deletions ci/appveyor.recipe/bld.bat

This file was deleted.

2 changes: 0 additions & 2 deletions ci/appveyor.recipe/build.sh

This file was deleted.

37 changes: 0 additions & 37 deletions ci/appveyor.recipe/meta.yaml

This file was deleted.

1 change: 1 addition & 0 deletions ci/install_travis.sh
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ fi
source activate pandas

pip install pytest-xdist

if [ "$LINT" ]; then
conda install flake8
pip install cpplint
Expand Down
7 changes: 4 additions & 3 deletions ci/print_skipped.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,21 @@ def parse_results(filename):
i += 1
assert i - 1 == len(skipped)
assert i - 1 == len(skipped)
assert len(skipped) == int(root.attrib['skip'])
# assert len(skipped) == int(root.attrib['skip'])
return '\n'.join(skipped)


def main(args):
print('SKIPPED TESTS:')
print(parse_results(args.filename))
for fn in args.filename:
print(parse_results(fn))
return 0


def parse_args():
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('filename', help='XUnit file to parse')
parser.add_argument('filename', nargs='+', help='XUnit file to parse')
return parser.parse_args()


Expand Down
2 changes: 1 addition & 1 deletion ci/requirements-3.4_SLOW.run
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ html5lib
patsy
beautiful-soup
scipy
numexpr=2.4.4
numexpr=2.4.6
pytables
matplotlib
lxml
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements-3.5-64.run
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
python-dateutil
pytz
numpy
numpy=1.11*
openpyxl
xlsxwriter
xlrd
Expand Down
4 changes: 2 additions & 2 deletions ci/requirements-3.6-64.run
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
python-dateutil
pytz
numpy
numpy=1.12*
openpyxl
xlsxwriter
xlrd
#xlwt
xlwt
scipy
feather-format
numexpr
Expand Down
38 changes: 38 additions & 0 deletions ci/script_multi.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/bin/bash

echo "[script multi]"

source activate pandas

# don't run the tests for the doc build
if [ x"$DOC_BUILD" != x"" ]; then
exit 0
fi

if [ -n "$LOCALE_OVERRIDE" ]; then
export LC_ALL="$LOCALE_OVERRIDE";
echo "Setting LC_ALL to $LOCALE_OVERRIDE"

pycmd='import pandas; print("pandas detected console encoding: %s" % pandas.get_option("display.encoding"))'
python -c "$pycmd"
fi

# Workaround for pytest-xdist flaky collection order
# https://github.com/pytest-dev/pytest/issues/920
# https://github.com/pytest-dev/pytest/issues/1075
export PYTHONHASHSEED=$(python -c 'import random; print(random.randint(1, 4294967295))')
echo PYTHONHASHSEED=$PYTHONHASHSEED

if [ "$BUILD_TEST" ]; then
echo "We are not running pytest as this is simply a build test."
elif [ "$COVERAGE" ]; then
echo pytest -s -n 2 -m "not single" --cov=pandas --cov-append --cov-report xml:/tmp/cov.xml --junitxml=/tmp/multiple.xml $TEST_ARGS pandas
pytest -s -n 2 -m "not single" --cov=pandas --cov-append --cov-report xml:/tmp/cov.xml --junitxml=/tmp/multiple.xml $TEST_ARGS pandas
else
echo pytest -n 2 -m "not single" --junitxml=/tmp/multiple.xml $TEST_ARGS pandas
pytest -n 2 -m "not single" --junitxml=/tmp/multiple.xml $TEST_ARGS pandas # TODO: doctest
fi

RET="$?"

exit "$RET"
10 changes: 5 additions & 5 deletions ci/script.sh → ci/script_single.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

echo "inside $0"
echo "[script_single]"

source activate pandas

Expand All @@ -20,11 +20,11 @@ fi
if [ "$BUILD_TEST" ]; then
echo "We are not running pytest as this is simply a build test."
elif [ "$COVERAGE" ]; then
echo pytest -s --cov=pandas --cov-report xml:/tmp/pytest.xml $TEST_ARGS pandas
pytest -s --cov=pandas --cov-report xml:/tmp/pytest.xml $TEST_ARGS pandas
echo pytest -s -m "single" --cov=pandas --cov-report xml:/tmp/cov.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas
pytest -s -m "single" --cov=pandas --cov-report xml:/tmp/cov.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas
else
echo pytest $TEST_ARGS pandas
pytest $TEST_ARGS pandas # TODO: doctest
echo pytest -m "single" --junitxml=/tmp/single.xml $TEST_ARGS pandas
pytest -m "single" --junitxml=/tmp/single.xml $TEST_ARGS pandas # TODO: doctest
fi

RET="$?"
Expand Down
2 changes: 1 addition & 1 deletion doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Creating a MultiIndex (hierarchical index) object

The ``MultiIndex`` object is the hierarchical analogue of the standard
``Index`` object which typically stores the axis labels in pandas objects. You
can think of ``MultiIndex`` an array of tuples where each tuple is unique. A
can think of ``MultiIndex`` as an array of tuples where each tuple is unique. A
``MultiIndex`` can be created from a list of arrays (using
``MultiIndex.from_arrays``), an array of tuples (using
``MultiIndex.from_tuples``), or a crossed set of iterables (using
Expand Down
2 changes: 1 addition & 1 deletion doc/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ Recommended Dependencies

* `numexpr <https://github.com/pydata/numexpr>`__: for accelerating certain numerical operations.
``numexpr`` uses multiple cores as well as smart chunking and caching to achieve large speedups.
If installed, must be Version 2.1 or higher (excluding a buggy 2.4.4). Version 2.4.6 or higher is highly recommended.
If installed, must be Version 2.4.6 or higher.

* `bottleneck <http://berkeleyanalytics.com/bottleneck>`__: for accelerating certain types of ``nan``
evaluations. ``bottleneck`` uses specialized cython routines to achieve large speedups.
Expand Down
13 changes: 13 additions & 0 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2777,6 +2777,7 @@ Added support for Openpyxl >= 2.2
``'xlsxwriter'`` will produce an Excel 2007-format workbook (xlsx). If
omitted, an Excel 2007-formatted workbook is produced.


.. _io.excel.writers:

Excel writer engines
Expand Down Expand Up @@ -2823,6 +2824,18 @@ argument to ``to_excel`` and to ``ExcelWriter``. The built-in engines are:

df.to_excel('path_to_file.xlsx', sheet_name='Sheet1')

.. _io.excel.style:

Style and Formatting
''''''''''''''''''''

The look and feel of Excel worksheets created from pandas can be modified using the following parameters on the ``DataFrame``'s ``to_excel`` method.

- ``float_format`` : Format string for floating point numbers (default None)
- ``freeze_panes`` : A tuple of two integers representing the bottommost row and rightmost column to freeze. Each of these parameters is one-based, so (1, 1) will freeze the first row and first column (default None)



.. _io.clipboard:

Clipboard
Expand Down
11 changes: 6 additions & 5 deletions doc/source/r_interface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,17 @@ In the remainder of this page, a few examples of explicit conversion is given. T
Transferring R data sets into Python
------------------------------------

The ``pandas2ri.ri2py`` function retrieves an R data set and converts it to the
appropriate pandas object (most likely a DataFrame):
Once the pandas conversion is activated (``pandas2ri.activate()``), many conversions
of R to pandas objects will be done automatically. For example, to obtain the 'iris' dataset as a pandas DataFrame:

.. ipython:: python

r.data('iris')
df_iris = pandas2ri.ri2py(r['iris'])
df_iris.head()
r['iris'].head()

If the pandas conversion was not activated, the above could also be accomplished
by explicitly converting it with the ``pandas2ri.ri2py`` function
(``pandas2ri.ri2py(r['iris'])``).

Converting DataFrames into R objects
------------------------------------
Expand All @@ -65,7 +67,6 @@ DataFrames into the equivalent R object (that is, **data.frame**):
print(type(r_dataframe))
print(r_dataframe)


The DataFrame's index is stored as the ``rownames`` attribute of the
data.frame instance.

Expand Down
Loading