Skip to content

Commit 61e746b

Browse files
committed
Use np.histogram_bin_edges with NumPy >= 1.15.0
Before that function was exposed, finding the bin edges required an unnecessary calculation of the combined histogram. We now detect the NumPy version and use `np.histogram_bin_edges()` if it's available.
1 parent d74a5ff commit 61e746b

File tree

3 files changed

+28
-14
lines changed

3 files changed

+28
-14
lines changed

README.rst

+5-4
Original file line numberDiff line numberDiff line change
@@ -172,11 +172,12 @@ Limitations and Caveats
172172

173173
- ``emd_samples()``:
174174

175-
- Using the default ``bins='auto'`` results in an extra call to
176-
``np.histogram()`` to determine the bin lengths, since `the NumPy
177-
bin-selectors are not exposed in the public API
175+
- With ``numpy < 1.15.0``, using the default ``bins='auto'`` results in an
176+
extra call to ``np.histogram()`` to determine the bin lengths, since `the
177+
NumPy bin-selectors are not exposed in the public API
178178
<https://github.com/numpy/numpy/issues/10183>`_. For performance, you may
179-
want to set the bins yourself.
179+
want to set the bins yourself. If ``numpy >= 1.15`` is available,
180+
``np.histogram_bin_edges()`` is called instead, which is more efficient.
180181

181182

182183
Contributing

pyemd/emd.pyx

+16-8
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
# distutils: language = c++
44
# emd.pyx
55

6+
from pkg_resources import parse_version
7+
68
from libcpp.pair cimport pair
79
from libcpp.vector cimport vector
810
import cython
@@ -139,6 +141,16 @@ def euclidean_pairwise_distance_matrix(x):
139141
return distance_matrix.reshape(len(x), len(x))
140142

141143

144+
# Use `np.histogram_bin_edges` if available (since NumPy version 1.15.0)
145+
if parse_version(np.__version__) >= parse_version('1.15.0'):
146+
get_bins = np.histogram_bin_edges
147+
else:
148+
def get_bins(a, bins=10, **kwargs):
149+
if isinstance(bins, str):
150+
hist, bins = np.histogram(a, bins=bins, **kwargs)
151+
return bins
152+
153+
142154
def emd_samples(first_array,
143155
second_array,
144156
extra_mass_penalty=DEFAULT_EXTRA_MASS_PENALTY,
@@ -196,14 +208,10 @@ def emd_samples(first_array,
196208
if range is None:
197209
range = (min(np.min(first_array), np.min(second_array)),
198210
max(np.max(first_array), np.max(second_array)))
199-
# Use automatic binning from `np.histogram()`
200-
# TODO: Use `np.histogram_bin_edges()` when it's available;
201-
# see https://github.com/numpy/numpy/issues/10183
202-
if isinstance(bins, str):
203-
hist, _ = np.histogram(np.concatenate([first_array, second_array]),
204-
range=range,
205-
bins=bins)
206-
bins = len(hist)
211+
# Get bin edges using both arrays
212+
bins = get_bins(np.concatenate([first_array, second_array]),
213+
range=range,
214+
bins=bins)
207215
# Compute histograms
208216
first_histogram, bin_edges = np.histogram(first_array,
209217
range=range,

tox.ini

+7-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
11
[tox]
2-
envlist = py{27,34,35,36}
2+
envlist = py{27,34,35,36}-numpy{114,115}
33

44
[testenv]
5-
deps = -r{toxinidir}/test_requirements.txt
5+
deps =
6+
-r{toxinidir}/test_requirements.txt
7+
# Use NumPy < 1.14 and NumPy >= 1.15 for `get_bins()` switch
8+
numpy114: numpy<1.15
9+
numpy115: numpy>=1.15
10+
611
commands = make test
712
whitelist_externals = make

0 commit comments

Comments
 (0)