Skip to content

Commit 15b7d47

Browse files
Romain Braultamueller
Romain Brault
authored andcommitted
relax skewness assumption (scikit-learn#7573)
relax skewedness assumption relax skewedness assumption relax skewedness assumption relax skewedness assumption relax skewedness assumption last corrections whats_new merge complying whats_new removed unnecessary _assert_X increased coverage relax skewness assumption relax skewedness assumption relax skewedness assumption relax skewedness assumption relax skewedness assumption relax skewedness assumption last corrections whats_new merge complying whats_new removed unnecessary _assert_X increased coverage remove cythonize.dat and merge simplify tests simplify tests
1 parent 55c9443 commit 15b7d47

File tree

3 files changed

+23
-5
lines changed

3 files changed

+23
-5
lines changed

doc/whats_new.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,13 @@ Enhancements
8787
by :user:`Alyssa Batula <abatula>`, :user:`Dylan Werner-Meier <unautre>`,
8888
and :user:`Stephen Hoover <stephen-hoover>`.
8989

90+
- Relax assumption on the data for the ``SkewedChi2Sampler``. Since the
91+
Skewed-Chi2 kernel is defined on the open interval :math: `(-skewedness;
92+
+\infty)^d`, the transform function should not check whether X < 0 but
93+
whether ``X < -self.skewedness``. (`#7573
94+
<https://github.com/scikit-learn/scikit-learn/pull/7573>`_) by `Romain
95+
Brault`_.
96+
9097
- The ``min_weight_fraction_leaf`` constraint in tree construction is now
9198
more efficient, taking a fast path to declare a node a leaf if its weight
9299
is less than 2 * the minimum. Note that the constructed tree will be

sklearn/kernel_approximation.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,8 @@ def transform(self, X, y=None):
185185
----------
186186
X : array-like, shape (n_samples, n_features)
187187
New data, where n_samples in the number of samples
188-
and n_features is the number of features.
188+
and n_features is the number of features. All values of X must be
189+
strictly greater than "-skewedness".
189190
190191
Returns
191192
-------
@@ -195,8 +196,9 @@ def transform(self, X, y=None):
195196

196197
X = as_float_array(X, copy=True)
197198
X = check_array(X, copy=False)
198-
if (X < 0).any():
199-
raise ValueError("X may not contain entries smaller than zero.")
199+
if (X <= -self.skewedness).any():
200+
raise ValueError("X may not contain entries smaller than"
201+
" -skewedness.")
200202

201203
X += self.skewedness
202204
np.log(X, X)

sklearn/tests/test_kernel_approximation.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,11 @@ def test_skewed_chi2_sampler():
8484

8585
# compute exact kernel
8686
c = 0.03
87+
# set on negative component but greater than c to ensure that the kernel
88+
# approximation is valid on the group (-c; +\infty) endowed with the skewed
89+
# multiplication.
90+
Y[0, 0] = -c / 2.
91+
8792
# abbreviations for easier formula
8893
X_c = (X + c)[:, np.newaxis, :]
8994
Y_c = (Y + c)[np.newaxis, :, :]
@@ -103,10 +108,14 @@ def test_skewed_chi2_sampler():
103108

104109
kernel_approx = np.dot(X_trans, Y_trans.T)
105110
assert_array_almost_equal(kernel, kernel_approx, 1)
111+
assert_true(np.isfinite(kernel).all(),
112+
'NaNs found in the Gram matrix')
113+
assert_true(np.isfinite(kernel_approx).all(),
114+
'NaNs found in the approximate Gram matrix')
106115

107-
# test error is raised on negative input
116+
# test error is raised on when inputs contains values smaller than -c
108117
Y_neg = Y.copy()
109-
Y_neg[0, 0] = -1
118+
Y_neg[0, 0] = -c * 2.
110119
assert_raises(ValueError, transform.transform, Y_neg)
111120

112121

0 commit comments

Comments
 (0)