Skip to content

Commit 7a83653

Browse files
razvanNickLarsenNZ
andauthored
Update docs/modules/demos/pages/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc
Co-authored-by: Nick <[email protected]>
1 parent c7d2006 commit 7a83653

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/modules/demos/pages/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ In practice, clients of Spark Connect do not need a full-blown Spark installatio
132132
== Model details
133133

134134
The job uses an implementation of the Isolation Forest {forest-algo}[algorithm] provided by the scikit-learn {scikit-lib}[library]:
135-
the model is trained and then invoked by a user-defined function (see {forest-article}[this article] for how to call the sklearn library with a pyspark UDF), all of which is run using the Spark connect executors.
135+
the model is trained and then invoked by a user-defined function (see {forest-article}[this article] for how to call the sklearn library with a pyspark UDF), all of which is run using the Spark Connect executors.
136136
This type of model attempts to isolate each data point by continually partitioning the data.
137137
Data closely packed together will require more partitions to separate data points.
138138
In contrast, any outliers will require less: the number of partitions needed for a particular data point is thus inversely proportional to the anomaly "score".

0 commit comments

Comments
 (0)