Skip to content

Commit b6a0f4b

Browse files
Chapter 6 production polish (#86)
* starting work on ch5+6; categorical type change; remove commented out R code * value counts, class name remap, replace in ch5 * remove warnings * polished ch5+6 up to euclidean dist * minor bugfix * minor bugfix * fixed worksheets link at end of chp * fix minor section heading wording in Ch1 * added nsmallest + note; better chaining for dist comps; removed comments; fixed colors (not working yet) * initial fit and predict polished; model spec -> model object * polishing preprocessing * balancing polished * pipelines * learning objs * mute warnings in ch5 * warn mute code; fixed links at end * restore cls2 to main branch * remove caption hack; minor fix to learning objs * Remove caption hack * initial improved seed explanation * random seed section polish done * polished ch6 up to tuning * initial cross val example done * in python -> in scikit * working on cross-val * polished ch6 up to predictor selection * commented out predictor selection * done ch6 except final under/overfit plot * warnings filter in ch6; remove seed hack cell * remove reference to random state in train/test split * minor typesetting .method() vs method * put setup.md back in to fix broken links * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * values -> to_numpy in randomness section * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * remove code for area plot at the end of ch6 Co-authored-by: Joel Ostblom <[email protected]>
1 parent 220f8d9 commit b6a0f4b

File tree

4 files changed

+415
-601
lines changed

4 files changed

+415
-601
lines changed

source/_toc.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ parts:
99
- file: acknowledgements-python.md
1010
- file: authors.md
1111
- file: editors.md
12-
#- file: setup.md
12+
- file: setup.md
1313
- caption: Chapters
1414
numbered: 3
1515
chapters:

source/classification1.md

+1-4
Original file line numberDiff line numberDiff line change
@@ -942,7 +942,6 @@ we will discuss how to choose $K$ in the next chapter.
942942
> which weigh each neighbor's vote differently, can be found on
943943
> [the `scikit-learn` website](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html?highlight=kneighborsclassifier#sklearn.neighbors.KNeighborsClassifier).
944944
945-
946945
```{code-cell} ipython3
947946
knn = KNeighborsClassifier(n_neighbors=5)
948947
knn
@@ -1048,7 +1047,6 @@ unscaled_cancer['Class'] = unscaled_cancer['Class'].replace({
10481047
'B' : 'Benign'
10491048
}).astype('category')
10501049
unscaled_cancer
1051-
unscaled_cancer
10521050
```
10531051

10541052
Looking at the unscaled and uncentered data above, you can see that the differences
@@ -1146,7 +1144,7 @@ is to *drop* the remaining columns. This default behavior works well with the re
11461144
in the {ref}`08:puttingittogetherworkflow` section), but for visualizing the result of preprocessing it can be useful to keep the other columns
11471145
in our original data frame, such as the `Class` variable here.
11481146
To keep other columns, we need to set the `remainder` argument to `'passthrough'` in the `make_column_transformer` function.
1149-
Furthermore, you can see that the new column names---{glue:}`scaled-cancer-column-0`
1147+
Furthermore, you can see that the new column names---{glue:}`scaled-cancer-column-0`
11501148
and {glue:}`scaled-cancer-column-1`---include the name
11511149
of the preprocessing step separated by underscores. This default behavior is useful in `sklearn` because we sometimes want to apply
11521150
multiple different preprocessing steps to the same columns; but again, for visualization it can be useful to preserve
@@ -1742,7 +1740,6 @@ unscaled_cancer['Class'] = unscaled_cancer['Class'].replace({
17421740
}).astype('category')
17431741
unscaled_cancer
17441742
1745-
17461743
# create the KNN model
17471744
knn = KNeighborsClassifier(n_neighbors=7)
17481745

0 commit comments

Comments
 (0)