Merge pull request #502 from UBC-DSCI/response-target-outcome

trevorcampbell · web-flow · commit f9085813ccd6 · 2023-07-11T18:08:17.000-07:00
Replace "target" and "outcome" with "response"
diff --git a/source/classification1.Rmd b/source/classification1.Rmd
@@ -749,8 +749,8 @@ knn_spec
 
 In order to fit the model on the breast cancer data, we need to pass the model specification
 and the data set to the `fit` function. We also need to specify what variables to use as predictors
-and what variable to use as the target. Below, the `Class ~ Perimeter + Concavity` argument specifies 
-that `Class` is the target variable (the one we want to predict),
+and what variable to use as the response. Below, the `Class ~ Perimeter + Concavity` argument specifies 
+that `Class` is the response variable (the one we want to predict),
 and both `Perimeter` and `Concavity` are to be used as the predictors.
 
 ```{r 05-tidymodels-4}
@@ -861,7 +861,7 @@ In the `tidymodels` framework, all data preprocessing happens
 using a `recipe` from [the `recipes` R package](https://recipes.tidymodels.org/) [@recipes].
 Here we will initialize a recipe\index{recipe} \index{tidymodels!recipe|see{recipe}} for 
 the `unscaled_cancer` data above, specifying
-that the `Class` variable is the target, and all other variables are predictors:
+that the `Class` variable is the response, and all other variables are predictors:
 
 ```{r 05-scaling-2, results=FALSE, message=FALSE, echo = TRUE}
 uc_recipe <- recipe(Class ~ ., data = unscaled_cancer)
@@ -872,7 +872,7 @@ uc_recipe
 hidden_print_cli(uc_recipe)
 ```
 
-So far, there is not much in the recipe; just a statement about the number of targets
+So far, there is not much in the recipe; just a statement about the number of response variables
 and predictors. Let's add 
 scaling (`step_scale`) \index{recipe!step\_scale} and 
 centering (`step_center`) \index{recipe!step\_center} steps for 
@@ -904,7 +904,7 @@ as well as naming particular columns with the same syntax as the `select` functi
 For example:
 
 - `all_nominal()` and `all_numeric()`: specify all categorical or all numeric variables
-- `all_predictors()` and `all_outcomes()`: specify all predictor or all target variables
+- `all_predictors()` and `all_outcomes()`: specify all predictor or all response variables
 - `Area, Smoothness`: specify both the `Area` and `Smoothness` variable
 - `-Class`: specify everything except the `Class` variable
 
@@ -1324,7 +1324,7 @@ First we will load the data, create a model, and specify a recipe for how the da
 
 ```{r 05-workflow, message = FALSE, warning = FALSE}
 # load the unscaled cancer data 
-# and make sure the target Class variable is a factor
+# and make sure the response variable, Class, is a factor
 unscaled_cancer <- read_csv("data/unscaled_wdbc.csv") |>
   mutate(Class = as_factor(Class))
 
diff --git a/source/regression1.Rmd b/source/regression1.Rmd
@@ -148,7 +148,7 @@ The scientific question guides our initial exploration: the columns in the
 data that we are interested in are `sqft` (house size, in livable square feet)
 and `price` (house sale price, in US dollars (USD)).  The first step is to visualize
 the data as a scatter plot where we place the predictor variable
-(house size) on the x-axis, and we place the target/response variable that we
+(house size) on the x-axis, and we place the response variable that we
 want to predict (sale price) on the y-axis.
 \index{ggplot!geom\_point}
 \index{visualization!scatter}
@@ -687,7 +687,7 @@ As the algorithm is the same, we will not cover it again in this chapter.
 We will now demonstrate a multivariable KNN regression \index{K-nearest neighbors!multivariable regression}  analysis of the 
 Sacramento real estate \index{Sacramento real estate} data using `tidymodels`. This time we will use
 house size (measured in square feet) as well as number of bedrooms as our
-predictors, and continue to use house sale price as our outcome/target variable
+predictors, and continue to use house sale price as our response variable
 that we are trying to predict.
 It is always a good practice to do exploratory data analysis, such as
 visualizing the data, before we start modeling the data. Figure \@ref(fig:07-bedscatter)
diff --git a/source/regression2.Rmd b/source/regression2.Rmd
@@ -286,7 +286,7 @@ lm_test_results
 
 Our final model's test error as assessed by RMSPE \index{RMSPE}
 is `r format(round(lm_test_results |> filter(.metric == 'rmse') |> pull(.estimate)), big.mark=",", nsmall=0, scientific=FALSE)`. 
-Remember that this is in units of the target/response variable, and here that
+Remember that this is in units of the response variable, and here that
 is US Dollars (USD). Does this mean our model is "good" at predicting house
 sale price based off of the predictor of home size? Again, answering this is
 tricky and requires knowledge of how you intend to use the prediction.
@@ -402,13 +402,13 @@ flexible and can be quite wiggly. But there is a major interpretability advantag
 model to a straight line. A 
 straight line can be defined by two numbers, the
 vertical intercept and the slope. The intercept tells us what the prediction is when
-all of the predictors are equal to 0; and the slope tells us what unit increase in the target/response
+all of the predictors are equal to 0; and the slope tells us what unit increase in the response
 variable we predict given a unit increase in the predictor
 variable. KNN regression, as simple as it is to implement and understand, has no such
 interpretability from its wiggly line. 
 
 There can, however, also be a disadvantage to using a simple linear regression
-model in some cases, particularly when the relationship between the target and
+model in some cases, particularly when the relationship between the response and
 the predictor is not linear, but instead some other shape (e.g., curved or oscillating). In 
 these cases the prediction model from a simple linear regression
 will underfit \index{underfitting!regression} (have high bias), meaning that model/predicted values do not
@@ -889,7 +889,7 @@ predictive performance.
 
 So far in this textbook we have used regression only in the context of
 prediction. However, regression can also be seen as a method to understand and
-quantify the effects of individual variables on a response / outcome of interest.
+quantify the effects of individual predictor variables on a response variable of interest.
 In the housing example from this chapter, beyond just using past data
 to predict future sale prices, 
 we might also be interested in describing the