Skip to content

Add index feature #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
GloriaWYY opened this issue Jul 19, 2022 · 2 comments · Fixed by #316
Closed

Add index feature #26

GloriaWYY opened this issue Jul 19, 2022 · 2 comments · Fixed by #316
Labels
1st edition Planned for inclusion in 1st print edition

Comments

@GloriaWYY
Copy link
Contributor

GloriaWYY commented Jul 19, 2022

PR #30

Due to difference in the functions used in Python and R, I need to modify several index names. e.g. tidymodels has to be changed to scikit-learn. And there are some cases when there is no one-to-one match for functions, so some index entries are combined into a single entry.

Below, I will document the index entries that I have changed for easier tracking:

classification1

  • glimpse -> info
  • DELETE: \index{factor!as_factor}, \index{levels}\index{factor!levels}
  • group_by -> groupby, summarize -> count
  • mutate -> assign
  • \index{tidymodels}\index{parsnip} -> scikit-learn
  • tidymodels!model specification -> scikit-learn; model instance
  • tidymodels!engine -> scikit-learn; KNeighborsClassifier
  • tidymodels!model formula -> scikit-learn; X & y
  • tidymodels!predict -> scikit-learn; predict
  • recipe -> pipeline
  • recipe!step_scale, recipe!step_center -> scikit-learn; StandardScaler
  • ADD: scikit-learn; ColumnTransformer
  • tidymodels!prep -> scikit-learn; fit
  • tidymodels!bake -> scikit-learn; transform
  • DELETE: \index{recipe!all_predictors}
  • recipe!step_upsample -> scikit-learn; resample
  • DELETE: tidymodels!add_recipe, tidymodels!add_model

classification 2

  • \index{seed!set.seed} -> seed; numpy.random.seed
  • \index{sample!function} -> sample; numpy.random.choice
  • tidymodels -> scikit-learn
  • \index{tidymodels!initial_split} -> scikit-learn; train_test_split
  • glimpse -> info
  • \index{tidymodels!vfold_cv}\index{cross-validation!vfold_cv} -> cross-validation; cross_validate, scikit-learn; cross_validate
  • DELETE tidymodels!fit_resamples
  • DELETE \index{tidymodels!collect_metrics}\index{cross-validation!collect_metrics}

regression 1

  • DELETE \index{seed!set.seed}
  • \index{ggplot!geom_point} -> altair; mark_circle
  • \index{slice_sample} -> pandas.DataFrame; sample
  • \index{mutate}\index{slice}\index{arrange}\index{abs} -> pandas.DataFrame; assign, head, pandas.DataFrame; sort_values, abs
  • \index{tidymodels}\index{recipe}\index{workflow} -> scikit-learn, scikit-learn; pipeline, scikit-learn; make_pipeline, scikit-learn; make_column_transformer
  • DELETE \index{cross-validation!collect_metrics} -> ADD scikit-learn; GridSearchCV

regression 2

  • tidymodels -> scikit-learn
  • \index{seed!set.seed} -> scikit-learn; random_state
  • \index{regression!multivariable linear}\index{regression!multivariable linear equation|see{plane equation}} ->
    regression; multivariable linear, regression; multivariable linear equation
    see: multivariable linear equation; plane equation

inference

  • DELETE \index{seed!set.seed}
  • \index{pull}\index{sum}\index{nrow} -> pandas.DataFrame; df[], count, len
  • \index{rep_sample_n} -> pandas.DataFrame; sample
  • DELETE \index{infer}
  • DELETE \index{rep_sample_n!reps argument}\index{rep_sample_n!size argument}
  • \index{bootstrap!in R}\index{rep_sample_n!bootstrap} -> bootstrap; in Python, scikit-learn; resample (bootstrap)
  • \index{quantile} -> numpy; percentile
  • \index{pull}\index{select} -> pandas.DataFrame; df[]

wrangling

  • ADD pandas; data frame,
  • DELETE \index{vector}\index{atomic vector|see{vector}}\index{c function}
    \index{data types}\index{character}\index{chr|see{character}}\index{integer}\index{int|see{integer}}\index{double}\index{dbl|see{double}}\index{logical}\index{lgl|see{logical}}\index{factor}\index{fct|see{factor}} -> data types, string, integer, floating point number, boolean, list, set, dictionary, tuple, none
  • class -> type
  • DELETE \index{tibble}
  • \index{pivot_longer} -> pandas.DataFrame; melt
  • \index{pivot_wider} -> pandas.DataFrame; pivot
  • \index{separate} -> pandas.Series; str.split
  • DELETE \index{select!helpers}
  • \index{select!starts_with} -> pandas.Series; str.startswith
  • \index{select!contains} -> pandas.Series; str.contains
  • \index{mutate} -> pandas.DataFrame; df[]
  • DELETE \index{pipe}\index{aaapipesymb@\vert{}>|see{pipe}}
  • ADD chaining methods
  • \index{NA|see{missing data}} -> see: NaN; missing data
  • \index{group_by} -> pandas.DataFrame; groupby
  • DELETE \index{across} \index{map} \index{map!map_* functions}
  • ADD pandas.DataFrame; apply
  • DELETE \index{rowwise}

intro

  • \index{library} -> import
  • \index{tidyverse} -> pandas
  • \index{filter}\index{select} -> pandas.DataFrame; df[], pandas.DataFrame; loc[]
  • \index{arrange}\index{slice} -> pandas.DataFrame; sort_values, pandas.DataFrame; iloc[]
  • \index{ggplot} -> altair
  • \index{aaaplussymb@$+$|see{ggplot (add layer)}} -> see: .; chaining methods
  • \index{plot!layers} -> plot; labels
  • \index{reorder} -> altair; sort
  • \index{aaaquestionmark@?|see{documentation}}\index{help|see{documentation}}\index{documentation} ->
    documentation
    see: help; documentation
    see: doc; documentation
  • tidyverse -> pandas
  • \index{warning} -> Error
  • \index{read function!skip argument} -> read function; skiprows argument
  • \index{read function!delim argument} -> read function; sep argument
  • \index{rename} -> pandas.DataFrame; rename
  • \index{read function!col_names argument} -> read function; names argument
  • DELETE \index{readxl}
  • Add SQLAlchemy, SQLAlchemy; create_engine, database; SQLAlchemy
  • \index{database!tbl} -> database; select, SQLAlchemy; select
  • \index{database!collect} -> database; fetchall, SQLAlchemy; fetchall
  • \index{database!show_query} -> database; show query, SQLAlchemy; query.compile
  • filter -> database; filter data, SQLAlchemy; where
  • \index{nrow} -> pandas.DataFrame; shape
  • \index{tail} -> pandas.DataFrame; tail
  • \index{write function!write_csv} -> write function; to_csv, pandas.DataFrame; to_csv

clustering

  • \index{seed!set.seed} -> seed; numpy.random.seed
  • DELETE \index{mutate}
  • \index{ggplot}\index{ggplot!geom_point} -> altair; altair, mark_circle
  • ADD scikit-learn; KMeans
  • DELETE \index{broom}\index{broom}\index{augment}
  • ADD K-means; inertia_, K-means; cluster_centers_, K-means; labels_, K-means; predict
  • \index{K-means!restart, nstart} -> K-means; init argument
    \index{WSSD!total} -> WSSD; total, K-means; inertia_
    see: WSSD; K-means inertia
  • mutate -> pandas.DataFrame; assign
  • DELETE \index{rowwise}\index{glance}
  • ADD pandas.DataFrame; iloc[]

viz

  • ggplot -> altair
  • \index{ggplot!aesthetic mapping}\index{ggplot!geometric object} -> altair; geometric object, altair; geometric encoding, geometric object, geometric encoding
  • DELETE \index{ggplot!aes}\index{ggplot!geom_point}
  • \index{ggplot!geom_line} -> altair; mark_line
  • \index{ggplot!xlab,ylab}\index{ggplot!theme} -> altair; alt.X, altair; alt.Y, altair; configure_axis
  • \index{ggplot!scales} -> altair; alt.Scale
  • \index{ggplot!geom_point} -> altair; mark_circle
  • filter -> pandas.DataFrame; loc[]
  • \index{ggplot!logarithmic scaling} -> altair; logarithmic scaling
  • \index{mutate}\index{select} -> pandas.DataFrame; assign, pandas.DataFrame; [[]]
  • DELETE \index{color palette}
  • \index{ggplot!reorder} -> altair; sort
  • \index{ggplot!geom_vline} -> altair; mark_rule
  • \index{factor}\index{factor!usage in ggplot} -> nominal, altair; :N
  • \index{ggplot!facet_grid} -> altair; facet
  • \index{ggplot!add layer} -> altair; +
@GloriaWYY
Copy link
Contributor Author

Changes in citations

classification 1

  • [@parsnip][@recipes] -> {cite:p}scikit-learn

classification 2

  • [@wickham2016r] -> {cite:p}mckinney2012python

regression 2

wrangling

  • DELETE [@dplyr] [@tidyselect] [@wickham2016r]
  • ADD {cite:p}mckinney2012python

intro

  • [@tidyverse; @wickham2019tidverse] -> {cite:p}reback2020pandas,mckinney-proc-scipy-2010
  • [@tidyversestyleguide] -> {cite:p}pep8-style-guide

reading

  • The content from this section on is not edited (still in R version), since Trevor said the instructor team will decide whether these will be kept or not in later stages.
  • [@wickham2016r] -> {cite:p}mckinney2012python
  • DELETE [readr documentation] [@here] [readxl documentation][@rio]
  • ADD pandas documentation

viz

  • [@ggplot] -> {cite:p}altair
  • DELETE [@wickham2016r]
  • ADD {cite:p}mckinney2012python

@trevorcampbell
Copy link
Contributor

This issue is done in PR #30 -- but I'll leave it open for now as a checklist for when we review/edit in detail.

@trevorcampbell trevorcampbell added the 1st edition Planned for inclusion in 1st print edition label Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1st edition Planned for inclusion in 1st print edition
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants