Skip to content

Commit dc57f90

Browse files
committed
Update content of Data Science tab
Follow up to numpygh-242, alternative to numpygh-262. The content rewrite and inclusion of more relevant libraries attempts to make this sound natural, sketch the breadth of the Python data science offerings, and keeps some of the tools like DVC and MLFlow that beginning to intermediate data scientists really need to learn about. It does shrink the amount of content to a more reasonable size.
1 parent 797ad7d commit dc57f90

File tree

1 file changed

+24
-44
lines changed

1 file changed

+24
-44
lines changed

layouts/partials/data-science.html

+24-44
Original file line numberDiff line numberDiff line change
@@ -9,73 +9,53 @@
99
<div>
1010
<p>
1111
NumPy lies at the core of a rich ecosystem of data science libraries.
12-
</p>
13-
<p>
14-
Data science is the analysis of massive amounts of data
15-
to gain insight. A typical workflow might be:
12+
A typical exploratory data science workflow might look like:
1613

1714
<ul class="content-tab">
18-
<li><b>Extract, Transform, Load (ETL):</b>
15+
<li><b>Extract, Transform, Load: </b>
1916
<a href="https://pandas.pydata.org">Pandas</a>,
20-
<a href="https://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a>,
21-
<a href="https://intake.readthedocs.io/en/latest/"> Intake</a>
17+
<a href="https://intake.readthedocs.io"> Intake</a>,
18+
<a href="https://pyjanitor.readthedocs.io/">PyJanitor</a>
2219
</li>
2320

24-
<li><b>Explore:</b>
21+
<li><b>Exploratory analysis: </b>
22+
<a href="https://jupyter.org">Jupyter</a>,
2523
<a href="https://seaborn.pydata.org"> Seaborn</a>,
26-
<a href="https://matplotlib.org">Matplotlib</a>,
24+
<a href="https://matplotlib.org"> Matplotlib</a>,
25+
<a href="https://altair-viz.github.io"> Altair</a>
2726

2827
</li>
2928

30-
<li><b>Model:</b>
29+
<li><b>Model and evaluate: </b>
3130
<a href="https://scikit-learn.org">scikit-learn</a>,
32-
<a href="https://www.scipy.org">SciPy</a>,
33-
<a href="https://www.statsmodels.org/stable/index.html"> statsmodels</a>.
31+
<a href="https://www.statsmodels.org/stable/index.html"> statsmodels</a>,
32+
<a href="https://docs.pymc.io"> PyMC3</a>,
33+
<a href="https://spacy.io"> spaCy</a>
3434
</li>
3535

36-
<li><b>Evaluate:</b>
37-
NumPy,
38-
<a href="https://www.tensorflow.org">TensorFlow</a>
39-
</li>
40-
41-
<li>
42-
<b>Display:</b>
43-
<a href="./index.html/#tab-visual"> Data Visualization Tools</a>
36+
<li><b>Report in a dashboard: </b>
37+
<a href="https://plotly.com/dash">Dash</a>,
38+
<a href="https://panel.holoviz.org"> Panel</a>,
39+
<a href="https://github.com/voila-dashboards/voila"> Voila</a>
4440
</li>
4541
</ul>
4642
</p>
4743
</div>
4844
</div>
4945
<div class="grid-container">
5046
<div>
51-
<p>
52-
<a href="https://pandas.pydata.org">Pandas </a>helps in data discovery and handling,
53-
<a href="https://intake.readthedocs.io/en/latest/"> Intake</a> helps with
54-
data access and distribution, while
55-
<a href="https://www.crummy.com/software/BeautifulSoup/">Beautiful Soup</a>
56-
is widely used for web-scraping and gathering data sets.
57-
<a href="https://seaborn.pydata.org"> Seaborn</a> is well known for
58-
<a href="https://towardsdatascience.com/how-to-perform-exploratory-data-analysis-with-seaborn-97e3413e841d">exploratory data analysis (EDA)</a>;
59-
<a href="https://scikit-learn.org">scikit-learn</a> and
60-
<a href="https://www.scipy.org">SciPy</a> (statistical computing) serve some
61-
of the backbone processes required for machine learning (regression methods,
62-
classification, clustering, model validation and selection).
63-
Statistical data exploration, estimation of various statistical models,
64-
and conducting statistical tests are some of the functions offered by
65-
<a href="https://www.statsmodels.org/stable/index.html"> statsmodels</a>.
47+
<p></p><p>
48+
When data volume increases, one may need <a href="https://dask.org">Dask</a> or
49+
<a href="https://ray.io/">Ray</a>.
50+
Effective and reproducible data science in a production environment requires
51+
yet more tools, such as <a href="https://dvc.org"> DVC</a> for data versioning,
52+
<a href="https://mlflow.org">MLFlow</a> for experiment tracking, and
53+
<a href="https://airflow.apache.org">Airflow</a> or
54+
<a href="https://www.prefect.io">Prefect</a> for workflow automation.
6655
</p>
6756
</div>
6857
<div>
6958
<img src="images/content_images/data-science.png" alt="Diagram of three overlapping circle. The circles labeled 'Mathematics', 'Computer Science' and 'Domain Expertise'. In the middle of the diagram, which has the three circles overlapping it, is an area labeled 'Data Science'." align="centre" width="75%">
7059
</div>
7160
</div>
72-
<p>
73-
Effective data analytics requires deep knowledge of the data domain (e.g.,
74-
retail, healthcare, marketing, finance, social media, automation, sales, travel,
75-
etc.) as well as other core disciplines of data science, data engineering, and
76-
data visualization. Tools such as <a href="https://mlflow.org">MLFlow</a> address
77-
experiment hyperparameter and result tracking needs, while
78-
<a href="https://dvc.org"> DVC</a> provides data version control for data science
79-
and machine learning workflows.
80-
</p>
8161
</li>

0 commit comments

Comments
 (0)