jupytext

kernelspec

cell_metadata_filter

formats

text_representation

-all

md:myst,ipynb

extension	format_name	format_version	jupytext_version
.md	myst	0.13	1.13.8

display_name	language	name
Python 3 (ipykernel)	python	python3

Preface

This textbook aims to be an approachable introduction to the world of data science. In this book, we define data science as the process of generating insight from data through reproducible and auditable processes. If you analyze some data and give your analysis to a friend or colleague, they should be able to re-run the analysis from start to finish and get the same result you did (reproducibility). They should also be able to see and understand all the steps in the analysis, as well as the history of how the analysis developed (auditability). Creating reproducible and auditable analyses allows both you and others to easily double-check and validate your work.

At a high level, in this book, you will learn how to

identify common problems in data science, and
solve those problems with reproducible and auditable workflows.

{numref}preface-overview-fig summarizes what you will learn in each chapter of this book. Throughout, you will learn how to use the Python programming language to perform all the tasks associated with data analysis. You will spend the first four chapters learning how to use Python to load, clean, wrangle (i.e., restructure the data into a usable format) and visualize data while answering descriptive and exploratory data analysis questions. In the next six chapters, you will learn how to answer predictive, exploratory, and inferential data analysis questions with common methods in data science, including classification, regression, clustering, and estimation. In the final chapters you will learn how to combine Python code, formatted text, and images in a single coherent document with Jupyter, use version control for collaboration, and install and configure the software needed for data science on your own computer. If you are reading this book as part of a course that you are taking, the instructor may have set up all of these tools already for you; in this case, you can continue on through the book reading the chapters in order. But if you are reading this independently, you may want to jump to these last three chapters early before going on to make sure your computer is set up in such a way that you can try out the example code that we include throughout the book.

---
name: preface-overview-fig
---
Where are we going?

Each chapter in the book has an accompanying worksheet that provides exercises to help you practice the concepts you will learn. We strongly recommend that you work through the worksheet for each chapter before moving on to the next chapter. All of the worksheets are available at https://worksheets.python.datasciencebook.ca; the "Exercises" section at the end of each chapter points you to the right worksheet for that chapter. For each worksheet, you can preview a non-interactive version of the worksheet by clicking "view worksheet." To work on the exercises interactively, follow the instructions in the worksheets repository to download all worksheets, and follow the instructions for computer setup found in {numref}Chapter %s <move-to-your-own-machine>. This will ensure that the automated feedback and guidance that the worksheets provide will function as intended.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preface-text.md

preface-text.md

Preface

Files

preface-text.md

Latest commit

History

preface-text.md

File metadata and controls

Preface