Skip to content

Commit dd24f4f

Browse files
committed
minor stylistic edits
1 parent 2096673 commit dd24f4f

File tree

4 files changed

+68
-66
lines changed

4 files changed

+68
-66
lines changed

lessons/01_preprocessing.ipynb

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -362,7 +362,7 @@
362362
"source": [
363363
"### Lowercasing\n",
364364
"\n",
365-
"While we acknowledge that the **casing** of words is informative, we often don't work in contexts where we can properly utilize this information.\n",
365+
"While we acknowledge that a word's casing is informative, we often don't work in contexts where we can properly utilize this information.\n",
366366
"\n",
367367
"More often, the subsequent analysis we perform is **case-insensitive**. For instance, in frequency analysis, we want to account for various forms of the same word. Lowercasing the text data aids in this process and simplifies our analysis.\n",
368368
"\n",
@@ -435,9 +435,7 @@
435435
"\n",
436436
"Our goal in this workshop is not to provide a deep (or even shallow) dive into regex; instead, we want to expose you to them so that you are better prepared to do deep dives in the future!\n",
437437
"\n",
438-
"The following example is a poem by William Wordsworth. Like many poems, the text may contain extra line breaks (i.e., newline characters, `\\n`) that we want to remove.\n",
439-
"\n",
440-
"Let's read the data in!"
438+
"The following example is a poem by William Wordsworth. Like many poems, the text may contain extra line breaks (i.e., newline characters, `\\n`) that we want to remove."
441439
]
442440
},
443441
{
@@ -1096,7 +1094,7 @@
10961094
"\n",
10971095
"The first package we'll be using is called **Natural Language Toolkit**, or `nltk`. \n",
10981096
"\n",
1099-
"Let's install a couple modules within the package."
1097+
"Let's install a couple modules from the package."
11001098
]
11011099
},
11021100
{
@@ -1841,7 +1839,7 @@
18411839
"\n",
18421840
"In this section, we will demonstrate tokenization in **BERT** (Bidirectional Encoder Representations from Transformers), which utilizes a tokenization algorithm called [**WordPiece**](https://huggingface.co/learn/nlp-course/en/chapter6/6). \n",
18431841
"\n",
1844-
"We will load the tokenizer of BERT from the package `transformers`, which hosts a number of Transformer-based LLMs (e.g., GPT-2). We won't go into the architecture of Transformer in this workshop, but feel free to check out the D-lab workshop on [GPT Fundamentals](https://github.com/dlab-berkeley/GPT-Fundamentals)!"
1842+
"We will load the tokenizer of BERT from the package `transformers`, which hosts a number of Transformer-based LLMs (e.g., BERT). We won't go into the architecture of Transformer in this workshop, but feel free to check out the D-lab workshop on [GPT Fundamentals](https://github.com/dlab-berkeley/GPT-Fundamentals)!"
18451843
]
18461844
},
18471845
{

0 commit comments

Comments
 (0)