traduccion hasta antes de tokeniz

juvizueteva · juvizueteva · commit eaeb81c0c58c · 2025-03-26T04:41:43.000Z
diff --git a/lessons/01_preprocessing.ipynb b/lessons/01_preprocessing.ipynb
@@ -710,11 +710,11 @@
    "id": "7087dc0c-5fef-4f1c-8662-7cbc8a978f34",
    "metadata": {},
    "source": [
-    "### Remove Punctuation Marks\n",
+    "### Eliminar los Signos de Puntuación\n",
     "\n",
-    "Sometimes we are only interested in analyzing **alphanumeric characters** (i.e., the letters and numbers), in which case we might want to remove punctuation marks. \n",
+    "A veces solo estamos interesados en analizar **caracteres alfanuméricos** (es decir, las letras y los números), en cuyo caso podríamos querer eliminar los signos de puntuación.\n",
     "\n",
-    "The `string` module contains a list of predefined punctuation marks. Let's print them out."
+    "El módulo `string` contiene una lista de signos de puntuación predefinidos. Vamos a imprimirlos."
    ]
   },
   {
@@ -742,7 +742,7 @@
    "id": "91119c9e-431c-42cb-afea-f7e607698929",
    "metadata": {},
    "source": [
-    "In practice, to remove these punctuation characters, we can simply iterate over the text and remove characters found in the list, such as shown below in the `remove_punct` function."
+    "En la práctica, para eliminar estos caracteres de puntuación, podemos simplemente iterar sobre el texto y eliminar los caracteres que se encuentren en la lista, como se muestra a continuación en la función `remove_punct`.\n"
    ]
   },
   {
@@ -772,7 +772,7 @@
    "id": "d4fc768b-c2dd-4386-8212-483c4485e4be",
    "metadata": {},
    "source": [
-    "Let's apply the function to the example below. "
+    "Aplicamos la función al siguiente ejemplo.\n"
    ]
   },
   {
@@ -815,7 +815,7 @@
    "id": "853a4b83-f503-4405-aedd-66bbc088e3e7",
    "metadata": {},
    "source": [
-    "Let's give it a try with another tweet. What have you noticed?"
+    "Intentémoslo con otro tweet. ¿Qué has notado?\n"
    ]
   },
   {
@@ -857,7 +857,7 @@
    "id": "1af02ce5-b674-4cb4-8e08-7d7416963f9c",
    "metadata": {},
    "source": [
-    "What about the following example?"
+    "¿Qué pasa con el siguiente ejemplo?\n"
    ]
   },
   {
@@ -890,24 +890,24 @@
    "id": "62574c66-db3f-4500-9c3b-cea2f3eb2a30",
    "metadata": {},
    "source": [
-    "⚠️ **Warning:** In many cases, we want to remove punctuation marks **after** tokenization, which we will discuss in a minute. This tells us that the **order** of preprocessing is a matter of importance!"
+    "⚠️ **Advertencia:** En muchos casos, queremos eliminar los signos de puntuación **después** de la tokenización, lo cual discutiremos en un momento. ¡Esto nos dice que el **orden** del preprocesamiento es importante!\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "58c6b85e-58e7-4f56-9b4a-b60c85b394ba",
    "metadata": {},
    "source": [
-    "## 🥊 Challenge 1: Preprocessing with Multiple Steps\n",
+    "## 🥊 Desafío 1: Preprocesamiento con Múltiples Pasos\n",
     "\n",
-    "So far we've learned a few preprocessing operations, let's put them together in a function! This function would be a handy one to refer to if you happen to work with some messy English text data, and you want to preprocess it with a single function. \n",
+    "Hasta ahora hemos aprendido algunas operaciones de preprocesamiento, ¡vamos a combinarlas en una función! Esta función sería útil para referirse a ella si alguna vez trabajas con datos de texto en inglés desordenados y deseas preprocesarlos con una sola función.\n",
     "\n",
-    "The example text data for challenge 1 is shown below. Write a function to:\n",
-    "- Lowercase the text\n",
-    "- Remove punctuation marks\n",
-    "- Remove extra whitespace characters\n",
+    "Los datos de texto de ejemplo para el desafío 1 se muestran a continuación. Escribe una función que:\n",
+    "- Ponga el texto en minúsculas\n",
+    "- Elimine los signos de puntuación\n",
+    "- Elimine los caracteres de espacio en blanco extra\n",
     "\n",
-    "Feel free to recycle the codes we've used above!"
+    "¡Siéntete libre de reutilizar los códigos que hemos usado anteriormente!\n"
    ]
   },
   {
@@ -986,25 +986,25 @@
    "id": "67c159cb-8eaa-4c30-b8ff-38a712d2bb0f",
    "metadata": {},
    "source": [
-    "## Task-specific Processes\n",
+    "## Procesos Específicos para Tareas\n",
     "\n",
-    "Now that we understand common preprocessing operations, there are still a few additional operations to consider. Our text data might require further normalization depending on the language, source, and content of the data.\n",
+    "Ahora que entendemos las operaciones comunes de preprocesamiento, aún hay algunas operaciones adicionales a considerar. Nuestros datos de texto pueden requerir una mayor normalización dependiendo del idioma, la fuente y el contenido de los datos.\n",
     "\n",
-    "For example, if we are working with financial documents, we might want to standardize monetary symbols by converting them to digits. It our tweets data, there are numerous hashtags and URLs. These can be replaced with placeholders to simplify the subsequent analysis."
+    "Por ejemplo, si estamos trabajando con documentos financieros, podríamos querer estandarizar los símbolos monetarios convirtiéndolos en dígitos. En nuestros datos de tweets, hay numerosos hashtags y URLs. Estos pueden ser reemplazados por marcadores de posición para simplificar el análisis posterior.\n"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "c2936cea-74e9-40c2-bfbe-6ba8129330de",
    "metadata": {},
    "source": [
-    "### 🎬 **Demo**: Remove Hashtags and URLs \n",
+    "### 🎬 **Demostración**: Eliminar Hashtags y URLs\n",
     "\n",
-    "Although URLs, hashtags, and numbers are informative in their own right, oftentimes we don't necessarily care about the exact meaning of each of them. \n",
+    "Aunque las URLs, los hashtags y los números son informativos por derecho propio, a menudo no nos importa necesariamente el significado exacto de cada uno de ellos.\n",
     "\n",
-    "While we could remove them completely, it's often informative to know that there **exists** a URL or a hashtag. In practice, we replace individual URLs and hashtags with a \"symbol\" that preserves the fact these structures exist in the text. It's standard to just use the strings \"URL\" and \"HASHTAG.\"\n",
+    "Si bien podríamos eliminarlos por completo, a menudo es informativo saber que **existe** una URL o un hashtag. En la práctica, reemplazamos las URLs y los hashtags individuales por un \"símbolo\" que conserva el hecho de que estas estructuras existen en el texto. Es común usar simplemente las cadenas \"URL\" y \"HASHTAG\".\n",
     "\n",
-    "Since these types of text often follow a regular structure, they're an apt case for using regular expressions. Let's apply these patterns to the tweets data."
+    "Dado que estos tipos de texto suelen seguir una estructura regular, son un buen caso para usar expresiones regulares. Apliquemos estos patrones a los datos de tweets.\n"
    ]
   },
   {