Skip to content

Commit 7db7a26

Browse files
authored
Update README.md
1 parent 67051e1 commit 7db7a26

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,10 @@ The development of the project used the following main tools:
2727
- The project is currently hosted on cloud: Heroku
2828

2929
## Data Scraping
30-
As its name says, data science. So the first step is to collect the necessary and useful data for our proposal. What we do here is Web Scraping, a common strategy which gets the entire HTML page one by one and create a csv file with the useful features got by the Youtube HTML tags. This because we don't have a database ready to work, so web scraping was the solution found.
30+
As its name says, data science. So the first step is to collect the necessary and useful data for our proposal. What we do here is Web Scraping, a common strategy which gets the entire HTML page one by one and create a csv file with the useful features got by the Youtube HTML tags. This because we don't have a database ready to work, so web scraping was the solution found. For this project, the web scraping will be on youtube search page with keywords:
31+
- Machine Learning
32+
- Data Science
33+
- Kaggle
3134

3235
## Data Cleaning
3336
The point here is, when we get the entire HTML page, we also get a lot os useless information. To clean this data, what we use is BS4 to parse the HTML and search which tag/class have useful values for the main problem. In the end, what we choose to keep from all this information got are:

0 commit comments

Comments
 (0)