Update README.md

felipeshiwu · web-flow · commit 48c8eb0a65e2 · 2020-04-22T11:51:05.000-03:00
diff --git a/README.md b/README.md
@@ -1,9 +1,9 @@
 # Video recommender
 
 ## What is it
-This project was built from the project oriented by the Data Science course [Como criar uma solução completa de Data Science | Mario Filho](http://mariofilho.com/curso/).
+This project was built from the one guided by the Data Science course [Como criar uma solução completa de Data Science | Mario Filho](http://mariofilho.com/curso/).
 
-The objective of this project is to build a Youtube video recommender, following a certain criterion which will be used to list a series of videos that, according to a machine learning score, we consider recommended.
+The objective of this project is to create a Youtube video recommender, following a certain criterion which will be used to list a series of videos that, according to a machine learning score, we consider recommended.
 
 The data science process of this project is divided into 4 macro steps:
 - Data Scraping
@@ -27,7 +27,7 @@ The development of the project used the following main tools:
 - The project is currently hosted on cloud: Heroku
 
 ## Data Scraping
-As its name says, data science. So the first step is to collect the necessary and useful data for our proposal. What we do here is Web Scraping, a common strategy which gets the entire HTML page one by one and create a csv file with the useful features got by the Youtube HTML tags. This because we don't have a database ready to work, so web scraping was the solution found. For this project, the web scraping will be on youtube search page with keywords:
+As its name says, data science. So the first step is to collect the necessary and useful data for our proposal. What we do here is Web Scraping, a common strategy which gets the entire HTML page one by one and creates a csv file with the useful features gotten by the Youtube HTML tags. That is because we don't have a database ready to work, so web scraping was the solution found. For this project, the web scraping will be on youtube search page with keywords:
 - Machine Learning
 - Data Science
 - Kaggle
@@ -55,16 +55,16 @@ To be more specific, 1600 videos were collected and I chose 500 videos to label.
 - Videos with more then 1500 views
 - No lives
 - No conference, except TED talk
-- Videos with not very low video and audio resolution
+- Videos with a reasonable image and audio resolution
 
 The definition of the criteria depends a lot on the need and the objective of the project. In addition, well-defined labeling makes it easy to metrics validation.
 
-After the first data labeling, we use the active learning strategy, which consists of using machine learning to search for a specific group of data that we call hard decisions so that I can manually classify and optimize the learning. In this project it uses the **Random Forest** to find the videos with probability between 0.45 and 0.55. And then I did the manual labeling for the second time.
+After the first data labeling, we use the active learning strategy, which consists in using machine learning to search for a specific group of data that we call hard decisions so that I can manually classify and optimize the learning. In this project it uses the **Random Forest** to find the videos with probability between 0.45 and 0.55. And then I did the manual labeling for the second time.
 
 Now is just split the data for test and validation and also convert title to [Tf-IDF](https://pt.wikipedia.org/wiki/Tf%E2%80%93idf).
 
 ## Modeling
-We already realized that this problem is about classification, so the options we have are, among them: **Decision Tree**, **Random Forest**, **LightGBM**, **Logistic Regression** and **SVM**. After I tested each one, I decided to combine **Random Forest**, **LightGBM** and **Logistic Regression** with simple arithmetic mean. For **LightGBM** in particular, Bayesian Optimization was used to improve the parameters. And now we have trained models, and with that we can already predict the videos.
+We already realized that this problem is about classification, so the options are, among them: **Decision Tree**, **Random Forest**, **LightGBM**, **Logistic Regression** and **SVM**. After I tested each one, I decided to combine **Random Forest**, **LightGBM** and **Logistic Regression** with simple arithmetic mean. For **LightGBM** in particular, Bayesian Optimization was used to improve the parameters. Now we have trained models, and with that we can already predict the videos.
 
 Of course in this project we did a little web app to display a list of videos with the best scores. The logic is simple, we do web scraping periodically and then we apply the trained models to get the scores. The displayed list is the list sort by score.