1078 - Refactor csv_importer.py and csv_to_database.py #1103

BrainIsDead · 2023-03-01T14:52:33Z

closes 1078

csv_to_database functionality moved to csv_importer (csv_to_database deleted)
fixed imports
changed mocks
added pandas errors exceptions to pd.read_csv in the load_csv method
changed is_header_valid method with issubset method

src/acquisition/covidcast/csv_importer.py

… to `csv_importer`, fixed imports, changed mocks

krivard · 2023-03-03T15:15:49Z

NB releasing this will need to be coordinated with configuration changes to Automation and/or Cronicle in prod.

melange396

great start, but i think we need to change another test file -- since we got rid of csv_to_database.py and added its content to csv_importer.py (in src/acquisition/covidcast/), we should merge test_csv_to_database.py into test_csv_importer.py (in tests/acquisition/covidcast/)

melange396 · 2023-03-03T16:23:10Z

src/acquisition/covidcast/csv_importer.py

@@ -414,3 +426,169 @@ def load_csv(filepath: str, details: PathDetails) -> Iterator[Optional[Covidcast
        details.issue,
        details.lag,
      )
+


from this line down, all this text is a straight copy-paste from csv_to_database.py (line 15 down) with no modifications, right?

src/acquisition/covidcast/csv_importer.py

dshemetov · 2023-03-03T21:56:52Z

Want to point out that this is a partial fix of #1078. This PR can handle merging the files together and adding some Pandas exceptions, but I'm hoping that in another PR we can:

rewrite load_csv to not loop over dataframe with df.itertuples and instead rely on Pandas data type-casting to do most of our validation
all the floaty_int and maybe_apply stuff should be deleted
extract_and_check_row is a very slow and manual way of going about the same things that can be done in Pandas column-by-column;
the way extract_and_check_row return a (Value, Error) tuple is unnecessary, especially because a CSV file is archived as failed if a single row fails, so we should just detect the failure and return immediately (instead of parsing the rest of the CSV); it would be a bit challenging to reproduce the same error log as current code with this setup (the current code returns the string of the full row that failed; while the Pandas code would just give an error for a full column; we could use Pandas to find the offending row though, probably)

BrainIsDead · 2023-03-29T20:07:35Z

src/acquisition/covidcast/csv_importer.py

I implemented extract_and_check_row, validate_missing_code, validate_quantity and other validations with pandas. I found the only way to create CovidcastRows - with itertuples.
Not sure if we need to save @staticmethods it makes no sense if we not using them somewhere else

* rely on Pandas built-ins more for speed * assume Pandas data types are not strings * update tests

Refactor csv_importer.py with Pandas

sonarqubecloud · 2023-04-11T14:32:05Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
14 Code Smells

No Coverage information
0.0% Duplication

dshemetov

This should be good to go, as long as we prepare a rollback / hotfix plan in case there is a bug here undetected by our tests/integrations.

After this is merged, we should

watch the ingestion logs in Kibana (Epidata dashboard) and make sure they look healthy
get some queries ready to validate that the new data can be pulled from the database and that it looks fine

csv importer with issues

b9d8cfa

BrainIsDead commented Mar 1, 2023

View reviewed changes

src/acquisition/covidcast/csv_importer.py Outdated Show resolved Hide resolved

BrainIsDead requested review from dmytrotsko, melange396 and rzats March 1, 2023 17:01

melange396 reviewed Mar 1, 2023

View reviewed changes

src/acquisition/covidcast/csv_importer.py Outdated Show resolved Hide resolved

src/acquisition/covidcast/csv_importer.py Outdated Show resolved Hide resolved

made rollback for sanity check, csv_to_database functionality mover…

65a8da3

… to `csv_importer`, fixed imports, changed mocks

BrainIsDead changed the title ~~csv importer with issues~~ 1078 - Refactor csv_importer.py and csv_to_database.py Mar 3, 2023

BrainIsDead marked this pull request as ready for review March 3, 2023 14:48

BrainIsDead requested a review from melange396 March 3, 2023 15:01

melange396 requested changes Mar 3, 2023

View reviewed changes

test+csv_to_database moved to test_csv_imported, minor changes

73c38d6

BrainIsDead requested a review from melange396 March 7, 2023 15:04

load_csv handled with pandas

04eaada

BrainIsDead commented Mar 29, 2023

View reviewed changes

BrainIsDead requested a review from dshemetov March 29, 2023 20:08

refactor: csv_importer

5de1dae

* rely on Pandas built-ins more for speed * assume Pandas data types are not strings * update tests

dshemetov mentioned this pull request Mar 31, 2023

Refactor csv_importer.py with Pandas #1116

Merged

4 tasks

dshemetov and others added 2 commits March 30, 2023 19:50

refactor: improve a few error messages

175c900

Merge pull request #1116 from cmu-delphi/ds/csv_importer_pandas

5bd8408

Refactor csv_importer.py with Pandas

dshemetov approved these changes Apr 11, 2023

View reviewed changes

dshemetov mentioned this pull request May 27, 2023

style(black): format acquisition with black, line-length=200 #1186

Closed

4 tasks

melange396 removed the request for review from rzats March 8, 2025 06:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1078 - Refactor csv_importer.py and csv_to_database.py #1103

1078 - Refactor csv_importer.py and csv_to_database.py #1103

BrainIsDead commented Mar 1, 2023 •

edited

Loading

krivard commented Mar 3, 2023

melange396 left a comment

melange396 Mar 3, 2023

dshemetov commented Mar 3, 2023 •

edited

Loading

BrainIsDead Mar 29, 2023

sonarqubecloud bot commented Apr 11, 2023

dshemetov left a comment •

edited

Loading

1078 - Refactor csv_importer.py and csv_to_database.py #1103

Are you sure you want to change the base?

1078 - Refactor csv_importer.py and csv_to_database.py #1103

Conversation

BrainIsDead commented Mar 1, 2023 • edited Loading

krivard commented Mar 3, 2023

melange396 left a comment

Choose a reason for hiding this comment

melange396 Mar 3, 2023

Choose a reason for hiding this comment

dshemetov commented Mar 3, 2023 • edited Loading

BrainIsDead Mar 29, 2023

Choose a reason for hiding this comment

sonarqubecloud bot commented Apr 11, 2023

dshemetov left a comment • edited Loading

Choose a reason for hiding this comment

BrainIsDead commented Mar 1, 2023 •

edited

Loading

dshemetov commented Mar 3, 2023 •

edited

Loading

dshemetov left a comment •

edited

Loading