Skip to content

[Refactor] Clarify a few csv_importer types and names #949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 20, 2023

Conversation

dshemetov
Copy link
Contributor

@dshemetov dshemetov commented Aug 2, 2022

A few small refactors for clarity that I couldn't help myself in making while working on #947.

Most of the changes are purely for better readability. No functionality is altered, except for the interface of load_csv, but this should not affect actual usage.

Changes:

  • rename RowValues to CsvRowValue
  • make CsvRowValue into a dataclass
  • remove pandas optional argument from load_csv that was only used for test mocks
  • use unittest.patch to mock test instead
  • update other tests
  • update csv_to_database with a few relative imports

Further details:

I changed

# From
def load_csv(filepath, geo_type, pandas=pandas):

# To
def load_csv(filepath, geo_type):

where pandas was just an abstract argument to the Pandas library meant for mocking. Instead of complicating this function's interface, we can just use @patch("pandas.read_csv") in the tests, which automatically mocks any calls to that function with our desired mock object.

Prerequisites:

  • Unless it is a documentation hotfix it should be merged against the dev branch
  • Branch is up-to-date with the branch to be merged with, i.e. dev
  • Build is successful
  • Code is cleaned up and formatted

@dshemetov
Copy link
Contributor Author

dshemetov commented Aug 2, 2022

Problem

CI is failing. I can repro on my local machine by running the same command CI runs

docker run -i --rm --network delphi-net \
		--env "SQLALCHEMY_DATABASE_URI=mysql+mysqldb://user:pass@delphi_database_epidata:3306/epidata" \
		--env "FLASK_SECRET=abc" \
		delphi_web_python python -m pytest --import-mode importlib repos/delphi/delphi-epidata/tests/

wherein it complains about the relative imports I use

_ ERROR collecting repos/delphi/delphi-epidata/tests/acquisition/covidcast/test_csv_importer.py _
ImportError while importing test module '/usr/src/app/repos/delphi/delphi-epidata/tests/acquisition/covidcast/test_csv_importer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
repos/delphi/delphi-epidata/tests/acquisition/covidcast/test_csv_importer.py:16: in <module>
    from ....src.acquisition.covidcast.csv_importer import CsvImporter, CsvRowValue
E   ModuleNotFoundError: No module named 'repos.delphi.delphi-epidata.src'

But I didn't run into this issue locally when running the same command with a volume mount

docker run -i --rm --network delphi-net \
		--mount type=bind,source=$(CWD)repos/delphi/delphi-epidata,target=/usr/src/app/repos/delphi/delphi-epidata,readonly \
		--mount type=bind,source=$(CWD)repos/delphi/delphi-epidata/src,target=/usr/src/app/delphi/epidata,readonly \
		--env "SQLALCHEMY_DATABASE_URI=mysql+mysqldb://user:pass@delphi_database_epidata:3306/epidata" \
		--env "FLASK_SECRET=abc" \
		delphi_web_python python -m pytest --import-mode importlib repos/delphi/delphi-epidata/tests/

How does the volume mount remove this issue? 🤔 It's not changing the paths at all, only updating the contents.


Answer

UPDATE: so it appears that we're adding a new source directory with the volume mount that is usually not present.

# CI Version
> docker run -i --rm -t delphi_web_python ls repos/delphi/delphi-epidata/
Jenkinsfile  deploy.json  integrations       package.json          scripts
LICENSE      dev          labels             pyproject.toml        tasks.py
README.md    devops       mypy.ini           requirements.dev.txt  testdata
build.js     docs         package-lock.json  requirements.txt      tests

# Mount Version
> docker run -i --rm --network delphi-net \
                --mount type=bind,source=/home/dskel/Documents/Code/Delphi/delphi-epidata-dev/driver/repos/delphi/delphi-epidata,target=/usr/src/app/repos/delphi/delphi-epidata,readonly \
                --mount type=bind,source=/home/dskel/Documents/Code/Delphi/delphi-epidata-dev/driver/repos/delphi/delphi-epidata/src,target=/usr/src/app/delphi/epidata,readonly \
                -t delphi_web_python ls repos/delphi/delphi-epidata/
Jenkinsfile  dev           mypy.ini              requirements.txt  tests
LICENSE      devops        package-lock.json     scripts
README.md    docs          package.json          src
build.js     integrations  pyproject.toml        tasks.py
deploy.json  labels        requirements.dev.txt  testdata

So in the CI version, the src directory is not there, which is what my relative imports rely on.

Future Work

Can we just add that directory to our Docker image instructions? Looking at the CI file, we can see that the setup.sh script here actually moves the src directories from all the repos over. We could fix this by changing the mv command to a cp command.

I'm going to leave this alone for now. There are many ways to "fix" our directory issues there and I'm not ready to do an exhaustive review of all our options for fixing that at this point (maybe relative imports, maybe directory restructuring, maybe renaming delphi-epidata to something without a hypen, etc.).

Resolution

So I'm just going to let this go, focus on the JIT meta computation priority, and revert all the relative directory imports.

@dshemetov dshemetov force-pushed the ds/refactor-csv-importer branch from b5384d7 to 7c111f5 Compare August 2, 2022 22:28
@dshemetov dshemetov changed the title [Refactor] Refactor csv_importer [Refactor] Clarify a few csv_importer types and names Aug 2, 2022
@dshemetov dshemetov added refactor Substantial projects to make the code do the same thing, better. enhancement good first issue and removed enhancement labels Aug 23, 2022
@dshemetov dshemetov requested a review from krivard January 19, 2023 21:57
* rename RowValues to CsvRowValue
* make CsvRowValue into a dataclass
* remove argument in load_csv only used for test mocks; correctly mock test
* update tests
@dshemetov dshemetov force-pushed the ds/refactor-csv-importer branch from 9f60604 to 87220a0 Compare January 19, 2023 22:10
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gorgeous change, I'll accept the long line lengths as a necessary evil if I must

@@ -37,6 +49,7 @@ class CsvImporter:
MIN_YEAR = 2019
MAX_YEAR = 2030

# The datatypes expected by pandas.read_csv. Int64 is like float in that it can handle both numbers and nans.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏆

missing_value, missing_stderr, missing_sample_size
)
return (row_values, None)
return (CsvRowValue(geo_id, value, stderr, sample_size, missing_value, missing_stderr, missing_sample_size), None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this harder to read but I will hold my nose and accept it as the way of the future

@@ -6,8 +6,7 @@
import unittest
from unittest.mock import MagicMock

from delphi.epidata.acquisition.covidcast.csv_to_database import get_argument_parser, main, \
collect_files, upload_archive, make_handlers
from delphi.epidata.acquisition.covidcast.csv_to_database import get_argument_parser, main, collect_files, upload_archive, make_handlers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh same, I don't like it but it's fine

@dshemetov
Copy link
Contributor Author

I guess I have to admit it... I'm a long-line lover. There are dozens of us. Dozens!

@dshemetov dshemetov merged commit 54722a4 into dev Jan 20, 2023
@dshemetov dshemetov deleted the ds/refactor-csv-importer branch January 20, 2023 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue refactor Substantial projects to make the code do the same thing, better.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants