Skip to content

Optimize with dask #1981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 1, 2024
Merged

Conversation

aysim319
Copy link
Contributor

@aysim319 aysim319 commented Jul 1, 2024

doctor_visit_EDI_AGG_OUTPATIENT_26062024_1455CDT.log
doctor_visit_refactored_EDI_AGG_OUTPATIENT_26062024_1455CDT.log
doctor_visit_refactored_using_dask_7b6e0764_EDI_AGG_OUTPATIENT_26062024_1455CDT.log
!WORK IN PROGESS! need to make test and more optimization

Description

continuing optimizing for doctors_visit
Main ran: 569
doctor_visit_refactor: 3135
optimize_with_dask (this branch): 174

Changelog

Itemize code/test/documentation changes and files added/removed.

  • refactored update sensor and moved the csv processing into separate module
  • using datetime date params instead of string

Fixes

  • Fixes #(issue)

@aysim319 aysim319 requested a review from minhkhul July 1, 2024 14:19
@aysim319
Copy link
Contributor Author

aysim319 commented Jul 1, 2024

Need to check if the memory is big enough to be able to pass the processed dataframe in memory otherwise need to look into chunking.

-------
'''
filename = Path(filepath).name
logger.info(f"Processing {filename}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: filepath might be more helpful, as it includes input dir.

Suggested change
logger.info(f"Processing {filename}")
logger.info(f"Processing {filepath}")

def update_sensor(
filepath, startdate, enddate, dropdate, geo, parallel,
weekday, se, logger
data:pd.DataFrame, startdate:datetime, enddate:datetime, dropdate:datetime, geo:str, parallel: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 we should start doing type specification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but I think that should be a seperate ticket maybe? don't want to have more PR/confusion within already scoped out feature.

@aysim319 aysim319 force-pushed the optimize_with_dask branch 4 times, most recently from e919fda to 935b7dd Compare July 1, 2024 17:35
@aysim319 aysim319 force-pushed the optimize_with_dask branch from 935b7dd to d1ee4ce Compare July 1, 2024 17:42
@aysim319 aysim319 merged commit fc2c58d into doctor_visits_refactor_for_speed Jul 1, 2024
3 checks passed
@aysim319 aysim319 deleted the optimize_with_dask branch July 1, 2024 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants