DRAFT: 285 rfm model #422

bdeck8317 · 2021-08-16T14:29:50Z

Meet to discuss integration into pipeline. Will need a var passed into create_scores.py which is a str(query_date)

Updates the rfm_scores table in postgres database.

Adds a number of module dependencies to requirements.txt

…bins func and writes txt file with bins, needs testing

…l LLCs from RFM score analysis based on recommendations

…abel list @c-simpsom

…ction which pulls data from tables and creates scores based on tabeled bins

…st of tuples for cleaned and matched data

…fo. Edges_dicts moved into rfm_functions.py. Updated requirements.txt to import modules necessary to run funcs. Data is uploaded to rfm_database once scores are calculated.

Contained secrets.

…peline into 285-rfm-model

…able/renaming var lines 85- 90. date_difference calculation now uses max close donation date instead of query date.

…-data-pipeline into 285-rfm-model

c-simpson · 2021-09-14T02:50:17Z

I'm getting this error running create_scores():

  File "C:\Projects\paws-data-pipeline\src\server\rfm_funcs\create_scores.py", line 73, in create_scores
    grouped_past_year['recency_score'] = pd.cut(grouped_past_year[('days_since','min')], bins= recency_bins, labels=recency_labels, include_lowest = True)
  File "C:\Python38\lib\site-packages\pandas\core\reshape\tile.py", line 262, in cut
    raise ValueError("bins must increase monotonically.")
ValueError: bins must increase monotonically.

with rfm_edges as:

{
"r": {"5": 0, "4": 262, "3": 1097, "2": 1910, "1": 2851}, 
"f": {"1": 0, "2": 1, "3": 2, "4": 3, "5": 4}, 
"m": {"1": 0.0, "2": 50.0, "3": 75.0, "4": 100.0, "5": 210.0}
}

It looks like the issue is with the 'r' set. Do the values need to be reversed wrt the keys?

bdeck8317 · 2021-09-14T18:59:37Z

@c-simpson looks like a pandas issue with handling numpy.diff. Could try updating pandas to 1.3.2

Apparently, the bug is fixed in that version as I haven't had this issue. I also ran the script using the API and it seemed okay....

I'll keep digging around on this issue, but my temporary solution is to just update pandas to 1.3.2

pandas-dev/pandas#40969

c-simpson · 2021-09-14T20:40:40Z

I'm still getting it with 1.3.2

c-simpson · 2021-09-14T23:48:21Z

We poked in this while you weren't around and noticed that

paws-data-pipeline/src/server/rfm_funcs/create_scores.py

Line 71 in 5367b6e

recency_bins.append(grouped_past_year[('days_since', 'min')].max())

adds 43 to the end of recency_bins, which causes the 'monotonic' issue. In the debugger, if I put 43 in the proper sort position, we don't get the monotonic complaint. We didn't know exactly what you were doing so we stopped there.

c-simpson · 2021-09-15T00:17:06Z

@bdeck8317 I created a branch ( 285-sort-after-append ) off yours that does nothing more than sort recency_bins after you append. It runs but only creates scores for 754 matching_ids. So I'll stop messing with things I don't understand and will let you carry on!

…ed out a bug which was appending improper max values. Now it is robust to max values which are either max for data if it is below max bin edge or will use pre-existing max bin edge

bdeck8317 · 2021-09-17T00:33:21Z

@c-simpson @sposerina ready for review and initial integration. There is one conflict. Not sure how to address it.

c-simpson · 2021-09-17T15:22:25Z

The 'append 43' is making me think about robustness:

What happens if date_differences() gets a badly-formatted date (shouldn't happen)? A future date(could) ? Can callers handle?
In create_scores(), what should happen if read_rfm_edges() returns None?
It might be cleaner to put the r,f,m processing in a separate function for each and put a try/catch in each. What to return if a failure?
If one of those fails, what should we do? We don't want to push bad data. @kfettich Any thoughts?

c-simpson · 2021-09-28T14:02:02Z

Spoke to BD (who is swamped) - I'll make the above changes this week.

kfettich · 2021-09-28T21:59:36Z

Sorry, I missed the mention! I think we discussed briefly, but my thoughts if any of the 3 failures happens:

What happens if date_differences() gets a badly-formatted date (shouldn't happen)? A future date(could) ? Can callers handle? -> future dates should be filtered out; meanwhile, a user who has donation data with badly formatted dates should display some kind of warning (either an "error" label instead of an RFM label, or something else to notify the user that there was a problem calculating the score)
In create_scores(), what should happen if read_rfm_edges() returns None? -> revert to the previous set of edges that worked, and alert dev team
It might be cleaner to put the r,f,m processing in a separate function for each and put a try/catch in each. What to return if a failure? ->return last RFM score for person, but with some indication that it is not the most recent score

If one of those fails, what should we do? We don't want to push bad data. @kfettich Any thoughts? -> see above; I don't expect there to be dramatic changes from one cycle to the other, so using outdated RFM scores from the previous cycle should be ok; doing it more than once will be problematic though

bdeck8317 added 19 commits July 15, 2021 19:49

Creating a number of functions and documentation for RFM model.

06011d2

Creating functions and documentation for RFM model

105c5bb

Creating new funcs for RFM analysis. Main script is 'donations_rfm.py'

aed1985

Adding donations,recency bins, and recency score function changes

be92b57

Adding amounts bin function and editing frequency bin func. Finished …

9b5ec1d

…bins func and writes txt file with bins, needs testing

Adding further documentation on the RFM process notebook. Removing al…

816bf3a

…l LLCs from RFM score analysis based on recommendations

Adding edge dicts func which creates a dictionary for each edge and l…

a79941c

…abel list @c-simpsom

Adding doc strings to edges_dicts

632bf5c

adding create score func and edits to edges_dicts funct

f2befb5

creating one single function call file and a single create scores fun…

782baa8

…ction which pulls data from tables and creates scores based on tabeled bins

Pulling edges from database using read_rfm_edges

f4d1843

committing syntax error for read_rfm_edges import statement.

06f0183

Adding pull_donations_for_rfm to create_scores function to pull in li…

3ff8810

…st of tuples for cleaned and matched data

Appending max score to bins so we don't lose anyone when creating scores

ccb5bb1

running it locally

a1d4616

Ready for integration

502c08c

removed a mistake adding secrets folder

87ad5b5

Moving from playground folder to server/rfm folder

2b8fcbd

Deleted html, amount_bins, edges_dicts docs. added .md doc for run in…

f770c08

…fo. Edges_dicts moved into rfm_functions.py. Updated requirements.txt to import modules necessary to run funcs. Data is uploaded to rfm_database once scores are calculated.

bdeck8317 requested review from sposerina and c-simpson August 16, 2021 14:29

Stephen Poserina and others added 9 commits August 16, 2021 17:36

populate donation data in dataframe

f55f844

Merge branch 'master' into 285-rfm-model

f94dc1a

Moved rfm_funcs up a level

8520045

Tweaks for create_scores

93b95d7

Reverting error I made in removing values()

9494472

Delete kustomization.yaml

482f7bd

Contained secrets.

Merge remote-tracking branch 'origin' into 285-rfm-model

d2675cf

Restore COMMIT in populate_rfm_mapping

41259b6

Merge branch '285-rfm-model' of github.com:CodeForPhilly/paws-data-pi…

974e595

…peline into 285-rfm-model

c-simpson and others added 2 commits August 17, 2021 20:24

Changed _id to matching_id in recency

dc74cbd

changes to create_scores, removing opp_id and adding placeholder vari…

b0b93c0

…able/renaming var lines 85- 90. date_difference calculation now uses max close donation date instead of query date.

bdeck8317 marked this pull request as draft August 24, 2021 13:55

bdeck8317 changed the title ~~285 rfm model~~ DRAFT 285 rfm model Aug 24, 2021

bdeck8317 changed the title ~~DRAFT 285 rfm model~~ DRAFT: 285 rfm model Aug 24, 2021

bdeck8317 and others added 5 commits August 24, 2021 18:43

Cleaning up RFM create_scores.py admin_api.py: Removing server

eeb27d2

Fixed response for API endpoint

7dfbcf8

Fixed typo

fc0317e

Test script which creates mock data

c0669a7

Merge branch '285-rfm-model' of https://github.com/CodeForPhilly/paws…

5367b6e

…-data-pipeline into 285-rfm-model

c-simpson and others added 3 commits September 16, 2021 18:19

Ensure recency last post greater than exisitng

0769d1f

Improving robustness of recency score creation. @c-simpson and I work…

95c130f

…ed out a bug which was appending improper max values. Now it is robust to max values which are either max for data if it is below max bin edge or will use pre-existing max bin edge

server/rfm_funcs/create_scores.py

ac25f46

bdeck8317 marked this pull request as ready for review September 17, 2021 00:31

c-simpson mentioned this pull request Oct 3, 2021

Improve robustness of RFM scoring #436

Closed

Merge branch 'master' into 285-rfm-model

87cf8bb

c-simpson merged commit 57a6426 into master Oct 3, 2021

c-simpson deleted the 285-rfm-model branch October 12, 2021 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: 285 rfm model #422

DRAFT: 285 rfm model #422

bdeck8317 commented Aug 16, 2021

c-simpson commented Sep 14, 2021

bdeck8317 commented Sep 14, 2021 •

edited

Loading

c-simpson commented Sep 14, 2021

c-simpson commented Sep 14, 2021 •

edited

Loading

c-simpson commented Sep 15, 2021 •

edited

Loading

bdeck8317 commented Sep 17, 2021

c-simpson commented Sep 17, 2021

c-simpson commented Sep 28, 2021

kfettich commented Sep 28, 2021

DRAFT: 285 rfm model #422

DRAFT: 285 rfm model #422

Conversation

bdeck8317 commented Aug 16, 2021

c-simpson commented Sep 14, 2021

bdeck8317 commented Sep 14, 2021 • edited Loading

c-simpson commented Sep 14, 2021

c-simpson commented Sep 14, 2021 • edited Loading

c-simpson commented Sep 15, 2021 • edited Loading

bdeck8317 commented Sep 17, 2021

c-simpson commented Sep 17, 2021

c-simpson commented Sep 28, 2021

kfettich commented Sep 28, 2021

bdeck8317 commented Sep 14, 2021 •

edited

Loading

c-simpson commented Sep 14, 2021 •

edited

Loading

c-simpson commented Sep 15, 2021 •

edited

Loading