Update anonymizer.py

jaSunny · web-flow · commit e6f62b5ab3c2 · 2019-01-04T20:04:16.000+01:00
Fixing NA issue: "The problem are the NA entries in your dataset. Each row in your dataset has at least one NA somewhere. When you apply .groupby to NA entries, it wouldn't know how to group NAs so it removes them, leaving an empty result (length 0)." pandas-dev/pandas#23050
diff --git a/anonymizer.py b/anonymizer.py
@@ -783,7 +783,7 @@ def identify_1st_identifier(colname):
     include an error rate for null values etc.
     """
     global statistics
-    representatives = df.groupby(colname, sort=False).size().reset_index().rename(columns={0:'count'})
+    representatives = df.fillna(-1).groupby(colname, sort=False).size().reset_index().rename(columns={0:'count'})
     unique_entries = representatives.loc[representatives['count']==1]['count'].count()
     coverage_of_uniques = unique_entries / ( len(df.index) - df[colname].isnull().sum() )