Skip to content

Commit e6f62b5

Browse files
authored
Update anonymizer.py
Fixing NA issue: "The problem are the NA entries in your dataset. Each row in your dataset has at least one NA somewhere. When you apply .groupby to NA entries, it wouldn't know how to group NAs so it removes them, leaving an empty result (length 0)." pandas-dev/pandas#23050
1 parent a7ad546 commit e6f62b5

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

anonymizer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -783,7 +783,7 @@ def identify_1st_identifier(colname):
783783
include an error rate for null values etc.
784784
"""
785785
global statistics
786-
representatives = df.groupby(colname, sort=False).size().reset_index().rename(columns={0:'count'})
786+
representatives = df.fillna(-1).groupby(colname, sort=False).size().reset_index().rename(columns={0:'count'})
787787
unique_entries = representatives.loc[representatives['count']==1]['count'].count()
788788
coverage_of_uniques = unique_entries / ( len(df.index) - df[colname].isnull().sum() )
789789

0 commit comments

Comments
 (0)