-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: ASV categoricals benchmark #18465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean-up! Left a few comments
asv_bench/benchmarks/categoricals.py
Outdated
try: | ||
from pandas.types.concat import union_categoricals | ||
except ImportError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you leave this try .. except in place? It is meant to be able to run the benchmarks with different older versions
asv_bench/benchmarks/categoricals.py
Outdated
np.random.seed(2718281) | ||
arr = ['s%04d' % i for i in np.random.randint(0, n // 10, size=n)] | ||
self.ts = Series(arr).astype('category') | ||
self.ts = pd.Series(arr).astype('category') | ||
self.dropna = dropna |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed I think? You can directly use the arg dropna
in the time method
asv_bench/benchmarks/categoricals.py
Outdated
self.dropna = dropna | ||
|
||
def time_value_counts(self, dropna): | ||
self.ts.value_counts(dropna=self.dropna) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.dropna -> dropna
asv_bench/benchmarks/categoricals.py
Outdated
|
||
def time_union(self): | ||
union_categoricals([self.a, self.b]) | ||
|
||
def time_constructor_regular(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we wan to remove the _constructor
part in the time names? As you already have it in the class name (now you have 'constructor' twice and making the name long)
Codecov Report
@@ Coverage Diff @@
## master #18465 +/- ##
==========================================
- Coverage 91.34% 91.32% -0.02%
==========================================
Files 163 163
Lines 49717 49717
==========================================
- Hits 45413 45404 -9
- Misses 4304 4313 +9
Continue to review full report at Codecov.
|
More splitting in classes & flake 8
b73774c
to
3fcf30f
Compare
Kept the removed imports, changed the redundant method names, and changed |
thanks! |
Split classes from
Categoricals/2/3
-->Concat
,ValueCounts
,Rank
,Repr
,SetCategories
, andConstructor
Utilized
params
andparam_names
forValueCounts
Added
np.random.seed(1234)
in setup classes where random data is created xref BENCH: put in np.random.seed on vbenches #8144Ran flake8 and replaced star imports