-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
ENH: Extract strategy performance method compute_stats #281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 13 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
3f1b1a4
ENH: Extract methods _compute_stats and _compute_drawdown_duration_pe…
crazy25000 f99ce5f
ENH: Move compute stats methods to new file, _stats.py
crazy25000 d302b13
TST: Update unit tests for compute_drawdown_duration_peaks
crazy25000 691f39f
TST: Remove ignore type for CI test failure
crazy25000 bd1075b
REF: Remove broker dependency from compute_stats, update sharpe ratio…
crazy25000 3d3164d
REF: Update self._results to account for compute_stats change, fix typo
crazy25000 eb80044
Update backtesting/backtesting.py
crazy25000 dc0b136
Update backtesting/backtesting.py
crazy25000 5cefccc
REF: Add risk_free_rate to Sortino Ratio
crazy25000 edf812d
ENH: Add compute_stats to lib, provide public method
crazy25000 68da66e
REF: Extract params to reduce line length
crazy25000 bc83e31
REF: Use strategy broker to calculate equity
crazy25000 5b47fd5
REF: Use example from test
crazy25000 c3e62ac
Update, make more idempotent, add doc, test
kernc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,142 @@ | ||
from typing import List | ||
|
||
import numpy as np | ||
import pandas as pd | ||
|
||
from ._util import _data_period | ||
|
||
|
||
def compute_drawdown_duration_peaks(dd: pd.Series): | ||
iloc = np.unique(np.r_[(dd == 0).values.nonzero()[0], len(dd) - 1]) | ||
iloc = pd.Series(iloc, index=dd.index[iloc]) | ||
df = iloc.to_frame('iloc').assign(prev=iloc.shift()) | ||
df = df[df['iloc'] > df['prev'] + 1].astype(int) | ||
|
||
# If no drawdown since no trade, avoid below for pandas sake and return nan series | ||
if not len(df): | ||
return (dd.replace(0, np.nan),) * 2 | ||
|
||
df['duration'] = df['iloc'].map(dd.index.__getitem__) - df['prev'].map(dd.index.__getitem__) | ||
df['peak_dd'] = df.apply(lambda row: dd.iloc[row['prev']:row['iloc'] + 1].max(), axis=1) | ||
df = df.reindex(dd.index) | ||
return df['duration'], df['peak_dd'] | ||
|
||
|
||
def geometric_mean(returns: pd.Series) -> float: | ||
returns = returns.fillna(0) + 1 | ||
if np.any(returns <= 0): | ||
return 0 | ||
|
||
return np.exp(np.log(returns).sum() / (len(returns) or np.nan)) - 1 | ||
|
||
|
||
def compute_stats( | ||
trades: List[pd.DataFrame], | ||
equity: np.ndarray, | ||
ohlc_data: pd.DataFrame, | ||
risk_free_rate: float = 0) -> pd.Series: | ||
|
||
index = ohlc_data.index | ||
dd = 1 - equity / np.maximum.accumulate(equity) | ||
dd_dur, dd_peaks = compute_drawdown_duration_peaks(pd.Series(dd, index=index)) | ||
|
||
equity_df = pd.DataFrame({ | ||
'Equity': equity, | ||
'DrawdownPct': dd, | ||
'DrawdownDuration': dd_dur}, | ||
index=index) | ||
|
||
trades_df = pd.DataFrame({ | ||
'Size': [t.size for t in trades], | ||
'EntryBar': [t.entry_bar for t in trades], | ||
'ExitBar': [t.exit_bar for t in trades], | ||
'EntryPrice': [t.entry_price for t in trades], | ||
'ExitPrice': [t.exit_price for t in trades], | ||
'PnL': [t.pl for t in trades], | ||
'ReturnPct': [t.pl_pct for t in trades], | ||
'EntryTime': [t.entry_time for t in trades], | ||
'ExitTime': [t.exit_time for t in trades], | ||
}) | ||
trades_df['Duration'] = trades_df['ExitTime'] - trades_df['EntryTime'] | ||
|
||
pl = trades_df['PnL'] | ||
returns = trades_df['ReturnPct'] | ||
durations = trades_df['Duration'] | ||
|
||
def _round_timedelta(value, _period=_data_period(index)): | ||
if not isinstance(value, pd.Timedelta): | ||
return value | ||
resolution = getattr(_period, 'resolution_string', None) or _period.resolution | ||
return value.ceil(resolution) | ||
|
||
s = pd.Series(dtype=object) | ||
s.loc['Start'] = index[0] | ||
s.loc['End'] = index[-1] | ||
s.loc['Duration'] = s.End - s.Start | ||
|
||
have_position = np.repeat(0, len(index)) | ||
for t in trades: | ||
have_position[t.entry_bar:t.exit_bar + 1] = 1 | ||
|
||
s.loc['Exposure Time [%]'] = have_position.mean() * 100 # In "n bars" time, not index time | ||
s.loc['Equity Final [$]'] = equity[-1] | ||
s.loc['Equity Peak [$]'] = equity.max() | ||
s.loc['Return [%]'] = (equity[-1] - equity[0]) / equity[0] * 100 | ||
c = ohlc_data.Close.values | ||
s.loc['Buy & Hold Return [%]'] = (c[-1] - c[0]) / c[0] * 100 # long-only return | ||
|
||
gmean_day_return: float = 0 | ||
day_returns = np.array(np.nan) | ||
annual_trading_days = np.nan | ||
if isinstance(index, pd.DatetimeIndex): | ||
day_returns = equity_df['Equity'].resample('D').last().dropna().pct_change() | ||
gmean_day_return = geometric_mean(day_returns) | ||
annual_trading_days = float( | ||
365 if index.dayofweek.to_series().between(5, 6).mean() > 2/7 * .6 else | ||
252) | ||
|
||
# Annualized return and risk metrics are computed based on the (mostly correct) | ||
# assumption that the returns are compounded. See: https://dx.doi.org/10.2139/ssrn.3054517 | ||
# Our annualized return matches `empyrical.annual_return(day_returns)` whereas | ||
# our risk doesn't; they use the simpler approach below. | ||
annualized_return = (1 + gmean_day_return)**annual_trading_days - 1 | ||
s.loc['Return (Ann.) [%]'] = annualized_return * 100 | ||
s.loc['Volatility (Ann.) [%]'] = np.sqrt((day_returns.var(ddof=int(bool(day_returns.shape))) + (1 + gmean_day_return)**2)**annual_trading_days - (1 + gmean_day_return)**(2*annual_trading_days)) * 100 # noqa: E501 | ||
# s.loc['Return (Ann.) [%]'] = gmean_day_return * annual_trading_days * 100 | ||
# s.loc['Risk (Ann.) [%]'] = day_returns.std(ddof=1) * np.sqrt(annual_trading_days) * 100 | ||
crazy25000 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Our Sharpe mismatches `empyrical.sharpe_ratio()` because they use arithmetic mean return | ||
# and simple standard deviation | ||
s.loc['Sharpe Ratio'] = np.clip((s.loc['Return (Ann.) [%]'] - risk_free_rate) / (s.loc['Volatility (Ann.) [%]'] or np.nan), 0, np.inf) # noqa: E501 | ||
# Our Sortino mismatches `empyrical.sortino_ratio()` because they use arithmetic mean return | ||
s.loc['Sortino Ratio'] = np.clip((annualized_return - risk_free_rate) / (np.sqrt(np.mean(day_returns.clip(-np.inf, 0)**2)) * np.sqrt(annual_trading_days)), 0, np.inf) # noqa: E501 | ||
max_dd = -np.nan_to_num(dd.max()) | ||
s.loc['Calmar Ratio'] = np.clip(annualized_return / (-max_dd or np.nan), 0, np.inf) | ||
s.loc['Max. Drawdown [%]'] = max_dd * 100 | ||
s.loc['Avg. Drawdown [%]'] = -dd_peaks.mean() * 100 | ||
s.loc['Max. Drawdown Duration'] = _round_timedelta(dd_dur.max()) | ||
s.loc['Avg. Drawdown Duration'] = _round_timedelta(dd_dur.mean()) | ||
s.loc['# Trades'] = n_trades = len(trades) | ||
s.loc['Win Rate [%]'] = np.nan if not n_trades else (pl > 0).sum() / n_trades * 100 # noqa: E501 | ||
s.loc['Best Trade [%]'] = returns.max() * 100 | ||
s.loc['Worst Trade [%]'] = returns.min() * 100 | ||
mean_return = geometric_mean(returns) | ||
s.loc['Avg. Trade [%]'] = mean_return * 100 | ||
s.loc['Max. Trade Duration'] = _round_timedelta(durations.max()) | ||
s.loc['Avg. Trade Duration'] = _round_timedelta(durations.mean()) | ||
s.loc['Profit Factor'] = returns[returns > 0].sum() / (abs(returns[returns < 0].sum()) or np.nan) # noqa: E501 | ||
s.loc['Expectancy [%]'] = returns.mean() * 100 | ||
s.loc['SQN'] = np.sqrt(n_trades) * pl.mean() / (pl.std() or np.nan) | ||
|
||
s.loc['_equity_curve'] = equity_df | ||
s.loc['_trades'] = trades_df | ||
|
||
s = _Stats(s) | ||
return s | ||
|
||
|
||
class _Stats(pd.Series): | ||
def __repr__(self): | ||
# Prevent expansion due to _equity and _trades dfs | ||
with pd.option_context('max_colwidth', 20): | ||
return super().__repr__() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,13 +15,14 @@ | |
from itertools import compress | ||
from numbers import Number | ||
from inspect import currentframe | ||
from typing import Sequence, Optional, Union, Callable | ||
from typing import Sequence, Optional, Union, Callable, List | ||
|
||
import numpy as np | ||
import pandas as pd | ||
|
||
from .backtesting import Strategy | ||
from ._plotting import plot_heatmaps as _plot_heatmaps | ||
from ._stats import compute_stats as _compute_stats | ||
from ._util import _Array, _as_str | ||
|
||
__pdoc__ = {} | ||
|
@@ -77,6 +78,23 @@ def barssince(condition: Sequence[bool], default=np.inf) -> int: | |
return next(compress(range(len(condition)), reversed(condition)), default) | ||
|
||
|
||
def compute_stats( | ||
trades: List[pd.DataFrame], | ||
equity: np.ndarray, | ||
ohlc_data: pd.DataFrame, | ||
risk_free_rate: float = 0) -> pd.Series: | ||
# TODO: Add details | ||
""" | ||
Computes strategy performance metrics. | ||
|
||
>>> broker = stats._strategy._broker | ||
>>> equity = pd.Series(broker._equity).bfill().fillna(broker._cash).values | ||
>>> perf = compute_stats(trades=broker.closed_trades, equity=equity, ohlc_data=GOOG) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kernc passing |
||
""" | ||
|
||
return _compute_stats(trades, equity, ohlc_data, risk_free_rate) | ||
|
||
|
||
def cross(series1: Sequence, series2: Sequence) -> bool: | ||
""" | ||
Return `True` if `series1` and `series2` just crossed (either | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. Now let's expose+document this function publicly in
backtesting.lib
. The user will be able to passtrades: pd.DataFrame
as obtained by this exact method (stats._trades
), so you have to figure out a way for that.