Butterwire’s premise is that superior active returns derive from selecting fundamentally attractive stocks with some controversy attached to them (whether related to the business, its finance, its stock valuation, management, ownership, etc.), and from correctly estimating that the controversy will evolve favourably over the following quarters.
Hence Butterwire’s fundamental stock scoring is based on a 12-month alpha forecast which is the product of a z-score of relative fundamental attractiveness and a controversy level expressed in % (see Under the Hood of Butterwire’s Digital Alpha Engine). In this note we detail how we enhance this “bottom-up” signal by dynamically flexing the importance of fundamental variables according to prevailing global macro (“top-down”) signals. We show how the resulting model generates stable, coherent, value-adding outputs that are entirely explainable and immediately exploitable by (human) users, as well as providing a learned foundation for an artificial neural network overlay.
Nowcasting the global macro: global growth, monetary policy, and inflation expectations
A company financial model may have its key revenue, margin, and cash flow drivers underpinned by fundamental analysis, but above them all are (implicit and explicit) macro assumptions, whether related to future economic growth, the price and time value of money (currencies, rates, spreads), the price of energy and materials, etc. Over the years we had compiled a long list of metrics that had proved useful in shaping a macro view that would inform our equity portfolio exposures. Back in 2015, we narrowed the list down to 25 metrics, all based on daily priced, highly liquid financial assets, from the value of US dollar (trade-weighted), to the spread of BBB corporate bonds, to the price of gold in barrels of oil, to the ratio of the Kospi to Nikkei indices, etc. We used a Principal Component Analysis (PCA) of the daily log returns of these metrics and focused on the first 3 PCs (which have the advantages of capturing a large [50%] percentage of the total variance from the 25 initial variables, and of being totally uncorrelated with each other).
Reduction of 25 “Global Macro” variables into 3 “Principal Component” variables
To begin with, these 3 PC vectors have proved remarkably stable over time (e.g. applying the vectors computed in 2011 -- based on 2005-11 data, to the 2012-19 period, yields a very similar picture than one based on PCs derived from 2011-18 data). In addition, they capture independently 3 “macro” aspects of special relevance to equities (as an asset class or as individual securities), which is after all not surprising since it was our primary selection driver for the PCA input variables:
- PC axis 1 is the most important (capturing 31% of total variance). We looked at the PC loadings for an indication of a possible macro theme, in this case growth. Once smoothed by taking an exponentially weighted moving average of its daily return, we then found that PC1 leads global GDP growth expressed in current US$ by about two quarters. PCA1 can be therefore interpreted as a real-time indicator of “global growth expectations”, one that is solidly underpinned by a huge amount of trading activity across a wide array of financial assets. We called it “iGDP”, for “market-Implied global GDP growth”.
- Likewise, the macro theme for PC Axis 2 (12% of total variance) revolves around a dovish Fed stance, higher supply than demand for US$, capital inflows into Emerging Markets, etc. It can therefore be loosely associated to a lead indicator of balance sheet expansion by central banks (at least to the extent that market expectations coincide with future Fed actions!), or the relative outperformance of EM equities. We called this axis “iEMC”, for “market-Implied EM Capital inflows”.
- The theme for PC Axis 3 is one of late cycle conditions, dominated by the building of inflationary pressures (rising cost of commodities and credit) and growth fears (e.g. rising financial stress index, underperformance of US bellwether growth stocks). Outside of specific sectors (e.g. commodities), overwhelmingly equity returns positively correlate with disinflationary expectations. We called this axis “iLCI”, for “market-Implied Late Cycle Indicator”.
Forecasting fundamental stock alphas: dynamic adjustment based on global macro regime
Like with the set of 25 macro variables eventually shortlisted for their ability to triangulate “top-down” signals relevant to equities in a stable and coherent manner, we selected 25 fundamental variables to triangulate “bottom-up” signals:
- 11 associated to what we labelled “Fitness”, split into Business Fitness (7 variables, e.g. long-term EVA) and Financial Fitness (4 variables, e.g. Merton’s distance to default)
- 7 related to “Value”, split into stock valuation (4 variables, e.g. Free Cash Flow Yield) and Management Behaviour (3 variables, e.g. Change in Operating Accruals)
- 6 reflecting Momentum, both Fundamental (3 variables, e.g. Earnings Revision Index) and Technical (e.g. Net Institutional Buyers)
- 1 associated to Controversy (proxied by residual share price volatility)
To link our macro indicators to the way we assessed the fundamentals of individual stocks, we first designed a simple “macro regime” indicator to be able to label market conditions as upcycle (green) or downcycle (red). This was done using trailing information from the 3 macro PC axis (30-day MA and 1.5% EWMA), together with an indicator of equity return (trailing 3-month ACWI returns).
Up/Downcycle indicator based on iGDP, iEMC, iLCI and ACWI’s trailing 3-months returns. Regime change occurs once every 10 months on average. ACWI returns average 7.3% in upcycle and -0.3% in downcycle conditions
We then used our perfect hindsight on past market conditions to select a small sample of up and down cycle conditions that together would cover all the macro scenario permutations (3 PC axis with 2 possible states = 8 permutations). For instance in 2016 we had selected a day in the 2nd half of 2012 for a specific set of “green” conditions (following Draghi’s “whatever it takes” in July), a day in the 4th quarter of 2014 for a specific set of “red” conditions (following the oil price crash in September), etc. Then by 2018 we added a day in late 2015 (leading up to a big-sell off) and one early 2016 (textbook recovery trade). These “memory” days are available to the engine to choose from, within the set limit of memories imposed on a training set (6 upcycle + 6 downcycle).
As with the macro variables where we used PCA, we again resorted to a simple approach this time using multiple linear regressions (MLR) on cross-sectional samples of stocks (50% of stocks covered) for each of the trading day (i.e. memory) selected, using the fundamental variables as input (after a regional winsorization and normalisation), and both 3+12-month forward excess returns as targets (transformed to be normally distributed between -1 and +1, using 2 x normal distribution integral - 1). This represented 32 sets of MLRs, consisting of 2 macro regimes (up/down) x 2 return targets (3+12 months) x 4 regions (EM, EU, NA, JP) x 2 subsets (3 sub-regions for EM – East Asia, West Asia, Rest of EM), 2 mega-sectors for EU and NA – Production and Transaction vs. Consumption and Innovation stocks, 1 for Japan
Unlike the PCA approach, the engine does not use the output straight from the runs, as the regressors are first forced to comply with pre-set constraints. This has the effect of making the regression “sub-optimal” (looking backward) but more coherent, if not more robust (looking forward). These include:
- No regressor can be negative (and must contribute a minimum weight)
- No regressor can represent more than a % of the category it belongs to (e.g. Free Cash Flow Yield cannot represent more than a set % of the overall Value z-score)
- No category can exceed a % of the total (e.g. Value cannot represent more or less than a set % of the overall Fundamental z-score)
The MLR regressors are used to compute the z-score (relative fundamental attractiveness) of each individual stock, which is then multiplied by the stock’s residual volatility to yield an alpha forecast (or fundamental base score which takes the alpha forecast’s Gaussian density function x 10).
The difference in regressor weights between upcycle and downcycle conditions is highly significant. For instance, the graph below illustrates how for industrial and financial firms, value creation track record, indebtment and earnings revision come to the fore in downcycle conditions while growth metrics, insider-buying and EVA momentum recede in importance.
Relative Importance of Regressors: Downcycle Over Upcycle (Production and Transaction Stocks)
Having established a useful link between the global macro “nowcast” and a stock alpha “forecast”, we must now validate the performance of the approach. MLR runs are assessed according to their ability to generate successful investment ideas (especially 12 months out on the long side and 3 months out on the short side).
Our benchmark is that winning investment ideas must be generated with a probability that improves significantly on a random draw. Specifically, we look at the % out(under)performers and/or average excess returns within the Top (Bottom) 20% Base Scores and compare it to a (cross-sectional) random draw baseline. We tend to use % outperformers and excess returns interchangeably given their strong correlation -- 10% better odds of selecting an outperformer typically equals +2% average excess return, bearing in mind the very substantial dispersion associated with such an average.
Random Draw Baseline (Monthly from Jun-11): % Outperformers and 1Yr Excess Returns (ACWI); Example of 1Yr Excess Return Distribution (28/2/17)
New memories have been added over the years and new MLR runs performed (once or twice a year), but the approach and constraints set on the regressors have been ensuring a high correlation of regressor weights across versions, and in turn the continuity of stocks’ base scores. The result is a stable global alpha model whose overall top/bottom quintile deliver a 5% average long-short spread +/- 5% (consistent across regions, regional sectors and firm sizes).
Top/Bottom Quintile Base Scores (all regions, all firm sizes): % Outperformers and 1Yr Excess Returns
The wide dispersion of returns and small, albeit significant, edge provided by the model mean that even on a good month like Feb-17, around 15% of top (bottom) quintile stocks will under(out)-perform their regional benchmark by over 15% over the following 12 months. This percentage may double in bad months like during 4Q15.
We therefore needed to look beyond the stock alpha forecast model, to balance its stability and long-term view (which makes it compatible with a bottom-up, low turnover investment style) with a complementary, more responsive, signalling system.
Beyond stock alpha forecasts: flags, alerts, and research candidate status
It is the nature of the model (and of markets) that bad periods such as 4Q15 occur, for instance due to an abrupt change in macro regime (e.g. 4Q15), or to the overriding influence of factors viewed negatively by the model (e.g. 2H18, see Guess who’s been outperforming lately?).
To address the former, we built two complementary scores, of which only one gets selected by the engine at any given time depending on macro conditions:
- The recession-resilience score combines a stock’s base score (assuming downcycle conditions) with its sensitivity to changes in iGDP. The highest recession scores are therefore attributed to stocks with high (downcycle) base scores and low iGDP score (equivalent to the stock’s trailing beta to iGDP changes). This score is the default choice by the engine as it helps maintain active portfolios with better odds of avoiding large (relative) drawdowns should recession expectations suddenly rise.
- The recovery score gets activated when adverse macro conditions look to be bottoming out. It is attributed only to the bottom 20% performers in each region (based on trailing 12 month returns). The highest recovery scores are attributed to stocks with the best combination of high (downcycle) base score, low fitness, and low momentum scores. This score helps surface “bottom-fishing” opportunity at times of (likely) market recovery
Recession-Resilience Scores help protect performance prior to a downturn (e.g. Apr-18); Recovery Scores help boost performance after the trough of a downturn (e.g. Feb-16)
To mitigate the impact of the model getting it wrong, we combine the various bottom-up signals compiled by the engine on each individual stock (evolution of fundamental score and residual volatility, trailing residual share return/skew/kurtosis, brokers’ views, trend-following price patterns, etc.) to build a set of indicators that single out “outliers” and complement (override even) the typically slow-moving behaviour of base scores. We use a simple, programmatic, approach to surface combinations where near-term returns look likely to be highly skewed and/or highly volatile. These are:
- Green flags are attributed to any stock with top quintile rise in base score and top quintile residual returns over the past 3 months (ie. distinctively improving fundamentals and outperformance). As illustrated in the graph below for the Jan-17 to Aug-19 period, green-flagged stocks are typically 10% more likely to outperform by over 15% over the following 3 months and 10% less likely to underperform by over -15%.
- Amber flags detect outliers in terms of residual returns over the past 3 months (top decile) and valuation (bottom tercile Value scores). While an amber flag does not indicate the likely direction of future performance, it does reflect the increased likelihood of volatile near-term relative returns (ca. 30% higher chance of either +15% or -15% excess return over the following quarter).
- Red flags may occur for several reasons, the main one being a top quintile drop in base score coupled with bottom quintile residual returns over the past 3 months (another one is an extremely high Value score coupled with extremely low Momentum score). While on balance a red flag indicates a much higher chance (over 200%) of extreme underperformance over both 3 months and 12 months, it also comes with an increased likelihood of a snap-back and subsequent strong outperformance.
- Exit? points to highly (negatively) skewed future returns and offers an especially useful warning on otherwise highly ranked stocks. For instance, over the past 2 years, stocks in the Top Quintile for Base Score that received an Exit? alert were over 2x more likely to lose 15% over the following 3 months, while none ended up outperforming materially.
- Take Profit? is programmed to single out single out stocks whose strong recent performance may have overshot their fundamentals, and as such whose run looks likely to abate. Over the past 2 years, Top Quintile Base Score stocks with such an alert were 20% less likely to be strong performers over the following quarter and 10% more likely to be severe underperformers.
- Check Thesis! reflects a disconnect between the (negative) trailing residual return of a stock and the otherwise (positive) array of signals that have been produced by the engine over the period. This alert typically affects 1% of stocks and indicates the presence of a controversy that the engine is unable to capture. A Check Thesis! alert does not indicate the likely direction of future performance, but it does increase the likelihood of extreme near-term relative returns (ca. 30% higher chance of +15% or -15% out/underperformance over the following quarter).
Bottom-Up Signals of Skewed Future Returns: Green Flags, Exit? and Take Profit? Alerts
Interesting Research Candidate Status and the special case of “High Octane” stocks
- Behind the question: “Is it any interesting research candidate?” answered daily for each of the 5,300 stock snapshots available on Butterwire, comes an assessment of whether a stock looks worth spending research time on. We need a “Yes” answer on a small enough percentage of stocks to add focus and value, and a large enough percentage to ensure an adequate representation across sizes, sectors, countries and fundamental profiles. Hence 10-15% of the answers are “Yes” (interesting LONG candidate) and ca. 10% are “Potential Short”. For the rest, there always is a material proportion of future outperformers in the +/- 70% of stocks labelled “Better Odds Elsewhere”, but none carry bottom-up signals salient enough to justify their inclusion as distinctly interesting research candidates.
- There is a special case of interesting stocks whose high alpha forecasts are predicated on extremely high residual volatilities (remember that alpha = fundamental z-score x residual volatility), or extreme spreads of fundamental sub-scores (e.g. 0 for Fitness, 10 for Value and Momentum), or some other (non-alpha forecast related) provocative features (e.g. a bottom decile Brokers’ view score). These stocks typically represent 5-10% of the covered universe and are labelled as “High Octane”. On balance, High Octane stocks offer the prospects of outsized (near and long-term) excess returns but are high-maintenance stocks that consume a disproportionate amount of a portfolio’s risk budget, with typical residual volatilities of 40% vs. 25% for an average stock.
Bottom-Up Signals of Volatile Future Returns: Red/Amber Flags, Check Thesis! Alert, and High-Octane Stocks
Conclusion: on the value of learned algorithms and their role in deep learning experiments
All the above was, indeed had to be, achieved without resorting to artificial neural networks. By drawing on straightforward techniques and systematic algorithms that are relatively fast to develop, train (without overfitting issues), validate, test, put in production, refresh, control and improve, we guaranteed an accurately coded, entirely explainable, already self-learning, easy to customise, expert investment model.
In the process, we also created a suitable, ready to use, library of data, labels, parameters, memories on which to overlay Deep Learning networks, not as substitutes to the pre-existing algorithms but as add-ons, as “augmenters”. One such example is the determination of a stock’s candidate status. We are training a deep neural network to learn how to interpret the whole set of signals presented in each stock snapshot and decide when to assign it an “interesting long” or “interesting short” label.
Stock snapshots share all the (current and historical) inputs from which a deep learning network can determine the likelihood of a stock’s material future out(under)performance, and in turn whether it is an “interesting research candidate”
Even with today’s tooling it is still difficult to explain fully how a deep learning network achieves its results, but they are interpretable in the sense that its decision derives from looking at the same set of (current and historical) snapshot data available to all Butterwire users. In addition, with the use of decision trees along-side our neural network we can cast a little light onto the internal workings of the learning network and ensure that we apply a clear set of guardrails, based upon our fundamental and macro expertise, to these results.
This feature is not active in the commercial version of Butterwire as we continue to develop, train and test the performance and practicality of deploying it, but it is likely to be the first DL overlay to soon be introduced on the platform, alongside the algorithmic determination of whether a stock looks interesting.