Andrew Yimou Li: All right. Yes. This is this is the last talk of today. I'm going to talk about machine learning. I guess after you hear what you're about to hear, you realize that we're we're all going to lose our jobs to machines and bots, so we better not hear it, I guess. Just kidding. So here is an actual summary of what I will talk about. I will discuss some opportunities and challenges we find as we think about machine learning for finance. I will share three tangible examples where we think machine learning can add value to the process of making stock investment decisions. And lastly, I'll focus on what we think is a key barrier to adoption that is the black box problem. How can we understand machine learning model predictions and how we use the model fingerprint tool that we propose to help bring more transparency into the machine learning process? Now, first of all, what does machine learning brings to the table when compared with traditional techniques? Machines give us scale and speed. Machine learning models. They can process large amount of data in a very short period of time. We can build machine learning models to read documents, listen to earnings calls, analyze satellite images, you know, hundreds, hundreds of thousands of them in very fast, thereby giving us timely insights into the financial market. And speaking of having machines, reading documents and analyzing images, that's an area called alternative data or unstructured data. You know, market moving information is not limited to what's already covered by traditional financial data sources and machine learning can give us an information edge in terms of understanding data that's coming from unconventional sources.
Another attraction of applying machine learning to financial markets is the promise of discovering relationships and dynamics that are not already specified or even known by practitioners. Machine learning models are very effective in capturing non-linear interaction effects from the data, and that will give us a modeling edge. But. But financial markets are different from other domains where machine learning has made tremendous strides and therefore financial markets also face a unique set of challenges. When we are thinking about applying machine learning to investment decision making. First of all, the amount of financial data is is relatively small, if you think about it, even for the US market, which has the longest and most reliable data for any given asset, we can probably go back at most 100 years and that will give us 1200 true monthly observations. And that number of observations is probably just a rounding error in other domains where the number of data points is in the magnitude of millions or billions. So, you know, with with electronic systems and more digitization, we are creating data every millisecond so that problem will get mitigated in the future. But overall, the amount of financial data is still relatively small. Next, financial markets are not stationary, so the dynamics and rules that our machine learning models are asked to predict, they change over time.
And that causes into the question of the validity of using historical data to train those machine learning models. And again, historical data series are not long enough to to begin with. Lastly, the black box problem, right? Financial markets investing is different in the sense that when dollars are on the line, there is a huge demand for transparency, huge demand for understanding of model behavior. And that's not a very straightforward task to tackle. Now, if we can characterize the process from raw data to investment decisions into as a two step process, first, from raw data to structured signals, then second from structured signals to investment decisions. In this context, stock return predictions where do we see machine learning adding value in this, in this process? And we can get a little bit more specific. And let's start from the first step from raw data to signals there. I think the key for success is the ability to generate timely, differentiated and accurate signals. And there is what I think at least where alternative data can create a lot of opportunities by gaining an information edge and creating faster and better information. Inputs. I'll share two text based machine learning examples. The first one great product. I use it a lot in my predictive modeling media stats sentiment score. I think you've already heard a couple of times about media stats. Romney just repeated a media narrative ten times, so you should 100% remember narrative media. So I won't dwell too much on this slide, but there is obviously obviously a machine learning angle, right? We humans are unfortunately extremely limited in terms of how many newspaper articles or digital media articles we can read in say one minute, but we can train machine learning models to learn to to read thousands of them very fast.
So media stats, they can leverage natural language processing algorithms to provide near real time pause of market sentiment, for example. And if you are interested in this, there's another breakout. We have more content on this. There's a breakout session tomorrow morning featuring experts in this area. One of our academic partners and my colleague Gianna will talk more about the new frontier of AI and Llms. Now a second text based machine learning example. Identifying peer stocks could be meaningful for a number of equity strategies. For example, industry momentum is an effect that has been documented in a number of articles since, I think the 1999 paper published by Moskowitz et al. However, on the other hand, the evolvement of business models and the emergence of new business business models are taking place at an ever increasing pace. So it's becoming more and more challenging for traditional industry classification schemes to keep up with this change in business landscape. Now what we did, we did an experiment by creating a natural language processing model to read company 10-K filings in those Form 10-K s. You will find required section called Item one. Each company is asked to describe their business in item one.
So we have a we developed an NLP algorithm to read those company descriptions. The algorithm is called Doc two Vec. What it does is to transform those textual descriptions, the company descriptions into mathematical vectors in the high dimensional space. And what's cool about this technique is that the semantic information embedded in those descriptions will be preserved and represented as spatial relationships in the vectors that live in this high dimensional space. What does that mean? Well, descriptions that are similar to each other, that have similar meanings, you will see them being placed close to each other in the high dimensional space, and that will allow us to apply another layer of machine learning algorithm called clustering by identifying groups of company vectors that are similar to each other. So this clustering algorithm algorithm, what it will achieve, it will basically identify groups that have maximum intra group similarities and lowest intergroup similarities. So with those machine learning models, we'll be able to get a machine driven company clustering mechanism that is dynamic and purely data driven. Now, we tested a pretty simple cross momentum signal based on that machine driven clusters. So what we did is for each company in the S&P 500 universe, we identify or we take the difference between the momentum of machine identified peers and the momentum of the identified peers. So we take that difference. We use that as a score to rank stocks in the cross section. We form a long short portfolio and rebalance it every month.
Now we have we have a few ideas to to refine this this test and research. For example, we can use fuzzy clustering algorithm to accommodate companies that are not pure plays, more of a conglomerates that have multiple footprints in multiple business lines. But I think as a proof of concept, the results of our test suggests that there is information content that can be captured by the machine driven company clusters that's not already covered by the more widely used industry classification scheme. Now, if we look at the second step, imagine you already have a group of structured signals. Now we want to go from there to stock return predictions. You want to go. You want to make investment decisions based on your predictions. Now, the key for success, for success there is the ability to obtain smart models that can discover meaningful patterns from your group of signals. Now, the opportunities that machine learning brings to the equation there is those models are more powerful, more powerful in capturing non-linear patterns and interactions among signals. Show you an example. So we did another experiment where we trained three machine learning models, random forest boosted trees, neural network. For those of you who are familiar with the mining machine product, they should be no strangers. These are the three guys Forest Boost and Neuro. Now, in this experiment, we're using them to make return predictions in a universe of US large cap stocks. So on average, we're looking at around 400 stocks every month.
We give them a group of input variables that's already structured signals. We're asking the machine learning models to make predictions on total returns for the next month. And we form a long short portfolio to to to test this, to test the efficacy of those machine learning model predictions every month. Now a little bit on input variables. So our philosophy is instead of giving the machines hundreds of input variables and let the machines figure them out, we think is a better idea to start with a smaller number of input variables. That's easier to understand, but we want this small number of input variables to cover a wide range of aspects. For example, we have company fundamentals like size, value, profitability. We have some price trend signals, momentum reversal. We have indicator based on our proprietary industry flows. Data also include, as I just mentioned, media sentiment indicator and we include the US equity market turbulence indicator as a regime variable. Now a little bit on the rolling process of model calibration. As I mentioned, market is non-stationary. They they they are dynamic, they are adaptive. And we are mindful of the fact that relationships and rules can change from time to time. So what we did is to implement a rolling process of model calibration. So at the beginning we'll give those models six years of in-sample data to train and calibrate on. We'll lock down that model and use that model to make predictions for the next year.
After 12 months, we'll recalibrate the models with the latest six years of data locked on the model. Model would be good for the next 12 months. And so on. Now here's a summary of the Out-of-sample performance from We identify top 15 bottom 50 stocks according to each model predictions and form those long short portfolios and rebalance every month. We include the predictions based on three machine learning models. We also include a regression model as a reference benchmark. And you can see in our out-of-sample period in terms of risk adjusted return. All three machine learning models could outperform their regression benchmark by at least 24%. Nuro is actually the best performing machine in terms of risk adjusted return. It improved upon by more than two times and all the three all three machine learning models have higher hit rate as measured by the percentage of months where we see positive return as compared when compared with the regression model. Now beyond the black box, it's not sufficient to have a machine learning model or any model that can give you a good looking test result. Especially for machine learning. Right. We need to understand what is going on inside the machine. We need to develop an understanding of how a model makes predictions. You know, machine learning models are complex and that's the source of their power and also the source of their opacity. Chatgpt models, some of the earlier GPT models that have billions of parameters in them.
By some estimation, the latest GPT model has more than a trillion parameters. It's hard to explain, but we don't think we have to be intimidated by just the sheer number of parameters in the machine. We think it's very helpful to compare machines to people. If we think about the human brain for a second, on average, human brain has 86 billion neurons in it. So we only truly understand a small portion of it. But a person can still express a simple and understandable idea. So for us, we don't think it's very relevant to ask, you know, how can I understand each and every parameter among the billions of parameters in the machine? It's probably more relevant to figure out, can we find interpretable relationships in an even complicated machine learning model? So we proposed a framework called model fingerprint based on the theoretical concept of partial dependence. What it does is it will decompose any machine learning predictions into basic and understandable, understandable subcomponents, namely linear, non-linear interaction effects and model fingerprint shows these basic subcomponents in comparable units so we can facilitate summarizing of key characteristics, similarities and differences among different machine learning models. Now, for those of you who are familiar with machine learning interpretability, you may have heard tools like Shap and Shapley value so we can show that the model fingerprint tool have the same desirable properties included for Shap, for example, efficiency, which means local feature attributions should add up to the original model. Prediction symmetry. If there are two features that attribute equally to a model prediction, then they will have same fingerprint value dummy.
If there is a variable in your data set that do not participate in model prediction at all, it will have a zero fingerprint value. Additivity. If your model is an ensemble of many individual submodels, then the fingerprint of your submodels will add up to the fingerprint of the overall model. And on top of that model, fingerprint also have some additional advantages. First of all, model fingerprint is truly model agnostic. So regardless whether your model is a deep neural network or a simple linear regression, you can implement model fingerprint in the exact same way because model fingerprint is based on the concept of partial dependence. Whereas for a tool like Shapley, that's based on a game theory concept of coalitions model fingerprint does not require calculations of different coalitions and permutations of different variables in your input set. So model fingerprint is is more efficient to to compute. And I think what was was really important is when we design the model fingerprint framework, we highlight nonlinear and interaction effects as logical features. So take Shapley as an example. What Shapley is trying to do is really to come up with one score that's trying to explain the feature importance of each and every input variable. But for us, we start from the get go decomposing. We want to decompose model predictions into linear and nonlinear interaction effects. And I will show you some examples to demonstrate why we think that that's a very helpful approach.
Now, here, let's let's take a look at some fingerprints from the three machine learning models we developed and used for making return predictions for us large cap stocks. Here are the bar charts is a summary of feature importance. So higher value would mean the variation of that input variable will change model predictions a lot. Lower values just mean the model is not using that input variable much. And as I mentioned, model fingerprint present those results in comparable units so you can put fingerprints of different models alongside each other and can compare them all at a glance. I think I want to point out one interesting observation for neural. The green bar is the interaction subcomponents, and you see a lot of interactions being captured by neural and compare it with Forest and Boost. It's actually quite intuitive, the construct of neural networks. There, really trying to figure out an overarching theme that explains everything. And the design nature of neural network model makes it very flexible in terms of capturing interaction effects. And that's, that's reflected in our model fingerprint visualization. Now we can dig a bit deeper. Let's focus on one machine here. I'm using neural as an example and try to understand how neural thinks about the value factor. So here what I'm showing on the horizontal axis that that's value as measured in price to book ratio from left to right. That's growth stocks to value stocks.
And then the vertical axis going from bottom to top. That's from low return prediction by neural model on average to higher return prediction by neural. So what can we get? Well, this is just a very faithful reflection of how neural thinks about the value factor. And apparently neural prefers growth stocks over value stocks. And there are actually more nuances into this value factor and how neural use it. And that nuance we can understand from the interaction fingerprint. So here I'm showing you a visualization of the interaction effect between market turbulence and value. So on the horizontal axis, the steel value going from left to right. This growth stocks to value stocks. Vertical axis now is US equity market turbulence. So going from top to from bottom to top, that's low turbulence to high turbulence. And the color code red means higher return prediction on average by the neural model. Blue just means lower return prediction on average. So as I mentioned previously, we're seeing sort of the first order effect neural likes growth stocks versus value stocks. But on top of that, if we focus if we look at the bottom section of this heatmap, what this is showing is that when turbulence is low, so when market is in a calm environment in general, neural will double down. Neural will really like growth stocks over value stocks. But if you look at the top section of this heatmap, that's when turbulence is high. You see neural start to like value stocks in that type of environment.
Again, we didn't tell the machine any rules or dynamics that they should they should look for they should make predictions on whatever pattern is. The model is learned by the model from data. And show you another example. Another input variable is endless upgrade ratio. So while we what we did there is for each stock we look at in the previous month, what's the number of analysts upgrades? We divide that by the total number of analyst upgrades and downgrades to make it an analyst upgrade ratio and we transform them into cross-sectional Z scores and put them into the machines. Now here is a visualization of how neural thinks about those analyst upgrade ratio horizontal axis from left to right. That's lower upgrade ratio to higher upgrade ratio, vertical axis bottom to up lower return prediction to higher return prediction. So this is, you know, non-linearity in action, right? So below a certain level of analyst upgrade ratio, you'll see that more analyst upgrade, higher analyst upgrade ratio would lead to higher model prediction, but that relationship flattens after a certain level. So for neural right, it doesn't no longer think that that incrementally higher analyst upgrade ratio would lead to higher return prediction. In fact, super high conviction or consensus according to analyst upgrades would result in a slightly less positive return, according to neural. Now, if we stay on this endless upgrade ratio a little bit, as I mentioned, another input we give the machine is media sentiment score and we see one of the most influential interaction effects is actually between media sentiment and analyst upgrades.
So speaking loosely is public sentiment versus analyst sentiment. So a horizontal axis that's analyst upgrade ratio from being lower to higher vertical axis, we're going from bottom to to top. That's more negative media sentiment going up to more positive media sentiment. Now, if we focus on the area where we see a lot of red, that's high model return prediction. On average, if we see which are the regions that neural would make higher return predictions, that's actually when public sentiment disagrees with analyst estimates or analyst upgrades. For example, in top left where you see a lot of concentration of red, a lot of concentration of high return predictions, that's when analyst upgrade ratios are actually lower, very low in the cross section. But we see a lot of positive public sentiment according to digital media. Again, a pattern. We didn't specify any patterns. It's just the rules and dynamics learned by the model from the data. As I mentioned, you know, financial markets are adaptive, so the rules could change over time. Here is another view of the model fingerprint. What we are plotting here is some factor importance evolvement over time, again from the neural model. So what is showing is if you see a higher value, that means that that factor is very important. So variation in that input variable would change model predictions a lot.
I think I will just probably just highlight one one line chart, the green one, that's profitability as measured by return on equity, you know, in 2019, that is really not a very important input variable according to neural. So variation in return on equity doesn't really make it doesn't really move the dial, doesn't really make much impact on the model predictions. But in 2020 and 2021, according to neural ROE, became more important. So variations in companies ROE would actually make a lot of difference when it comes to model return prediction. And of course, things would change over time. But again, to emphasize on the idea that financial markets are adaptive, it's a bad idea to incorporate that thinking into your modeling process. And you can use model fingerprint to visualize and understand how models are thinking about those input factors. I guess some final thoughts on are we all going to lose our jobs to machines and bots while we are trying very hard to to create an argument that human and machines can coexist peacefully? And we think if we can open the black box, if we can get an understanding of what's going on inside the machine, we can actually bring minds and machines together, having them talk to each other and identify where are the areas of opportunities. Be mindful of what what risks there are when applying machine learning to to financial applications and learn. Having the minds and machines learn from each other and improve both processes. So I think that's all. Thank you. I'll take some questions.
Speaker2: All righty. Thanks so much, Andrew. Do we have any questions in the crowd before we hand it over to Slido? Oh, great. We have one over over here. Do we have a mic? Okay. Oh, there you go. Oh, thank you.
Speaker3: So what what sorts of performance metrics do you use to test how accurate these predictions are?
Andrew Yimou Li: How accurate these predictions are? Well, again, if you think of the the two step process from raw data to signal, I think there you would think if we go back to the example of, say, stock groupings in industry classification, where you could design specific metrics to measure, for example, whatever pure stocks are grouped are identified by our machine learning model. You can think of, you know, are there you can use traditional metrics, for example, revenue or other company fundamentals to evaluate whether the pure stocks identified by a machine learning model are they do they really look like each other? You could also go into if you're not super, if you don't have a lot of trust in having machines read company descriptions, well, you can have an army of human to read them and assess whether they are they are truly similar to each other. I guess in the second step from signals to stock return predictions. Well, I think it's it's, you know, whatever metric or system or process you have in place for testing a quant strategy, I think Romney actually gave a pretty good demonstration of robustness tests, different evaluation metrics. I think they apply the same to to machine learning, discovered patterns. But I guess the key there and I want to emphasize the key there is you have to have some sort of tool to to help you open the black box. Right. If you, for example, the the the patterns relationships that I showed is from the three machine learning models we trained to make predictions on US stocks. You know, you can you can ask the question, are those sensible, are those reasonable? I don't think it's a right or wrong question or yes or no question. But at least you have to have a tool to understand what is being modeled, what is what patterns are being identified by the models. And then you can you can make up your decision.
Speaker3: So do you do you believe that from the tools we have now, will this be more useful in a strongly efficient market or markets that are less efficient?
Andrew Yimou Li: Yeah, well, the way I think of it again, going back to what advantage is machine learning can provide, I think one is information. Age two is modeling age. Information age, I think is probably more relevant in a more efficient market. You know, if all the financial data or all the traditional data is already pretty much commoditized, where do you seek source of alpha? Well, you could seek source of alpha from alternative data. Data that's out there in real life, not already analyzed, not already captured by models. And if you have an edge, if you have, you know, ability to understand those data and transform them into actionable signals, then you probably have an advantage modeling edge, probably more relevant for a less efficient market. You know, there are patterns, rules and biases that are out there not already exploited by people. You can have a machine learning model, you know, without much pre specification having them discover those patterns, those biases and give you a modeling edge. So I think it's a yeah, it's a case specific type of question, but I think I hope through the presentation you can see where machine learning can add value at different stages of this process. Thank you.