Hello. Thank you very much Dwyfor and it's wonderful to be with all of you. We're excited to share this research that we've been working on for quite some time now, and this is joint work that Megan and I have done, along with Mark Kritzman, who is one of our founding partners at State Street Associates. And... The motivation for this research is to figure out how we can... reunite quantitative processes, and the way that people predict from data that's usually done in a classical statistical way, with the more intuitive approach that people use to try to make sense of the future and learning from the past. People tend to think about past experience as narratives and as experiences, and to extrapolate from relevant experiences in the past to what might happen in the future. But that doesn't seem at all like the way statistics works, it has a whole separate language that doesn't reconcile with that natural, intuitive approach. And, as we've looked into this, we've found... you know, it's very fascinating to change your perspective on data-driven analysis, and to be able to tell stories about which experiences are most relevant, and how you can learn from them to make better predictions. We... actually ended up writing a book on this, that's the level to which we got interested in this topic, and it really is quite wide-ranging, but hopefully in this presentation we can convince you that there's some interesting new perspectives to be had here, and a new way to use data that's intuitive. So, let's make it concrete with a case study to keep in mind throughout the presentation, and that is, suppose you want to predict inflation, which would certainly be a useful thing to do these days, and you have some historical experiences of different economic environments and what the inflation was over the following year. How should you use these historical data points to try to predict inflation going forward from today? So, the setup here is, imagine data that could be in a spreadsheet, like what we're showing, that consists of a bunch of economic measurements, like wage growth, spending, unemployment, money supply interest rates, things of that nature, and also the subsequent inflation that followed each of those environments, and we know what these variables are now, which is at the top, the current circumstances, and we know what they were at various points in history, going back to the 1940s. And the question is... how do you use this data to predict? Now, we're going to contrast two different approaches, and we have two individuals here hypothetically to... represent them, we have Ava and Ben, and this is a little bit of a play on, if some of you are familiar with cryptography and some of the crypto stuff, they talk about Alice and... Alice and Bob are always the two players that they do hypothetical examples with, so we wanted to switch it up a little bit, and we have Ava here, and we're going to have Ben in a moment. So, the first approach is quant analysis. We have Ava, who is an analyst, and she is well-trained in regression analysis, and she takes the data set we were just looking at, and sees there's a bunch of variables there, and so she regresses inflation on a bunch of predictive economic variables. And she's going to find that some of them have an important effect, they have large betas and some of them are statistically significant, and we have t-statistics, and this is all the language of classical statistical analysis. The emphasis is on the variables, which measurements affect the thing you're predicting? and those are the columns in a data set, that's how you would typically look at it. And then, of course, you form a prediction by saying, well, if I have identified these two variables that are very important and predictive of inflation, I'm going to... extrapolate those from their values today. So, that's the quantitative approach, but there is a very different way to approach this problem that is more intuitive to many people, so consider Ben, who is more of an economic historian, less quantitatively orientated. Now, what he might do, is say there are periods in history that are reminiscent of the circumstances we face today, and we can identify them with judgment and good intuition, and say, maybe today is a little bit like something that happened 30 years ago, and maybe it's a little bit like something that happened 50 years ago, and we're going to make a prediction by saying, well, if that's the case, why don't we predict something similar will also occur now? So, this is learning from experience, and it's a totally different approach, because what Ben is doing here is finding observations or experiences which are the rows in the data, the actual time periods, and that tells a very different story. There's no discussion of variables here, although they're being used to implicitly identify which periods are important. So... the advantage of the traditional statistics approach, Ava's approach, is that it's rigorous using data. But the disadvantage is that it's hard to tell a narrative story with it, and overlay the judgment and intuition that a lot of people bring. The advantage of Ben's approach is, it has that narrative, but it's subject to bias of human judgment, and maybe only has a couple of data points to draw from. So, what we'd like to propose is the best of both worlds. How do we unite these? And we're going to argue that it's done through a concept called relevance. That is to say we have all these experiences or observations of things that have happened in the past, and we want to pick the ones that are most relevant to our present circumstances so that we can extrapolate from those to predict the future. And... this means that prediction is all about determining the relevance of each of these historical experiences or historical observations.So, what does that mean? It's intuitive that that should be the case, I think we all... when we face a new situation, try to think of similar events that we've experienced in the past. But it turns out that relevance is quite a precise thing, and the benefit that we get from this approach is in being able to understand what makes something relevant and then further, which Meg will talk about soon, how we can measure that precisely using data. So, the first component of relevance is informativeness, and the idea here is that we may have a bunch of historical experiences, like the economic environments that led to high, or low, or medium inflation, but these experiences are not all created equal. we shouldn't devote equal attention to them, because many of them probably mostly just reflect noise. There are a lot of historical periods where nothing eventful happened, and so the patterns that we see in those data are due to pure chance. They're random and we really shouldn't rely on them. But there are other times when significant events happened in markets. And... those events are not going to be due to noise, they're going to be due to something truly substantial that happened, and that means that they're going to reveal a lot about the relationships between circumstances and, say, inflation in the future. So, it's worth digging a little bit more into this concept, because this underpins the rigor of how we can use data to identify these relevant events. And it's based on something called information theory, which is extremely powerful, and it was introduced by Claude Shannon back in 1949. It's surprising that more people are not familiar with Claude Shannon's contribution, in fact a lot of people who are in the know consider him to be perhaps the greatest genius of the 20th century, with Einstein coming in 2nd. Now, whether or not that's the case, it may also be that Shannon had more impact on that world that we live in that we experience. What he did was come up with a theory that showed that all information could be represented in zeros and ones, so he ushered in really the whole information age that we have today. But more importantly, he created this theory of communication and of information that shows exactly how much information is contained in a particular message or an experience. So, one way to think about this is, you know suppose that you have a friend who goes to the grocery store, buys some apples and comes back and they tell you that story. That is a very boring story. And it's not surprising, it's very typical, and it contains very little information. All this friend has to do is say, I went to the store, nothing happened, and you know exactly what's going on. But imagine that a different friend goes to the store, and says something truly crazy happened, you'll never believe what I saw. You need a ton of information to figure out what that was, there's going to be all this richness in explaining that and understanding it. So surprising things, unusual things, are more interesting and they contain more information. And what Shannon showed is, there's this direct inverse relationship between how probable something is, and how much information it contains. Rare things contain a ton of information. So, this implies that, when we're looking through all these past experiences, we should focus more on the ones that are unusual. And that's kind of... a counter-intuitive idea. Classical statistics tends to say the opposite. We tend to … classical statistics tends to say you should treat unusual occurrences with skepticism, maybe these outliers are not reliable. But in fact, the unusual occurrences are the most important of all. So we really want to pay a lot of attention to them. But there is, of course, a second component to relevance, which is similarity to present circumstances. So, for example, if you're trying to predict what's going to happen amidst the depths of a recession, a lot of economists and pundits would intuitively say, let's look to past recessions that have some similar character, and see what happened there. A few people would say, let's predict the recession outcomes by looking at the most different circumstances, the best positive growth in the past, and then predict from those. So you're usually looking to draw from similar experiences, and all of what we're saying here applies outside of finance, and outside of economics, another example to keep in mind would be something like healthcare. If we want to know how well a certain drug or a treatment is going to work for some disease, isn't it intuitive to draw your experience from people that are similar to you? Maybe of the same age, similar health conditions, same sex, etc. So this is something that we all intuitively do and what's really fascinating is that it turns out that quantitative techniques do this too. They do it to some extent, but they can actually do it better. So, the punchline here is, we can measure relevance of our experiences, as the sum of two pieces, how informative a past experience is, and how similar it is to what we're encountering right now. And if we do that, we can then extrapolate that similar things may happen from what was the case in those past episodes. So this becomes a powerful heuristic, a model for predicting that's very different than the way data is usually used, and we'll go through an example of this, and hopefully make it more concrete to see how you can apply this to a lot of different situations. what we have not covered yet is how you actually measure any of these things, we've talked very conceptually about these ideas but Megan is going to show you now that there is a precise statistical measurement of these things that's very intuitive and helpful and so I'm going to turn it over to you, Meg, to go through that.As Dave just introduced, relevant observations are going to be those that are both informative and similar to today, assuming today is the observation that we're looking to predict. So now we'll dig a little bit more into how we can measure that in a really precise, mathematical way, so I think to help visualise the scatter plot is a nice thing to start with, so this is a scatter plot here of hypothetical observations for two of the attributes from our earlier data sets, so here we're just thinking about wage growth and spending growth. So, each dot that you see on the scatter plot corresponds to a hypothetical observation, and each observation's location on the scatter plot reflects the combination of its wage growth, which is on the horizontal X axis, and then also its spending growth, which is on the vertical Y axis. And then, at the centre of this chart, which is where the two axis meet, this indicates the most typical or average variables for these two attributes that we're considering. So, let's walk through how we can actually determine which of these observations are relevant to today, and in the scatter plot today is indicated by that blue dot that you see in the upper right quadrant. So, for each of the other three observations on our scatter plot, we determine relevance by measuring and considering two things. First, we're going to take an observation, and we're going to look at or measure how distant it is from the centre of our chart, and remember the centre of our chart indicates the most typical or average values. So, if a data point or an observation is far away from the centre, that indicates that it's unusual and therefore that it's informative. The second thing that we're then going to consider, is how distant an observation is from our observation of interest, which again is our blue data point, which is [unclear word]. And so this distance is going to tell us about the similarity of that observation to out current circumstances, so actually that distance, the negative of that is going to be similarity, so how distant is it, and the smaller that distance, the more similar the observation. So just working through that exercise with the observations that you see here, orange, the orange dot is informative, because it's far away from the centre, but unfortunately it's dissimilar because it's far away from today, so that makes it not particularly relevant. The grey observation is kind of the opposite situation, it's very similar to today, which is good, but it's also similar to average so it's not particularly informative. And then the green observation is distant from the centre, so it's informative, it's close to today, so it's similar, so that means at least three observations that we're considering, the green one is our most relevant observation, it's both informative, and it's also similar to our current circumstances. So, this is a really highly simplified illustration, but I think it's nice because it highlights a couple of key things, and so in order to quantify the relevance of an observation, we need to be able to measure its distance from average, so that will be informativeness, we need to measure its distance from today, so similarity, and then when we measure those distances, we need to consider multiple attributes all at once. So, here we're only considering, for simplicity, two attributes, so we can plot this and visualise it easily But recall from earlier that the data's that we're working with actually has seven attributes or characteristics that we're considering of our historical observations. And just more generally, any data set that you're looking at could include any number of observations. So, conceptually, this is how we measure relevance, to look at it in a more mathematically precise way, what we do is, we rely upon a really handy statistic called the Mahalonobis distance, and for those of you who are familiar with our research, you may recognise this from other work that we've done with measuring turbulence, our recession likelihood indicator so this tends to pop up in our research and the Mahalonobis distance is really well-suited for what we're trying to do, because what it does is, it measures the distance between two data points, or two observations, that are characterised by many attributes, so it's exactly what we want to do. and when it measures that distance, it considers not only the distance of each attribute in isolation, but also the distance of their co-occurrence. So, for example, if we think again of our two attributes, so wage growth and spending growth, the Mahalonobis distance between two observations based on these attributes would capture the difference in their two wage growths, the difference in their two spending growths and then also the difference in their combinations of wage and spending growth. So not only each attribute in isolation, but also their combination, their co-occurrence. Then it does all of this, summarises it in a single number, and when it does, it accounts for the typical variations and co-occurrences of these attributes, As a bit of an interesting side story in the spirit of some of the stories Dave told, this statistic was actually discovered in the 1920s by a statistician named Mahalonobis, and the background, which is why we have the skull image here is that Mahalonobis developed the measure while analysing human skulls, so it was in a an anthropological context, he was studying people with different mixed parentages, and so he developed this measure as a way of taking two different skulls that are characterised by various dimensions, such as nasal length, and, I don't know, skull height, and used that to compare the two skulls. So, since then, the Mahalonobis system has been applied in a range of fields, it's used on medical studies, as I mentioned before, we've applied it a number of times in the work that we do at State Street Associates, so it's a very handy statistic, and we've become a big fan of the Mahalonobis systems.So, back to relevance, the key take away here is that we measure, or we use the Mahalonobis distance to precisely measure the historical or just the relevance of historical observations to today, and we do that by measuring an observation's distance from average, which again indicates its informativeness, as well as its distance from today, which indicates its similarity, and then together these two things combined indicate that observation's overall relevance. So, that's step one, that's how we measure relevance.Now, let's look at how we actually use it for generating projections. And so here we are returning to our historical data that we saw earlier from Dave, and recall our task is to predict upcoming inflation using this data. So, we just established that relevance, that predicting is all about relevance, so the first thing we want to do is calculate the relevance of each historical observation, based on its set of attributes and comparing them to today in average, so these are the values that you see in that first column with the orange border, the orange highlighting, so these are relevance for every historical observation, or experience in our sample, and then the second column that we've highlighted is the inflation outcome for each historical observation, again, we saw this earlier. And the reason we highlight these two columns, relevance and outcomes, is because this is what we're going to use to form our predictions, and specifically, our prediction for upcoming inflation, equals the relevance weighted average of past inflation outcomes. So, it's like taking the relevance column, multiplying it times your inflation outcome column, and then taking an average of those products, and that is our prediction.So, just to help visualise that process a little bit more, so this chart shows in the grey bars, relevance, for all of the historical observations in our sample, so the previous page was just looking at a handful of observations so here we're showing relevance for every period in our sample. It also shows inflation outcomes in dark dots for every period in our sample. And so, again, to generate our prediction for inflation, we multiply each observation's relevance, times its inflation outcome, so grey bar times its corresponding dot, and then the average of these relevance weighted outcomes, across all of our observations is going to equal our prediction for upcoming inflation. So, in this particular example, which again is just an illustration, the result would say that the prediction for inflation is six and a half percent over the next year. So, don't anchor too much to this prediction, it's just an example to show how the process works.So, what's really important is that this approach to forming a prediction is not arbitrary, so the prediction that you're looking at here, which again, we produced as a relevance-weighted average of past outcomes, this prediction is identical to the projection from a full sample linear regression, so to go back to Ava from earlier, if we were to use her approach of fitting betas to our attributes, multiplying today's attributes times those betas and summing those to come up with a prediction, this is the exact same prediction that you see here. So this equivalence is really important because, again, what is means is that our definition of relevance and how we use it for forecasting is not arbitrary, so it follows from this equivalence with linear regression, and to Dave's earlier comparison of Ava and Ben, this is where we start to reconcile these two approaches. Ben, we're forming our prediction, through this lens of looking at past experiences and what happened following those. But we're doing it in a statistically rigorous way that outlines with Ava's regression. So this is how we're reconciling those two approaches.Another important thing about this equivalence is that it reveals a really interesting feature of regression analysis. Just to return to our chart from earlier, notice here that a lot of the relevance values are negative. Imagine again now our process where we take a relevance value, multiply it times an outcome and that, on average, is our projection. So, what this reveals is that linear regression again, that approach is equivalent to a full sample regression, what it's doing is it assumes that what happened following relevant periods in the past will recur. What happened following non-relevant periods will also recur, but in the opposite direction, so it's flipping the sign of those past experiences and assuming that's what is going to happen going forward. So, this begs the question, does this make sense? Should we be relying on non-relevant observations as much as we do relevant observations, and is this a bit of perverse logic almost to assume that what happened from relevant or extremely non-relevant periods will happen again, but just in the opposite direction? So, whether or not this makes sense is an empirical question. It's actually going to depend on whether there is asymmetry in the relationships that we're modelling, but we do have a number of applications where we found evidence that focusing on relevant observations is more useful, and produces a better quality prediction than focusing on non-relevant observations, and so we call this approach partial sample regression and the idea is that we just censor, we calculate relevance and then we censor our sample to exclude a number of non-relevant observations and just focus on most relevant ones before we come up with our average prediction from those. And so you can see, again, working through this example that when we do that, we actually get a meaningfully different prediction for inflation, at seven point six percent compared to six and a half earlier.So, this is another key take away, which is that focusing on a subset of relevant observations can actually improve the quality of our predictions, and we don't have any results here, but we've tested this again in a number of applications, such as predicting the [unclear word] correlation, in the US presidential elections, and other things and we do find improvement from the partial sample approach, so this is a really powerful benefit of using relevance to form predictions. Now, another benefit of using relevance is that we can actually quantify the quality, and therefore our confidence in a single prediction, and we refer to this as a prediction's fit. What it does is, it indicates the alignment between relevance and outcomes across observations, and the greater the alignment, the greater the quality of the predictions, so a simple way to think about this is what fit does is, it looks at all the observations that underlie the prediction and remember each observation has a relevance value and it has an outcome, and fit asks the question in the answers, do similarly relevant observations experience similar outcomes? And if they do, then the fit is high, and we should be relatively confident in the predictions formed from those observations. So it would be like surveying a bunch of experts on a topic and they all give you very similar answers so you'd be very confident in their average answer than if all of these experts gave you opposing, or very vastly different answers. So, in that analogy, the relevant observations are your experts, their outcomes are their answers, so the more aligned those answers and expertise are, the more confident you'll be in that prediction. Another key point to emphasise here is that fit actually varies by the prediction task so what this means is, I could use the same historical sample, such as what we looked at earlier to predict outcomes for two different observations and those two predictions may have different qualities or fit, they will have different qualities or fit, even though they're relying on the same data. And that's because relevance depends on the circumstances that you're predicting, so different circumstances, different relevance, different effectively precedent in history, and therefore different potential fits in terms of how confident or how... confident your data is in, I guess, forming a prediction and an outlook based on what's happening today. So, this notion of having a measure of fit or quality for a single prediction, actually doesn't really exist in classical statistics, it's another key benefit of our relevance approach, however it does tie out to the R squared, so for those of you who know the R squared is a traditional way of evaluating a model's overall quality, so not an individual prediction, but on average how well that model does that predicting, and so fit actually aligns quite directly with that concept, so for a full sample regression the average fit overall, the predictions equals that model's R square. So, returning now to our... our inflation example, this compares the predictions from whether we form them from the 20 percent most relevant observations, the full sample of observations, the 20 percent least relevant observations, and again what you'll see is, not only do their actual point predictions vary, which is that top panel, row, but also their fit varies as well. So, here we would conclude that we're more confident in the subset of most relevant observations and the prediction that came from that. So, this can be very useful... I'm about to run over here, so as you can imagine, this can be really useful for actually scaling your bets, so the idea would be if you know that you're relatively more confident in one prediction than another, you could use, and which you quantify with fit, then you can use that to scale your bets accordingly. So, I think with that, we have a summary slide here which goes over all of the key points that we just went through, but I know I'm over, so what I'll do is, I'll turn it over to Dwyfor now and we're happy to answer some questions.Dave and Megan, thank you very much indeed, so, moving on in the available time that we have in this session to the Q and A, and once again, don't forget to put your questions through the Q and A chat box.Thank you once again David and Megan, alas, we only have around five minutes or so for the Q and A session, before we have to move on to the next session. But we do have a number of questions that have come through here, and I'm going to choose... I guess a couple of these questions that I feel are probably most relevant in terms of what you've provided to us over the last half an hour or so. The first one, and I'm sure you've done some work on this given your backgrounds, but... how does this analysis offer, maybe direct or indirect insights into optimal asset allocation strategies? David, maybe if you want to start off with that one.Yeah, just briefly, I think the asset allocation process has two components, you have to predict the inputs that go into the process, you have to predict expected returns and risks and correlations for the assets, and then you have to optimally combine them to diversify as best you can so that whole prediction process is central to the exercise... and we found, as Meg mentioned briefly, that predicting something as time varying as the correlation between stocks and bonds, which is quite important, can be done much better by using a predictive model like this, where the relationships are complex enough and conditional on the circumstances that we currently face, we want to draw upon those relevant sets of experience to do it, so in that entire first phase of asset allocation, we can improve the quality of those predictions that then go into the optimisation models.There are two questions that have come through here that are variations of theme actually, one is asking how do we incorporate non-quantitative inputs that have not been previously observed, in other words, I guess an input that has no real benchmark? And there's also a second question, which is something very similar: for data that may be more limited in the historical timeline, e.g. crypto, how would the methodology determine its relevance, or would it be more difficult to determine its relevance, hence the estimate given that again, it's a very limited time series of data to incorporate.Yeah, I guess I can start... maybe I'll actually start with the second question, I mean, with any quantitative model this does rely on data, so of course if there's more limited, if there's limited data, then it is a bit of a limitation to the methodology, and perhaps even more so to some extent because not only are we... not only do we need data to measure relevance, but then, at least in the case of partial sample regression, we're also then actually filtering our sample even more to focus on a subset, so there is a bit of a trade-off. So, of course, you do need data, and then also when you think about partial sample regression there's a bit of a trade-off in terms of... to the extent that you're reducing your sample even more, you want to be careful you're not introducing more noise, and at the same time you're focusing on more relevant observations, so there's just a bit of a balancing act. In terms of incorporating, I think was the question, on more qualitative inputs, I don't know if this is a direct answer to that question, but you can actually combine this with more qualitative approaches, so think about, for example, event studies, right? So, normally you might, if you're trying to predict the path of a recession, you might look back at past recessions, and model what happened following those, and use that judgment of identifying the sub sample of recessions, so you're using judgment there. This is kind of similar to that, but you could almost, rather than looking at what happened I guess what I'm saying is, you could also still use judgment to identify past periods that you think are relevant to what you're considering today, and then maybe just use relevance to weigh the average outcomes that happened following that. So I don't think the two are completely... there's an area in which they overlap where you can actually use some judgment to identify past periods that you think you should be incorporating. So hopefully that helps answer.We've got time for one more, and actually I'm a part of me is reluctant to ask this question because I don't think it's the sort of question you should be asking people. The time, or the night that you're actually preparing and you're presenting this presentation, given that Dave and Megan actually have the short straw here in terms of the really late night Boston slot, but in the very short time that we have left, this is a really technical question, and maybe one for... if you don't want to answer it fully now, maybe just one of you to consider going forward, and to return on at a later date, but here's the question, anyway. Whichever one of you wants to take this, is the embedded assumption of a constant covariance matrix in the Mahalonobis calculation an issue when applying to financial market data, which are arguably non-stationary? So, quite a technical question there, if either of you wants to attempt that one.Yeah, what we're doing is extrapolating from the same assumptions that linear regression already makes, which would include that constant co-variance, and we're relaxing it to say maybe things are not symmetrical, and maybe they depend on circumstances today, so we're relaxing an important thing there, which is that the non-relevant may just be less effective to predict than the relevant subset, and that's very powerful. You could make the model more complicated in fact, these ideas lead into principles of machine learning, which gets very very complicated, and there's a spectrum of complexity that you might want to put into a model like this, and what we're proposing is, I think somewhere in between a linear regression, which is quite basic, an machine learning models, which are quite powerful, but also introduce risks of their own. So, that's the simple answer, that's the quick answer. But it's a great question.Okay. It is a good question actually, and I guess it speaks a lot of the type of work that David and Megan are doing, and their teams have been doing for a number of years with respect to looking at a lot of statistical work around portfolio and risk management, so we're going to have to wrap it up there, time has once again beaten us, but as I said, it's a very late hour in Boston, so very appreciative of both David and Megan for the taking of the late slot on this particular session, but thank you again David and Megan, for that insight. Again, as per all the sessions, we'd really appreciate your feedback, so please remember to rate the speakers at the end of each session.