We are sorry - we can’t find the page you are looking for.
×
The page you were looking for may no longer be available or may not be available in your country, language or to your investor type. Please use the website navigation or site search at the top of the page to find content similar to what you were looking for.
The Future of Prediction
お客様のクッキー設定によって、動画コンテンツがブロックされています。これらコンテンツにアクセスするには、以下より、クッキー設定を変更するか、全てのクッキーを有効にしてください。
Mark Kritzman: Great. Thank you, Michael.Nice introduction. And I should point out that Michael alluded to this. This presentation is based on a book that David and begins listeners and I wrote that was just published this year as well as an article that was just accepted to the Journal of Financial Data Science, which I think is, which is also co-authored by Megan. So it is about what we think will be hopefully a widely adopted new approach to making forecasts. And it's based on the notion of relevance. And this this is something that has a very precise mathematical meaning. It's something that's theoretically justified, and you'll see why that's the case. And and it's also mathematically unified and you'll see why that's also the case. And unlike linear regression analysis, it deals with complexities like asymmetry, nonlinear verities. But in a way that is more transparent, more adaptive, and more theoretically justified or less arbitrary than many machine learning algorithms. So the way to think about it is it's an alternative to both linear regression analysis because it can deal with asymmetry, but it's also an alternative to machine learning algorithms. So I can't read what's down there, but I think I have it memorized. Having given this talk all over the world many, many times in the last several months, I wanted to give you just the historical foundation that paved the path to this innovation. So these obviously, we can't list all of the important innovations, but these are the ones that we felt were most important. So the first is the when Abraham de Moivre back in 1733 discovered the formula for the normal distribution, that was obviously a big deal. And then in the late 1800s, Karl Gauss came up with the method of ordinary least squares. And then following that, pure Laplace came up with the central limit theorem getting this right so far because I can't see. Okay, good. And then following Laplace, we have probably the most interesting character out there. I'll either him or Shannon, I would say. Right. Which is Francis Galton, a Brit, Michael, who came up with regression to the mean by getting his friends to grow peas for him. And and they came up with the first approximation of the correlation and then his protege Carl Pearson, formalized correlation and his I would say rival Ron Fisher came up with analysis of variance. Now you might say, well, that underpins all of classical statistics and you would be right, it does. But where we depart company is by drawing upon two further innovations. One is the the discovery of the Mahalanobis distance by an Indian mathematician who actually discovered this for the purpose of analyzing human skulls of all things. And Dave is going to go into great detail about how this formula works. It's a it's a really cool formula. And then and then finally, probably the greatest innovation of the 20th century, which is information theory. That comes from Claude Shannon. So again, we're we differ from all of the other approaches is that our approach relies on these two more recent innovations.
Mark Kritzman: So I think it's probably helpful to just provide some context by talking about what people did in the past, what they're doing today. And in that way you can see how our approach builds upon that. So what do you imagine the purpose? I feel like having just listened to Robin, I should be doing this like Harvard based Harvard Business School case study approach and ask you questions. I won't do that, but I'll do it rhetorically. What do you think motivated people to come up with, you know, to try to predict outcomes from data in the first place? This goes back hundreds of years. It had two purposes. One was to predict the movement of heavenly bodies, stars, planets, comets for the purpose of navigation. Remember back then, Well, you don't remember, but you've read back then. People, you know, they moved around on boats and they needed a way of figuring out where they were going. And then the other purpose was to provide guidance to gamblers. And not just not just people who went to casinos, but you can think of gambling more broadly to include insurance brokers, for example. Although a lot of these great mathematicians back in the day did hire themselves out as consultants to people who went to casinos. Now it turns out that the. Rules that govern the motion of heavenly bodies and the role of dice obey their. They're very stable, simple rules. But today. Let me skip this linear regression.
Mark Kritzman: You all know what it is, but let me just move past this. Today, we're we deal with more complexity. For example, asymmetry. Right? So imagine we have a scatterplot like this, and you might think this is really quite extreme and it seems extreme, but consider trying to forecast the stock bond correlation. Right? Well, when interest rates are high, that bonds offer a competitive return with stocks, at least on a risk adjusted basis. So they don't really need to be negatively correlated with stocks because they merit inclusion in the portfolio just based on their return. So it's likely that there would be a positive correlation with stocks and bonds and high interest rate regimes. And that in fact has been the case for the most part. When interest rates are low, however, bonds do not offer a competitive return and therefore the only reason somebody would hold them in their portfolio is to protect against equities when there's a sell off. Right. So in that case, they must have a negative correlation and that has been by and large the case. So this is something that linear regression analysis can't deal with. Right. So the and this just shows you that if you have that kind of a situation, that it would give you the same prediction for all of the outcomes. So this is where I want it to get to. So today we as social scientists, we deal with a lot of complexity that arises from social dynamics, right? And these this elegant but rather.
Mark Kritzman: Limited model regression model just doesn't work. And the emergent approach today is machine learning. And I guess you've probably a lot of you do this and you've certainly all heard about it, but this is the dominant approach in the face of complexity. And for our purposes, it's useful to stratify machine learning algorithms into two types. One would be model based algorithms that for all intents and purposes are just enhancements to linear regression analysis and then model free algorithms. And these these model free algorithms sort of serve as a bridge to what we're going to describe or what Dave is going to describe. So examples of a model based algorithm is lasso regression. And this is a way of selecting variables, tree based algorithms and neural networks and examples of model free algorithms are near neighbors and Gaussian kernels. Now, what I want you to just keep in mind are so Gaussian kernels are used to select observations, Lasso regressions are used to select variables. And if you think about it, the choice of the variables should depend on the observations you're using, right? You shouldn't expect the same set of variables to be equally effective across all samples of observations and you shouldn't expect. A given sample of observations to be equally useful across many different combinations of variables. They depend on each other. So. You know the approach today. We approach those two problems independently. We ignore this codependence, right? We use lasso to select variables and kernels to select observations.
Mark Kritzman: What we're going to show you is how our relevance based approach simultaneously selects observations and variables, thereby offering an alternative and not just an alternative, but a theoretically justified alternative to both of these approaches. So model based algorithms all rely on an iterative process, which is first to specify a decision rule than to calibrate the rule, and then finally to test the rule until you're satisfied with the results and you can construct these models to be extraordinarily flexible. Will and I were at a conference last week where someone who was talking about machine learning algorithms and they're talking about neural networks that had 1.7 trillion variables in them. So to me that implies an awful lot of flexibility. But despite that, they're rigid in one really important way. They do not adapt to new circumstances. Right? So if you get a new situation, you have to go through that entire iterative process of re specifying the model, recalibrating it and retesting it. So that's I think, a pretty significant limitation. Now model three algorithms, which, as I mentioned, are referred to as kernel regressions, Focus on the selection of observations instead of variables, and they form their predictions as weighted average weighted averages of prior outcomes. So we're not estimating coefficients, we're just taking an average of prior outcomes. And what's nice about these models is that the weights that are used to form the prediction are revised with each new prediction task. So that's why this is a bridge to what Dave is going to describe in some detail for you.
Mark Kritzman: But the key difference is that the weights that are used in relevance based prediction are theoretically justified, and you'll see why that's the case. Whereas the machine learning algorithms lack the theoretical core. So we're now going to talk about relevance based prediction. And you see. I don't know what might strike you as an unusual icon, but it makes sense because as I mentioned to you, the Mahalanobis distance, which is really integral to our new prediction system, was based on the analysis of skulls. Right. And it turns out that information in theory, which was given to us by Claude Shannon. It turns out that Claude Shannon worked for the phone company and actually. Now. He worked for Bell Labs, so it's still a phone company, but it's a pretty elite department or part of the phone company. So anyway, information theory comes from Shannon's work to figure out how to move messages quickly and accurately across great distances. And he did this all mathematically. And we draw upon some of the same insights that he came up with in our prediction system. So before Dave gets into it, I just want to set the stage here, which is what are the criteria that we're using to define a good prediction system? There are three. First of all, it should be transparent. These are our this is what we value. Others may have may place greater weight on other criteria, but these are the criteria that we think are important, that the prediction system should be transparent because this promotes intuition and it facilitates interpretation.
Mark Kritzman: It should be flexible. By which we mean it should adapt to new circumstances automatically. And it should be non arbitrary. It should be theoretically justified and mathematically unified. And neither linear regression analysis nor machine learning satisfy all these criteria. So those are our three principles. And we also have three key tenets of this new relevance based approach to prediction. First is relevance, and this identifies the optimal subset of observations from which to form a prediction in a way that's not arbitrary and mathematically precise. So each each prediction task has its own unique set of observations. It also has its own unique set of variables. Then there's this notion of fit. And fit enables one to assess the unique reliability of each individual prediction task separately from the overall quality of the prediction model. So we typically. Assess the quality of a prediction based on the model's R squared. Well, R squared is just an average of lots of good predictions and lots of bad predictions. What fit does is it gives you a much more nuanced and prediction specific measure of how much reliability you should attach to a prediction. And then finally, situational learning. This enables one to identify the uniquely optimal combination of observations and variables from which to form a prediction. So I'm just setting the stage here. I'll turn it over to Dave, who is the main act for this presentation.
David Turkington: All right. Let's get into how relevance based prediction actually works. And I have to warn you, there is some math involved. But don't worry. These are very friendly formulas. I promise. So relevance based prediction is a more intuitive way to think about using things that happened in the past to predict the future. And in fact, recent research in neuroscience, the cutting edge latest interpretation of how the human brain actually works, says that we instinctively rely on experiences from the past to extrapolate what we predict for the future. So we are scouring our memory all the time for relevant circumstances to what's happening today. And we think that what happened in those circumstances in the past is very likely what we'll see going forward. So how do we translate this into a rigorous, data driven approach that uses that intuitive principle? Well, like the model free method that Mark described, we're going to use history as our guide. So our prediction for the future, which you can call y hat at time T, which is today, is very literally a weighted average of what actually happened in experiences of the past. Y sub ai. So in any prediction model, in a regular regression we have X and we have Y and they each have jobs. Y is the thing we want to predict. X is the stuff that we use to predict. So the job of Y is to tell us what actually happened before in the thing we care about.
David Turkington: The job of X is to tell us how much attention should we pay to each past experience? What should the weight be on those past experiences? So it's a very simple principle, but we need to do it in a precise way. Now, you can imagine if you had no ability to discriminate, the more relevant and least less relevant experiences of the past to the circumstances. Now you would just equally weight everything that happened before, so you would have one over n That would be your weight for everything and you would end up with a very boring but not crazy prediction. The average of everything that happened in the past is what we're going to predict for the future. But we want to do much better than that, and we can do it by defining this notion of statistical relevance. We can overweight past experiences that are more relevant to today and we can underweight those that are less relevant. How do we do it? So relevance has two key components, and they make sense when you think about it. One is similarity to current conditions and the other is informative ness. What does that mean? So let's put it in the case of an example. Suppose you're trying to predict how stocks will perform in a regression. Sorry, in a recession? Well, it's natural to think back to past recessions. And how did stocks do then? And we're going to look for circumstances that are similar to today.
David Turkington: Or maybe you want to forecast how traffic is going to evolve during a snowstorm. Well, you naturally recall past snowstorms and you might think, well, what was happening in the traffic. So similarity tells us to find past experiences that relate to the circumstances we're dealing with right now. The second component is informative ness. And the key insight here is that the more unusual things that have happened in the past, the more extreme things that have happened in the past, are more important to focus on now. Intuitively, more extreme things reside more prominently in our memory. We tend to recall those, and it's a good idea to focus on things that are more dramatic because those contain more information. So there is a fundamental principle of information theory which is extremely profound and important. It was developed by Claude Shannon in the 1940s that says that information is the inverse of probability. In other words, things that are rare contain much more information. We need to pay more attention to them. Now. We want to measure the similarity and informative ness of every historical experience to within the context of today. So historical experiences, the attributes that describe them are X. So we have a variable, a vector, let's say, of many variables that describe circumstances. It could be a bunch of economic variables, for example, and that's what we're calling X sub I, and then today's circumstances are x sub T and we want to compare them.
David Turkington: So similarity is how close are those collection of circumstances from some past period to the collection of circumstances we see today and we measure it with something called a Mahalanobis distance. So that is a measure of how distant those two vectors are from each other, but it accounts for the typical variation in those variables and the co variation, the correlations between those attributes. Mahalanobis back in India in the 1920s and thirties figured this out. And as Mark mentioned, he was studying human skulls and he had to compare them. And you can intuitively recognize that if you're taking measurements of a human skull, the variation matters. A one centimeter difference in the size of a nose is much more meaningful than a one centimeter difference in the size of a head. And furthermore, bigger heads tend to have a bigger noses if you're talking about skulls. So we need to account for all these complexities. The negative distance is a measure of similarity. Informative ness is how far was something that happened before from average conditions? How unusual is it? And once again, we use the Mahalanobis distance. So the formula here is actually quite elegant. It's just the difference in these two vectors piece by piece. How far are the variables from average, let's say, and then multiplied by the inverse of their covariance matrix? And that's the part that accounts for the variation and the correlation of the data. When you multiply again by the difference, you get a single number.
David Turkington: So this is an extremely important informative number that tells you how similar to today and different from average is something from the past and all else equal. That's what we care about. Similar to today, different than average. So here's something to explain how the Mahalanobis distance really works. On the left hand side, you see that two equidistant points from the center of an orange distribution are actually equal in how surprising they are. They have the same Mahalanobis distance. That's because these variables are uncorrelated. Okay, look on the right. If the variables are correlated, then two equally distant points from the center are not equally surprising or equally informative. Point D is much more surprising and it's much more informative. It lies far outside of the typical behavior for these variables. So we need to pay a lot of attention. Here's another example. Suppose that our current circumstances measured in two variables it could be growth and inflation, for example, are that we have positive growth and positive inflation. Let's just say now how much attention should we place on these two different historical experiences? One which has more moderate values of each and one that has more extreme values of each? Well, they're both equally similar to current circumstances. We should pay attention to them, but we should pay more attention to point B, which is more extreme. The intuition is if something dramatic happens in the economy, there's a good chance that the thing we're trying to predict, whatever it is like stock prices is affected by that.
David Turkington: If it's a dramatic move, if we get a point close to the center, whatever happens in stocks is probably just noise. So we have to pay attention to extreme ness, unusual ness, informative ness. And just to emphasize the point that relevance is by no means arbitrary. So it follows from the central limit theorem, which means that we should see normal distributions a lot of the time, at least to first approximation. It's a good thing to start with and information theory, which says that in fact the normal distribution is intimately related to this Mahalanobis distance. So I'm not going to go into detail here, but if there's any connoisseurs of the normal distribution in the audience, which I suspect there are, then you'll recognize that the Mahalanobis distance lies inside of it. The probability of something decays with the Mahalanobis distance. So information theory tells you literally the information, and a point is the Mahalanobis distance. Okay, so what's the punch line here? We form this prediction as a weighted average of what actually happened in the past where the weights depend on statistical relevance and something pretty amazing happens. We get the same exact prediction as we would get from a linear regression model. Okay, that might sound like a disappointing conclusion because we all knew about linear regression models, but there's a big benefit here. First of all, this shows it's not at all arbitrary to look at relevance in this way and prediction in this way, because linear regression makes a lot of sense.
David Turkington: But more importantly, we get a ton of extra intuition here. We have the transparency of seeing how every historical experience contributes to the prediction that we're making today. You can't see that it's hidden in the linear regression when you focus on betas and perhaps even more importantly, we can understand the inner workings of a linear regression model and how it does something that's almost kind of silly. So what linear regression does is it pays as much attention to the most relevant experiences as it does to the least relevant experiences. It just predicts the opposite of what happens in the least relevant times. So, for example, if we go to an economist and say, How do you predict stocks during a recession? And the answer is, Well, I'm going to study the two historical periods that are as different from a recession as possible, extreme growth booms. And I'm just going to predict the opposite. Would you have confidence in that logic? Or if you go to your doctor and say, how is the medication going to affect me? And she says, Well, I studied two patients, a complete opposite from you. Different characteristics, different age, sex, health conditions, everything. I'm just going to predict the opposite. Well, no one reasons this way because it doesn't usually work. This is what I would call an extremely heroic extrapolation of opposite conditions, and it's usually not a good idea.
David Turkington: That's the assumption of linearity. So what we want to do is perhaps you want to use this notion of relevance, but just focus on a subset of the most relevant experiences. And then what you can do is form your prediction just based on those. But the question arises, is that a good idea? Do we expect that prediction to actually be better? Well, the typical answer to this is we can't know that until after we make a ton of actual predictions out of sample and we evaluate the model after the fact. But with the transparency of knowing how each historical observation contributes to a prediction, we can actually tell in advance what confidence we should assign to each bet. And we do it using something called fit. So the idea here is we have all these historical experiences where we know what happened to the Y variable and we know how relevant they are, and that's our weights. Well, in the most important, most relevant experiences, did the outcomes agree? If they do, on average, we should have a lot of confidence. It's like going to a bunch of friends and asking them to predict something like the price of a bottle of wine. And if your friends who know the most about wine, who are the most relevant, all agree with each other, then that prediction has a high confidence. Right. But if you ask your friends about some obscure French wine and the people who know the most disagree violently, then you're going to be really uncertain about whatever prediction you get.
David Turkington: So you can look at this as the pairwise alignment of what actually happens during the most relevant periods. Are those things aligned? If so, then you have a good prediction today. Another way to look at it is what's the correlation of relevance and outcomes? And you can only look at that if you know what the relevance of the weights are. So the point to emphasize here is that this is the confidence in a specific prediction task. It's not the confidence on average of a model. If you take the average of this, you actually get the R squared. But as Mark said, some predictions are very high quality, some are very low quality. This just shows what I said, that on average, if you take a weighted average of fits, you're going to get the traditional R squared. This also shows you that this measure is not arbitrary. It converges to what we look at, but it gives you much more insight into the inner workings of which predictions are good and which are bad. Now, this brings us to situational learning. And the idea here is, as Mark said, to use this measure of FIT as a guide to select variables and select subsets of observations for every prediction we make. And we might want to use different variables some of the time than the others.
David Turkington: And when we use different degrees of focus sometimes than others, sometimes we might want to include everything, sometimes only a tiny fraction of the most relevant data. And it turns out that this inherent tradeoff of when you focus your attention on just a small set of data, you have more noise, right? You have fewer observations, you have more potential noise, but you also might have a tighter fit in that relevant subset. So it turns out that our measure of fit inherently weighs this trade off. It penalizes the small sample aspect of focusing narrowly on past experiences, but it captures the benefit that often happens when you focus narrowly, you actually get a much clearer relationship. So going back to the setup from earlier, a lasso regression and there are other techniques. It's a way of selecting variables, but it says you should always include certain ones and should always ignore other ones. We're saying that sometimes you should pay attention and sometimes you should switch to a different variable. Furthermore, something like kernel regression says we should always pay attention to a narrow set of the most similar experience. We're saying sometimes you should do that and sometimes you should cast your net broadly. So how does this work? We ran a simulation of a regime switching process, and this is something that is very common in financial data. When regimes switch abruptly, you might all of a sudden need to switch your focus.
David Turkington: So we simulated something with full knowledge of how it actually works. And then we saw if our technique could detect what was going on. And the way we did it was we have two regimes. One of them is very persistent, 80% persistent, the other is 60% persistent. And we have two groups of variables. So we have two sets of X variables that are correlated that have a high average of three and a variance of one. So they're giving you a strong signal and then they generate Y. So there's a beta of one and two that says in the first regime, these X variables produce the outcome that we want to predict. Then the whole situation flips in the second regime, the variables that were inactive before become the important ones. They determine y with coefficients of one and two, and the original variables contribute pure noise. So we do this simulation and we have a whole out of sample test and we say, Well, learning from the training data, the experience, can you pick the right variables to focus on at the right times and outperform a simple linear regression so it works? This is one example. It's not the only example, but it's one example of where this technique is really powerful. What happened was our predictions were about 90% correlated to the actual outcomes that happened from a regime model versus 50% or so for a simple linear regression. Oh, sorry, how do I go back?
Mark Kritzman: One The red one.
David Turkington: Okay. Thank you. And furthermore, we can measure the fit. And we found that the higher fit has actually lower prediction error out of sample. So we need to know when to be confident and it's warranted. Right? There's a negative correlation between fit and actual errors. So the most interesting thing to me is just to look at what the model is actually thinking because this is so transparent. The gray bars show that the model didn't have conviction one way or the other. It used all four variables and used the full size of the data. That's the height of the bars. But the light blue bars are when the model decided to focus on the first two variables. It thought those were the ones that probably determine the outcomes for Y, and most of the time it was right. And you'll see that a lot of the times it didn't include 100% of the observations that focused on about 50%, which is about the occurrence of regime one. Now what about regime two? Sometimes it was clear that you should ignore the first two variables and focus on the others. And so the dark bars show that and that regime doesn't happen so often. So you need a narrow set to find relevant experiences. And so you find about 20% of the data was retained in those instances. And below an orange you see the fit. So sometimes you're confident, sometimes you're not. The R squared of a model is the average of those. You need to know the difference of when you have a good prediction and when you have a bad prediction. So let's try to summarize here. We have these three key principles transparency, flexibility and being non arbitrary. Linear regression is not as transparent as it seems because you don't know how each observation contributes to your prediction, which is very important, and you don't know how much confidence you should have in today's prediction as opposed to tomorrow's.
David Turkington: It's not flexible. That's the main problem. It can only account for linear models and obviously so many of the challenges we face today are complex and non-linear. Linear regression shines in being non arbitrary. So in the extreme, if you include all the data points, we agree with linear regression to have more transparency machine learning. It has a shortcoming in transparency. That is a big complaint that you don't know why it's giving you the answer that it gives you. It's very flexible, but not in the important sense that it adapts to new unforeseen circumstances or abrupt regime shifts. It's also hard to escape the fact that machine learning is arbitrary. We're only guided by empirical efficacy and not by some core principles in most cases. So where do we come out on this? Well, we feel that relevance based prediction is extremely transparent. The way people think, the way people talk and defend their qualitative predictions are by appealing to similarity to past events. And that's exactly what you get out of this approach. Furthermore, you can see transparently the fit of each individual prediction and this tradeoff between noise and degree of focus on subsets of the data. It's very flexible. As we showed, it adapts to circumstances in many ways, and it's not arbitrary. It's underpinned by information theory. And this. Cohesion with traditional linear regression in the extreme, which we can then relax. So with that, we we would love to open it up, get some questions, get some discussion. Thank you very much.
Michael Metcalfe: Can I just say thank you for living up to my prediction? I think might the only prediction I've got right this year. So that's great. Okay, so I've got one question on the on the pad, but I'm very happy to take the first one from the audience, if there is one. We seem to have one here. Yeah. Bottom right?
Speaker4: Yeah. So I have this question since we are discussing about the relevance based and similarity based. Right. I heard that Markov chain, right? It says that. To predict the future. You don't need to know the past. All you need to know is the current state. So is that true? How is it? I have never seen you mentioning Markov chain in this whole thing. So second question is the Time series how effective it is for a time series analysis. And third is can you show some confusion matrix for your approach, for your methodology?
David Turkington: Yeah. So start starting at the end. There's definitely a lot of metrics we can show. We we're simplifying this example, but yeah, we could look at a lot more metrics of correct predictions and when it fails and do that comparison, I think that maybe to the broader point, if I understood that the thrust at the beginning of your question is you don't if you know the current state, you can predict forward, that would depend on having some robust model of how the system works. So if you have a model and we're making this distinction between model based predictions and model free predictions, if you have a model and it's correct, then that's great. I think a lot of the time the system is so complicated, we don't really know what the true model is. And so relying on past experience to interpolate it or try to approximate what's going to go on is is a good approach. So I think the reality is a lot of times we don't know the correct model. So it makes sense to take an empirical approach. Mark And if you wanna add to that.
Mark Kritzman: Just the point that this doesn't have any temporal dependency, this approach, it works as well across actually as it does with Time series data.
Michael Metcalfe: Um, I've got a couple of questions here, but again, only from the floor. Okay, let me let me do this, actually. So this one, I don't know what I think. It came through about half way through the talk, and it's actually the exact same question we got in Melbourne about three weeks ago. How did you get this far without mentioning Baz? So we had the same question.
David Turkington: Yeah. Bayes Bayesian? Yeah. Do you want to address that or. No. So, yeah, obviously, you know, with Bayesian reasoning, you have to have some kind of a prior that comes from something other than the data. Maybe it comes from different data, maybe it comes from some theoretical basis, and then you update your predictions or views of of what's going to happen based on the data that you see. You know that spiritually, I think what we're talking about learning from data in a principled way, I think it's related. But we're starting from a different baseline, I think, and we're trying to go back to to first principles with it. So we're not I'm not sure yet if there's a direct relationship with Bayesian models. And if you could derive this from a Bayesian perspective, I think there probably is a tie there. But, you know, we're trying to boil this down to something very simple based on information theory and how we should go about extrapolating from the past. So it's obviously Bayesian is a whole other framework of thinking. And as we've shown here, I think we've already taken a very different perspective than classically how regression is used and interpreted. And to me, a Bayesian approach is yet another lens, another perspective on the problem. So does it reconcile, does it does it unite to this? I suspect that in some sense it does, but it's not something we fully fleshed out.
Mark Kritzman: Yeah, I think I would just say that they don't necessarily contradict one another.
Michael Metcalfe: Right, Right. Okay, So last one in that case. How does this relate to generalized method of moments? Would this outperform a regime switching model if it were estimated with simulation data?
Mark Kritzman: So as I understand, generalized method of moments, it's a it's a it's an approach for coming up with coefficients that are beyond the reach of ordinary least squares. Right. And our approach doesn't even purport to come up with coefficients. There's no. Model there. It's simply taking a weighted average of past outcomes. The main difference with. Colonel regression is that the weights here are derived in this mathematically cohesive way based on some pretty defendable theories. So I guess the short answer is there's no connection.
Michael Metcalfe: Okay, Well. Well, look, on that note, please join me and thank both Mark and Dave.
State Street LIVE: Research Retreat offers a wide range of academic expertise and timely market insights.
Mark Kritzman, Chief Executive Officer, Windham Capital Management, founding academic partner of State Street Associates and faculty member at MIT Sloan School of Management, and David Turkington, Head of State Street Associates, describe a new approach to forming predictions based on the concept of statistical relevance. This new approach addresses complexities that are beyond the capacity of linear regression analysis in a way that is theoretically grounded and mathematically unified, unlike many of the heuristics that have been proposed to enhance linear regression analysis. Additionally, this presentation compares relevance-based predictions to machine learning algorithms, with a focus on how these two approaches might complement one another.