Insights

Value from data: Which variables matter?

Value from data: which variables matter?

As the race to design sophisticated data analytics continues, we show why relevance-based prediction offers an ideal way to measure the importance of an input variable to a prediction.

February 2025

T-statistics act as a hallmark for rigor by pinpointing the effect of a single variable and distinguishing signal from noise. However, they have significant limitations:

  1. T-stats do not capture “shared” information
  2. T-stats are not prediction-specific
  3. T-stats only consider linear relationships

In our recent paper, we introduce an alternative method, called Relevance-Based Importance (RBI), which measures the importance of every variable to the reliability of every individual prediction. RBI recognizes that it is almost never the case that a variable is always important, or that it is never important. Rather, it's more likely that variables are sometimes important, depending on the circumstance. We show that RBI brings the virtues of t-statistics, but also adapts to each unique situation, making it robust to complexities where t-stats fall short.


Key highlights

Figure 1 variable importance (rbi) by prediction out of sample q1 2005 q4 2023

We have written before about Relevance-Based Prediction (RBP), an extremely transparent and interpretable technique that forms predictions as weighted averages of the past, where the weights carefully consider which variables and experiences to use for each task. It turns out that this RBP “thought process” also gives a measure of how important every variable is to the reliability of every individual prediction task, which we call Relevance-Based Importance (RBI).

It is almost never the case that a variable is always important, or that it is never important – though this is what a summary t-statistic might lead us to believe. More likely, variables are sometimes important, and sometimes not, depending on the circumstance. Consider how inflation suddenly arose as a major factor in the aftermath of COVID-19 pandemic following decades of status quo. Or how rain on a summer day makes the temperature irrelevant when we decide whether to go to the beach. RBI reveals not only the importance of a variable given current circumstances, but also its signal to noise contribution. Thus, RBI brings the virtues of t-statistics but also adapts to each unique situation.

When there is no complexity, an average of RBI across all prediction tasks delivers the same information as a t-statistic. But when variables are correlated, relationships shift, and we care about explaining individual predictions, simple t-stats fall short and they can be misleading.

Knowing when and how a variable matters makes data more useful, and more interesting too.
 

Share

Stay updated

Please send me State Street’s latest Insights.