The InterModel Vigorish (IMV)
What is the IMV?
When two models both predict the same binary outcome, how much better is one than the other — and does that difference actually matter?
Standard metrics like R², AUC, and the F₁ score can answer versions of this question, but they have a shared limitation: their values depend on the baseline difficulty of the prediction problem. An improvement of 0.03 in AUC means something very different when predicting a rare event (prevalence = 2%) versus a common one (prevalence = 50%). This makes it hard to compare model improvements across different datasets or outcomes.
The InterModel Vigorish (IMV) is designed to fix this. It is a metric for quantifying the change in predictive accuracy between two models — a baseline and an enhanced prediction — in a way that is portable (comparable across outcomes with different prevalences) and intuitive (grounded in a concrete physical analogy).
The Weighted Coin Analogy
The IMV is built on an analogy to weighted coins.
Any predictive system that assigns probabilities to binary outcomes can be mapped to an equivalent weighted coin — a physical object whose bias exactly matches the average uncertainty in those predictions. When you have two such systems, you can ask: by how much does the enhanced model’s coin outperform the baseline model’s coin in a single-blind bet?
More formally, the IMV is the expected proportional winnings from betting according to the enhanced model’s probabilities when the baseline model sets the odds. A positive IMV means the enhanced model provides genuine predictive value beyond the baseline; a negative IMV signals overfitting or model misspecification.
This framing has a key consequence: because the baseline model defines the bet, the IMV is always a statement about relative improvement, not absolute accuracy. The same absolute improvement in log-likelihood will yield a larger IMV when the baseline outcome is highly uncertain (prevalence near 0.5) than when it is already predictable — which is exactly the right behavior.
Try it yourself
Two weighted coins are flipped 20 times each. You see the outcomes but not the true weights. Enter your best guesses for each coin’s probability of heads — these define your enhanced model. The baseline model uses only the overall average (treating both coins identically). The IMV then measures how much your guesses improve on that baseline.
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 680
library(shiny)
# ── IMV helpers ──────────────────────────────────────────────────────────────
get_coin_weight <- function(avg_ll, sigma = 1e-6) {
target <- log(avg_ll)
# The entropy H(w) = w*log(w) + (1-w)*log(1-w) has a minimum of log(0.5) at w=0.5.
# If target is below this floor (with small tolerance for floating point),
# there is no solution — predictions are worse than a fair coin.
tol <- 1e-9
if (target < log(0.5) - tol) stop("implausible")
# Boundary case: target is at or very near the floor — coin weight is 0.5
if (target <= log(0.5) + tol) return(0.5)
f <- function(w) w * log(w) + (1 - w) * log(1 - w) - target
uniroot(f, lower = 0.5 + sigma, upper = 1 - sigma)$root
}
avg_ll <- function(y, p, sigma = 1e-4) {
p <- pmin(pmax(p, sigma), 1 - sigma)
exp(mean(y * log(p) + (1 - y) * log(1 - p)))
}
imv_coins <- function(y, p_baseline, p_enhanced) {
ll0 <- avg_ll(y, p_baseline)
ll1 <- avg_ll(y, p_enhanced)
w0 <- tryCatch(get_coin_weight(ll0), error = function(e) NA)
w1 <- tryCatch(get_coin_weight(ll1), error = function(e) NA)
list(w0 = w0, w1 = w1, imv = if (!is.na(w0) && !is.na(w1)) (w1 - w0) / w0 else NA)
}
# ── UI ───────────────────────────────────────────────────────────────────────
ui <- fluidPage(
tags$head(tags$style(HTML("
body { font-family: 'Georgia', serif; background: #fafaf8; }
.well { background: #f0ede6; border: none; border-radius: 8px; }
.flip-row { font-size: 1.3em; letter-spacing: 3px; margin: 4px 0; }
.coin-label { font-weight: bold; font-size: 1.05em; margin-top: 10px; }
.result-box { background: #fff; border: 1px solid #ddd; border-radius: 8px;
padding: 16px 20px; margin-top: 12px; }
.result-box h4 { margin-top: 0; }
.imv-positive { color: #2a7d2e; font-weight: bold; }
.imv-negative { color: #b33a1e; font-weight: bold; }
.imv-zero { color: #666; font-weight: bold; }
.reveal-box { background: #f5f0e8; border-left: 4px solid #8b6914;
padding: 12px 16px; border-radius: 4px; margin-top: 10px; }
.error-box { background: #fff5f5; border: 1px solid #f5c6cb; border-radius: 8px;
padding: 16px 20px; margin-top: 12px; }
.error-box h4 { margin-top: 0; color: #b33a1e; }
hr.thin { border-top: 1px solid #ddd; margin: 14px 0; }
.explainer-inner { background: #fafafa; border: 1px solid #e8e8e8;
border-radius: 6px; padding: 14px 16px; font-size: 0.88em; }
.stat-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 8px; margin: 10px 0; }
.stat-card { background: #f0ede6; border-radius: 6px; padding: 8px 10px; }
.stat-label { font-size: 0.78em; color: #666; margin-bottom: 2px; }
.stat-val { font-size: 1.1em; font-weight: bold; }
.val-ok { color: #2a7d2e; }
.val-fail { color: #b33a1e; }
.curve-wrap { position: relative; width: 100%; height: 200px; margin: 10px 0 4px; }
"))),
titlePanel(NULL),
sidebarLayout(
sidebarPanel(width = 4,
actionButton("new_game", "🎲 New coins", class = "btn-primary btn-block",
style = "margin-bottom:14px;"),
tags$div(class = "coin-label", "Coin 1 outcomes:"),
uiOutput("flips1_ui"),
tags$div(class = "coin-label", style = "margin-top:10px;", "Coin 2 outcomes:"),
uiOutput("flips2_ui"),
tags$hr(class = "thin"),
tags$p("Enter your probability guesses:"),
sliderInput("g1", "Your guess for Coin 1 (p₁):",
min = 0.01, max = 0.99, value = 0.5, step = 0.01),
sliderInput("g2", "Your guess for Coin 2 (p₂):",
min = 0.01, max = 0.99, value = 0.5, step = 0.01),
actionButton("submit", "⚖️ Compute IMV", class = "btn-success btn-block",
style = "margin-top:6px;")
),
mainPanel(width = 8,
uiOutput("results_ui")
)
)
)
# ── Server ───────────────────────────────────────────────────────────────────
server <- function(input, output, session) {
# Reactive game state
game <- reactiveValues(
p1 = NULL, p2 = NULL,
flips1 = NULL, flips2 = NULL,
submitted = FALSE
)
# Generate new coins on button press (and on startup)
observeEvent(input$new_game, {
game$p1 <- runif(1, 0.3, 0.85)
game$p2 <- runif(1, 0.3, 0.85)
game$flips1 <- rbinom(20, 1, game$p1)
game$flips2 <- rbinom(20, 1, game$p2)
game$submitted <- FALSE
updateSliderInput(session, "g1", value = 0.5)
updateSliderInput(session, "g2", value = 0.5)
}, ignoreNULL = FALSE) # run on startup too
# Show flip sequences
fmt_flips <- function(flips) {
paste(ifelse(flips == 1, "H", "T"), collapse = " ")
}
output$flips1_ui <- renderUI({
req(game$flips1)
heads <- sum(game$flips1)
tags$div(
tags$div(class = "flip-row", fmt_flips(game$flips1)),
tags$small(style = "color:#555;",
sprintf("(%d heads, %d tails out of 20)", heads, 20 - heads))
)
})
output$flips2_ui <- renderUI({
req(game$flips2)
heads <- sum(game$flips2)
tags$div(
tags$div(class = "flip-row", fmt_flips(game$flips2)),
tags$small(style = "color:#555;",
sprintf("(%d heads, %d tails out of 20)", heads, 20 - heads))
)
})
# On submit: compute and display results
observeEvent(input$submit, {
game$submitted <- TRUE
})
output$results_ui <- renderUI({
if (!game$submitted) {
return(tags$div(
style = "color:#888; margin-top:30px; font-style:italic;",
"Adjust your probability guesses using the sliders, then click",
tags$strong("Compute IMV"), "to see your results."
))
}
req(game$flips1, game$flips2)
# Build outcome and prediction vectors over both coins combined
y_all <- c(game$flips1, game$flips2)
coin_id <- c(rep(1, 20), rep(2, 20))
# Baseline: same probability for every flip = overall mean
p_base <- mean(y_all)
p_baseline_vec <- rep(p_base, 40)
# Enhanced: user's guesses per coin
p_enhanced_vec <- ifelse(coin_id == 1, input$g1, input$g2)
res <- imv_coins(y_all, p_baseline_vec, p_enhanced_vec)
# Check which model(s) failed
baseline_failed <- is.na(res$w0)
enhanced_failed <- is.na(res$w1)
if (baseline_failed || enhanced_failed) {
# avg_ll() returns exp(mean log-likelihood), so log() recovers mean log-likelihood
ll_enhanced <- log(avg_ll(y_all, p_enhanced_vec))
ll_baseline <- log(avg_ll(y_all, p_baseline_vec))
log2_floor <- log(0.5)
gap_enh <- ll_enhanced - log2_floor
gap_base <- ll_baseline - log2_floor
failed_model <- if (baseline_failed && enhanced_failed) {
"both the baseline and your guesses are"
} else if (baseline_failed) {
"the baseline model is"
} else {
"your guesses are"
}
# Pass values into JS via data attributes on a hidden div
return(tagList(
tags$div(class = "error-box",
tags$h4("\u26a0\ufe0f Predictions too far from the data"),
tags$p(
"The IMV cannot be computed because ", failed_model,
" so inconsistent with the observed flips that they perform",
" worse than a fair coin (p\u00a0=\u00a00.5). In the weighted-coin",
" framework, no valid coin weight exists for predictions this poor."
),
tags$p("Try adjusting your guesses to be closer to the observed",
" proportion of heads for each coin."),
tags$div(class = "reveal-box",
tags$strong("Observed proportions:"),
tags$br(),
sprintf("Coin 1: %.2f heads (your guess: %.2f)",
mean(game$flips1), input$g1),
tags$br(),
sprintf("Coin 2: %.2f heads (your guess: %.2f)",
mean(game$flips2), input$g2)
),
tags$div(class = "explainer-inner", style = "margin-top: 14px;",
tags$p(
"The IMV maps every set of predictions to an equivalent weighted coin",
" via the bijection: find w \u2208 (0.5, 1) such that its Bernoulli entropy",
" H(w)\u00a0=\u00a0w\u00a0log\u00a0w\u00a0+\u00a0(1\u2212w)\u00a0log(1\u2212w)",
" equals your average log-likelihood \u2113\u0304.",
" But H(w) has a floor at H(0.5)\u00a0=\u00a0\u2212log\u00a02\u00a0\u2248\u00a0\u22120.693 —",
" the entropy of a perfectly fair coin.",
" If your \u2113\u0304 falls below this floor, the equation has no solution.",
" Your predictions are so confidently wrong they carry",
" less information than a fair coin flip."
),
tags$div(class = "stat-grid",
tags$div(class = "stat-card",
tags$div(class = "stat-label", "Your avg log-likelihood \u2113\u0304"),
tags$div(class = paste("stat-val", if (enhanced_failed) "val-fail" else "val-ok"),
sprintf("%.4f", ll_enhanced))
),
tags$div(class = "stat-card",
tags$div(class = "stat-label", "Feasibility floor \u2212log\u00a02"),
tags$div(class = "stat-val", "\u22120.6931")
),
tags$div(class = "stat-card",
tags$div(class = "stat-label", "Gap to floor"),
tags$div(class = paste("stat-val", if (enhanced_failed) "val-fail" else "val-ok"),
sprintf("%+.4f", gap_enh))
),
tags$div(class = "stat-card",
tags$div(class = "stat-label", "Baseline \u2113\u0304"),
tags$div(class = paste("stat-val", if (baseline_failed) "val-fail" else "val-ok"),
sprintf("%.4f", ll_baseline))
)
),
tags$p(style = "font-size:0.85em; color:#555; margin: 8px 0 4px;",
"The curve below shows H(w) — the Bernoulli entropy as a function of coin weight w.",
" The solid black line is the feasibility floor \u2212log\u00a02.",
" The red dashed line is your average log-likelihood \u2113\u0304.",
" A coin weight exists only when the red line meets or crosses the curve."),
tags$div(class = "curve-wrap",
tags$script(src = "https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.js"),
tags$canvas(id = "entropy-curve",
`aria-label` = "Entropy curve showing feasibility floor and current log-likelihood")
),
# Inline script — polls until Chart.js ready then draws
tags$script(HTML(sprintf("
(function() {
var LOG2 = Math.log(0.5);
var ll_enh = %f;
var ll_base = %f;
var failed_enh = %s;
var failed_base = %s;
function entropy(w) {
return w * Math.log(w) + (1 - w) * Math.log(1 - w);
}
function coinWeight(ll) {
if (ll < LOG2) return null;
var f = function(w) { return entropy(w) - ll; };
var lo = 0.5, hi = 1 - 1e-9;
for (var i = 0; i < 60; i++) {
var mid = (lo + hi) / 2;
if (f(mid) < 0) hi = mid; else lo = mid;
}
return (lo + hi) / 2;
}
function drawChart() {
var canvas = document.getElementById('entropy-curve');
if (!canvas || typeof Chart === 'undefined') {
setTimeout(drawChart, 50);
return;
}
var wPts = [], hPts = [];
for (var i = 0; i <= 200; i++) {
var w = 0.5 + (i / 200) * 0.499;
wPts.push(w.toFixed(3));
hPts.push(entropy(w));
}
var yMin = Math.min(-2.6, ll_enh - 0.3, ll_base - 0.1);
var datasets = [
{ label: 'Floor: \u2212log 2',
data: wPts.map(function() { return LOG2; }),
borderColor: '#111', borderWidth: 2, borderDash: [],
pointRadius: 0, fill: false },
{ label: 'H(w) \u2014 Bernoulli entropy curve', data: hPts,
borderColor: '#185fa5', borderWidth: 2,
pointRadius: 0, tension: 0.3, fill: false },
{ label: 'Your \u2113\u0304 (avg log-likelihood)',
data: wPts.map(function() { return ll_enh; }),
borderColor: '#a32d2d',
borderWidth: 2.5, borderDash: [6,4], pointRadius: 0, fill: false }
];
var w_enh = coinWeight(ll_enh);
if (w_enh !== null) {
datasets.push({
label: '\u03c9', data: [{ x: w_enh.toFixed(3), y: entropy(w_enh) }],
borderColor: 'transparent', backgroundColor: '#2a7d2e',
pointRadius: 7, showLine: false
});
}
new Chart(canvas.getContext('2d'), {
type: 'line',
data: { labels: wPts, datasets: datasets },
options: {
responsive: true, maintainAspectRatio: false, animation: false,
plugins: { legend: { display: false }, tooltip: { enabled: false } },
scales: {
x: { title: { display: true, text: 'Coin weight w', font: { size: 11 } },
ticks: { maxTicksLimit: 6,
callback: function(v,i) { return i%%40===0 ? parseFloat(wPts[i]).toFixed(2) : ''; } },
grid: { color: 'rgba(0,0,0,0.06)' } },
y: { title: { display: true, text: 'H(w)', font: { size: 11 } },
min: yMin, max: 0.05,
ticks: { callback: function(v) { return v.toFixed(1); } },
grid: { color: 'rgba(0,0,0,0.06)' } }
}
}
});
}
drawChart();
})();
", ll_enhanced, ll_baseline,
tolower(as.character(enhanced_failed)),
tolower(as.character(baseline_failed))
)))
) # explainer-inner
) # error-box
)) # tagList
} # if failed
imv_val <- res$imv
imv_class <- if (imv_val > 0.005) "imv-positive" else
if (imv_val < -0.005) "imv-negative" else "imv-zero"
imv_interp <- if (imv_val > 0.005)
"Your guesses improve on the baseline — you captured something real about the two coins."
else if (imv_val < -0.005)
"Your guesses perform worse than just using the overall average. The baseline beats you here."
else
"Your guesses and the baseline perform about the same — not much signal picked up."
tagList(
tags$div(class = "result-box",
tags$h4("📊 Results"),
tags$p(tags$strong("Baseline model:"),
sprintf("treats both coins the same (p = %.3f, implied coin weight = %.4f)",
p_base, res$w0)),
tags$p(tags$strong("Your model:"),
sprintf("Coin 1 = %.2f, Coin 2 = %.2f → implied coin weight = %.4f",
input$g1, input$g2, res$w1)),
tags$hr(class = "thin"),
tags$p(
tags$strong("IMV = "),
tags$span(class = imv_class,
sprintf("%.4f", imv_val))
),
tags$p(style = "color:#444; font-style:italic;", imv_interp),
tags$hr(class = "thin"),
tags$div(class = "reveal-box",
tags$strong("🔍 True coin weights:"),
tags$br(),
sprintf("Coin 1: p₁ = %.3f (you guessed %.2f, sample proportion = %.2f)",
game$p1, input$g1, mean(game$flips1)),
tags$br(),
sprintf("Coin 2: p₂ = %.3f (you guessed %.2f, sample proportion = %.2f)",
game$p2, input$g2, mean(game$flips2))
)
)
)
})
}
shinyApp(ui, server)
The IMV Structure
Every IMV calculation involves exactly two components: a baseline model and an enhanced model. The baseline is the simpler prediction — often just the outcome prevalence, or a model without a key predictor. The enhanced model is the one being evaluated. Both are translated into equivalent weighted coins, and the IMV is the proportional gain from betting with the enhanced coin when the baseline coin sets the odds.
The IMV is then \((w_1 - w_0) / w_0\) — the proportional gain. A positive value means the enhanced model’s coin is heavier (more informative); a negative value means the enhanced model is actually worse than the baseline, typically a sign of overfitting.
Why not R² or AUC?
The portability of the IMV comes from the fact that it is always measuring the same thing: reduction in outcome uncertainty. When the outcome prevalence is near 50%, there is substantial uncertainty in the outcome and a good model can yield a large IMV. When prevalence is extreme — near 0% or 100% — there is almost no uncertainty left to reduce, so even a perfect model yields a small IMV. This is the right behavior: the IMV is small when there is little to predict.
Standard metrics like R² and AUC do not behave this way, and their deviations from this pattern can be misleading. To see the difference, consider the logistic regression model:
\[\Pr(y=1) = \frac{1}{1 + \exp(-(b_0 + b_1 x))}\]
Here \(x\) is a predictor, \(b_1\) controls the strength of the relationship (the signal), and \(b_0\) is the intercept. Changing \(b_0\) shifts the outcome prevalence without changing how informative \(x\) is — the signal is the same, only the baseline uncertainty changes.
The simulation below fixes \(b_1\) and sweeps \(b_0\) from −3 to 3, computing IMV, McFadden’s R², and AUC at each point. The baseline model is the prevalence alone; the enhanced model is the fitted logistic regression. Each metric is averaged over 20 replications (N = 2,000 each) per \(b_0\) value. The x-axis shows both \(b_0\) and the implied prevalence.
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 480
library(shiny)
sigmoid <- function(x) 1 / (1 + exp(-x))
get_coin_weight <- function(avg_ll_geom, sigma = 1e-6) {
target <- log(avg_ll_geom)
tol <- 1e-9
if (target < log(0.5) - tol) return(NA)
if (target <= log(0.5) + tol) return(0.5)
f <- function(w) w * log(w) + (1 - w) * log(1 - w) - target
tryCatch(uniroot(f, lower = 0.5 + sigma, upper = 1 - sigma)$root,
error = function(e) NA)
}
avg_ll_geom <- function(y, p, sigma = 1e-4) {
p <- pmin(pmax(p, sigma), 1 - sigma)
exp(mean(y * log(p) + (1 - y) * log(1 - p)))
}
compute_one <- function(b0, b1, N = 2000, n_rep = 20) {
imv_v <- r2_v <- auc_v <- numeric(n_rep)
for (i in seq_len(n_rep)) {
x <- rnorm(N); p_true <- sigmoid(b0 + b1 * x)
y <- rbinom(N, 1, p_true)
x2 <- rnorm(N); p2_true <- sigmoid(b0 + b1 * x2)
y2 <- rbinom(N, 1, p2_true)
fit <- glm(y ~ x, family = binomial)
p_hat <- sigmoid(coef(fit)[1] + coef(fit)[2] * x2)
prev <- mean(y)
# IMV
w0 <- get_coin_weight(avg_ll_geom(y2, rep(prev, N)))
w1 <- get_coin_weight(avg_ll_geom(y2, p_hat))
imv_v[i] <- if (!is.na(w0) && !is.na(w1) && w0 > 0) (w1 - w0) / w0 else NA
# McFadden R2
ll_null <- sum(dbinom(y2, 1, prev, log = TRUE))
ll_full <- sum(dbinom(y2, 1, p_hat, log = TRUE))
r2_v[i] <- 1 - ll_full / ll_null
# AUC
pos <- p_hat[y2 == 1]; neg <- p_hat[y2 == 0]
auc_v[i] <- mean(outer(pos, neg, ">")) + 0.5 * mean(outer(pos, neg, "=="))
}
c(imv = mean(imv_v, na.rm = TRUE),
r2 = mean(r2_v, na.rm = TRUE),
auc = mean(auc_v, na.rm = TRUE),
prev = sigmoid(b0)) # prevalence at x=0 (mean of x dist)
}
ui <- fluidPage(
tags$head(tags$style(HTML("
body { font-family: Georgia, serif; background: #fafaf8; }
.well { background: #f0ede6; border: none; border-radius: 8px; }
"))),
sidebarLayout(
sidebarPanel(width = 3,
sliderInput("b1", HTML("Slope b<sub>1</sub><br><small>(signal strength — held fixed)</small>"),
min = 0.3, max = 2.0, value = 1.0, step = 0.1),
tags$hr(),
actionButton("run", "▶ Run simulation", class = "btn-primary btn-block"),
tags$p(style = "font-size:0.8em; color:#888; margin-top:8px;",
"Sweeps b₀ from −3 to 3 in steps of 0.25.",
" Each point averages 20 replications (N = 2,000).",
" Takes ~20 seconds.")
),
mainPanel(width = 9,
plotOutput("sweep_plot", height = "340px")
)
)
)
server <- function(input, output, session) {
sweep <- reactiveVal(NULL)
observeEvent(input$run, {
b1 <- input$b1
b0_seq <- seq(-3, 3, by = 0.25)
withProgress(message = "Sweeping b₀...", value = 0, {
res <- t(sapply(seq_along(b0_seq), function(i) {
incProgress(1/length(b0_seq),
detail = sprintf("b₀ = %.2f (prevalence ≈ %.0f%%)",
b0_seq[i], sigmoid(b0_seq[i]) * 100))
compute_one(b0_seq[i], b1)
}))
sweep(data.frame(b0 = b0_seq, res))
})
})
output$sweep_plot <- renderPlot({
req(sweep())
d <- sweep()
# x-axis labels: b0 on top, prevalence on bottom
b0_ticks <- seq(-3, 3, by = 1)
prev_ticks <- round(sigmoid(b0_ticks) * 100)
par(mfrow = c(1, 3), mar = c(5, 4, 6, 1), family = "serif")
plot_panel <- function(y, ylab, col, main) {
plot(d$b0, y, type = "l", lwd = 2.5, col = col,
xlab = "", ylab = ylab, main = main,
xaxt = "n", cex.main = 1.05, cex.lab = 0.95,
panel.first = abline(h = mean(y, na.rm=TRUE),
lty = 3, col = "gray70"))
abline(h = 0, col = "gray80")
# bottom axis: b0 values
axis(1, at = b0_ticks, labels = b0_ticks, cex.axis = 0.85)
mtext(expression(b[0]), side = 1, line = 2.2, cex = 0.85)
# top axis: prevalence
axis(3, at = b0_ticks,
labels = paste0(prev_ticks, "%"), cex.axis = 0.8, col.axis = "gray40")
mtext("Prevalence", side = 3, line = 3.5, cex = 0.8, col = "gray40")
}
plot_panel(d$imv, "IMV", "#2a7d2e", paste0("IMV (b₁ = ", input$b1, ")"))
plot_panel(d$r2, "McFadden R²", "#b33a1e", paste0("McFadden R² (b₁ = ", input$b1, ")"))
plot_panel(d$auc, "AUC", "#185fa5", paste0("AUC (b₁ = ", input$b1, ")"))
}, res = 96)
}
shinyApp(ui, server)
All three metrics vary with prevalence — but for different reasons and with different implications:
- IMV peaks near 50% prevalence and tapers toward zero at the extremes. This reflects the structure of outcome uncertainty: when the outcome is nearly certain (prevalence near 0% or 100%), there is very little uncertainty to reduce and even an excellent model earns a small IMV. This is the desired behavior — the IMV is measuring something real.
- McFadden R² also peaks near 50% but collapses much more sharply at extreme prevalences, understating model quality in a way that is hard to interpret. At very low or high prevalence it can suggest a model is nearly worthless even when it is predicting as well as the data allow.
- AUC is more stable across prevalences but can remain high at extreme prevalences for a misleading reason: when nearly all outcomes are the same value, ranking predictions correctly is easy even for an uninformative model.
The dotted horizontal line in each panel marks the metric’s mean across the sweep. The key comparison is not the shape of any single curve but what happens when you try to compare two datasets with different prevalences: an IMV of 0.05 means the same thing regardless of the dataset’s prevalence; an R² of 0.05 does not.
The interactive examples below are organized into two groups. If you are new to the IMV, we recommend working through the binary prediction examples first.
Binary Prediction
These three examples all use logistic regression and are designed to build intuition progressively.
IRT Examples
These examples apply the IMV to item response theory models, where the quantity of interest is how well an IRT model predicts item responses relative to a simpler baseline.
Computational Examples
These examples show how to use the imv R package to compute IMV values in real-life data analysis scenarios. The package is available on CRAN and can be installed with install.packages("imv"). Each example uses a publicly available dataset and walks through model fitting, IMV computation, and interpretation of results.
How to Interpret IMV Values
Because the IMV is defined relative to a baseline, its absolute magnitude depends on the comparison being made. Some reference points from published work:
- IMV ≈ 0: The enhanced model offers no predictive improvement over the baseline. This can indicate either that the extra predictors are uninformative or, in small samples, that the model is overfitting.
- IMV > 0, small (e.g., 0.001–0.01): Modest but potentially meaningful improvement. Typical range for comparing closely related IRT models (e.g., 2PL vs. 3PL, or 1PL vs. 2PL on well-behaved items).
- IMV > 0, moderate (e.g., 0.01–0.05): Substantively meaningful improvement. Typical range when a new predictor explains a real portion of variance.
- IMV < 0: The enhanced model performs worse than the baseline on new data. A diagnostic flag for overfitting or model misspecification.
These benchmarks are discussed in more depth, with simulation-based reference distributions, in the Psychometrika paper below.
References
Contact & Computing Resources
Additional code and computing resources are available on GitHub.