IRT versus Cognitive Diagnostic Models
Cognitive Diagnostic Models (CDMs) classify respondents into discrete latent attribute profiles — binary mastery/non-mastery states for each of \(K\) skills — and predict item responses from those profiles. Item Response Theory (IRT) instead places respondents on a continuous latent trait \(\theta\). When the true data-generating process is a CDM, which model predicts better?
The answer depends on the correlation among attributes. When skills are independent, a respondent’s profile on each attribute carries unique information that a unidimensional score cannot capture, and the CDM has a genuine advantage. As skills become more correlated, the attribute profiles increasingly collapse toward a single dimension, and IRT becomes competitive.
Simulation design. Data are generated from a DINA model (Deterministic Input, Noisy And-gate) with \(K\) skills and \(J\) items structured according to a Q-matrix. Slip and guess parameters are sampled independently for each item from Uniform(0, 0.3). Skills are drawn from a multivariate normal and thresholded to binary, with the inter-skill correlation \(\rho\) controlling how tightly the skills co-vary. At \(\rho = 0\) the skills are independent; at \(\rho = 0.7\) they are strongly correlated. The simulation sweeps across a grid of evenly-spaced \(\rho\) values between these extremes — more grid points produce a smoother curve but take longer to run.
Evaluation scheme. To ensure all IMV comparisons are out-of-sample, three independent datasets are used:
- Training set (\(N\) persons): used to fit the 2PL IRT model and estimate item parameters (discrimination and difficulty).
- Test set (200 persons): used to estimate person-level latent variables — IRT \(\hat\theta\) via MLE and CDM attribute profiles via MAP — applying the item parameters from step 1.
- Evaluation set (200 persons, same attribute profiles as the test set, independent response draws): all IMV quantities are scored here. Because the evaluation responses are independent of the data used in steps 1 and 2, MAP and MLE cannot inflate their own scores by fitting to the evaluation outcomes.
The three IMV quantities reported are:
- IMV(IRT, CDM) — positive values mean CDM predicts better; negative means IRT predicts better
- IMV(IRT, True) — oracle gap for IRT: how far fitted IRT predictions are from the true generating probabilities
- IMV(CDM, True) — oracle gap for CDM: how far MAP-based CDM predictions are from the true probabilities
Both oracle gaps should be non-negative: no fitted model should systematically beat the true probabilities on held-out data.
Based on work by Ma, Liu, Kanopka, Ma, & Domingue (2025).
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 1100
webr::install("ltm", repos = c("https://cran.r-universe.dev", "https://repo.r-wasm.org"))
webr::install("mvtnorm", repos = c("https://repo.r-wasm.org"))
library(shiny)
library(ltm)
library(mvtnorm)
# ---- IMV ----------------------------------------------------------------
imv.binary <- function(y, p1, p2, sigma = 1e-4) {
p1 <- pmin(pmax(p1, sigma), 1 - sigma)
p2 <- pmin(pmax(p2, sigma), 1 - sigma)
ll <- function(x, p) exp(sum(log(p) * x + log(1 - p) * (1 - x)) / length(x))
coins <- function(a) nlminb(0.5, function(p, a) abs(p*log(p) + (1-p)*log(1-p) - log(a)),
lower = 1e-4, upper = 1 - 1e-4, a = a)$par
c1 <- coins(ll(y, p1)); c2 <- coins(ll(y, p2))
(c2 - c1) / c1
}
# ---- Build simulation parameters from a Q matrix ------------------------
make_params <- function(Q) {
K <- ncol(Q)
J <- nrow(Q)
slip <- runif(J, 0, 0.3)
guess <- runif(J, 0, 0.3)
all_profiles <- as.matrix(expand.grid(rep(list(0:1), K)))
n_prof <- nrow(all_profiles)
pp <- matrix(NA_real_, n_prof, J)
for (p in seq_len(n_prof))
for (j in seq_len(J)) {
req <- Q[j, ] == 1
eta <- if (!any(req)) 1 else prod(all_profiles[p, req])
pp[p, j] <- eta * (1 - slip[j]) + (1 - eta) * guess[j]
}
list(K = K, J = J, slip = slip, guess = guess, Q = Q,
all_profiles = all_profiles,
log_pp = log(pmax(pp, 1e-8)),
log_1mpp = log(pmax(1 - pp, 1e-8)),
pp = pp)
}
# ---- Random Q-matrix generator ------------------------------------------
gen_random_Q <- function(K, J, density) {
for (i in seq_len(200)) {
Q <- matrix(rbinom(J * K, 1L, density), nrow = J, ncol = K)
if (all(rowSums(Q) > 0) && all(colSums(Q) > 0)) return(Q)
}
# Fallback: guarantee one item per skill, fill remaining rows randomly
Q <- matrix(0L, J, K)
perm <- sample(K)
for (k in seq_len(K)) Q[k, perm[k]] <- 1L
if (J > K) {
for (j in (K + 1):J) {
q <- rbinom(K, 1L, density)
if (sum(q) == 0L) q[sample(K, 1L)] <- 1L
Q[j, ] <- q
}
}
Q
}
# ---- Custom Q-matrix parser ---------------------------------------------
parse_Q <- function(text) {
lines <- strsplit(trimws(text), "\n")[[1]]
lines <- trimws(lines)
lines <- lines[nchar(lines) > 0]
if (length(lines) == 0) return(list(Q = NULL, error = "Empty input."))
rows <- lapply(lines, function(l) {
parts <- strsplit(l, "[,[:space:]]+")[[1]]
parts <- parts[nchar(parts) > 0]
suppressWarnings(as.integer(parts))
})
lens <- lengths(rows)
if (length(unique(lens)) != 1)
return(list(Q = NULL,
error = sprintf("Rows have unequal lengths (%s).", paste(lens, collapse = ", "))))
if (unique(lens) < 1)
return(list(Q = NULL, error = "Could not parse any values."))
Q <- do.call(rbind, rows)
if (anyNA(Q))
return(list(Q = NULL, error = "Non-integer values found."))
if (!all(Q %in% c(0L, 1L)))
return(list(Q = NULL, error = "Matrix must contain only 0s and 1s."))
if (any(rowSums(Q) == 0))
return(list(Q = NULL, error = "Some rows are all zeros — each item must require at least one skill."))
if (any(colSums(Q) == 0))
return(list(Q = NULL, error = "Some columns are all zeros — each skill must appear in at least one item."))
if (nrow(Q) < 4)
return(list(Q = NULL, error = "Need at least 4 items (rows) for IRT estimation."))
list(Q = Q, error = NULL)
}
# ---- Simulation helpers -------------------------------------------------
gen_attrs <- function(N, rho, K) {
Sigma <- matrix(rho, K, K); diag(Sigma) <- 1
Z <- rmvnorm(N, sigma = Sigma)
(Z > 0) * 1L
}
dina_prob <- function(alpha, pr) {
N <- nrow(alpha)
probs <- matrix(NA_real_, N, pr$J)
for (j in seq_len(pr$J)) {
req <- pr$Q[j, ] == 1
eta <- if (!any(req)) rep(1, N) else apply(alpha[, req, drop = FALSE], 1, prod)
probs[, j] <- eta * (1 - pr$slip[j]) + (1 - eta) * pr$guess[j]
}
probs
}
map_dina <- function(y_mat, pr) {
N <- nrow(y_mat)
y_t <- t(y_mat)
alpha_hat <- matrix(NA_integer_, N, pr$K)
for (i in seq_len(N)) {
ll <- pr$log_pp %*% y_t[, i] + pr$log_1mpp %*% (1L - y_t[, i])
alpha_hat[i, ] <- pr$all_profiles[which.max(ll), ]
}
alpha_hat
}
get_theta_mle <- function(yi, a_vec, b_vec) {
nll <- function(theta) {
p <- pmin(pmax(1 / (1 + exp(-a_vec * (theta - b_vec))), 1e-8), 1 - 1e-8)
-sum(yi * log(p) + (1 - yi) * log(1 - p))
}
tryCatch(optimize(nll, interval = c(-5, 5))$minimum, error = function(e) 0)
}
# ---- One simulation at a given rho --------------------------------------
sim_one <- function(N_train, rho, pr) {
N_test <- 200L
a_tr <- gen_attrs(N_train, rho, pr$K)
y_tr <- matrix(rbinom(N_train * pr$J, 1L, dina_prob(a_tr, pr)), N_train, pr$J)
fit_irt <- tryCatch(
ltm(as.data.frame(y_tr) ~ z1, control = list(iter.em = 30, iter.qN = 50)),
error = function(e) NULL
)
if (is.null(fit_irt)) return(NULL)
ipar <- coef(fit_irt)
b_vec <- ipar[, "Dffclt"]
a_vec <- ipar[, "Dscrmn"]
a_te <- gen_attrs(N_test, rho, pr$K)
p_true <- dina_prob(a_te, pr)
y_te <- matrix(rbinom(N_test * pr$J, 1L, p_true), N_test, pr$J)
theta_te <- apply(y_te, 1, get_theta_mle, a_vec = a_vec, b_vec = b_vec)
p_irt <- matrix(NA_real_, N_test, pr$J)
for (j in seq_len(pr$J))
p_irt[, j] <- 1 / (1 + exp(-a_vec[j] * (theta_te - b_vec[j])))
p_cdm <- dina_prob(map_dina(y_te, pr), pr)
y_eval <- matrix(rbinom(N_test * pr$J, 1L, p_true), N_test, pr$J)
y_v <- as.vector(y_eval)
pirt_v <- as.vector(p_irt)
pcdm_v <- as.vector(p_cdm)
ptru_v <- as.vector(p_true)
c(
irt_cdm = imv.binary(y_v, pirt_v, pcdm_v),
irt_true = imv.binary(y_v, pirt_v, ptru_v),
cdm_true = imv.binary(y_v, pcdm_v, ptru_v)
)
}
# ---- Default custom Q text (K=3, J=9, identity structure) ---------------
default_qmat <- paste(
"1 0 0 0 0", "0 1 0 0 0", "0 0 1 0 0", "0 0 0 1 0", "0 0 0 0 1",
"1 0 0 0 0", "0 1 0 0 0", "0 0 1 0 0", "0 0 0 1 0", "0 0 0 0 1",
"1 1 0 0 0", "1 0 1 0 0", "1 0 0 1 0", "1 0 0 0 1", "0 1 1 0 0",
"0 1 0 1 0", "0 1 0 0 1", "0 0 1 1 0", "0 0 1 0 1", "0 0 0 1 1",
"1 1 1 0 0", "1 1 0 1 0", "1 1 0 0 1", "1 0 1 1 0", "1 0 1 0 1",
"1 0 0 1 1", "0 1 1 1 0", "0 1 1 0 1", "0 1 0 1 1", "0 0 1 1 1",
sep = "\n"
)
# ---- UI -----------------------------------------------------------------
ui <- fluidPage(
titlePanel("IRT vs CDM: IMV as a Function of Skill Correlation"),
sidebarLayout(
sidebarPanel(
radioButtons("qmat_mode", "Q-matrix specification:",
choices = c("Random (density)" = "random", "Custom (paste)" = "custom"),
selected = "random", inline = TRUE),
conditionalPanel(condition = "input.qmat_mode == 'random'",
sliderInput("K", "Number of skills (K):", min = 2, max = 6, value = 3, step = 1),
sliderInput("J", "Number of items (J):", min = 4, max = 45, value = 9, step = 1),
sliderInput("density", "Q-matrix density:",
min = 0.1, max = 0.9, value = 0.33, step = 0.05),
helpText("Density = probability each item loads on each skill.")
),
conditionalPanel(condition = "input.qmat_mode == 'custom'",
textAreaInput("qmat_text",
HTML("Q-matrix: rows = items, cols = skills<br/>(space or comma separated, 0s and 1s only)"),
value = default_qmat,
rows = 8, width = "100%")
),
verbatimTextOutput("qmat_status"),
hr(),
selectInput("N_train", "Training sample size (N):",
choices = c("250" = 250, "500" = 500, "1000" = 1000),
selected = 500),
sliderInput("n_rho", "Skill correlation (ρ) grid points:",
min = 5, max = 50, value = 10, step = 5),
helpText("Simulation is run at this many evenly-spaced inter-skill correlation values from ρ = 0 to ρ = 0.7."),
numericInput("seed", "Random seed:", value = 42, min = 1, max = 99999, step = 1),
helpText("Seed controls Q-matrix generation, item slip/guess draws (each from Uniform(0, 0.3)), and all simulation draws."),
actionButton("run", "Run Simulation", class = "btn-primary"),
hr(),
fluidRow(
column(6, numericInput("ymin", "Y-axis min:", value = -0.2, step = 0.05)),
column(6, numericInput("ymax", "Y-axis max:", value = 0.5, step = 0.05))
),
helpText(
"First run installs ltm (~30s). Run time scales with J, N, and the number of ρ values."
)
),
mainPanel(
plotOutput("imvPlot", height = "520px"),
verbatimTextOutput("usedQ"),
verbatimTextOutput("resultTable")
)
)
)
# ---- Server -------------------------------------------------------------
server <- function(input, output, session) {
output$qmat_status <- renderPrint({
if (input$qmat_mode == "random") {
K <- as.integer(input$K)
J <- as.integer(input$J)
d <- input$density
if (J < K) {
cat(sprintf("Error: J (%d) must be ≥ K (%d).\n", J, K))
} else {
p_multi <- 1 - (1 - d)^K - K * d * (1 - d)^(K - 1)
cat(sprintf("Q: %d items × %d skills | density: %.2f\n", J, K, d))
cat(sprintf("Expected multi-skill items: %.1f of %d (%.0f%%)\n",
p_multi * J, J, 100 * p_multi))
}
} else {
qr <- parse_Q(input$qmat_text)
if (!is.null(qr$error)) {
cat("Error:", qr$error, "\n")
} else {
Q <- qr$Q
cat(sprintf("Q: %d items × %d skills\n", nrow(Q), ncol(Q)))
cat(sprintf("Density: %.2f | Multi-skill items: %d of %d\n",
mean(Q), sum(rowSums(Q) > 1), nrow(Q)))
}
}
})
results <- eventReactive(input$run, {
set.seed(as.integer(input$seed))
if (input$qmat_mode == "random") {
K <- as.integer(input$K)
J <- as.integer(input$J)
if (J < K) {
showNotification(sprintf("J (%d) must be ≥ K (%d).", J, K), type = "error")
return(NULL)
}
Q <- gen_random_Q(K, J, input$density)
} else {
qr <- parse_Q(input$qmat_text)
if (!is.null(qr$error)) {
showNotification(paste("Q-matrix error:", qr$error), type = "error")
return(NULL)
}
Q <- qr$Q
}
pr <- make_params(Q)
N_train <- as.integer(input$N_train)
rho_vals <- seq(0, 0.7, length.out = as.integer(input$n_rho))
out <- matrix(NA_real_, length(rho_vals), 3,
dimnames = list(NULL, c("irt_cdm", "irt_true", "cdm_true")))
withProgress(message = "Simulating…", value = 0, {
for (i in seq_along(rho_vals)) {
incProgress(1 / length(rho_vals),
detail = sprintf("ρ = %.2f (%d of %d)", rho_vals[i], i, length(rho_vals)))
res <- sim_one(N_train, rho_vals[i], pr)
if (!is.null(res)) out[i, ] <- res
}
})
list(rho = rho_vals, imv = out, K = pr$K, J = pr$J, Q = pr$Q, N_train = N_train)
})
output$imvPlot <- renderPlot({
req(results())
r <- results()
rho <- r$rho
imv <- r$imv
ylim <- c(input$ymin, input$ymax)
plot(rho, imv[, "irt_cdm"],
type = "n",
xlab = expression(paste("Skill inter-correlation (", rho, ")")),
ylab = "IMV",
main = sprintf("IMV vs Skill Correlation (N = %d, K = %d, J = %d)",
r$N_train, r$K, r$J),
ylim = ylim, las = 1)
grid(lty = 3, col = "grey85")
abline(h = 0, lty = 2, col = "grey40")
sm <- function(y) {
ok <- !is.na(y)
if (sum(ok) < 4) return(rep(NA, length(y)))
predict(loess(y[ok] ~ rho[ok], span = 0.75), newdata = rho)
}
lines(rho, sm(imv[, "irt_true"]), col = "steelblue", lwd = 2, lty = 2)
lines(rho, sm(imv[, "cdm_true"]), col = "tomato", lwd = 2, lty = 2)
lines(rho, sm(imv[, "irt_cdm"]), col = "black", lwd = 2.5)
points(rho, imv[, "irt_true"], col = "steelblue", pch = 16, cex = 0.9)
points(rho, imv[, "cdm_true"], col = "tomato", pch = 16, cex = 0.9)
points(rho, imv[, "irt_cdm"], col = "black", pch = 16, cex = 1.1)
legend("topright", bty = "n",
legend = c("IMV(IRT, CDM) [main]",
"IMV(IRT, True) [oracle IRT]",
"IMV(CDM, True) [oracle CDM]"),
col = c("black", "steelblue", "tomato"),
lwd = c(2.5, 2, 2), lty = c(1, 2, 2), pch = 16,
cex = 0.9)
})
output$usedQ <- renderPrint({
req(results())
r <- results()
cat(sprintf("Q-matrix used (%d items × %d skills):\n", r$J, r$K))
Q <- r$Q
colnames(Q) <- paste0("s", seq_len(r$K))
rownames(Q) <- paste0("i", seq_len(r$J))
print(Q)
})
output$resultTable <- renderPrint({
req(results())
r <- results()
df <- data.frame(
rho = round(r$rho, 2),
IRT_CDM = round(r$imv[, "irt_cdm"], 4),
IRT_True = round(r$imv[, "irt_true"], 4),
CDM_True = round(r$imv[, "cdm_true"], 4)
)
print(df, row.names = FALSE)
})
}
shinyApp(ui = ui, server = server)
What you should see:
- At low \(\rho\) (near 0): skills are independent, so each person’s profile across attributes carries unique information that a unidimensional IRT score cannot capture. IMV(IRT, CDM) is positive and the oracle gap for IRT (blue) is larger than for CDM (red) — IRT is further from the truth.
- As \(\rho\) increases toward 0.7: the attribute profiles increasingly collapse to a single dimension. IRT, which estimates exactly that dimension, becomes more competitive. IMV(IRT, CDM) falls toward zero or below, and the oracle gaps converge.
- The oracle CDM gap (red) is generally smaller than the oracle IRT gap (blue) when the true model is DINA — the CDM is better calibrated to the data-generating process. This advantage shrinks as \(\rho\) grows.
- With more skills (\(K=4\)), the CDM has more structure to capture, and the advantage over IRT at low \(\rho\) may be more pronounced.