Multidimensional IRT and the Cost of Dimensionality Misspecification

When item responses are generated by a multidimensional IRT model but scored using a unidimensional model, how much predictive accuracy is lost? The answer depends on two things: how correlated the latent dimensions are, and how items load on those dimensions.

This simulation generates data from a compensatory 2D-2PL model,

\[P(Y_{ij}=1\mid\theta_{1i},\theta_{2i}) = \frac{1}{1+\exp\!\bigl(-(a_{1j}\,\theta_{1i}+a_{2j}\,\theta_{2i}+d_j)\bigr)},\]

and fits a unidimensional 2PL to recover person scores. The dimensions \((\theta_1,\theta_2)\) are drawn from a bivariate standard normal with inter-dimension correlation \(\rho\), which is varied across the x-axis.

Loading balance (M). Each item has one randomly assigned dominant dimension. Base loadings \(a_{1,j}\) and \(a_{2,j}\) are drawn independently from Uniform(0.5, 1.5). If dimension 1 is dominant, the item’s loadings are \((a_{1,j},\; M \times a_{2,j})\); if dimension 2 is dominant, they are \((M \times a_{1,j},\; a_{2,j})\).

\(M=0\): between-item multidimensionality — each item loads on exactly one dimension; the test is a mixture of two unidimensional subtests.
\(M=1\): within-item multidimensionality — every item loads on both dimensions, with independently drawn loadings for the two dimensions.

Evaluation scheme. Item parameters are drawn once per run and held fixed across the \(\rho\) sweep. Three independent datasets are used at each \(\rho\):

Training (\(N\) persons): fit the 1D-2PL and estimate item parameters.
Test (200 persons): estimate each person’s \(\hat\theta\) (MLE) using the fitted item parameters; compute 1D predicted probabilities.
Evaluation (200 persons, same attribute profiles, independent responses): score IMV(1D, True) here to prevent the MLE from inflating its own score.

The y-axis, IMV(1D-2PL, True), measures how far the fitted unidimensional predictions are from the true 2D generating probabilities. It is always \(\geq 0\) in expectation; larger values indicate greater cost from ignoring the second dimension.

Note on relationship to Domingue et al. (2024, Psychometrika). This simulation measures the oracle gap — IMV(1D-2PL, True) — how far fitted unidimensional predictions fall from the true multidimensional generating probabilities. The Psychometrika paper instead reports IMV(2PL, 2F-2PL): a head-to-head comparison of a fitted unidimensional 2PL against a fitted two-factor 2PL on empirical data (Figure 4). These address related but distinct questions. The oracle gap asks how much information does the 1D model lose relative to perfect knowledge of the 2D truth? The paper’s comparison asks how much does estimating a second factor recover in practice? The latter is bounded by estimation noise and sample size in ways the oracle is not.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 940

webr::install("ltm",     repos = c("https://cran.r-universe.dev", "https://repo.r-wasm.org"))
webr::install("mvtnorm", repos = c("https://repo.r-wasm.org"))

library(shiny)
library(ltm)
library(mvtnorm)

# ---- IMV ----------------------------------------------------------------
imv.binary <- function(y, p1, p2, sigma = 1e-4) {
  p1 <- pmin(pmax(p1, sigma), 1 - sigma)
  p2 <- pmin(pmax(p2, sigma), 1 - sigma)
  ll <- function(x, p) exp(sum(log(p) * x + log(1 - p) * (1 - x)) / length(x))
  coins <- function(a) nlminb(0.5, function(p, a) abs(p*log(p) + (1-p)*log(1-p) - log(a)),
                              lower = 1e-4, upper = 1 - 1e-4, a = a)$par
  c1 <- coins(ll(y, p1)); c2 <- coins(ll(y, p2))
  (c2 - c1) / c1
}

# ---- Item parameters (drawn once per run) -------------------------------
make_item_params <- function(J, M) {
  a1_base  <- runif(J, 0.5, 1.5)
  a2_base  <- runif(J, 0.5, 1.5)
  dominant <- sample(c(1L, 2L), J, replace = TRUE)
  a1 <- ifelse(dominant == 1L, a1_base, M * a1_base)
  a2 <- ifelse(dominant == 2L, a2_base, M * a2_base)
  d  <- rnorm(J, 0, 0.5)
  list(J = J, a1 = a1, a2 = a2, d = d)
}

# ---- True 2D-2PL probabilities ------------------------------------------
true_prob_2d <- function(theta1, theta2, ipar) {
  N <- length(theta1)
  p <- matrix(NA_real_, N, ipar$J)
  for (j in seq_len(ipar$J))
    p[, j] <- 1 / (1 + exp(-(ipar$a1[j]*theta1 + ipar$a2[j]*theta2 + ipar$d[j])))
  p
}

# ---- Person generation --------------------------------------------------
gen_persons <- function(N, rho) {
  Sigma <- matrix(c(1, rho, rho, 1), 2, 2)
  Z <- rmvnorm(N, sigma = Sigma)
  list(theta1 = Z[, 1], theta2 = Z[, 2])
}

# ---- 1D MLE theta -------------------------------------------------------
get_theta_mle <- function(yi, a_vec, b_vec) {
  nll <- function(theta) {
    p <- pmin(pmax(1 / (1 + exp(-a_vec * (theta - b_vec))), 1e-8), 1 - 1e-8)
    -sum(yi * log(p) + (1 - yi) * log(1 - p))
  }
  tryCatch(optimize(nll, interval = c(-5, 5))$minimum, error = function(e) 0)
}

# ---- One simulation at a given rho --------------------------------------
sim_one <- function(N_train, rho, ipar) {
  J      <- ipar$J
  N_test <- 200L

  # Step 1: training data — fit 1D-2PL
  pr_tr <- gen_persons(N_train, rho)
  p_tr  <- true_prob_2d(pr_tr$theta1, pr_tr$theta2, ipar)
  y_tr  <- matrix(rbinom(N_train * J, 1L, p_tr), N_train, J)

  fit <- tryCatch(
    ltm(as.data.frame(y_tr) ~ z1, control = list(iter.em = 30, iter.qN = 50)),
    error = function(e) NULL
  )
  if (is.null(fit)) return(NA_real_)

  ipar1d <- coef(fit)
  b_vec  <- ipar1d[, "Dffclt"]
  a_vec  <- ipar1d[, "Dscrmn"]

  # Step 2: test data — estimate 1D theta per person
  pr_te  <- gen_persons(N_test, rho)
  p_true <- true_prob_2d(pr_te$theta1, pr_te$theta2, ipar)
  y_te   <- matrix(rbinom(N_test * J, 1L, p_true), N_test, J)

  theta_hat <- apply(y_te, 1, get_theta_mle, a_vec = a_vec, b_vec = b_vec)
  p_1d <- matrix(NA_real_, N_test, J)
  for (j in seq_len(J))
    p_1d[, j] <- 1 / (1 + exp(-a_vec[j] * (theta_hat - b_vec[j])))

  # Step 3: independent evaluation draw
  y_eval <- matrix(rbinom(N_test * J, 1L, p_true), N_test, J)

  imv.binary(as.vector(y_eval), as.vector(p_1d), as.vector(p_true))
}

# ---- UI -----------------------------------------------------------------
ui <- fluidPage(
  titlePanel("Multidimensional IRT: Cost of Unidimensional Scoring"),
  sidebarLayout(
    sidebarPanel(
      sliderInput("M", "Loading balance (M):",
                  min = 0, max = 1, value = 0, step = 0.1),
      helpText(
        "M = 0: between-item (each item loads on one dimension only).",
        "M = 1: within-item (both dimensions equally weighted per item)."
      ),
      selectInput("J", "Number of items (J):",
                  choices = c("10" = 10, "20" = 20, "30" = 30),
                  selected = 20),
      selectInput("N_train", "Training sample size (N):",
                  choices = c("250" = 250, "500" = 500, "1000" = 1000),
                  selected = 500),
      sliderInput("n_rho", "Number of ρ values:",
                  min = 5, max = 50, value = 10, step = 5),
      actionButton("run", "Run Simulation", class = "btn-primary"),
      hr(),
      fluidRow(
        column(6, numericInput("ymin", "Y-axis min:", value = -0.2, step = 0.05)),
        column(6, numericInput("ymax", "Y-axis max:", value = 0.5, step = 0.05))
      ),
      helpText(
        "First run installs ltm (~30s). Each ρ value fits a 2PL IRT model; ",
        "run time scales with the number of ρ values. More points clarify ",
        "the trend but 50 values may take 10+ minutes."
      )
    ),
    mainPanel(
      plotOutput("imvPlot", height = "500px"),
      verbatimTextOutput("resultTable")
    )
  )
)

# ---- Server -------------------------------------------------------------
server <- function(input, output, session) {

  results <- eventReactive(input$run, {
    J        <- as.integer(input$J)
    M        <- input$M
    N_train  <- as.integer(input$N_train)
    rho_vals <- seq(0, 0.9, length.out = as.integer(input$n_rho))

    # Draw item parameters once for the whole sweep
    ipar <- make_item_params(J, M)
    imv_vals <- numeric(length(rho_vals))

    withProgress(message = "Simulating…", value = 0, {
      for (i in seq_along(rho_vals)) {
        incProgress(1 / length(rho_vals),
                    detail = sprintf("ρ = %.2f (%d of %d)", rho_vals[i], i, length(rho_vals)))
        imv_vals[i] <- sim_one(N_train, rho_vals[i], ipar)
      }
    })

    list(rho = rho_vals, imv = imv_vals, M = M, N = N_train, J = J)
  })

  output$imvPlot <- renderPlot({
    req(results())
    r   <- results()
    rho <- r$rho
    imv <- r$imv

    ylim <- c(input$ymin, input$ymax)

    plot(rho, imv,
         type = "n",
         xlab = expression(paste("Inter-dimension correlation (", rho, ")")),
         ylab = "IMV(1D-2PL, True)",
         main = sprintf(
           "Cost of Unidimensional Scoring  (N = %d, J = %d, M = %.1f)",
           r$N, r$J, r$M),
         ylim = ylim, las = 1)
    grid(lty = 3, col = "grey85")
    abline(h = 0, lty = 2, col = "grey40")

    ok <- !is.na(imv)
    if (sum(ok) >= 4) {
      sm <- predict(loess(imv[ok] ~ rho[ok], span = 0.75), newdata = rho)
      lines(rho, sm, col = "steelblue", lwd = 2.5)
    }
    points(rho, imv, col = "steelblue", pch = 16, cex = 1.1)
  })

  output$resultTable <- renderPrint({
    req(results())
    r <- results()
    df <- data.frame(rho = round(r$rho, 2), IMV_1D_True = round(r$imv, 4))
    print(df, row.names = FALSE)
  })
}

shinyApp(ui = ui, server = server)

What you should see:

IMV(1D-2PL, True) decreases as \(\rho\) increases. When the two dimensions are highly correlated, they are nearly interchangeable, and the unidimensional model captures most of the predictive signal. As \(\rho \to 1\), the cost of ignoring the second dimension vanishes.
M modulates the effect. At \(M=0\) (between-item), each item loads on exactly one dimension; the 1D model must bridge two independent subtests, so the cost of ignoring the second dimension is largest — and decreases most sharply as \(\rho\) increases. At \(M=1\) (within-item), every item loads on both dimensions with similar magnitudes, so the 1D sum-score closely approximates the true 2D probabilities and IMV is small and nearly flat across \(\rho\).
Larger \(N\) gives a smoother, more stable curve but does not shift it — sample size affects estimation quality, not the underlying cost of misspecification.

← Back to Home