The InterModel Vigorish (IMV)
  • Home
  • Logistic regression and the intercept
  • Logistic regression and the Oracle
  • Logistic regression and the Overfit
  • 2PL versus 3PL predictions
  • The collapse of the thresholded IMV

The collapse of the thresholded IMV

The IMV for binary outcomes is well-established, but many measurement instruments use ordered polytomous responses — rating scales, Likert items, graded assessments. The thresholded IMV (denoted \(\omega_t\)) extends the IMV framework to these multi-category items by applying the binary IMV at each response threshold and combining the results. A natural question is whether this extension is coherent: does \(\omega_t\) reduce to the standard binary IMV when the polytomous item effectively becomes binary?

The answer is yes, and this page demonstrates it. We consider the graded response model (GRM) with \(K=3\) ordered categories and two threshold parameters \(b_0 < b_1\). As the second threshold \(b_1\) decreases toward \(b_0\), the middle category disappears and the item collapses to a binary response. The thresholded IMV tracks this collapse and converges to the binary IMV based on the corresponding dichotomization — confirming that \(\omega_t\) is a coherent generalization.

The plot shows three IMV series as a function of the second threshold \(b_1\):

  • \(\omega_t\) (black): the thresholded IMV for the full 3-category GRM item, comparing IRT-based predictions against the category prevalence baseline
  • \(\omega_{x'}\) (red): the binary IMV for the \(x > 0\) dichotomization (collapsing categories 1 and 2 together)
  • \(\omega_{x''}\) (blue): the binary IMV for the \(x > 1\) dichotomization (collapsing categories 0 and 1 together)

As \(b_1 \to b_0\) (left side of the plot), all three series converge — the polytomous and binary IMVs become equivalent, as they should when the item is effectively binary.

Notes:

  • We consider here only a single item and treat abilities as known.
  • The IMVs are comparisons of the category prevalences (baseline) against IRT-based predictions (enhanced model).
  • At the far left of the panel, all of the IMV values collapse — this is the key theoretical property of \(\omega_t\).
#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 900

library(shiny)

generate_grm_data <- function(th, a, b) {
  # --- 1. Input Validation and Setup ---
  # Ensure b is sorted, which is a requirement for the GRM
  if (is.unsorted(b)) {
    warning("Threshold parameters 'b' should be sorted for the GRM. Sorting them now.")
    b <- sort(b)
  }
  
  n <- length(th)
  num_categories <- length(b) + 1
  
  # --- 2. Calculate Cumulative Probabilities ---
  # P*(k) = P(Response >= k | th), for k = 1, ..., K-1
  # This uses the 2-Parameter Logistic (2PL) model formula for each threshold.
  # The `outer` function efficiently calculates this for all combinations of th and b,
  # resulting in an n x (K-1) matrix.
  p_star <- outer(th, b, function(theta, b_k) {
    1 / (1 + exp(-a * (theta - b_k)))
  })
  
  # Ensure p_star is a matrix, even if n=1 or length(b)=1
  if (!is.matrix(p_star)) {
    p_star <- matrix(p_star, nrow = n)
  }
  
  # --- 3. Calculate Category Probabilities ---
  # P(Response = k) is calculated from the cumulative probabilities.
  # To do this efficiently, we augment the p_star matrix.
  # P(Response >= 0) is always 1.
  # P(Response >= K) is always 0.
  p_star_augmented <- cbind(1, p_star, 0)
  
  # Category prob P(k) = P*(k) - P*(k+1). This vectorized subtraction
  # calculates all category probabilities at once.
  # The result is an n x K matrix of probabilities.
  prob_matrix <- p_star_augmented[, 1:num_categories] - p_star_augmented[, 2:(num_categories + 1)]
  
  # Due to floating point precision, very small negative numbers can occur.
  # We set them to 0 and re-normalize each row to sum to 1.
  prob_matrix[prob_matrix < 0] <- 0
  prob_matrix <- prob_matrix / rowSums(prob_matrix)

  # --- 4. Generate Responses ---
  # For each person, sample a response from their specific probability distribution.
  responses <- numeric(n)
  categories <- 0:(num_categories - 1)
  for (i in 1:n) {
    responses[i] <- sample(categories, size = 1, prob = prob_matrix[i, ])
  }
  
  return(responses)
}

estimate_grm <- function(x, th) {
  # Determine the number of categories K from the data
  # Assumes responses are 0, 1, ..., K-1
  K <- max(x, na.rm = TRUE) + 1
  
  if (K < 2) {
    stop("GRM requires at least 2 response categories.")
  }

  # Negative log-likelihood function for a generalized GRM
  nll_grm <- function(par) {
    # par[1] is log(a), par[2:K] are the b parameters
    a <- exp(par[1])
    b <- par[-1] # All parameters except the first

    # Constraint: b parameters must be ordered
    # diff(b) computes b[i] - b[i-1]. We need all these to be > 0.
    if (length(b) > 1 && any(diff(b) <= 0)) {
      return(1e10) # Return a large value to penalize invalid parameter sets
    }

    # --- Vectorized Calculation of Probabilities ---

    # 1. Calculate cumulative probabilities (P_star)
    # This matrix will have one row per person and one column per threshold (K-1)
    # outer() creates a matrix of all (th_i - b_j) combinations
    logits <- -a * outer(th, b, FUN = "-")
    p_star_matrix <- 1 / (1 + exp(logits))

    # 2. Calculate category probabilities (P)
    # We create a matrix with K columns for P(X=0), P(X=1), ..., P(X=K-1)
    cat_probs <- matrix(NA, nrow = length(th), ncol = K)

    # Prob for the first category (0)
    cat_probs[, 1] <- 1 - p_star_matrix[, 1]

    # Probs for middle categories (1 to K-2)
    if (K > 2) {
      for (k in 2:(K - 1)) {
        cat_probs[, k] <- p_star_matrix[, k - 1] - p_star_matrix[, k]
      }
    }
    
    # Prob for the last category (K-1)
    cat_probs[, K] <- p_star_matrix[, K - 1]

    # Ensure probabilities are not exactly zero to avoid log(0)
    cat_probs <- pmax(cat_probs, 1e-10)

    # 3. Calculate log-likelihood using matrix indexing
    # This is an efficient way to get the probability for each person's actual response
    # It avoids loops or complex sums.
    # We create a 2-column matrix to index `cat_probs`: 
    #   - 1st col: row number (person index)
    #   - 2nd col: column number (response category + 1, for 1-based indexing)
    idx <- cbind(seq_along(x), x + 1)
    
    # Extract the probabilities corresponding to the observed responses
    observed_probs <- cat_probs[idx]
    
    # Sum the log-probabilities and return the negative
    ll <- sum(log(observed_probs))
    
    return(-ll)
  }

  # Generate reasonable starting values for optimization
  # log(a)=0 => a=1. b's are spread out between -1 and 1.
  # Total parameters = 1 (for a) + (K-1) (for b's) = K
  if (K == 2) {
    # For a 2PL model, there's only one b
    start_vals <- c(0, 0) 
  } else {
    start_vals <- c(0, seq(-1, 1, length.out = K - 1))
  }
  
  # Optimize to find the parameters that minimize the negative log-likelihood
  fit <- optim(par = start_vals, fn = nll_grm, method = "BFGS")

  # Extract and name the final parameters
  a_hat <- exp(fit$par[1])
  b_hats <- fit$par[-1]
  names(b_hats) <- paste0("b", 0:(K - 2))

  return(list(a = a_hat, b = b_hats))
}

predict_grm <- function(th, a, b) {
  # --- Input Validation ---
  if (a <= 0) {
    stop("Discrimination parameter 'a' must be positive.")
  }
  if (length(b) > 1 && any(diff(b) <= 0)) {
    stop("Threshold parameters 'b' must be in strictly increasing order.")
  }

  # Number of categories K is determined by the number of thresholds
  K <- length(b) + 1
  
  # --- Vectorized Calculation ---

  # 1. Calculate cumulative probabilities (P*)
  # Creates a matrix where element [i, j] is P*(theta_i, b_j)
  logits <- -a * outer(th, b, FUN = "-")
  p_star_matrix <- 1 / (1 + exp(logits))

  # 2. Calculate category probabilities (P)
  # Initialize a matrix to hold the results
  cat_probs <- matrix(NA, nrow = length(th), ncol = K)

  # Probability for the first category (0)
  # P(X=0) = 1 - P*(b_1)
  cat_probs[, 1] <- 1 - p_star_matrix[, 1]

  # Probabilities for middle categories (if they exist)
  # P(X=k) = P*(b_k) - P*(b_{k+1})
  if (K > 2) {
    for (k in 2:(K - 1)) {
      # The k-th column of cat_probs is for response k-1.
      # It uses the (k-1)th and k-th columns of p_star_matrix.
      cat_probs[, k] <- p_star_matrix[, k - 1] - p_star_matrix[, k]
    }
  }

  # Probability for the last category (K-1)
  # P(X=K-1) = P*(b_{K-1})
  cat_probs[, K] <- p_star_matrix[, K - 1]
  
  # Name the columns for clarity
  colnames(cat_probs) <- paste0("p", 0:(K - 1))
  
  return(as.data.frame(cat_probs))
}

generate_pcm_data <- function(th, a, d) {
  # --- 1. Setup ---
  n <- length(th)
  num_categories <- length(d) + 1
  categories <- 0:(num_categories - 1)
  
  # --- 2. Calculate Numerators of the Probabilities ---
  # The exponent for category k > 0 is: a * (k*th - sum(d_1, ..., d_k))
  # We will calculate these for all persons and categories in a vectorized way.
  
  # Create a matrix to hold the unnormalized log-probabilities (the exponents)
  # It has n rows (persons) and K columns (categories)
  log_p_numerators <- matrix(0, nrow = n, ncol = num_categories)
  
  # The cumulative sum of d parameters is needed for the formula.
  # d_cumsum will be [d_1, d_1+d_2, d_1+d_2+d_3, ...]
  d_cumsum <- cumsum(d)
  
  # For categories k = 1, 2, ..., K-1
  for (k in 1:(num_categories - 1)) {
    # The exponent is a * (k*theta - sum of d's up to k)
    # The column index is k+1 because R is 1-based and our categories are 0-based
    log_p_numerators[, k + 1] <- a * (k * th - d_cumsum[k])
  }
  
  # The exponents are now calculated. Exponentiate to get the numerators.
  p_numerators <- exp(log_p_numerators)
  
  # --- 3. Normalize to get Final Probabilities ---
  
  # The denominator for each person is the sum of their numerators across categories
  denominator <- rowSums(p_numerators)
  
  # Divide each person's numerators by their denominator to get the probability matrix.
  # R's recycling handles this row-wise division correctly.
  prob_matrix <- p_numerators / denominator
  
  # --- 4. Generate Responses ---
  # For each person, sample one response from their specific probability distribution.
  responses <- numeric(n)
  for (i in 1:n) {
    responses[i] <- sample(categories, size = 1, prob = prob_matrix[i, ])
  }
  
  return(responses)
}

estimate_pcm <- function(x, th) {
  # Determine the number of categories K from the data
  K <- max(x, na.rm = TRUE) + 1
  n <- length(th)
  
  if (K < 2) {
    stop("PCM requires at least 2 response categories.")
  }

  # Negative log-likelihood function for a generalized PCM
  nll_pcm <- function(par) {
    # par[1] is a, par[2:K] are the d_1, ..., d_{K-1} parameters
    a <- par[1]
    d <- par[-1]

    # --- Vectorized Calculation of Probabilities ---
    
    # 1. Calculate the exponents for the numerators
    # The exponent for category k is: a * (k*th - sum(d_1, ..., d_k))
    log_p_numerators <- matrix(0, nrow = n, ncol = K)
    d_cumsum <- cumsum(d)
    
    for (k in 1:(K - 1)) {
      # Column k+1 corresponds to response category k
      log_p_numerators[, k + 1] <- a * (k * th - d_cumsum[k])
    }

    # 2. Calculate the full numerators and the denominator
    p_numerators <- exp(log_p_numerators)
    denominator <- rowSums(p_numerators)
    
    # 3. Calculate the final category probabilities
    # Use pmax to avoid division by zero if all numerators are zero (highly unlikely)
    prob_matrix <- p_numerators / pmax(denominator, 1e-10)

    # 4. Calculate log-likelihood using efficient matrix indexing
    # This extracts the probability corresponding to each person's observed response
    idx <- cbind(seq_along(x), x + 1)
    observed_probs <- prob_matrix[idx]
    
    # Use pmax to avoid log(0) for stability
    ll <- sum(log(pmax(observed_probs, 1e-10)))
    
    return(-ll)
  }

  # --- Optimization ---
  
  # Generate sensible starting values for optimization
  # log(a)=0 => a=1. d's are spread out.
  # Total parameters = 1 (for a) + (K-1) (for d's) = K
  if (K == 2) {
    # For a 2PL model equivalent, there's only one d
    start_vals <- c(0, 0) 
  } else {
    # Spread initial d values around 0
    start_vals <- c(0, seq(-0.5, 0.5, length.out = K - 1))
  }

  # Optimize to find parameters that minimize the negative log-likelihood
  fit <- optim(par = start_vals, fn = nll_pcm, method = "BFGS")

  # --- Extract and Return Final Parameters ---
  a_hat <- fit$par[1]
  d_hats <- fit$par[-1]
  names(d_hats) <- paste0("d", 1:(K - 1))

  return(list(a = a_hat, d = d_hats))
}

predict_pcm <- function(th, a, d) {
  # --- 1. Setup ---
  n <- length(th)
  num_categories <- length(d) + 1
  
  # --- 2. Calculate Numerators of the Probabilities ---
  # The exponent for category k > 0 is: a * (k*th - sum(d_1, ..., d_k))
  
  # Create a matrix to hold the unnormalized log-probabilities (the exponents)
  log_p_numerators <- matrix(0, nrow = n, ncol = num_categories)
  
  # Pre-calculate the cumulative sum of d parameters for efficiency
  d_cumsum <- cumsum(d)
  
  # Loop over categories k = 1, 2, ..., K-1
  for (k in 1:(num_categories - 1)) {
    # The exponent is a * (k*theta - sum of d's up to k)
    # The column index is k+1 (for response k)
    log_p_numerators[, k + 1] <- a * (k * th - d_cumsum[k])
  }
  
  # Exponentiate to get the actual numerators
  p_numerators <- exp(log_p_numerators)
  
  # --- 3. Normalize to get Final Probabilities ---
  
  # The denominator for each person is the sum of their numerators
  denominator <- rowSums(p_numerators)
  
  # Divide each numerator by the corresponding denominator
  # R's recycling handles this row-wise division correctly
  prob_matrix <- p_numerators / denominator
  
  # --- 4. Format and Return Output ---
  
  # Name the columns for clarity (p0, p1, p2, ...)
  colnames(prob_matrix) <- paste0("p", 0:(num_categories - 1))
  
  return(as.data.frame(prob_matrix))
}


imv.binary<-function(y, p1, p2, sigma=1e-4) {
    p1<-ifelse(p1<sigma,sigma,p1)
    p2<-ifelse(p2<sigma,sigma,p2)
    p1<-ifelse(p1>1-sigma,1-sigma,p1)
    p2<-ifelse(p2>1-sigma,1-sigma,p2)
    
    ll<-function(x,p) {
        z<-log(p)*x+log(1-p)*(1-x)
        z<-sum(z)/length(x)
        exp(z)
    }    
    loglik1<-ll(y,p1)
    loglik2<-ll(y,p2)
    getcoins<-function(a) {
        f<-function(p,a) abs(p*log(p)+(1-p)*log(1-p)-log(a))
        nlminb(.5,f,lower=0.001,upper=.999,a=a)$par
    }
    c1<-getcoins(loglik1)
    c2<-getcoins(loglik2)
    ew<-function(p1,p0) (p1-p0)/p0
    imv<-ew(c2,c1)
    imv
}

imv_c<-function(y,pctt.tab,p1,p2) {
    nn<-length(pctt.tab)
    om<-numeric()
    iis<-0:(nn-1)
    for (ii in iis) {
        ns<-om.tmp<-numeric()
        jjs<-iis[-match(ii,iis)]
        for (jj in jjs) {
            y2<-y[y$resp %in% c(ii,jj),]
            resp<-ifelse(y2$resp==ii,1,0)
            ##irt p-values for being 
            p1.ii<-y2[[paste(p1,ii,sep='')]]
            p1.jj<-y2[[paste(p1,jj,sep='')]]
            p2.ii<-y2[[paste(p2,ii,sep='')]]
            p2.jj<-y2[[paste(p2,jj,sep='')]]
            ##
            z<-data.frame(resp=resp,
                          p1=p1.ii/(p1.ii+p1.jj),
                          p2=p2.ii/(p2.ii+p2.jj)
                          )
            j0<-as.character(jj)
            om.tmp[j0]<-imv.binary(z$resp,p1=z$p1,p2=z$p2)
            ns[as.character(jj)]<-nrow(z)
        }
        om[ii+1]<-sum(om.tmp*ns)/sum(ns)
    }
    sum(om*pctt.tab)/sum(pctt.tab)
}

imv_t<-function(y,pctt.tab,p1,p2) {
    nn<-length(pctt.tab)
    om<-numeric()
    for (ii in 0:(nn-2)) {
        resp<-ifelse(y$resp<=ii,1,0)
        ##irt p-values for being below ii
        pr1<-rowSums(y[,paste(p1,0:ii,sep=''),drop=FALSE])
        pr2<-rowSums(y[,paste(p2,0:ii,sep=''),drop=FALSE])
        z<-data.frame(resp=resp,p1=pr1,p2=pr2)
        om[ii+1]<-imv.binary(z$resp,p1=z$p1,p2=z$p2)
    }
    ##
    pctt.tab<-pctt.tab[1:(nn-1)]/(1-pctt.tab[nn])
    ##
    sum(om*pctt.tab)/sum(pctt.tab)
}

estimate_2pl <- function(x_binary, th) {
  # x_binary: binary response vector (0, 1)
  # th: known ability vector
  
  # Negative log-likelihood for 2PL
  nll_2pl <- function(par) {
    a <- par[1]
    b <- par[2]
    
    # Probability of correct response
    p <- 1 / (1 + exp(-a * (th - b)))
    p <- pmax(pmin(p, 1 - 1e-10), 1e-10)  # Bound probabilities
    
    # Log-likelihood
    ll <- sum(x_binary * log(p) + (1 - x_binary) * log(1 - p))
    
    return(-ll)
  }
  
  # Optimize
  fit <- optim(par = c(0, 0), fn = nll_2pl, method = "BFGS")
  
  a_hat <- fit$par[1]
  b_hat <- fit$par[2]
  
  return(list(a = a_hat, b = b_hat))
}

predict_2pl <- function(th, a, b) {
  # Returns probability of response = 1
  p <- 1 / (1 + exp(-a * (th - b)))
  return(p)
}


##########################################################

# ============================================================================
# FUNCTION 6: Run single simulation iteration
# ============================================================================
run_single_simulation <- function(th, a, b0, b1) {
  
  # Step 1: Generate initial data x
  x <- generate_grm_data(th, a, b=c(b0, b1))
  
  # Step 2: Fit models to x
  # Fit GRM
  grm_fit <- estimate_grm(x, th)
  
  # Dichotomize and fit 2PLs
  x_prime <- as.numeric(x > 0)
  x_double <- as.numeric(x > 1)
  
  fit_2pl_prime <- estimate_2pl(x_prime, th)
  fit_2pl_double <- estimate_2pl(x_double, th)
  
  # Step 3: Generate new test data x2
  x2 <- generate_grm_data(th, a, b=c(b0, b1))
  
  # Step 4: Get predictions for x2
  # Predictions from GRM (for polytomous response)
  pred_grm <- predict_grm(th, grm_fit$a, grm_fit$b)
  
  # Step 5: Calculate omega for dichotomizations
  # For x' dichotomization (x > 0)
  x2_prime <- as.numeric(x2 > 0)
  p_prime_2pl <- predict_2pl(th, fit_2pl_prime$a, fit_2pl_prime$b)
  
  z_prime <- data.frame(
    resp = x2_prime,
    p1 = mean(x_prime),
    p2 = p_prime_2pl
  )
  omega_x_prime <- imv.binary(z_prime$resp, p1 =z_prime$p1, p2 = z_prime$p2)
  
  # For x'' dichotomization (x > 1)
  x2_double <- as.numeric(x2 > 1)
  p_double_2pl <- predict_2pl(th, fit_2pl_double$a, fit_2pl_double$b)
  
  z_double <- data.frame(
    resp = x2_double,
    p1 = mean(x_double),
    p2 = p_double_2pl
  )
  omega_x_double <- imv.binary(z_double$resp, p1 = z_double$p1, p2 = z_double$p2)
  
  # Step 6: Calculate omega_c and omega_t
  # Reconstruct polytomous predictions from the two 2PLs
  
  # Create full prediction dataframe for imv_c and imv_t
    y <- data.frame(
        resp = x2
    )
    for (i in 1:ncol(pred_grm)) y[[paste('p1',i-1,sep='')]]<-mean(x==i-1)
    for (i in 1:ncol(pred_grm)) y[[paste('p2',i-1,sep='')]]<-pred_grm[,i]

    
  # Calculate frequency table
  pctt.tab <- table(factor(x2, levels = 0:2))
  pctt.tab <- as.numeric(pctt.tab)/length(x2)
  
  omega_c <- imv_c(y, pctt.tab, p1 = "p1", p2 = "p2")
  omega_t <- imv_t(y, pctt.tab, p1 = "p1", p2 = "p2")
  
  # Return results
  return(data.frame(
    b1_true = b1,
    omega_x_prime = omega_x_prime,
    omega_x_double = omega_x_double,
    omega_c = omega_c,
    omega_t = omega_t
  ))
}

run_simulation_app <- function(n_theta, n_sim, seed, b0, mu) {
  set.seed(seed)
  
  # Fixed parameters
  a <- 1
  
  # Generate theta values (abilities)
  th <- rnorm(n_theta, mean = mu)
  
  # Sample b1 values (must be greater than b0)
  b1_values <- sort(runif(n_sim, b0 + 0.1, b0 + 2))
  
  # Run simulations
  results <- list()
  
  for(i in 1:n_sim) {
    results[[i]] <- run_single_simulation(th, a, b0, b1_values[i])
  }
  
  # Combine results
  results_df <- do.call(rbind, results)
  results_df$b0 <- b0
  return(results_df)
}

ui <- fluidPage(
  titlePanel("GRM vs 2PL Dichotomization: Interactive IMV Analysis"),
  
  sidebarLayout(
    sidebarPanel(
      sliderInput("b0",
                  "First Threshold (b0):",
                  min = -2,
                  max = 1,
                  value = 0,
                  step = 0.1),
      
      sliderInput("mu",
                  "Mean of Theta Distribution:",
                  min = -2,
                  max = 2,
                  value = 0,
                  step = 0.1),
      
      numericInput("n_theta",
                   "Sample Size:",
                   value = 1000,
                   min = 100,
                   max = 5000),
      
      numericInput("n_sim",
                   "Number of b1 values to simulate:",
                   value = 50,
                   min = 10,
                   max = 100),
      
      numericInput("seed",
                   "Random Seed:",
                   value = 123),
      
      actionButton("run", "Run Simulation", class = "btn-primary"),
      
      hr(),
      
      helpText("This app simulates a 3-category GRM item across multiple b1 values and compares:",
               tags$ul(
                 tags$li("GRM model (omega_t - black line)"),
                 tags$li("2PL on x>0 dichotomization (omega_x_prime - red line)"),
                 tags$li("2PL on x>1 dichotomization (omega_x_double - blue line)")
               ),
               tags$p("The x-axis shows the second threshold (b1), which varies from b0+0.1 to b0+2."),
               tags$p("The vertical dashed line shows the first threshold (b0)."))
    ),
    
    mainPanel(
      plotOutput("imvPlot", height = "600px"),
      
      hr(),
      
      h4("Summary Statistics"),
      tableOutput("summaryTable")
    )
  )
)

server <- function(input, output, session) {
  
  # Reactive value to store simulation results
  simulation_results <- reactiveVal(NULL)
  
  # Run simulation when button is clicked
  observeEvent(input$run, {
    
    withProgress(message = 'Running simulation...', value = 0, {
      
      incProgress(0.2, detail = paste("Simulating", input$n_sim, "different b1 values..."))
      
      results <- run_simulation_app(
        n_theta = input$n_theta,
        n_sim = input$n_sim,
        seed = input$seed,
        b0 = input$b0,
        mu = input$mu
      )
      
      simulation_results(results)
      
      incProgress(1, detail = "Complete!")
    })
  })
  
  # Generate plot
  output$imvPlot <- renderPlot({
    
    results <- simulation_results()
    req(results)
    
    # Helper function for smoothing
    pf <- function(x, y, ...) {
      m <- loess(y ~ x)
      lines(x, predict(m), ...)
    }
    
    # Create base plot
    plot(NULL, 
         xlim = range(results$b1_true),
         ylim = c(0, max(c(results$omega_t, results$omega_x_prime, results$omega_x_double)) * 1.1),
         xlab = expression('Second threshold' ~ (b[1])),
         ylab = 'IMV (prevalence baseline vs IRT predictions)',
         main = 'Collapse of the thresholded IMV as categories merge',
         cex.lab = 1.2,
         cex.main = 1.3)
    
    # Add grid
    grid(col = "gray90")
    
    # Plot smoothed lines
    pf(results$b1_true, results$omega_t, lwd = 3, col = 'black')
    pf(results$b1_true, results$omega_x_prime, col = 'red', lwd = 3)
    pf(results$b1_true, results$omega_x_double, col = 'blue', lwd = 3)
    
    # Add points
    points(results$b1_true, results$omega_t, pch = 19, cex = 0.8, col = 'black')
    points(results$b1_true, results$omega_x_prime, pch = 19, cex = 0.8, col = 'red')
    points(results$b1_true, results$omega_x_double, pch = 19, cex = 0.8, col = 'blue')
    
    # Add vertical line at b0
    abline(v = unique(results$b0), lty = 2, lwd = 2, col = "gray40")
    
    # Add legend
    legend("topleft",
           bty = 'n',
           lwd = 3,
           col = c("black", "red", "blue"),
           legend = c(expression(omega[t] ~ "(thresholded IMV, 3-category GRM)"),
                      expression(omega[x*minute] ~ "(binary IMV: x > 0)"),
                      expression(omega[x*second] ~ "(binary IMV: x > 1)")),
           cex = 1.1)
    
    # Add note about vertical line
    mtext(bquote("Vertical dashed line: first threshold" ~ b[0] == .(input$b0) ~ "— convergence occurs as" ~ b[1] %->% b[0]), 
          side = 3, line = 0.5, cex = 0.9, col = "gray40")
  })
  
  # Display summary table
  output$summaryTable <- renderTable({
    results <- simulation_results()
    req(results)
    
    data.frame(
      Metric = c("b0 (Fixed)", 
                 "b1 Range",
                 "Mean Omega_t (GRM)",
                 "Mean Omega_x_prime (2PL: x>0)",
                 "Mean Omega_x_double (2PL: x>1)",
                 "Number of Simulations"),
      Value = c(
        sprintf("%.3f", unique(results$b0)),
        sprintf("%.3f - %.3f", min(results$b1_true), max(results$b1_true)),
        sprintf("%.4f", mean(results$omega_t)),
        sprintf("%.4f", mean(results$omega_x_prime)),
        sprintf("%.4f", mean(results$omega_x_double)),
        as.character(nrow(results))
      )
    )
  }, striped = TRUE, hover = TRUE)
}

shinyApp(ui = ui, server = server)

What you should see:

  • As \(b_1\) moves rightward (further from \(b_0\)), the three categories become more distinct and the thresholded IMV \(\omega_t\) grows — the GRM model gains more predictive value over the prevalence baseline.
  • As \(b_1\) approaches \(b_0\) (left side of the plot), all three curves converge to the same value. This is the collapse property: a 3-category item with two identical thresholds is indistinguishable from a binary item, and \(\omega_t\) reflects this exactly.
  • The relative positions of \(\omega_{x'}\) and \(\omega_{x''}\) depend on the distribution of abilities relative to the thresholds and can be explored by adjusting the mean of the theta distribution.

← Back to Home