The InterModel Vigorish (IMV)
  • Home
  • Simulation Examples
    • Logistic regression and the intercept
    • Logistic regression and the Oracle
    • Logistic regression and the Overfit
    • 2PL versus 3PL predictions
    • The collapse of the thresholded IMV
    • Implied probabilities and the IMV
    • IRT versus Cognitive Diagnostic Models
    • Multidimensional IRT and unidimensional scoring
  • Computational Examples
    • Logistic regression (glm)
    • Mixed-effects logistic regression (glmer)
    • Unidimensional IRT
    • Multidimensional IRT

Logistic Regression and the Intercept

In this example, we consider the logistic regression model \[\Pr(y=1)=\frac{1}{1+\text{exp}(-(b_0+b_1 x))}.\] We generate \(y\) based on a sample of 5000 \(x\) values and consider the IMV associated with predicting new \(y\) outcomes (based on new \(x\) observations) for different values of \(b_0\) and \(b_1\).

The baseline model uses only the overall proportion of 1s — i.e., \(\text{mean}(y)\) — as a flat prediction. The enhanced model uses the fitted logistic regression predictions \(\hat{p}\). The IMV measures how much better the regression predictions are compared to simply using the prevalence.

The key question this example addresses: does the intercept \(b_0\) matter for the IMV, even when the slope \(b_1\) is held fixed? Adjust \(b_1\) with the slider and run the simulation to find out.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 900

library(shiny)

imv.binary<-function(y, p1, p2, sigma=1e-4) {
    p1<-ifelse(p1<sigma,sigma,p1)
    p2<-ifelse(p2<sigma,sigma,p2)
    p1<-ifelse(p1>1-sigma,1-sigma,p1)
    p2<-ifelse(p2>1-sigma,1-sigma,p2)
    
    ll<-function(x,p) {
        z<-log(p)*x+log(1-p)*(1-x)
        z<-sum(z)/length(x)
        exp(z)
    }    
    loglik1<-ll(y,p1)
    loglik2<-ll(y,p2)
    getcoins<-function(a) {
        f<-function(p,a) abs(p*log(p)+(1-p)*log(1-p)-log(a))
        nlminb(.5,f,lower=0.001,upper=.999,a=a)$par
    }
    c1<-getcoins(loglik1)
    c2<-getcoins(loglik2)
    ew<-function(p1,p0) (p1-p0)/p0
    imv<-ew(c2,c1)
    imv
}

ui <- fluidPage(
  titlePanel("Logistic Regression: IMV vs b0"),

  sidebarLayout(
    sidebarPanel(
      sliderInput("b1",
                  "Slope (b1):",
                  min = -2,
                  max = 2,
                  value = 1,
                  step = 0.1),

      actionButton("run", "Run Simulation", class = "btn-primary"),

      hr(),
      fluidRow(
        column(6, numericInput("ymin", "Y-axis min:", value = -0.2, step = 0.05)),
        column(6, numericInput("ymax", "Y-axis max:", value =  0.5, step = 0.05))
      ),
      helpText("Adjust b1 and click 'Run Simulation' to see how IMV varies with b0.")
    ),
    
    mainPanel(
      plotOutput("omPlot", height = "600px")
    )
  )
)

server <- function(input, output, session) {
  
  sim_results <- eventReactive(input$run, {
    withProgress(message = 'Running simulation...', value = 0, {

      b1 <- input$b1
      b0vals <- seq(-2, 2, length.out = 100)
      om <- numeric()

      for (i in seq_along(b0vals)) {
        b0 <- b0vals[i]

        pr <- function(x, b0, b1) 1/(1 + exp(-(b0 + b1*x)))

        x <- rnorm(5000)
        p <- pr(x, b0 = b0, b1 = b1)
        y <- rbinom(length(x), 1, p)
        df <- data.frame(x = x, y = y)
        m <- glm(y ~ x, df, family = 'binomial')

        ##oos data
        x2 <- rnorm(1000)
        p2 <- pr(x2, b0 = b0, b1 = b1)
        y2 <- rbinom(length(x2), 1, p2)
        p.oos <- predict(m, data.frame(x = x2), type = 'response')
        om[as.character(b0)] <- imv.binary(y2, mean(y), p.oos)

        incProgress(1/length(b0vals))
      }

      list(b0vals = b0vals, om = om)
    })
  })
  
  output$omPlot <- renderPlot({
    req(sim_results())
    
    results <- sim_results()
    
    # Fit loess
    loess_fit <- loess(results$om ~ results$b0vals, span = 0.5)
    loess_pred <- predict(loess_fit)

    # Plot points
    plot(results$b0vals, results$om,
         xlab = expression(b[0]),
         ylab = "IMV(prevalence, fitted model)",
         ylim=c(input$ymin, input$ymax),
         main = paste0("IMV vs b0 (b1 = ", input$b1, ")"),
         pch = 19,
         col = "steelblue")

    # Add loess smooth
    lines(results$b0vals, loess_pred, col = "darkred", lwd = 2)
    
    # Add reference line and grid
    abline(h = 0, lty = 2, col = "red")
    grid()
  })
}

shinyApp(ui = ui, server = server)

What you should see:

  • The IMV peaks near \(b_0=0\) (balanced classes) and falls off as \(|b_0|\) grows — when the intercept pushes the outcome prevalence toward 0 or 1, the baseline (mean \(y\)) already captures most of the outcome, leaving little for the regression to improve on.
  • The curve is symmetric around \(b_0=0\): a large positive intercept and a large negative intercept produce the same IMV because both create an extreme prevalence with the same predictability.
  • As \(b_1\) increases (stronger slope), the peak IMV at \(b_0=0\) rises and the curve stays elevated over a wider range of \(b_0\) values — a stronger predictor can partially overcome the headwind of extreme prevalence.

← Back to Home