Final Four Analytics Challenge 2026

NCAA Tournament
Prediction

A two-stage machine learning pipeline that predicts which teams make the NCAA tournament and their seeding — using an 8-model stacked ensemble with 50 engineered features from historical team performance data.

0.966Selection AUC
50Engineered Features
8Base Models
451Teams Predicted

Project Overview

Two prediction tasks solved with a unified ML pipeline

1

Selection Model

Binary classification: will a team be selected for the NCAA tournament? Uses team stats, NET rankings, strength of schedule, and quadrant records to predict tournament inclusion with an AUC of 0.966.

Classification · AUC = 0.966
2

Seed Model

Regression: for selected teams, predict their overall seed (1–68). Minimizes mean absolute error on seed placement using the same feature set with regression-tuned ensemble learners.

Regression · MAE Optimized
3

Stacked Ensemble

Both tasks use an 8-model stacked ensemble with a meta-learner. Base models generate out-of-fold predictions that feed into LogisticRegression (selection) and Ridge (seed) meta-learners for final output.

Meta-Learning · 10-Fold CV

End-to-End Pipeline

1Raw Data
2,701 teams × 15 cols
2W-L Parsing
8 record columns
3Feature Engineering
30 → 50 features
4Imputation
SimpleImputer (median)
58-Model Ensemble
10-fold stacked CV
6Meta-Learner
LogReg + Ridge
7Submission
451 predictions

Data Overview

Historical NCAA team performance data from multiple seasons

2,701 Training Records Teams across multiple seasons
451 Test Records Teams to predict for 2026
15 Raw Columns Expanded to 50 features

Key Raw Features

FeatureTypeDescription
NET RankNumericNCAA Evaluation Tool ranking (1 = best)
PrevNETNumericPrevious season NET rank (momentum signal)
AvgOppNETRankNumericAverage NET rank of opponents faced
NETSOSNumericNET Strength of Schedule
NETNonConfSOSNumericNon-conference strength of schedule
WLStringOverall win-loss record (e.g. "23-7")
Conf.RecordStringConference win-loss record
RoadWLStringRoad game win-loss record
Quadrant 1–4StringWin-loss by opponent quality tier

Training Set: Tournament Selection Distribution

Feature Engineering

From 15 raw columns to 50 predictive features

Base Features (30)

Win-loss strings parsed into wins, losses, and win percentage for all 8 record columns (WL, Conference, Non-Conference, Road, Quadrant 1–4). Combined with 6 NET-based numeric features.

WL_WWL_WinPctQuadrant1_WinPct NET RankNETSOSRoadWL_WinPct

Deep Features (20)

Derived signals capturing momentum, conference strength, quality of wins, and composite metrics that aren’t directly available in the raw data.

conf_tieris_power6net_momentum quality_winsbad_lossesq1_q2_combined_pct sos_compositenet_pct_changesos_norm
Feature Engineering Pipeline
def engineer_features(df):
    d = df.copy()
    # Parse W-L strings into numeric wins, losses, win%
    for col in wl_cols:
        wins, losses = zip(*d[col].apply(parse_wl))
        d[f"{col}_W"]      = wins
        d[f"{col}_L"]      = losses
        d[f"{col}_WinPct"] = wins / (wins + losses)

    # Conference tier encoding
    d["conf_tier"]   = d["Conference"].map(conf_rankings)
    d["is_power6"]   = (d["conf_tier"] <= 6).astype(int)

    # NET momentum: improvement from previous year
    d["net_momentum"]  = d["PrevNET"] - d["NET Rank"]
    d["net_improved"]  = (d["net_momentum"] > 0).astype(int)

    # Quality metrics
    d["quality_wins"]  = d["Quadrant1_W"] + d["Quadrant2_W"]
    d["bad_losses"]    = d["Quadrant3_L"] + d["Quadrant4_L"]
    return d

Top 15 Features by Importance (Selection Model)

Model Architecture

8-model stacked ensemble with meta-learner

HistGradientBoosting

Handles missing values natively, fast training. Primary base learner for both tasks.

Classifier + Regressor

GradientBoosting

Traditional gradient boosting with tree-based splits. Strong on structured tabular data.

Classifier + Regressor

ExtraTrees

Extremely randomized trees. Adds diversity to the ensemble via random split selection.

Classifier + Regressor

RandomForest

Bagged decision trees. Reduces variance and provides robust probability calibration.

Classifier + Regressor

Stacking Architecture

Base Layer (8 models)
HGB-ClfGB-ClfET-ClfRF-Clf HGB-RegGB-RegET-RegRF-Reg
↓ Out-of-fold predictions (10-fold CV) ↓
Meta-Learner Layer
LogisticRegression (Selection) Ridge Regression (Seed)
↓ Final predictions ↓
Output
Selected? (0/1) Seed (1–68)
Stacking Cross-Validation
# 10-fold stacked ensemble: base models generate OOF predictions
# that become features for the meta-learner

for fold, (train_idx, val_idx) in enumerate(skf.split(X, y_sel)):
    X_tr, X_val = X[train_idx], X[val_idx]

    for name, model in base_classifiers.items():
        clf = clone(model).fit(X_tr, y_sel[train_idx])
        oof_sel[name][val_idx] = clf.predict_proba(X_val)[:, 1]

    for name, model in base_regressors.items():
        reg = clone(model).fit(X_tr[sel_mask], y_seed[train_idx][sel_mask])
        oof_seed[name][val_idx] = reg.predict(X_val)

# Meta-learner fits on stacked OOF predictions
meta_X = np.column_stack([oof_sel[n] for n in base_classifiers])
meta_clf = LogisticRegression().fit(meta_X, y_sel)

PR-Curve Threshold Calibration

Instead of the default 0.5 cutoff, the optimal threshold was selected by maximizing F1 score on the Precision-Recall curve. The calibrated threshold of 0.6531 selected 70 teams (vs. 76 with default), improving precision while maintaining recall.

Results

Model performance and submission comparison

0.966Selection AUCStacked ensemble
0.653Optimal ThresholdPR-curve calibrated
70Teams SelectedOut of 451 candidates
68Actual BidsNCAA tournament slots

Predicted Seed Distribution

Old vs. Improved Submission

Selection Probability Distribution

Key Insights

What the model reveals about tournament selection

NET Rank is King

NET Rank is by far the most important feature for both selection and seeding. Teams ranked in the top 45 by NET are almost guaranteed a tournament bid. The committee heavily weights this metric above all others.

Quadrant 1 Wins Matter

Quadrant 1 win percentage is the second most impactful feature. Beating highly-ranked opponents on the road or neutral courts provides the strongest signal of tournament worthiness beyond NET rank alone.

Momentum Signals

NET momentum (improvement from previous season) and conference record win percentage add predictive power beyond static rankings. Teams trending upward are more likely to earn at-large bids.

Threshold Matters

Moving from the default 0.5 threshold to a PR-curve optimized 0.653 reduced the number of predicted selections from 76 to 70 — closer to the actual 68 tournament bids — while improving precision significantly.

Technical Takeaways

  • Stacking > Single models: The 8-model ensemble with meta-learner consistently outperformed any individual model in cross-validation.
  • Deep feature engineering paid off: Adding 20 derived features (momentum, quality wins, conference tiers) lifted AUC from ~0.94 to 0.966.
  • PR-curve calibration: Optimizing the threshold on precision-recall (not ROC) produced more realistic team counts matching the actual 68-team bracket.
  • W-L parsing was essential: Converting 8 string-format W-L columns into 24 numeric features (wins, losses, win%) was the single biggest feature engineering win.