NCAA Tournament Prediction — Final Four Analytics Challenge 2026

Project Overview

Two prediction tasks solved with a unified ML pipeline

Selection Model

Binary classification: will a team be selected for the NCAA tournament? Uses team stats, NET rankings, strength of schedule, and quadrant records to predict tournament inclusion with an AUC of 0.966.

Classification · AUC = 0.966

Seed Model

Regression: for selected teams, predict their overall seed (1–68). Minimizes mean absolute error on seed placement using the same feature set with regression-tuned ensemble learners.

Regression · MAE Optimized

Stacked Ensemble

Both tasks use an 8-model stacked ensemble with a meta-learner. Base models generate out-of-fold predictions that feed into LogisticRegression (selection) and Ridge (seed) meta-learners for final output.

Meta-Learning · 10-Fold CV

End-to-End Pipeline

1Raw Data
2,701 teams × 15 cols

→

2W-L Parsing
8 record columns

→

3Feature Engineering
30 → 50 features

→

4Imputation
SimpleImputer (median)

→

58-Model Ensemble
10-fold stacked CV

→

6Meta-Learner
LogReg + Ridge

→

7Submission
451 predictions

Data Overview

Historical NCAA team performance data from multiple seasons

2,701 Training Records Teams across multiple seasons

451 Test Records Teams to predict for 2026

15 Raw Columns Expanded to 50 features

Key Raw Features

Feature	Type	Description
NET Rank	Numeric	NCAA Evaluation Tool ranking (1 = best)
PrevNET	Numeric	Previous season NET rank (momentum signal)
AvgOppNETRank	Numeric	Average NET rank of opponents faced
NETSOS	Numeric	NET Strength of Schedule
NETNonConfSOS	Numeric	Non-conference strength of schedule
WL	String	Overall win-loss record (e.g. "23-7")
Conf.Record	String	Conference win-loss record
RoadWL	String	Road game win-loss record
Quadrant 1–4	String	Win-loss by opponent quality tier

Training Set: Tournament Selection Distribution

Feature Engineering

From 15 raw columns to 50 predictive features

Base Features (30)

Win-loss strings parsed into wins, losses, and win percentage for all 8 record columns (WL, Conference, Non-Conference, Road, Quadrant 1–4). Combined with 6 NET-based numeric features.

WL_WWL_WinPctQuadrant1_WinPct NET RankNETSOSRoadWL_WinPct

Deep Features (20)

Derived signals capturing momentum, conference strength, quality of wins, and composite metrics that aren’t directly available in the raw data.

conf_tieris_power6net_momentum quality_winsbad_lossesq1_q2_combined_pct sos_compositenet_pct_changesos_norm

Feature Engineering Pipeline

def engineer_features(df):
    d = df.copy()
    # Parse W-L strings into numeric wins, losses, win%
    for col in wl_cols:
        wins, losses = zip(*d[col].apply(parse_wl))
        d[f"{col}_W"]      = wins
        d[f"{col}_L"]      = losses
        d[f"{col}_WinPct"] = wins / (wins + losses)

    # Conference tier encoding
    d["conf_tier"]   = d["Conference"].map(conf_rankings)
    d["is_power6"]   = (d["conf_tier"] <= 6).astype(int)

    # NET momentum: improvement from previous year
    d["net_momentum"]  = d["PrevNET"] - d["NET Rank"]
    d["net_improved"]  = (d["net_momentum"] > 0).astype(int)

    # Quality metrics
    d["quality_wins"]  = d["Quadrant1_W"] + d["Quadrant2_W"]
    d["bad_losses"]    = d["Quadrant3_L"] + d["Quadrant4_L"]
    return d

Top 15 Features by Importance (Selection Model)

Model Architecture

8-model stacked ensemble with meta-learner

HistGradientBoosting

Handles missing values natively, fast training. Primary base learner for both tasks.

Classifier + Regressor

GradientBoosting

Traditional gradient boosting with tree-based splits. Strong on structured tabular data.

Classifier + Regressor

ExtraTrees

Extremely randomized trees. Adds diversity to the ensemble via random split selection.

Classifier + Regressor

RandomForest

Bagged decision trees. Reduces variance and provides robust probability calibration.

Classifier + Regressor

Stacking Architecture

Base Layer (8 models)

HGB-ClfGB-ClfET-ClfRF-Clf HGB-RegGB-RegET-RegRF-Reg

↓ Out-of-fold predictions (10-fold CV) ↓

Meta-Learner Layer

LogisticRegression (Selection) Ridge Regression (Seed)

↓ Final predictions ↓

Output

Selected? (0/1) Seed (1–68)

Stacking Cross-Validation

# 10-fold stacked ensemble: base models generate OOF predictions
# that become features for the meta-learner

for fold, (train_idx, val_idx) in enumerate(skf.split(X, y_sel)):
    X_tr, X_val = X[train_idx], X[val_idx]

    for name, model in base_classifiers.items():
        clf = clone(model).fit(X_tr, y_sel[train_idx])
        oof_sel[name][val_idx] = clf.predict_proba(X_val)[:, 1]

    for name, model in base_regressors.items():
        reg = clone(model).fit(X_tr[sel_mask], y_seed[train_idx][sel_mask])
        oof_seed[name][val_idx] = reg.predict(X_val)

# Meta-learner fits on stacked OOF predictions
meta_X = np.column_stack([oof_sel[n] for n in base_classifiers])
meta_clf = LogisticRegression().fit(meta_X, y_sel)

PR-Curve Threshold Calibration

Instead of the default 0.5 cutoff, the optimal threshold was selected by maximizing F1 score on the Precision-Recall curve. The calibrated threshold of 0.6531 selected 70 teams (vs. 76 with default), improving precision while maintaining recall.

Key Insights

What the model reveals about tournament selection

NET Rank is King

NET Rank is by far the most important feature for both selection and seeding. Teams ranked in the top 45 by NET are almost guaranteed a tournament bid. The committee heavily weights this metric above all others.

Quadrant 1 Wins Matter

Quadrant 1 win percentage is the second most impactful feature. Beating highly-ranked opponents on the road or neutral courts provides the strongest signal of tournament worthiness beyond NET rank alone.

Momentum Signals

NET momentum (improvement from previous season) and conference record win percentage add predictive power beyond static rankings. Teams trending upward are more likely to earn at-large bids.

Threshold Matters

Moving from the default 0.5 threshold to a PR-curve optimized 0.653 reduced the number of predicted selections from 76 to 70 — closer to the actual 68 tournament bids — while improving precision significantly.

Technical Takeaways

Stacking > Single models: The 8-model ensemble with meta-learner consistently outperformed any individual model in cross-validation.
Deep feature engineering paid off: Adding 20 derived features (momentum, quality wins, conference tiers) lifted AUC from ~0.94 to 0.966.
PR-curve calibration: Optimizing the threshold on precision-recall (not ROC) produced more realistic team counts matching the actual 68-team bracket.
W-L parsing was essential: Converting 8 string-format W-L columns into 24 numeric features (wins, losses, win%) was the single biggest feature engineering win.

NCAA Tournament
Prediction

Project Overview

Selection Model

Seed Model

Stacked Ensemble

End-to-End Pipeline

Data Overview

Key Raw Features

Training Set: Tournament Selection Distribution

Feature Engineering

Base Features (30)

Deep Features (20)

Top 15 Features by Importance (Selection Model)

Model Architecture

HistGradientBoosting

GradientBoosting

ExtraTrees

RandomForest

Stacking Architecture

PR-Curve Threshold Calibration

Results

Predicted Seed Distribution

Old vs. Improved Submission

Selection Probability Distribution

Key Insights

NET Rank is King

Quadrant 1 Wins Matter

Momentum Signals

Threshold Matters

Technical Takeaways