A two-stage machine learning pipeline that predicts which teams make the NCAA tournament and their seeding — using an 8-model stacked ensemble with 50 engineered features from historical team performance data.
Two prediction tasks solved with a unified ML pipeline
Binary classification: will a team be selected for the NCAA tournament? Uses team stats, NET rankings, strength of schedule, and quadrant records to predict tournament inclusion with an AUC of 0.966.
Classification · AUC = 0.966Regression: for selected teams, predict their overall seed (1–68). Minimizes mean absolute error on seed placement using the same feature set with regression-tuned ensemble learners.
Regression · MAE OptimizedBoth tasks use an 8-model stacked ensemble with a meta-learner. Base models generate out-of-fold predictions that feed into LogisticRegression (selection) and Ridge (seed) meta-learners for final output.
Meta-Learning · 10-Fold CVHistorical NCAA team performance data from multiple seasons
| Feature | Type | Description |
|---|---|---|
| NET Rank | Numeric | NCAA Evaluation Tool ranking (1 = best) |
| PrevNET | Numeric | Previous season NET rank (momentum signal) |
| AvgOppNETRank | Numeric | Average NET rank of opponents faced |
| NETSOS | Numeric | NET Strength of Schedule |
| NETNonConfSOS | Numeric | Non-conference strength of schedule |
| WL | String | Overall win-loss record (e.g. "23-7") |
| Conf.Record | String | Conference win-loss record |
| RoadWL | String | Road game win-loss record |
| Quadrant 1–4 | String | Win-loss by opponent quality tier |
From 15 raw columns to 50 predictive features
Win-loss strings parsed into wins, losses, and win percentage for all 8 record columns (WL, Conference, Non-Conference, Road, Quadrant 1–4). Combined with 6 NET-based numeric features.
Derived signals capturing momentum, conference strength, quality of wins, and composite metrics that aren’t directly available in the raw data.
def engineer_features(df):
d = df.copy()
# Parse W-L strings into numeric wins, losses, win%
for col in wl_cols:
wins, losses = zip(*d[col].apply(parse_wl))
d[f"{col}_W"] = wins
d[f"{col}_L"] = losses
d[f"{col}_WinPct"] = wins / (wins + losses)
# Conference tier encoding
d["conf_tier"] = d["Conference"].map(conf_rankings)
d["is_power6"] = (d["conf_tier"] <= 6).astype(int)
# NET momentum: improvement from previous year
d["net_momentum"] = d["PrevNET"] - d["NET Rank"]
d["net_improved"] = (d["net_momentum"] > 0).astype(int)
# Quality metrics
d["quality_wins"] = d["Quadrant1_W"] + d["Quadrant2_W"]
d["bad_losses"] = d["Quadrant3_L"] + d["Quadrant4_L"]
return d
8-model stacked ensemble with meta-learner
Handles missing values natively, fast training. Primary base learner for both tasks.
Classifier + RegressorTraditional gradient boosting with tree-based splits. Strong on structured tabular data.
Classifier + RegressorExtremely randomized trees. Adds diversity to the ensemble via random split selection.
Classifier + RegressorBagged decision trees. Reduces variance and provides robust probability calibration.
Classifier + Regressor# 10-fold stacked ensemble: base models generate OOF predictions
# that become features for the meta-learner
for fold, (train_idx, val_idx) in enumerate(skf.split(X, y_sel)):
X_tr, X_val = X[train_idx], X[val_idx]
for name, model in base_classifiers.items():
clf = clone(model).fit(X_tr, y_sel[train_idx])
oof_sel[name][val_idx] = clf.predict_proba(X_val)[:, 1]
for name, model in base_regressors.items():
reg = clone(model).fit(X_tr[sel_mask], y_seed[train_idx][sel_mask])
oof_seed[name][val_idx] = reg.predict(X_val)
# Meta-learner fits on stacked OOF predictions
meta_X = np.column_stack([oof_sel[n] for n in base_classifiers])
meta_clf = LogisticRegression().fit(meta_X, y_sel)
Instead of the default 0.5 cutoff, the optimal threshold was selected by maximizing F1 score on the Precision-Recall curve. The calibrated threshold of 0.6531 selected 70 teams (vs. 76 with default), improving precision while maintaining recall.
Model performance and submission comparison
What the model reveals about tournament selection
NET Rank is by far the most important feature for both selection and seeding. Teams ranked in the top 45 by NET are almost guaranteed a tournament bid. The committee heavily weights this metric above all others.
Quadrant 1 win percentage is the second most impactful feature. Beating highly-ranked opponents on the road or neutral courts provides the strongest signal of tournament worthiness beyond NET rank alone.
NET momentum (improvement from previous season) and conference record win percentage add predictive power beyond static rankings. Teams trending upward are more likely to earn at-large bids.
Moving from the default 0.5 threshold to a PR-curve optimized 0.653 reduced the number of predicted selections from 76 to 70 — closer to the actual 68 tournament bids — while improving precision significantly.