Hi, I'm

Daniel Kang

Data Analyst | Product Analytics, Pricing Optimization, and Machine Learning

I analyze product and operational data to drive pricing decisions, improve conversion, and optimize efficiency. I have built optimization models for shipping costs, end-to-end ML pipelines, and analytics frameworks that translate directly into business impact.

My focus is on solving high-leverage problems where data informs strategy and drives measurable outcomes.

About Me

I’m a graduate student at Purdue University (MS in Business Analytics, Aug 2026) who cares about one thing: what should we do next?

At Jambo Club I designed geo-split experiments that cut CPC by 27% and built survival models that doubled retention. At Mayacrew I ran cohort analyses that lifted conversion 15% and automated B2B dashboards that saved 40% of reporting time. At Purdue Data Mine I built KPI dashboards and forecasting models for corporate partners.

I work across the full analytics lifecycle — structuring messy questions, pulling and cleaning data, building models, and delivering recommendations that stakeholders act on.

7+ Projects Completed
3.85 Undergraduate GPA
2026 MS Graduation

Experience

Aug 2025 – Dec 2025

Graduate Data Science Researcher — The Data Mine, Purdue University

Built an interactive Plotly dashboard centralizing corporate KPIs, backlog metrics, and a reorder likelihood signal flagging high-risk and high-opportunity products. Identified repeatable inventory and demand patterns at the SKU level using time-series analysis and clustering, improving sales predictability and guiding replenishment strategy.

May 2025 – Aug 2025

Business Analyst — Jambo Club, Seoul

Owned marketing performance and retention analytics across paid acquisition and in-product funnels. Designed geo-split experiments and multivariate bid tests, applying regression-based lift analysis to reduce CPC by 27% while maintaining conversion volume. Built retention curves and survival analyses that identified critical churn moments, guiding features that increased retention from 22% to 53%.

May 2024 – Aug 2024

Data Analyst — Mayacrew, Seoul

Assessed acquisition effectiveness and segment economics through funnel and cohort analyses, delivering strategic recommendations that increased digital conversion rates by 15%. Built Tableau and Excel dashboards for B2B engagement KPIs, reducing manual reporting time by 40% and shifting budget toward the highest-ROI B2B segments.

Jan 2024 – May 2024

Analytics Researcher — Applied Materials Inc., Purdue University

Identified seasonal and macro-driven demand patterns in semiconductor equipment utilization, linking utilization shifts to customer CapEx behavior. Developed an ensemble forecasting model combining Random Forest and XGBoost, improving utilization prediction accuracy by 13%.

Jul 2019 – Feb 2022

Deputy Team Leader & Instructor — Pom Education, Seoul

Contributed to more than 500% growth in student enrollment over two years through academic consultations and trial classes. Led and mentored a team of instructors, coordinating lesson planning, sharing best practices, and monitoring instructional quality.

Education

Master of Science

Business Analytics & Information Management

Purdue University, Daniels School of Business

West Lafayette, IN · Expected August 2026

Relevant Coursework: Business Analytics, Financial Analytics, Spreadsheet Optimization, Data Mining

Bachelor of Science

Business Analytics & Information Management

Purdue University, Daniels School of Business

West Lafayette, IN · December 2024 · GPA: 3.85/4.0

Dean’s List: Fall 2022, Spring 2023, Fall 2023, Spring 2024 · Graduated with Distinction

Projects

Each project follows Problem → Approach → Results → Business Insight

Corporate KPI Dashboard

Reorder Likelihood Dashboard for a Seed Distributor

Problem: Sales and operations teams relied on gut feel to decide which products to restock. There was no shared view of backlog health or early demand signals, leading to frequent stockouts on high-value SKUs.

  • Data integration: Consolidated point-of-sale, backlog, and inventory data into a single interactive Plotly dashboard with executive KPI cards
  • Signal design: Created a reorder-likelihood score by clustering SKU purchase histories to surface repeat-buy and seasonal patterns
  • Self-serve views: Built drill-downs by region, customer tier, and product family so sales, ops, and finance could act on the same numbers
  • Surfaced the top 20% of at-risk SKUs roughly four weeks earlier than the prior spreadsheet-based process
  • Dashboard adopted by three cross-functional teams for weekly replenishment and inventory meetings
  • Replaced fragmented one-off reports with a single source of truth for backlog and reorder risk

Demand signals already lived in historical order data — the gap was visibility and alignment, not collection. A scoped dashboard turned tribal knowledge into a repeatable planning rhythm and reduced reactive firefighting on high-value SKUs.

PythonPlotlyTime Series ClusteringPandas

Paid Acquisition & Retention Analysis at Jambo Club

Problem: The company was scaling ad spend without knowing which channels actually drove incremental sign-ups. At the same time, only 22% of new users were still active after 30 days.

  • Incrementality testing: Designed geo-split experiments to isolate true ad lift; ran multivariate bid tests to find efficient spend levels without sacrificing conversion volume
  • Retention diagnostics: Built survival curves that revealed Day 3 and Day 7 as the critical churn windows
  • Cross-functional execution: Partnered with product to ship re-engagement nudges timed to those drop-off points
  • Reduced cost-per-click by 27% with no loss in conversion volume
  • D30 retention increased from 22% to 53%
  • Created a repeatable experimentation playbook for future bid and creative tests

Churn concentrated in two narrow windows, not a slow leak — so timely, targeted nudges beat broad blast campaigns. Rigorous geo tests gave finance confidence to reallocate budget toward channels with provable incremental lift.

A/B TestingSurvival AnalysisSQL Causal InferenceRetention
NCAA Tournament Prediction

NCAA Tournament Selection & Seed Prediction

Problem: Sports media and bracket analysts rely on subjective rankings to predict which 68 teams make the NCAA tournament and how they are seeded. Can a data-driven pipeline match or beat that intuition using only regular-season performance?

  • Feature engineering: Built 50 features for team strength, strength-of-schedule, and conference competitiveness
  • Two-stage design: Stage 1 classifies tournament selection (in/out); Stage 2 predicts seed for teams predicted “in”
  • Ensemble + calibration: Stacked eight base models with a meta-learner; tuned the decision threshold with precision-recall curves to reduce false negatives on bubble teams
  • Selection model achieved AUC 0.966, ranking tournament-worthy teams with very high fidelity
  • Seed predictions landed within about two seed lines of actual for in-bracket teams
  • Demonstrated a full production-style ML workflow: features, stacking, and threshold tuning in one pipeline

Schedule quality and late-season momentum mattered more than raw win totals — mirroring how selection committees weight “who you beat” and how you finish. Treating selection vs. seeding as separate decisions avoided a single model conflating two different objectives and improved both outputs.

Stacked Ensemblescikit-learnPython Feature EngineeringPR-Curve Calibration

Capacity Planning Forecast for Applied Materials

Problem: Applied Materials planned equipment capacity using trailing averages. When customer demand shifted — driven by seasonal cycles and CapEx timing — utilization forecasts lagged behind, leading to over- or under-provisioning.

  • Signal discovery: Decomposed utilization series to isolate seasonality and linked swings to customer CapEx cycles
  • Forecasting: Replaced the moving-average baseline with a Random Forest + XGBoost ensemble and scored accuracy on held-out quarters
  • Delivery: Packaged forecasts into operational dashboards with drill-downs by region and equipment type for planning leads
  • 13% improvement in utilization prediction accuracy vs. the incumbent baseline
  • Surfaced which drivers (macro vs. seasonal vs. account-level) explained the largest forecast errors
  • Outputs structured for quarterly capacity conversations rather than ad-hoc spreadsheet updates

Customer CapEx timing was a stronger leading indicator of utilization than the equipment’s own trailing trend alone. Feeding that signal into forecasts shifted planning from reactive backfill to forward-looking allocation.

Random ForestXGBoostPython ForecastingDashboards
3rd / 53 Teams
Bankruptcy Prediction Dashboard

Bankruptcy Risk Scoring from Financial Statements

Problem: Lenders and investors need to flag companies at risk of bankruptcy before it happens. The challenge: financial data is messy — 64 ratios with over 40% missing values — and bankrupt firms represent less than 5% of observations.

  • Data quality: Winsorized extreme outliers and applied KNN imputation to preserve relationships between ratios
  • Modeling: Trained XGBoost with class-weight tuning for severe label imbalance
  • Evaluation: Used stratified cross-validation and optimized for AUC (ranking quality) rather than a single hard cutoff
  • Mean AUC 0.907 across folds in a graduate predictive analytics competition
  • Placed 3rd out of 53 teams
  • Delivered a continuous risk score suitable for credit review workflows instead of a single yes/no flag

A score beats a blunt label: risk teams can set thresholds to match portfolio policy and expected loss appetite. Thoughtful imputation and feature integrity moved the needle more than chasing marginal gains from model complexity alone.

XGBoostImbalanced ClassificationPython KNN ImputationCross-Validation

How I Approach Problems

1

Start with the business context

I focus on understanding the real-world problem before jumping into data. Domain knowledge is critical for asking the right questions and avoiding misleading conclusions.

2

Translate problems into measurable metrics

I define clear success metrics that align with business goals, whether it’s revenue, conversion, cost, or efficiency.

3

Use data and models as tools, not answers

I apply statistical methods and machine learning to analyze patterns, but I treat models as tools to support decisions, not replace them.

4

Validate with context and judgment

In an era where AI can generate instant outputs, the ability to evaluate results critically and apply domain knowledge is increasingly important. I focus on interpreting results in a way that makes sense for the business.

5

Deliver actionable decisions

My goal is not just analysis, but clear recommendations that stakeholders can act on to improve outcomes.

Skills

Analytics & Modeling

Statistical ModelingRegression ClassificationForecasting Survival AnalysisClustering XGBoostscikit-learn

Data & SQL

SQLPythonPandas ETL PipelinesData Cleaning Excel

Visualization

TableauPower BIPlotly MatplotlibChart.js

Experimentation

A/B TestingGeo-Split Experiments Causal InferenceLift Analysis Incrementality Testing

Get in Touch

I'm always open to new opportunities and collaborations