Micah Jona

micahjona99@gmail.com | GitHub: mjona99 | Resume (PDF)

About Me

I'm a senior computer science and statistics double major at the University of Wisconsin-Madison. After graduation, I hope to join a MLB team's baseball operations department as either an analyst or developer. Most of my programming experience is in Java and R, but I also have varying degrees of proficiency in C, JavaScript, SQL, and Python. I love playing pick-up games (basketball/ultimate frisbee) with friends, enjoy watching the Cubs (most of the time), and working on baseball coding and visualization projects.

Projects

How Tired Do Starting Pitchers Get During a Game? (PDF)

Analyzed pitcher and then starting pitcher fatigue during a game measured through pitch velocity

  • Showed that velocity generally declined throughout the game

  • Showed that there is some sort of recovery during inning breaks that allows a pitcher to recapture some of the previously lost velocity

  • Showed that velocity increased for the last four pitches of an inning

Predicting Pitch Outcomes in Major League Baseball (PDF)

Analyzed pitch outcomes for entire 2019 MLB season

  • Experimented with k-nearest neighbors (kNN), scikit-learn's GradientBoostingClassifier (GBM), XGBoost, LightGBM, and neural networks (NN) models

  • Objective was to predict one of three classes for each pitch: strike, ball, or contact

  • XGBoost performed the best with an accuracy of ~74%

Visualizations

All visualization work is done in R

Sankey Networks of Pitch Sequences - App

  • How this is useful:

    • Can be used as a scouting report to look at pitch type sequence tendencies

  • How I created it:

    • Created sequences for each at bat in season for given pitcher

    • Turned sequences into data frame that had all combinations of a pitch with the next pitch. Ex: (0-0 FF -> 1-0 SI), (1-0 SI -> 2-0 FF)

    • Used that data frame to create a sankey network

Quality of Contact Plus (QC+) Validation

  • How this is useful:

    • Validating a new metric

    • Can see the average positioning of defenders coincide with 'cool' zones

  • How I created it:

    • Cleaned dataset of ~120,000 batted balls, removing all instances that were deemed to be 'No Nulls'

    • Ran an XGBoost multi-class classification using a multi:softprob objective that returned the probability of each batted ball belonging to one of five classes: out, single, double, triple, or homerun,

    • Applied weighted on-base average (wOBA) linear weights averaged over past six seasons to each class probability to get a weighted likelihood value

    • Took the weighted likelihood value and scaled it to an average of 100, similar to other popular metrics (wRC+, OPS+, DRC+, etc)

Kyle Hendricks' Pitch Sequences - By Batter Handedness

  • How this is useful:

    • Can be used as a scouting report to look at pitch type sequences

    • Ex: Can see that Hendricks mainly throws curveballs to lefties as a first pitch

  • How I created it:

    • Scraped all of Hendricks' pitches from the 2019 season via baseballsavant.com

    • Created a ggplot with geom_tile() to plot each pitch as a rectangle and then applied a facet_grid() to separate by batter handedness

Kyle Hendricks' Pitch Sequences - By Batter

  • How this is useful:

    • Can be used as a sort of game summary to see how each hitter was pitched against

    • Ex: Can see that Hendricks attacked Myers with a majority of sinkers

  • How I created it:

    • Scraped all of Hendricks' pitches from the 2019 season via baseballsavant.com

    • Created a ggplot with geom_tile() to plot each pitch as a rectangle and then applied a facet_grid() to separate by batter

Willson Contreras' Rolling Called Strike Percentage

  • How this is useful:

    • Shows Contreras' framing throughout the 2019 season (did he improve as Theo Epstein said?)

    • Can be used to see if changes being implemented by either player or coach is actually working

  • How I created it:

    • Limited data to only called balls and strikes in the 'Shadow Zone'

    • Calculated rolling average by number of strikes in last 50 called pitches in the 'Shadow Zone'

Pitch Velocity as a Function of Pitch Number

  • How this is useful:

    • Shows a game-wide trend of velocity diminishing as pitches increase

    • Shows how few pitchers in today's game reach a high number of pitches

  • How I created it:

    • Downloaded every pitch in the Statcast era (2015-2019)

    • Found the average velocity of each pitch in the game (1st, 2nd, etc.)

    • Found what percentage of pitchers throw (1, 2, etc.) pitches in a game

Starting Pitcher Velocity by Pitch Type as a Function of Pitch Number

  • How this is useful:

    • Shows a 'warming up' effect for starting pitchers

    • Shows how velocity seems to recover after inning breaks

    • Can lead to see how individual pitches velocity is affected as the number of pitches thrown increases

  • How I created it:

    • Downloaded every pitch in the Statcast era (2015-2019)

    • Found the average velocity of each pitch in the game (1st, 2nd, etc.) separated by pitch type

Kyle Hendricks 2019 Count Breakdown by Pitch Type

  • How this is useful:

    • Can be used as a scouting report to look at pitch type tendencies by count

  • How I created it:

    • Scraped all of Hendricks' pitches from the 2019 season via baseballsavant.com

    • Used facet_grid on a ggplot object separated by ball and strike combinations

Kyle Hendricks 2019 Count Breakdown by Pitch Type and Batter Stance

  • How this is useful:

    • Can be used as a scouting report to look at pitch type tendencies by count and batter stance

    • Ex: Can see that Hendricks rarely throws curveballs when behind in the count

  • How I created it:

    • Scraped all of Hendricks' pitches from the 2019 season via baseballsavant.com

    • Separated by pitch type

    • Used facet_grid on a ggplot object separated by ball and strike combinations

Called Strike Percentage vs Average

  • How this is useful:

    • Shows a catcher's strengths and weaknesses in framing - most important aspect of catcher defense

    • Helps front office and coaches evaluate a catcher's framing ability

  • How I created it:

    • Limited data to only called balls and strikes in the 'Shadow Zone'

    • Separated pitches into eight distinct zones

    • Calculated strike percentage in each zone for player and league

Detailed 'Shadow Zone' Pitch Calls

  • How this is useful:

    • Shows at a macro level this catcher's called pitches break down

  • How I created it:

    • Limited data to only called strikes and balls in the 'Shadow Zone'

    • Labeled each remaining pitch as either: a ball in the strike zone, ball out of the strike zone, called strike in the zone, or called strike out of the zone

Gaining and Losing Called Strikes

  • How this is useful:

    • Shows at a granular level where the catcher is successful and not at framing borderline pitches

    • Can lead coaches/front office to offer advice on where this catcher needs to improve his framing

  • How I created it:

    • Limited data to only called balls and strikes in the 'Shadow Zone'

    • Then limited the data further to only called strikes outside of the strike zone and balls inside the strike zone

Game Pitch Chart (Interactive)

  • How this is useful:

    • Quick velocity summary of the game for the pitcher and coaching staff

  • How I created it:

    • Labeled each pitch of the game

    • Plotted velocity for each pitch divided by pitch type

    • Made a plotly of the ggplot to add the interactive feature of the graph

  • Part of a game summary R Shiny app (in development)

3D Release Point Plot (Interactive)

  • How this is useful:

    • Can help the opposing team to try and identify differences in release point in order to pick up the pitch type from release

    • Helps the pitching coach/front office identify flaws in release point and try make them as similar as possible

  • How I created it:

    • Made a 3D plot of the pitcher's release point height, side, and extension

    • Color coded by pitch type

  • Part of a game summary R Shiny app (in development)