Micah Jona
micahjona99@gmail.com | GitHub: mjona99 | Resume (PDF)
About Me
I'm a senior computer science and statistics double major at the University of Wisconsin-Madison. After graduation, I hope to join a MLB team's baseball operations department as either an analyst or developer. Most of my programming experience is in Java and R, but I also have varying degrees of proficiency in C, JavaScript, SQL, and Python. I love playing pick-up games (basketball/ultimate frisbee) with friends, enjoy watching the Cubs (most of the time), and working on baseball coding and visualization projects.
Projects
How Tired Do Starting Pitchers Get During a Game? (PDF)
Analyzed pitcher and then starting pitcher fatigue during a game measured through pitch velocity
Showed that velocity generally declined throughout the game
Showed that there is some sort of recovery during inning breaks that allows a pitcher to recapture some of the previously lost velocity
Showed that velocity increased for the last four pitches of an inning
Predicting Pitch Outcomes in Major League Baseball (PDF)
Analyzed pitch outcomes for entire 2019 MLB season
Experimented with k-nearest neighbors (kNN), scikit-learn's GradientBoostingClassifier (GBM), XGBoost, LightGBM, and neural networks (NN) models
Objective was to predict one of three classes for each pitch: strike, ball, or contact
XGBoost performed the best with an accuracy of ~74%
Visualizations
All visualization work is done in R
Sankey Networks of Pitch Sequences - App
How this is useful:
Can be used as a scouting report to look at pitch type sequence tendencies
How I created it:
Created sequences for each at bat in season for given pitcher
Turned sequences into data frame that had all combinations of a pitch with the next pitch. Ex: (0-0 FF -> 1-0 SI), (1-0 SI -> 2-0 FF)
Used that data frame to create a sankey network
Quality of Contact Plus (QC+) Validation
How this is useful:
Validating a new metric
Can see the average positioning of defenders coincide with 'cool' zones
How I created it:
Cleaned dataset of ~120,000 batted balls, removing all instances that were deemed to be 'No Nulls'
Ran an XGBoost multi-class classification using a multi:softprob objective that returned the probability of each batted ball belonging to one of five classes: out, single, double, triple, or homerun,
Applied weighted on-base average (wOBA) linear weights averaged over past six seasons to each class probability to get a weighted likelihood value
Took the weighted likelihood value and scaled it to an average of 100, similar to other popular metrics (wRC+, OPS+, DRC+, etc)
Kyle Hendricks' Pitch Sequences - By Batter Handedness
How this is useful:
Can be used as a scouting report to look at pitch type sequences
Ex: Can see that Hendricks mainly throws curveballs to lefties as a first pitch
How I created it:
Scraped all of Hendricks' pitches from the 2019 season via baseballsavant.com
Created a ggplot with geom_tile() to plot each pitch as a rectangle and then applied a facet_grid() to separate by batter handedness
Kyle Hendricks' Pitch Sequences - By Batter
How this is useful:
Can be used as a sort of game summary to see how each hitter was pitched against
Ex: Can see that Hendricks attacked Myers with a majority of sinkers
How I created it:
Scraped all of Hendricks' pitches from the 2019 season via baseballsavant.com
Created a ggplot with geom_tile() to plot each pitch as a rectangle and then applied a facet_grid() to separate by batter
Willson Contreras' Rolling Called Strike Percentage
How this is useful:
Shows Contreras' framing throughout the 2019 season (did he improve as Theo Epstein said?)
Can be used to see if changes being implemented by either player or coach is actually working
How I created it:
Limited data to only called balls and strikes in the 'Shadow Zone'
Calculated rolling average by number of strikes in last 50 called pitches in the 'Shadow Zone'
Pitch Velocity as a Function of Pitch Number
How this is useful:
Shows a game-wide trend of velocity diminishing as pitches increase
Shows how few pitchers in today's game reach a high number of pitches
How I created it:
Downloaded every pitch in the Statcast era (2015-2019)
Found the average velocity of each pitch in the game (1st, 2nd, etc.)
Found what percentage of pitchers throw (1, 2, etc.) pitches in a game
Starting Pitcher Velocity by Pitch Type as a Function of Pitch Number
How this is useful:
Shows a 'warming up' effect for starting pitchers
Shows how velocity seems to recover after inning breaks
Can lead to see how individual pitches velocity is affected as the number of pitches thrown increases
How I created it:
Downloaded every pitch in the Statcast era (2015-2019)
Found the average velocity of each pitch in the game (1st, 2nd, etc.) separated by pitch type
Kyle Hendricks 2019 Count Breakdown by Pitch Type
How this is useful:
Can be used as a scouting report to look at pitch type tendencies by count
How I created it:
Scraped all of Hendricks' pitches from the 2019 season via baseballsavant.com
Used facet_grid on a ggplot object separated by ball and strike combinations
Kyle Hendricks 2019 Count Breakdown by Pitch Type and Batter Stance
How this is useful:
Can be used as a scouting report to look at pitch type tendencies by count and batter stance
Ex: Can see that Hendricks rarely throws curveballs when behind in the count
How I created it:
Scraped all of Hendricks' pitches from the 2019 season via baseballsavant.com
Separated by pitch type
Used facet_grid on a ggplot object separated by ball and strike combinations
Called Strike Percentage vs Average
How this is useful:
Shows a catcher's strengths and weaknesses in framing - most important aspect of catcher defense
Helps front office and coaches evaluate a catcher's framing ability
How I created it:
Limited data to only called balls and strikes in the 'Shadow Zone'
Separated pitches into eight distinct zones
Calculated strike percentage in each zone for player and league
Detailed 'Shadow Zone' Pitch Calls
How this is useful:
Shows at a macro level this catcher's called pitches break down
How I created it:
Limited data to only called strikes and balls in the 'Shadow Zone'
Labeled each remaining pitch as either: a ball in the strike zone, ball out of the strike zone, called strike in the zone, or called strike out of the zone
Gaining and Losing Called Strikes
How this is useful:
Shows at a granular level where the catcher is successful and not at framing borderline pitches
Can lead coaches/front office to offer advice on where this catcher needs to improve his framing
How I created it:
Limited data to only called balls and strikes in the 'Shadow Zone'
Then limited the data further to only called strikes outside of the strike zone and balls inside the strike zone
Game Pitch Chart (Interactive)
How this is useful:
Quick velocity summary of the game for the pitcher and coaching staff
How I created it:
Labeled each pitch of the game
Plotted velocity for each pitch divided by pitch type
Made a plotly of the ggplot to add the interactive feature of the graph
Part of a game summary R Shiny app (in development)
3D Release Point Plot (Interactive)
How this is useful:
Can help the opposing team to try and identify differences in release point in order to pick up the pitch type from release
Helps the pitching coach/front office identify flaws in release point and try make them as similar as possible
How I created it:
Made a 3D plot of the pitcher's release point height, side, and extension
Color coded by pitch type
Part of a game summary R Shiny app (in development)