The pioneering baseball sabermetric research published by James was the primary inspiration for the simulator’s development, including his so-called Log5 formula to predict the outcomes of batter-pitcher matchups.
However, the simulator uses a complex variation of the Log5 to adjust probabilities for batter/pitcher splits and situations like home-road, left-right, and base-running configurations (e.g., runners in scoring position) in order to achieve the most reliable predictive outcomes. Of these, left-right splits figure most prominently.
Home run numbers from the Dead Ball Era are processed by an additional set of routines to account for the fact that parks from that era often had either no fences or obscenely large outfield dimensions in comparison to the modern game.
Defensive probabilities take into account the dimensions of the home team’s park and the ground-fly tendencies of its pitchers. Broad application of the Log5 requires a tremendous volume of data. For this reason NSB maintains its own proprietary database schemas instead of those created and maintained by Sean Lahman.
Split stats and situational stats are culled from play by play files published by Retrosheet. As of August 2020 we have splits for all player records dating back to 1901.