A Mathematical Analysis of Catcher/Pitcher Encryption Schemes

A Mathematical Analysis of Catcher/Pitcher Encryption Schemes fmlancast, samird, [email protected] May 16th, 2018 1 Introduction For those unfamiliar with the game of baseball, there are a few important actors that play a role in every single play of the game. The pitcher throws the baseball to his teammate, the catcher, while a member of the opposing team, the batter, attempts to hit the thrown ball. Pitchers use a variety of different types of pitches in an attempt to prevent the batter from hitting the ball. There are many different types of pitches, each of which moves with different speeds, curves, slides, or dips. The four main pitches (and the ones we will refer to in this paper) are • The fastball - thrown straight and fast • The curveball - thrown with spin so that it dives or curves • The slider - thrown with spin so that it tails laterally • The changeup - thrown slowly Before throwing a pitch, the pitcher and catcher need to agree on the pitch to be thrown. Usually, the catcher will indicate to the pitcher which pitch he thinks should be thrown using a series of hand signs. The pitcher will simply shake or nod his head in (dis)agreement. The standard mapping used by catchers to indicate pitches is one finger for fastball, two for curveball, three for slider, and four for a changeup. Obviously the catcher cannot simply tell the pitcher out loud which pitch to throw, because then the batter will gain a competitive advantage in that he knows which pitch to expect. Instead, the catcher will use hand signs, as seen in Figure 1 below, to indicate which pitch to throw. Now that we have explained one of baseball's basic constructions, we will move on to explain how this is related to cryptography and frame this problem as a cryptography one. There is a specific scenario in the game of baseball in which the above scheme for communicating pitches between the pitcher and catcher is insufficient. If a batter successfully reaches second base, he has a clear 1 Figure 1: Catcher giving sign for fastball view of the catcher giving signs to the pitcher. Given that the standard mapping (1=fastball, 2=curveball, 3=slider, 4=changeup) is known by everybody, he can then relay the signs to the batter. Standard ways of doing this include placing his hands on his knees, shuffling his feet, touching his helmet, etc. These details are less relevant, but the main point is that the pitcher/catcher communication scheme is no longer secure. To mitigate this, the pitcher and catcher must modify their communication scheme such that neither the current batter, or the batter on second base can determine which pitch will be thrown in advance. They typically do this using an encryption-like scheme that is designed to hide the value of the pitch from the batter on second base. There are a number of such schemes, most of which use a sequence of hand signs from the catcher to the pitcher. When we discuss these sequences, we will use the term index to refer to the location of a number in the sequence. For example, the index of the number 3 in the sequence 21431 is 4. In order to be consistent and to bound our calculations, we restricted the number of signs in a sequence for all of the encryption schemes to 5. It will be helpful to think about the original value of the pitch (1, 2, 3, or 4) as the plaintext message, and the sequence of signs as the ciphertext. To be clear, the schemes used by the pitcher and catcher to securely communicate pitch values are not encryption schemes in the traditional sense. They do not use mathematical keys to decrypt, instead the keys are simply knowledge of the encryption algorithm, which allows the pitcher to invert the encoding to retrieve the original pitch value. In order to be usable in an actual baseball game by the pitcher and catcher, these schemes must have a few characteristics: 1. They must be non-deterministic, since the catcher will be encrypting the same plaintext many times 2. They must be simple to \decrypt" by the pitcher, since he only has a 2 couple of seconds to invert the encryption and decide if he wants to throw that pitch We can formalize this problem as a Chosen Plaintext Attack security analysis on the encryption-like schemes used by a pitcher and catcher to communicate pitch information. We consider two different scenarios: 1. Consider the adversary to be any member of the opposing team on second base. Let n represent the number of times an adversary reaches second base. If we assume the average at-bat is approximately 5 pitches, the adversary will get to see 5n (ciphertext, plaintext) pairs to use to crack the scheme. 2. Consider an adversary watching recorded replays of a team's games. If the team uses the same scheme from game to game (which teams do), the adversary has essentially a limitless pool of (ciphertext, plaintext) pairs to use to crack the scheme. In the above scenarios, the adversary sees both the encryption sequence gener- ated by the catcher, and the decrypted plaintext in the form of the pitch that gets thrown. In this paper, we will do a mathematical analysis on the relative hiding capabilities of 9 of these schemes. We will then combine our analysis with a scoring of each of the schemes using some homegrown metrics such as usability and (in)consistency between at-bats. These terms will be given more context later. To help compare and rank the different schemes, we will also present a neural network that is trained to make predictions about pitch values given ciphertext inputs. Using what we learn from this analysis, we will develop our own pitch-encoding scheme and present that at the end. Before we dive into the details of the metrics, there are a few more defi- nitions that will make the following analysis much easier to digest for a reader without baseball-specific knowledge. First, the basic unit of a pitcher/catcher face-off is an at-bat. Each batter takes one at-bat at a time against the pitcher. Within an at-bat, we refer to the count, which indicates the state of the at-bat by tallying the number of balls and strikes the pitcher has thrown. A pitcher can throw up to 4 balls or 3 strikes in an at-bat, whichever comes first. 2 Defining the Metrics There are many aspects to each scheme that can make it more or less secure. From being easy for the pitcher and catcher to understand to being tough for the opposing team to crack, there are many factors and scenarios to consider. We have come up with five metrics to help quantify the security and practicality of each scheme • Usability - whether the pitcher can instantaneously determine the pitch from the catcher's signs with minimal thinking 3 • Hardness - how many at-bats an adversary needs to see to be able to accurately predict the pitch • Inconsistency - the variation of the scheme between pitches • Encodablility - the number ways a given pitch can be encoded • Decodability - the number combinations a scheme has if the adversary knows the scheme (number of variations of the specific rule based on pa- rameters that are set beforehand by catcher and pitcher) 2.1 Usability Some schemes require little to no thinking from the pitcher to decrypt the pitch from the catcher. The easier the transfer of knowledge is, the less likely it is that a miscommunication or misinterpretation will occur. For example, if the pitch is always the first sign, the pitcher simply has to look for that first sign and can ignore the rest of the scheme. On the other hand, a scheme like counting the number of 1s in a series of signs requires the pitcher to see all the signs and perform a calculation on that value. Thus, the key difference in usability is whether or not the pitcher has to observe all the signs to decrypt the code or if he simply needs to wait for one sign of the sequence. We say a scheme has usability 0 if it does the former and usability 1 if it does that latter. 2.2 Hardness The longer it takes to crack a scheme, the harder that scheme is by our definition. To objectively obtain a measure of how hard a scheme is to break, we built a MLP Classifier (Multi-layer Perceptron) neural network and observe the number of at-bats necessary in the training set for the model to learn to accurately predict a pitch from a given sequence. We built a generator function for each scheme that randomly picked sequences of length 5, and identified what the correct pitch should have been if each of those sequences was shown to a pitcher. We ensured that the probability that the sequence would correspond to a given pitch was equal across all four potential pitches. These sequences of signs, each appended by the number of outs, pitch count (balls and strikes), jersey number and whether the batter is left or right-handed at the time the sequence was given, constituted an input vector to the model. The output label for each input vector was simply the pitch that this sequence corresponded to (the pitch that should have been thrown). We trained each scheme's model on varying training set sizes, and deemed a scheme to be "cracked" when the neural net could consistently achieve over 80% accuracy on the test set.

Load more