Stochastic Signal Analysis

Stochastic Signal Analysis What's the word "stochastic" mean? The word stochastic is often used as an erudite (or what my mother would call up-ity) synonym for the word: random. But there is a formal, and informal, rule for when to use stochastic rather than random. Formally: Random describes the signal, while stochastic describes the type of analysis. A grammatically precise title for this class would be: Stochastic Analysis of Random Signals. We didn't do that, because people who don't appreciate the formal distinction between random and stochastic find the title redundant. Informally: We use the term "random" when we are working with a random number, or a random vector (a finite group of numbers). We are not careful to say if it is the number, or the analysis, which is random. We use the term "stochastic" when dealing with a random discrete time signal (a countably infinite group of random numbers) and signals (an uncountably infinite group of random numbers). We apply the word stochastic to both the signal itself, and the analysis. Known, Unknown, Partially Known and Random Numbers We say we know the value of variable x if we write something like x=4. If there is an expression that a sufficiently smart person, or calculator, can solve, we still say x is known. For example, x is known when x=sqrt(88)*cos(2*pi*456.789), even if you don't know how to find the value without the help of a calculator. If we know nothing at all about x, then we say it is unknown. For example if I ask you the distance between two points, with no other information, you have no idea what number to use. EE 5440 Page 1 Sometimes we have partial information about x, but don't know it's exact value. We may know that x is confined to a particular range of the number line, as in 4>x>3 or x=sin(theta) (which is the same as -1 ≤ x ≤ 1). Or we may know that x is one of a few different possible values, as x2=4, which means x is -2 or 2, but we don't know which. We may also know that x is an imaginary number, because x2<0. Or we may know that x is an integer, or rational, or irrational, or transretinal, etc. There's many different ways we can describe partial information about a variable. The variable x may be called a random variable. This means it is partially known, and the partial information we have fits a very specific set of rules. These rules give us an idea about not only the range x may take on, but that it has a preference for some of these values more than others. The rules that describe this preference are known as a probability measure. Non-Stochastic Analysis of Random (or other partially known) Variables You don't have to use stochastic analysis to handle partially known variables, or the particular class of partially known variables called random variables. One non-stochastic method is worst-case analysis. Example. There is an iconic suspension bridge in San Francisco California, known as the Golden Gate bridge. It was completed in 1937, and designed to carry 6 lanes of vehicle traffic. The load on the bridge varies with the number of vehicles, weight of each vehicle, wind conditions, and earth movement. The designers wanted to make sure the bridge would never fail, so they designed the structure to hold the most dense vehicles allowed on federal highways, packed bumper-to-bumper in all 6 lanes of the entire span of the bridge. They also assumed this would occur when there were gale force winds blowing, and a major earthquake was underway on a nearby geologic fault. They estimated in this situation, the loading on the bridge deck could reach about 4 kN/m2, (about 85 lbs per square foot). On a typical day, the bridge does EE 5440 Page 2 about 4 kN/m2, (about 85 lbs per square foot). On a typical day, the bridge does not come anywhere close to experiencing this high of a load. On May 24th of 1987, there was a celebration on the bridge, as it was 50 years since it's completion. The bridge was shutdown to vehicle traffic, and roughly 250 thousand pedestrians were allowed to walk on the bridge. This is the heaviest the bridge had ever been loaded (the center of the span drooped over 2 meters from it's un-loaded height). The load approached 3 kN/m^2, or 75% of its maximum design load. While 250 thousand were allowed on the bridge, another 500 thousand were turned away. While these half million people were probably disappointed, notice that if they had been allowed on the bridge, it may have catastrophically failed. Random or Stochastic Analysis The worst case analysis described above is not always useful. Suppose there is EE 5440 Page 3 The worst case analysis described above is not always useful. Suppose there is an electronic system that can cause dramatic damage, if it is accessed by an unauthorized individual. I plan to protect it by asking a user to enter a password with N characters. If the user gets the password wrong, they will have to wait one minute before trying again. How large does N have to be, so that I'm confident unauthorized users (who have never been told the password) cannot access the system? Worst case analysis says that no matter what you choose for N, the system is not secure. Even if N=100, it's possible an unauthorized user will correctly type all N characters on the first attempt. Or if they don't get it on the first attempt, they can eventually guess the correct sequence. This concept is sometimes called the Infinite Monkey Theorem It has roots that go back at least as far as the ancient Greek philosopher Aristotle. It has been rephrased and updated many times over the ages, with one version being: If a monkey is set in front of a typewriter and randomly hits keys for an infinite period of time, he will eventually type the complete works of William Shakespeare. I prefer an updated version that may have first shown up in 1996 in a speech by Robert Wilensky: We've heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that is not true. But, thanks to the Wikipedia page dedicated to the Infinite Monkey Theorem I know the following: EE 5440 Page 4 know the following: In 2002, lecturers and students from the University of Plymouth MediaLab Arts course used a £2,000 grant from the Arts Council to study the literary output of real monkeys. They left a computer keyboard in the enclosure of six Celebes crested macaques in Paignton Zoo in Devon, England for a month, with a radio link to broadcast the results on a website. Not only did the monkeys produce nothing but five total pages largely consisting of the letter 'S', the lead male began striking the keyboard with a stone, and other monkeys followed by soiling it. Mike Phillips, director of the university's Institute of Digital Arts and Technology (i-DAT), said that the artist-funded project was primarily performance art, and they had learned "an awful lot" from it. Getting away from monkeys, and back to passwords, let's look at how long it would take someone to try every possible combination of 100 characters. If you can choose from 26 lower case, 26 upper case, 10 digits and 10 special characters, you have 26+26+10+10=72 choices for each character. That means there are 72*72 ways to pick two characters, 72*72*72 ways to pick three characters, and 72^100 ways to pick 100 characters. That is over 10^185 possible combinations. I find it difficult to put large numbers like this into perspective. Let's compare it to the age of the universe. Scientists currently estimate the big bang that was the origin of the universe occurred about 10^24 seconds ago. So if someone started guessing passwords when the universe was young, to get through all N letter combinations by today, they would have to guess 10^161 combinations per second. This may incorrectly lead you to believe that a 100 digit password is for all practical purposes impossible to break. For a set of 100 randomly selected EE 5440 Page 5 practical purposes impossible to break. For a set of 100 randomly selected characters, this is pretty much true with current technology. However human factors can dramatically reduce its security. If people are allowed to select their own password, they are inclined to use words. Assume someone has a vocabulary of about 10,000 words, and each are about 5 characters long. Then, 100 characters is 20 words, and there are roughly 10,000^20. Now there are only 10^80 possible passwords. The problem of cracking the password just got easier by a factor of 10^100. Take into account that many people pick a string of 20 words which are form a grammatically correct sentence, and the password becomes even less secure. Simple counting often is insufficient in situations like this. We need a way to quantify what words, and sequence of words, people are likely to pick. We then need to see how likely that makes it that a very intelligent and motivated hacker could guess that string of words in a reasonably short period of time.

Load more