Random Walk Interpretation of Server Security and Convergence to the Binomial Distribution
Introduction/context
In this assignment we'll model the scenario of a weekly security status of a server as a stochastic process. Each week the server may either remain secure or be breached by one of several attackers. We associate a score of +1 to secure weeks and −1 to breached weeks and study the cumulative score over n weeks.
By simulating many trajectories (with m independent runs), we show that this process can be interpreted as a random walk and that the empirical distribution of the final scores converges to a binomial distribution as n and m grow.
Model definition
There are some key points that must be considered, in order to keep the smiulation realistic. In order:
We consider
-
In each week, the server is breached with probability p and remains secure with probability
-
We assume independence between weeks.
Define the weekly score Yi:
We can treat this model as it is a random walk with steps +1 and -1.
Link between random walk and binomial distribution
In our model, each week represents one step of a discrete random walk.
The server either remains secure (step = +1) or gets breached (step = −1).
After n weeks, the cumulative score Sn is the sum of all steps:
where each Yi is an independent random variable taking values +1 and −1 with probabilities
1−p and p respectively.
The process Sn can therefore be viewed as a simple random walk starting at zero, moving one unit up or down each week depending on whether the server resists or is breached.
Mapping to the Binomial Distribution
Let X be the number of secure weeks during the n-week period.
Since each week is independent and has probability 1−p of being secure, X follows a Binomial distribution: X∼Binomial(n,1−p)
If the server was secure X times and breached n−X times, the total cumulative score can be expressed as:
Consequently, every possible total score Sn corresponds to a specific value of X, and their probabilities are directly related:
where s ∈
{-n,-n,+2,...,n-2,n}.
This formulation shows that the total score distribution, even if it's put in another context, is simply a binomial distribution re-scaled and shifted to the range [−n, n].
This link makes the random walk intuition clear: each secure week moves the trajectory one step upward, each breach moves it downward. Over time, the number of trajectories reaching a given total score follows the same pattern as the binomial probabilities. When n and m (the number of simulations) increase, the empirical frequencies of scores converge to these theoretical values according to the Law of Large Numbers. The next step, is to smiulate the actual scenario, based on this theory.
The test
In this section we implement a simulation of the random walk process described above.
The objective is to generate multiple independent trajectories of the server's cumulative security score, then compare the empirical frequencies of the final scores with the theoretical binomial probabilities.
The simulation assigns a score of +1 for a secure week and −1 for a breach, and repeats the process for many runs (trajectories). As the number of simulations m increases, the empirical results are expected to converge to the theoretical distribution.
The code defines a simulation that runs m independent random walks of length n, each representing one possible history of the server's weekly security outcomes. For each possible total score s, we compute both the empirical probability (from simulation) and the theoretical probability from the binomial model.
As usual, we implemented the code using JavaScript, here's a snippet of the code, from the part where the random walk it's calculated:
On the other hand, this loop computes the theoretical probabilities of each possible total score Sn:
Empirical results
To verify the model, we executed the simulation using the following parameters:
meaning that we simulated 10 000 servers, each observed for 20 weeks, with a 30% probability of being breached in any given week. For each trajectory, the total cumulative score was computed, and its frequency was compared with the theoretical binomial probability.
As we can see from the results above, the empirical and theoretical probabilities are extremely close across the entire range of possible scores. Minor discrepancies are due to randomness and finite sample size, but the shape of the distribution and its symmetry clearly mirror the binomial model.
-
The most likely total scores are around +6 to +10, corresponding to a majority of secure weeks (since p=0.3 gives a 70% chance of security each week).
-
Scores near −20 or +20 are rare, representing extreme cases where the server is always breached or always secure.
-
The histogram of empirical scores (not shown here) forms a smooth, bell-shaped curve centered near the theoretical expectation:
Increasing m (number of simulations) would make the empirical frequencies converge even more tightly to the theoretical values, in accordance with the Law of Large Numbers.
So, the test confirms that the cumulative score Sn behaves exactly like a random walk with step probabilities1−p and p. It also proves that the distribution of final scores is identical (after rescaling) to the binomial distribution for the number of secure weeks and that the convergence between simulated and theoretical results provides empirical evidence for the probabilistic model.
Concluding, this test validates the correctness of both the simulation logic and the theoretical formulation. Even with a moderate number of trajectories, the alignment between empirical and binomial probabilities demonstrates that the random walk representation of server security is both accurate and stable.
Nessun commento:
Posta un commento