Simulation is the process of designing a model of a real system and conducting experiments with that model to understand the behaviour of the system or evaluate strategies for its operation.
It imitates the operations of real-world processes or systems over time, using a computer to generate artificial history.
โ Use Simulation When:
- The real system does not yet exist (designing a new hospital, airport)
- Experimenting with the real system would be too dangerous, costly, or disruptive
- The system is too complex for analytical (mathematical) solutions
- You want to compress or expand time (test 10 years of operation in minutes)
- You need to observe system behaviour under different what-if scenarios
โ Do NOT Use Simulation When:
- A simple analytical or mathematical solution already exists
- Common sense or direct experimentation is cheaper and faster
- The problem is straightforward and well-understood
- Resources (time, money, expertise) are insufficient to build a valid model
Benefits (pick 2)
- Handles complexity โ Simulation can model systems too complex for closed-form mathematical solutions (e.g., queues with non-Poisson arrivals)
- Risk-free experimentation โ Test policies and scenarios without disrupting the real system or incurring real costs
- Time flexibility โ Can slow down or speed up time; observe rare events that formulae can't easily represent
- Visual & intuitive โ Results are easier for non-mathematicians to understand and communicate
Limitations (pick 1)
- Not exact โ Results are estimates with statistical error; requires many runs for accuracy
- Time and cost โ Building and validating a good simulation model is expensive and time-consuming
- Garbage in, garbage out โ A poorly built model produces misleading results
- Requires expertise โ Need statistical knowledge to design and interpret simulation experiments correctly
| Problem Area | Example | Solution Method |
|---|---|---|
| Banking / Queuing | Customer wait times at bank tellers | Queuing simulation (single/multi-server) |
| Manufacturing | Production line throughput, bottleneck analysis | Discrete-event simulation |
| Inventory / Supply Chain | Optimizing stock levels, lead times | Monte Carlo simulation |
| Transportation | Traffic flow, bus scheduling, route optimization | Agent-based / DES models |
| Healthcare | Hospital capacity planning, emergency room flow | Discrete-event simulation |
| Finance / Investment | Portfolio risk analysis, project NPV | Monte Carlo simulation |
| Military / Emergency | Evacuation planning, disaster response | Agent-based simulation |
- Problem Formulation โ Define the problem clearly; set objectives and scope of the study
- Data Collection & Analysis โ Gather real-world data; identify probability distributions that fit the data
- Model Building โ Construct the simulation model representing the real system
- Verification โ Check that the model works as intended (does the code do what you designed?)
- Validation โ Check that the model accurately represents the real system (does the model match reality?)
- Experimental Design โ Define what scenarios to run, number of replications, run length
- Run Simulation & Collect Results โ Execute the model and gather output statistics
- Analysis & Interpretation โ Analyse results using statistical methods; draw conclusions
- Documentation & Implementation โ Report findings; implement recommendations in the real system
Discrete Systems
State variables change at specific, countable points in time (events).
Examples:
- Bank queue: changes when customer arrives or leaves
- Traffic light: changes state at specific times
- Manufacturing: machine breaks down at a specific moment
Continuous Systems
State variables change continuously over time.
Examples:
- Water tank: water level changes continuously
- Chemical reaction: concentration changes over time
- Population growth: changes continuously
Deterministic
No randomness โ given the same inputs, you always get the same output.
Example: A machine that always takes exactly 5 minutes to produce one unit.
Used when variability is negligible or unimportant.
Stochastic
Includes randomness โ outputs vary even with the same inputs.
Example: Customer service time follows an exponential distribution โ each customer takes a different random time.
More realistic for most real-world systems.
Static
Represents a system at a single point in time; time does not play a role.
Example: Monte Carlo simulation to estimate the value of ฯ.
Dynamic
Represents a system as it evolves over time.
Example: Queuing simulation tracking customers arriving, waiting, and departing over hours.
State variables describe the current condition of the system at any point in time:
- Number of customers in the queue โ how many are waiting
- Server status โ busy (1) or idle (0)
- Number of customers in the system โ queue + those being served
- Time of last event โ when the last arrival or departure occurred
Events in a Single-Server Model
- Arrival event โ a new customer joins the system
- Departure/Service completion event โ server finishes with a customer; next customer (if any) begins service
- End-of-simulation event โ the stopping condition is reached
Verification
Ensuring the simulation model correctly implements the conceptual model.
"Did we build the model right?"
Methods: Code review, trace through the model step-by-step, check with known simple cases.
Validation
Ensuring the simulation model is an accurate representation of the real system.
"Did we build the right model?"
Methods: Compare output to historical data, statistical hypothesis tests, expert review.
- Large solution space โ there may be millions of possible combinations of input variables to explore
- Noisy responses โ simulation output is stochastic; the same input can give different outputs in different runs
- Computational cost โ each simulation run takes time; evaluating all options is infeasible
- Local vs global optima โ algorithms may get stuck in a local optimum and miss the global best
- Warm-up period โ initial transient effects bias results if not handled correctly
A queuing simulation tracks customers arriving, waiting, being served, and leaving. The key metrics to compute are:
| Metric | Formula |
|---|---|
| Average Waiting Time | Total waiting time รท Total customers |
| Probability of Waiting (P_wait) | Customers who waited > 0 รท Total customers |
| Probability of Idle Server (P_idle) | Total idle time รท Total simulation time |
| Average Time in System | Total time in system รท Total customers |
Given: inter-arrival times and service times for each customer.
- Arrival Time = Previous arrival time + Inter-arrival time (Customer 1 arrives at time 0)
- Time Service Begins = MAX(Arrival Time, Previous Service End Time)
- Wait in Queue = Time Service Begins โ Arrival Time
- Time Service Ends = Time Service Begins + Service Time
- Time in System = Time Service Ends โ Arrival Time
- Idle Time of Server = MAX(0, Arrival Time โ Previous Service End Time) [only if server was free when customer arrived]
From the 20-customer simulation table given in the exam paper, here's how to compute the three required metrics:
Step 1: Extract the totals from the table
| Column | Total (Sum) |
|---|---|
| Time Customer Waits in Queue | 0+0+0+3+0+0+0+4+6+8+8+10+10+8+6+3+6+7+6+1 = 86 minutes |
| Idle Time of Server | 0+4+0+0+0+0+4+0+0+0+0+0+0+0+0+0+0+0+0+0 = 8 minutes |
| Customers who waited (>0 minutes) | Customers 4,8,9,10,11,12,13,14,15,16,17,18,19,20 = 14 customers |
Step 2: Calculate the metrics
When arrival rate ฮป and service rate ฮผ are given (not a simulation table), use these formulae:
Let ฯ = ฮป/ฮผ (traffic intensity / server utilisation)
| Metric | Formula |
|---|---|
| Prob. server is idle (Pโ) | Pโ = 1 โ ฯ = 1 โ ฮป/ฮผ |
| Prob. customer receives immediate service | = Pโ = 1 โ ฯ |
| Avg number in system (L) | L = ฯ / (1 โ ฯ) |
| Avg time in system (W) | W = 1 / (ฮผ โ ฮป) |
| Avg number in queue (Lq) | Lq = ฯยฒ / (1 โ ฯ) |
| Avg wait in queue (Wq) | Wq = ฮป / [ฮผ(ฮผ โ ฮป)] |
ฮป = 3 customers/hour, ฮผ = 4 customers/hour, ฯ = 3/4 = 0.75
- Proportion idle: Pโ = 1 โ 0.75 = 0.25 (25%)
- Prob. immediate service: = 0.25 (same as above)
- Avg customers in system: L = 0.75/(1โ0.75) = 0.75/0.25 = 3 customers
- Avg time in system: W = 1/(4โ3) = 1/1 = 1 hour
Given IAT and ST sequences, build the simulation table the same way as the bank example:
IAT: 0 1 1 6 3 7 5 2 4 1 โ Customer 1 arrives at t=0, Customer 2 at t=0+1=1, etc.
Once the table is built, compute:
- Average waiting time = Sum of all queue wait times / 8 customers
- Idle time of server = Sum of all server idle time entries
- Average service time = Sum of all service times / 8 customers
Monte Carlo simulation uses random numbers to sample from probability distributions and simulate the behaviour of a system many times, then averages the results.
Steps:
- Identify the uncertain variables and their probability distributions
- Assign random number ranges to each outcome (cumulative distribution)
- Use random numbers to look up simulated values
- Calculate the outcome for each trial
- Average the results across all trials
Advantages
- Can handle complex, interdependent variables
- Provides a range of possible outcomes (risk profile)
- Easy to understand and explain
- Flexible โ works with any distribution
Disadvantages
- Only as good as the input distributions (GIGO)
- Requires many trials for accurate results
- Computationally intensive
- Results are approximate, not exact
Convert a probability table into cumulative probabilities, then map ranges of random numbers to each value:
| Value | Prob | Cumulative Prob | RN Range |
|---|---|---|---|
| 45 | 0.03 | 0.03 | 00โ02 |
| 46 | 0.05 | 0.08 | 03โ07 |
| 47 | 0.07 | 0.15 | 08โ14 |
| 48 | 0.10 | 0.25 | 15โ24 |
| 49 | 0.15 | 0.40 | 25โ39 |
| 50 | 0.20 | 0.60 | 40โ59 |
| 51 | 0.15 | 0.75 | 60โ74 |
| 52 | 0.10 | 0.85 | 75โ84 |
| 53 | 0.07 | 0.92 | 85โ91 |
| 54 | 0.05 | 0.97 | 92โ96 |
| 55 | 0.03 | 1.00 | 97โ99 |
Train capacity = 51 trucks. Shortfall = max(0, 51 โ production). Excess = max(0, production โ 51).
Random numbers: 05, 34, 78, 56, 45, 90, 04, 58, 92, 39
| Day | RN | Production | Shortfall | Excess |
|---|---|---|---|---|
| 1 | 05 | 46 (RN 03โ07) | 5 | 0 |
| 2 | 34 | 49 (RN 25โ39) | 2 | 0 |
| 3 | 78 | 52 (RN 75โ84) | 0 | 1 |
| 4 | 56 | 50 (RN 40โ59) | 1 | 0 |
| 5 | 45 | 50 (RN 40โ59) | 1 | 0 |
| 6 | 90 | 53 (RN 85โ91) | 0 | 2 |
| 7 | 04 | 46 (RN 03โ07) | 5 | 0 |
| 8 | 58 | 50 (RN 40โ59) | 1 | 0 |
| 9 | 92 | 54 (RN 92โ96) | 0 | 3 |
| 10 | 39 | 49 (RN 25โ39) | 2 | 0 |
| Total | 17 | 6 |
Avg shortfall: 17/10 = 1.7 trucks/day | Avg excess: 6/10 = 0.6 trucks/day
Formula: Profit = (Selling Price โ Variable Cost) ร Demand โ Investment
Each trial: use random numbers to look up a selling price, a variable cost, and a demand. Calculate profit for that trial. Average across all 10 trials.
- Uniformly distributed on [0,1]
- Statistically independent โ knowing one number tells you nothing about the next
- Long period โ the sequence should not repeat for a very long time
- Reproducible โ same seed gives same sequence (useful for debugging)
- Fast to generate computationally
Period: The length of the sequence before it repeats.
Where: Xโ = seed, a = multiplier, c = increment, m = modulus
To get U โ [0,1]: divide by m: U = X / m
When c = 0, it's the Multiplicative Congruential Generator:
Xโ = (19 ร 63) mod 100 = 1197 mod 100 = 97 Xโ = (19 ร 97) mod 100 = 1843 mod 100 = 43 Xโ = (19 ร 43) mod 100 = 817 mod 100 = 17 Xโ = (19 ร 17) mod 100 = 323 mod 100 = 23 Xโ = (19 ร 23) mod 100 = 437 mod 100 = 37
So Xโ = 37, giving Uโ = 0.37
Skip-Ahead Formula
This lets you jump n steps ahead without computing all intermediate values.
Combines three multiplicative generators for a very long period (~8ร10ยนยฒ):
Parameters: aโ=157, mโ=32363 | aโ=146, mโ=31727 | aโ=142, mโ=31657
Xโ,โ = (157 ร 100) mod 32363 = 15700 mod 32363 = 15700 Xโ,โ = (146 ร 300) mod 31727 = 43800 mod 31727 = 12073 Xโ,โ = (142 ร 500) mod 31657 = 71000 mod 31657 = 7686 Yโ = (15700 - 12073 + 7686) mod 32362 = 11313 mod 32362 = 11313 Uโ = 11313 / 32363 โ 0.3496
Repeat for iterations 2โ5 using Xโ,โ, Xโ,โ, Xโ,โ as new seeds.
Proposed by John von Neumann. To generate n-digit numbers:
- Start with an n-digit seed
- Square it to get a 2n-digit number
- Extract the middle n digits as the next number
- Repeat
Fibonacci Generator
# R implementation
fibonacci_rng <- function(n, x0) {
x <- numeric(n + 2)
x[1] <- x0[1]; x[2] <- x0[2] # need 2 seeds
for (i in 3:(n+2)) x[i] <- (x[i-1] + x[i-2]) %% 1
return(x[3:(n+2)])
}
Mitchell-Moore Generator
# R implementation
mitchell_moore <- function(n, x0) { # x0 must be length 55
x <- c(x0, numeric(n))
for (i in 56:(55+n)) x[i] <- (x[i-24] + x[i-55]) %% 1
return(x[56:(55+n)])
}
To transform U ~ Uniform[0,1] to Uniform[a, b]:
Examples:
- Uniform[โ11, 17]: X = โ11 + (17 โ (โ11)) ร U = โ11 + 28U
- Uniform[โ148, 1128]: X = โ148 + 1276U
- Uniform[โ160, 200]: X = โ160 + 360U
The most common technique. If F(x) is the CDF, set F(x) = U (uniform random number) and solve for x:
Works perfectly when Fโปยน has a closed form.
Exponential Distribution (mean = ฮฒ)
Generate 5 values using U: 0.12, 0.45, 0.78, 0.33, 0.91 (example) X = -3.7 ร ln(U) Xโ = -3.7 ร ln(0.12) = -3.7 ร (-2.12) = 7.85 days Xโ = -3.7 ร ln(0.45) = -3.7 ร (-0.80) = 2.95 days ...
Uniform Distribution on [a, b]
Normal Distribution (mean=ฮผ, variance=ฯยฒ)
Use Box-Muller or standard normal table. In R: rnorm(n, mean=ฮผ, sd=ฯ)
Generate 5 values: look up Z from standard normal table using U, then X = 33 + 2Z
Weibull Distribution (shape ฮฑ, scale ฮฒ)
For the fluorescent bulb: P(T > t) = e^(โ(t/1000)โต). Set U = e^(โ(T/1000)โต), solve for T:
Discrete Distribution (Empirical)
Build a CDF table. For each U, find the x value where F(x) first exceeds U.
x: 1, 2, 4, 6, 8, 10 with f(x): 0.10, 0.20, 0.25, 0.20, 0.15, 0.10
| x | f(x) | F(x) | RN Range |
|---|---|---|---|
| 1 | 0.10 | 0.10 | 0.00โ0.10 |
| 2 | 0.20 | 0.30 | 0.10โ0.30 |
| 4 | 0.25 | 0.55 | 0.30โ0.55 |
| 6 | 0.20 | 0.75 | 0.55โ0.75 |
| 8 | 0.15 | 0.90 | 0.75โ0.90 |
| 10 | 0.10 | 1.00 | 0.90โ1.00 |
U=0.05 โ x=1; U=0.45 โ x=4; U=0.62 โ x=6; U=0.90 โ x=10
Given: F(x) = x(x+1)(2x+1) / [n(n+1)(2n+1)] for x = 1, 2, โฆ, n. With n=4:
| x | Numerator x(x+1)(2x+1) | F(x) |
|---|---|---|
| 1 | 1ร2ร3 = 6 | 6/180 = 0.033 |
| 2 | 2ร3ร5 = 30 | 30/180 = 0.167 |
| 3 | 3ร4ร7 = 84 | 84/180 = 0.467 |
| 4 | 4ร5ร9 = 180 | 180/180 = 1.000 |
Find x such that F(xโ1) < U โค F(x):
- Rโ = 0.83 โ F(3)=0.467 < 0.83 โค F(4)=1.000 โ x = 4
- Rโ = 0.24 โ F(2)=0.167 < 0.24 โค F(3)=0.467 โ x = 3
- Rโ = 0.57 โ F(3)=0.467 < 0.57 โค F(4)=1.000 โ x = 4
Range (a, b), mode at c. True mean = (a + b + c) / 3
For range (1,10), mode at 4: True mean = (1+10+4)/3 = 5
Beta Distribution
In R: rbeta(n, shape1=ฮฒโ, shape2=ฮฒโ) generates values on [0,1]. To transform to [L, H]: X = L + (HโL) ร U_beta
# Exponential (mean = 3.7) rexp(5, rate = 1/3.7) # Normal (mean=33, variance=4, so sd=2) rnorm(5, mean=33, sd=2) # Gamma (shape=3, scale=1) rgamma(10000, shape=3, scale=1) # Beta rbeta(10, shape1=1.47, shape2=2.16) # Weibull (shape=5, scale=1000) rweibull(n, shape=5, scale=1000) # Chi-squared (k degrees of freedom) rchisq(n, df=k) # or: sum of k squared standard normals # Binomial P(X = 25000) where X ~ Bin(50000, 0.5) # Method 1: dbinom(25000, size=50000, prob=0.5) # Method 2 (using sum, log, exp only): n <- 50000; k <- 25000; p <- 0.5 log_prob <- sum(log(1:n)) - sum(log(1:k)) - sum(log(1:(n-k))) + k*log(p) + (n-k)*log(1-p) exp(log_prob)
Fixed-Time Increment (Fixed-Step)
The simulation clock advances by a fixed small amount (ฮt) at each step. Check if any events occurred in that interval.
โ Simple to implement
โ Good for continuous-time systems
โ Wasteful if events are rare (many empty time steps)
โ May miss events between steps if ฮt too large
Next-Event Time Advance
The clock jumps directly to the time of the next event. Maintains a future event list (FEL) sorted by event time.
โ Efficient โ no wasted steps
โ Standard for discrete-event simulation
โ More complex to implement
โ Requires maintaining and sorting the event list
The simulation table tracks: Clock time, Server status, Queue length, Future events.
Given: IAT = 1, 5, 6, 3, 8 and ST = 3, 5, 4, 1, 5
| Clock | Event | Server | Queue | Next Arrival | Next Departure |
|---|---|---|---|---|---|
| 0 | Start | Idle | 0 | 1 | โ |
| 1 | C1 arrives | Busy | 0 | 6 | 4 |
| 4 | C1 departs | Idle | 0 | 6 | โ |
| 6 | C2 arrives | Busy | 0 | 12 | 11 |
| 11 | C2 departs | Idle | 0 | 12 | โ |
| 12 | C3 arrives | Busy | 0 | 15 | 16 |
| 15 | C4 arrives | Busy | 1 | 23 | 16 |
Two probability tables: arrivals per day and unloading rate per day. Build RN ranges for each, then simulate day by day tracking:
- Barges arriving each day
- Barges unloaded each day (limited by unloading rate and barges available)
- Barges delayed (queue carrying over to next day)
Metrics: Avg barges delayed = Total delayed / Days; Avg arrivals/day; Avg offloaded/day.
When arrival rates change over time (e.g., rush hours), the arrival rate ฮป(t) is estimated from data:
Example: 8:00โ10:00 has counts 22, 24, 20, 28 across 4 days โ avg = 94/4 = 23.5 arrivals โ ฮป = 23.5/2 = 11.75 arrivals/hour
When data shows autocorrelation (successive observations are related), use time-series models.
AR(1) โ Autoregressive Model
Where ฯ is the autocorrelation coefficient and ฮต_n ~ Normal(0, ฯยฒ(1โฯยฒ)). Works well for continuous, normally distributed data.
EAR(1) โ Exponential Autoregressive Model
Better for count data (integers, non-negative). Uses a mixture approach with a Bernoulli random variable.
To choose between them: Look at a histogram of the data. If bell-shaped/symmetric โ AR(1). If skewed/count-like โ EAR(1).
Hotel patrons data (20, 14, 21, 19, 14, 18, 21, 25, 27, 26, 22, 18, 13, 18, 18, 18, 25, 23, 20, 21) appears roughly normal โ AR(1) likely fits better.
# R code for AR(1) x <- c(20,14,21,19,14,18,21,25,27,26,22,18,13,18,18,18,25,23,20,21) fit <- arima(x, order=c(1,0,0)) print(fit) hist(x, main="Hotel Patrons")
Test if simulation output is consistent with real system behaviour.
Data: 7 replications: 18.9, 22.0, 19.4, 22.1, 19.8, 21.9, 20.2. Real-world average: ฮผโ = 22.5
Hโ: ฮผ = 22.5 (model is valid) vs Hโ: ฮผ โ 22.5
n = 7 xฬ = (18.9+22.0+19.4+22.1+19.8+21.9+20.2)/7 = 144.3/7 = 20.614 sยฒ = ฮฃ(xแตข โ xฬ)ยฒ / (nโ1) Deviations: -1.714, 1.386, -1.214, 1.486, -0.814, 1.286, -0.414 Squared: 2.939, 1.921, 1.474, 2.209, 0.663, 1.654, 0.171 Sum = 11.031 โ sยฒ = 11.031/6 = 1.839 โ s = 1.356 t = (xฬ โ ฮผโ) / (s/โn) = (20.614 โ 22.5) / (1.356/โ7) = โ1.886 / 0.5124 = โ3.68 Critical value: tโ.โโโ ,โ = 2.447 (two-tailed, ฮฑ=0.05)
|t| = 3.68 > 2.447 โ Reject Hโ: the simulation output is NOT consistent with the real system. The model needs revision.
Data: 0.740, 1.28, 1.46, 2.36, 0.354, 0.750, 0.912, 4.44, 0.114, 3.08, 3.24, 1.10, 1.59, 1.47, 1.17, 1.27, 9.12, 11.5, 2.42, 1.77
- Compute summary statistics: mean, variance, min, max
- Plot histogram to observe shape (skewed right โ try Exponential or Gamma)
- Estimate parameters: For Exponential, mean โ 1/ฮป so ฮปฬ = 1/xฬ
- Goodness-of-fit test: Chi-square or K-S test to confirm fit
# R code for input modelling
data <- c(0.740,1.28,1.46,2.36,0.354,0.750,0.912,4.44,0.114,
3.08,3.24,1.10,1.59,1.47,1.17,1.27,9.12,11.5,2.42,1.77)
mean(data); var(data); sd(data)
hist(data, probability=TRUE)
# If variance โ meanยฒ, try exponential
# If right-skewed, try gamma or lognormal
library(MASS)
fitdistr(data, "exponential")
fitdistr(data, "gamma")
Where Oแตข = observed frequency, Eแตข = expected frequency under the hypothesised distribution.
Degrees of freedom = k โ 1 โ (number of estimated parameters)
- For Poisson with unknown mean: df = k โ 2 (estimated ฮป from data)
- For Poisson with specified mean: df = k โ 1
Reject Hโ if ฯยฒ > ฯยฒ_{ฮฑ, df} from table.
Given: IAT = 0,1,1,6,3,7,5,2 and ST = 4,2,5,4,1,5,4,1 for 8 customers (Customer 1 at t=0)
| Cust | IAT | Arrival | Svc Time | Svc Begins | Wait | Svc Ends | In System | Server Idle |
|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 4 | 0 | 0 | 4 | 4 | 0 |
| 2 | 1 | 1 | 2 | 4 | 3 | 6 | 5 | 0 |
| 3 | 1 | 2 | 5 | 6 | 4 | 11 | 9 | 0 |
| 4 | 6 | 8 | 4 | 11 | 3 | 15 | 7 | 0 |
| 5 | 3 | 11 | 1 | 15 | 4 | 16 | 5 | 0 |
| 6 | 7 | 18 | 5 | 18 | 0 | 23 | 5 | 2 |
| 7 | 5 | 23 | 4 | 23 | 0 | 27 | 4 | 0 |
| 8 | 2 | 25 | 1 | 27 | 2 | 28 | 3 | 0 |
Total wait: 0+3+4+3+4+0+0+2 = 16 | Avg wait: 16/8 = 2.0 min
Total idle: 2 min | Sim ends at: t=28
P(wait): 5/8 = 0.625 | P(idle): 2/28 โ 0.071
Avg svc time: (4+2+5+4+1+5+4+1)/8 = 26/8 = 3.25 min
Formula: X = โ11 + 28U
Given U ~ Uniform[0,1], generate X ~ Uniform[โ11, 17]:
- U = 0.25 โ X = โ11 + 28(0.25) = โ11 + 7 = โ4
- U = 0.80 โ X = โ11 + 28(0.80) = โ11 + 22.4 = 11.4
Set U = F(x) = xโด/16, solve for x:
True mean: E[X] = โซโยฒ x ร f(x)dx where f(x) = xยณ/4
Generate sample values using Uโ, Uโ, ... โ Xแตข = 2Uแตข^(0.25). Sample mean should โ 1.6.
simulate_die <- function() {
rolls <- 0
while (TRUE) {
rolls <- rolls + 1
if (sample(1:6, 1) == 6) return(rolls)
}
}
# Run it
simulate_die()
# Many runs to get average
mean(replicate(10000, simulate_die())) # Should be โ 6
Data: injuries per month over 100 months. Test if Poisson distributed.
data <- c(rep(0,35), rep(1,40), rep(2,13), rep(3,6), rep(4,4), rep(5,1), rep(6,1)) lambda_hat <- mean(data) # = (0ร35+1ร40+2ร13+3ร6+4ร4+5ร1+6ร1)/100 = 100/100 = 1.0 # Expected frequencies (n=100, Poisson with ฮป=1) E <- 100 * dpois(0:6, lambda=lambda_hat) # Observed O <- c(35, 40, 13, 6, 4, 1, 1) # Merge last cells if E < 5 # chi-square chi_sq <- sum((O - E)^2 / E) df <- length(O) - 1 - 1 # minus 1 for estimated lambda qchisq(0.95, df) # critical value
Queuing Metrics
| Metric | Formula |
|---|---|
| Avg Wait Time | ฮฃWait / n |
| P(wait > 0) | Customers who waited / n |
| P(server idle) | Total idle time / Total sim time |
| Server utilisation ฯ | ฮป/ฮผ |
| P(server idle) M/M/1 | 1 โ ฯ |
| L (avg in system) | ฯ/(1โฯ) |
| W (avg time in system) | 1/(ฮผโฮป) |
Random Number Generation
| Method | Formula |
|---|---|
| LCG | X_{n+1} = (aX_n + c) mod m |
| Multiplicative CG | X_{n+1} = aX_n mod m (c=0) |
| Skip-ahead | X_{i+n} = (a^n mod m ร X_i) mod m |
| Transform [0,1]โ[a,b] | X = a + (bโa)U |
Random Variate Generation
| Distribution | Generator |
|---|---|
| Exponential(ฮฒ) | X = โฮฒ ln(U) |
| Uniform[a,b] | X = a + (bโa)U |
| Weibull(ฮฑ,ฮฒ) | X = ฮฒ[โln(U)]^(1/ฮฑ) |
| Triangular (Uโคc*) | X = a + โ[U(bโa)(cโa)] |
| Triangular (U>c*) | X = b โ โ[(1โU)(bโa)(bโc)] |
| Normal | X = ฮผ + ฯฮฆโปยน(U) |
Statistical Tests
| Test | Formula |
|---|---|
| t-test (validate sim) | t = (xฬ โ ฮผโ)/(s/โn); df=nโ1 |
| Chi-square GOF | ฯยฒ = ฮฃ(OโE)ยฒ/E; df=kโ1โp |
- Q1 (Compulsory): 2 benefits + 1 limitation of simulation; 4 real-world applications; steps in simulation; queuing table (compute avg wait, P_wait, P_idle)
- Monte Carlo Production Simulation: Build RN ranges from probability table; simulate 10 days; compute shortfalls/excesses
- Random Number Generation: LCG computation (Xโ, a, c, m given); L'Ecuyer combined generator; transforming U[0,1] to U[a,b]
- Random Variate Generation: CDF inversion; exponential; discrete CDF table
- R Code: Simulation of distributions (gamma, chi-square, exponential, die roll)
- AR(1)/EAR(1): Fitting to hotel patron data; deciding which fits better from histogram
- Validation t-test: job shop replication data vs. ฮผโ = 22.5
| Term | Definition |
|---|---|
| Simulation | Imitation of a real-world system over time using a model |
| Seed | The initial value Xโ used to start a pseudo-random number generator |
| Period | The length of a pseudo-random sequence before it begins to repeat |
| Verification | Checking the model is coded correctly ("built it right") |
| Validation | Checking the model represents reality ("built the right model") |
| Stochastic | A model incorporating randomness |
| Deterministic | A model with no randomness; same inputs โ same outputs always |
| Discrete system | State changes at countable points in time (events) |
| Continuous system | State changes continuously over time |
| FCFS | First Come First Served โ the standard queue discipline |
| Traffic intensity ฯ | ฯ = ฮป/ฮผ; must be <1 for a stable queue |
| Monte Carlo | Simulation technique using random sampling to estimate outcomes |
- Handles complexity โ Simulation can model systems too complex for closed-form solutions (e.g., queues with non-Poisson arrivals, multiple interacting variables).
- Risk-free experimentation โ Test policies, designs, and what-if scenarios without disrupting the real system or incurring real costs.
- Results are estimates, not exact โ Simulation output includes statistical noise; requires many replications and careful analysis to draw valid conclusions, unlike formulae which give exact answers.
- Banking queues โ Simulate customer arrivals and service times to determine optimal number of tellers and expected wait times.
- Manufacturing โ Model a production line with machine breakdowns to identify bottlenecks and optimise throughput.
- Inventory management โ Use Monte Carlo simulation to determine reorder points and quantities under uncertain demand.
- Hospital emergency department โ Simulate patient arrivals, triage, and treatment to plan staff schedules and bed capacity.
- Problem Formulation โ Define objectives, scope, and boundaries of the study.
- Data Collection & Analysis โ Gather real-world data; identify probability distributions.
- Model Building โ Construct the simulation model (conceptual then computational).
- Verification โ Ensure the code correctly implements the conceptual model ("Built it right?").
- Validation โ Ensure the model accurately represents reality ("Built the right model?").
- Experimental Design โ Define scenarios, number of replications, run length, warm-up period.
- Run Simulation โ Execute the model and collect output statistics.
- Analysis & Interpretation โ Use statistical methods (confidence intervals, hypothesis tests) to interpret results.
- Documentation & Implementation โ Report findings and implement recommendations.
Discrete: State variables change at specific, countable points in time (events). Example: a bank queue โ the number of customers changes only when someone arrives or departs.
Continuous: State variables change continuously over time. Example: water level in a tank โ it rises/falls smoothly as water flows in or out.
Deterministic: No randomness involved. Given the same inputs, you always get exactly the same output. Example: a machine that always takes exactly 5 minutes per unit.
Stochastic: Incorporates random variables. Outputs vary between runs even with the same inputs. Example: customer service times following an exponential distribution โ each customer takes a different random time.
State variables:
- Server status (busy or idle)
- Number of customers in the queue
- Number of customers in the system (queue + being served)
- Time of the last event
Events:
- Arrival event โ a new customer enters the system
- Departure event โ server completes service; next customer begins or server goes idle
- End-of-simulation event โ stopping condition is reached
- Large solution space โ Millions of combinations of input parameters may need evaluation, making exhaustive search impractical.
- Noisy/stochastic responses โ The same inputs produce different outputs across runs due to randomness, making it hard to distinguish true improvements from random variation.
- Computational cost โ Each simulation run is time-consuming; evaluating enough candidate solutions for reliable optimisation requires substantial compute resources.
| Cust | IAT | Arrival | ST | Svc Begins | Wait | Svc Ends | In System | Idle |
|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 4 | 0 | 0 | 4 | 4 | 0 |
| 2 | 1 | 1 | 2 | 4 | 3 | 6 | 5 | 0 |
| 3 | 1 | 2 | 5 | 6 | 4 | 11 | 9 | 0 |
| 4 | 6 | 8 | 4 | 11 | 3 | 15 | 7 | 0 |
| 5 | 3 | 11 | 1 | 15 | 4 | 16 | 5 | 0 |
| 6 | 7 | 18 | 5 | 18 | 0 | 23 | 5 | 2 |
| 7 | 5 | 23 | 4 | 23 | 0 | 27 | 4 | 0 |
| 8 | 2 | 25 | 1 | 27 | 2 | 28 | 3 | 0 |
(i) Avg waiting time = (0+3+4+3+4+0+0+2)/8 = 16/8 = 2.0 minutes
(ii) P(wait) = 5 customers waited / 8 total = 0.625 (62.5%)
(iii) P(idle) = Total idle time / Simulation end = 2/28 = 0.071 (7.1%)
ฯ = ฮป/ฮผ = 3/4 = 0.75
(i) Proportion idle = Pโ = 1 โ ฯ = 1 โ 0.75 = 0.25 (25%)
(ii) P(immediate service) = Pโ = 0.25 (same as proportion idle)
(iii) Avg in system (L) = ฯ/(1โฯ) = 0.75/0.25 = 3 customers
(iv) Avg time in system (W) = 1/(ฮผโฮป) = 1/(4โ3) = 1 hour
Step 1: Build RN ranges from the cumulative probability table (see Tab 4, Card 2).
| Day | RN | Production | Shortfall | Excess |
|---|---|---|---|---|
| 1 | 05 | 46 | 5 | 0 |
| 2 | 34 | 49 | 2 | 0 |
| 3 | 78 | 52 | 0 | 1 |
| 4 | 56 | 50 | 1 | 0 |
| 5 | 45 | 50 | 1 | 0 |
| 6 | 90 | 53 | 0 | 2 |
| 7 | 04 | 46 | 5 | 0 |
| 8 | 58 | 50 | 1 | 0 |
| 9 | 92 | 54 | 0 | 3 |
| 10 | 39 | 49 | 2 | 0 |
(i) Avg shortfall = 17/10 = 1.7 trucks/day
(ii) Avg excess = 6/10 = 0.6 trucks/day
Monte Carlo simulation uses random numbers to sample from probability distributions repeatedly, then aggregates results to estimate system behaviour. Each trial draws random values for uncertain inputs, computes the outcome, and the average across many trials gives the estimated result.
Advantages: (1) Can handle complex, interdependent variables that have no analytical solution. (2) Provides a full range of possible outcomes (risk profile), not just a single point estimate.
Disadvantages: (1) Results are only as good as the input distributions โ garbage in, garbage out. (2) Requires many trials for accurate results, making it computationally intensive.
Xโ = (19 ร 63) mod 100 = 1197 mod 100 = 97 โ Uโ = 0.97 Xโ = (19 ร 97) mod 100 = 1843 mod 100 = 43 โ Uโ = 0.43 Xโ = (19 ร 43) mod 100 = 817 mod 100 = 17 โ Uโ = 0.17 Xโ = (19 ร 17) mod 100 = 323 mod 100 = 23 โ Uโ = 0.23 Xโ = (19 ร 23) mod 100 = 437 mod 100 = 37 โ Uโ = 0.37
Xโ,โ = (157 ร 100) mod 32363 = 15700 Xโ,โ = (146 ร 300) mod 31727 = 12073 Xโ,โ = (142 ร 500) mod 31657 = 7686 Yโ = (15700 โ 12073 + 7686) mod 32362 = 11313 Uโ = 11313 / 32363 โ 0.3496
- Uniformly distributed on [0, 1] โ each value equally likely.
- Statistically independent โ knowing one number tells you nothing about the next.
- Long period โ the sequence should not repeat for a very long time.
- Reproducible โ same seed produces the same sequence (essential for debugging).
- Fast to generate โ computationally efficient for large-scale simulations.
U = 0.25 โ X = โ11 + 28(0.25) = โ11 + 7 = โ4
U = 0.80 โ X = โ11 + 28(0.80) = โ11 + 22.4 = 11.4
U = 0.12 โ X = โ3.7 ร ln(0.12) = โ3.7 ร (โ2.120) = 7.85 days
U = 0.45 โ X = โ3.7 ร ln(0.45) = โ3.7 ร (โ0.799) = 2.95 days
U = 0.78 โ X = โ3.7 ร ln(0.78) = โ3.7 ร (โ0.248) = 0.92 days
Set U = F(x) = xโด/16, solve for x:
True mean: f(x) = F'(x) = 4xยณ/16 = xยณ/4
U = 0.20 โค c* = 0.333:
U = 0.60 > c* = 0.333:
Fixed-Time Increment: Clock advances by a fixed ฮt each step. โ Simple to implement; good for continuous systems. โ Wasteful if events are rare (many empty steps).
Next-Event Time Advance: Clock jumps directly to the next scheduled event. โ Efficient โ no wasted time steps. โ More complex; requires maintaining and sorting a future event list (FEL).
| Clock | Event | Server | Queue | Next Arr | Next Dep |
|---|---|---|---|---|---|
| 0 | Start | Idle | 0 | 1 | โ |
| 1 | C1 arrives | Busy | 0 | 6 | 4 |
| 4 | C1 departs | Idle | 0 | 6 | โ |
| 6 | C2 arrives | Busy | 0 | 12 | 11 |
| 11 | C2 departs | Idle | 0 | 12 | โ |
| 12 | C3 arrives | Busy | 0 | 15 | 16 |
| 15 | C4 arrives | Busy | 1 | 23 | 16 |
| 16 | C3 departs, C4 starts | Busy | 0 | 23 | 17 |
| 17 | C4 departs | Idle | 0 | 23 | โ |
| 23 | C5 arrives | Busy | 0 | โ | 28 |
| 28 | C5 departs | Idle | 0 | โ | โ |
Hโ: ฮผ = 22.5 (model is valid) vs Hโ: ฮผ โ 22.5 (two-tailed)
n = 7 xฬ = 144.3/7 = 20.614 Deviationsยฒ: 2.939, 1.921, 1.474, 2.209, 0.663, 1.654, 0.171 ฮฃ = 11.031 โ sยฒ = 11.031/6 = 1.839 โ s = 1.356 t = (20.614 โ 22.5) / (1.356/โ7) = โ1.886 / 0.5124 = โ3.68 Critical value: tโ.โโโ ,โ = 2.447
|t| = 3.68 > 2.447 โ Reject Hโ. The simulation output is NOT consistent with the real system. The model needs revision.
AR(1) (Autoregressive): X_n = ฮผ + ฯ(X_{n-1} โ ฮผ) + ฮต_n. Best for continuous, normally distributed data.
EAR(1) (Exponential Autoregressive): Uses a Bernoulli mixture. Best for count data, skewed, non-negative data.
Decision: Plot a histogram. If bell-shaped/symmetric โ AR(1). If skewed right/count-like โ EAR(1).
The Chi-Square test compares observed frequencies (O) with expected frequencies (E) under a hypothesised distribution.
Degrees of freedom = k โ 1 โ p, where k = number of categories (after merging cells with E < 5) and p = number of parameters estimated from data.
Reject Hโ if ฯยฒ > ฯยฒ_{ฮฑ, df} from the chi-square table.
simulate_die <- function() {
rolls <- 0
while (TRUE) {
rolls <- rolls + 1
if (sample(1:6, 1) == 6) return(rolls)
}
}
# Run and average
mean(replicate(10000, simulate_die())) # Should be โ 6
The expected value is 6 (geometric distribution with p = 1/6 has mean 1/p = 6).
x <- rgamma(10000, shape=3, scale=1) hist(x, breaks=50, probability=TRUE, main="Gamma(3,1)") mean(x) # Should be โ 3 (shape ร scale) var(x) # Should be โ 3 (shape ร scaleยฒ)
# Method (a): Direct
dbinom(25000, size=50000, prob=0.5)
# Method (b): Logarithmic (avoids overflow)
n <- 50000; k <- 25000; p <- 0.5
log_prob <- sum(log(1:n)) - sum(log(1:k)) -
sum(log(1:(n-k))) + k*log(p) + (n-k)*log(1-p)
exp(log_prob)