## Introduction

This blog post was mostly written while waiting at Gate B1 at Sofia International Airport (SOF).

I decided to return to Dublin from Sofia with Austrian Airlines, which includes a connection with only 50 minutes long layover at Vienna airport.

I don’t like layovers this short – it can easily happen that I miss my next flight and fall into limbo of finding alternative flights.

So, to amuse, I asked myself: **What is the chance of me making it to the next flight?**

## Situation modelling

Let’s define the situation:

My flight from Sofia departures to Vienna at 13:40 PM (GMT+2) according to schedule. My next flight from Vienna to Frankfurt takes off at 15:10 PM (GMT+1). In between, I have to get to the gate on time.

So, what are my chances?

To answer my main question, I needed to find answers to the following questions:

- What is the average delay of flight from Sofia to Vienna?
- What is the time needed to get from my arrival gate to my next flight departure gate?
- How often does flight from Vienna to Frankfurt gets delayed (and for how long)?

When we have them, we want to calculate total time it will take to get to the gate of my next flight – my time of arrival at the gate.

We will then calculate when my next flight might take off by taking into account its delay.

By deducting these two time values, we will get time difference – this is **the** **outcome** we are interested in.

We are interested in values greater than 0, because this means I will make it on time. If the value is below 0, it means I’m late.

To model this problem, I first checked out a website called flightstats.com. There, I found out what the average delay time is and approximately how distribution of delays looks like. We cannot describe delays with Gaussian distribution, as we can see that most often scenario is flight arriving more or less on time, and, while other outcomes with longer delays are possible, they are not so frequent.

We will use (negative) exponential probability distribution as data from FlightStats.com follows its shape. This distribution takes only one parameter, called *rate*, which is equal to **1/mean**. You can read more about it on Wikipedia. One feature of exponential probability distribution that is particularly useful in this case is that we cannot get negative values, as this distribution is supported on the interval [0, ∞). Flight cannot have negative delay time, right?

FlightStats.com stats tell us that the average delay (mean) is 35 minutes. Alright, so now we know our parameter value will be 1/35.

How do we proceed? We would like to simulate many different outcomes so we could know what to expect. The ideas is to get many trials with different outcomes and combine them all so we can get an idea of what our **outcome distribution** looks like.

## Monte Carlo to the rescue!

With Monte Carlo simulations, we will sample 100,000 different values from used distributions to get 100,000 different outcomes, combining values from our distributions of delay times, terminal transfer time etc. I chose to sample 100,000 values arbitrarily, you can choose other sample size. The bigger the sample, the better approximation gets.

When can we expect our first flight to arrive? Let’s combine flight departure time, sampled delay times and flight duration time.

**Milestone 1: We can calculate 100,000 possible outcomes of arrival time by summing departure time and 100,000 sampled values from exponential distribution of delay time, and adding flight duration.**

For transfer at Vienna airport their website states it takes approximately 20 minutes. Alright, let’s assume that the time needed to move from gate A to gate B is normally distributed with a mean of 20 and standard deviation of 5 minutes.

**Milestone 2: We can calculate expected time of arrival at the gate of my next flight by summing values from milestone 1 with sampled values of time needed to get from arrival to departure gate.**

Before we move onto the next step, we have to take into consideration two time corrections:

**Timezone difference**– Vienna is GMT+1 and Sofia is GMT+2, so we will deduct time by one hour**Gate closing time**– gates close 20 minutes before departure; we will add 20 minutes on total time to adjust

**Milestone 3: Total time to get to the gate has been corrected with timezone difference and gate closing time adjustment**

Let’s also check how often flight from Vienna to Frankfurt is delayed, as this gives me bonus time to make it on time! 🙂

On average, it’s 9 minutes, so we will model this delay again with exponential distribution, where parameter will take value of 1/9.

**Milestone 4: We calculate 100,000 possible outcomes of Vienna to Frankfurt flight departure time by adding sampled delay times to scheduled departure time**

Our last step is to deduct sampled total time to get to the gate from sampled departure times of my next flight.

**Analysis outcome: distribution of sampled values telling us what’s the difference between departure time and time needed to get to the gate on time**

Now, let’s combine everything together.

For this purpose, we will use R, programming language for statistical computing.

We will model everything using minutes and absolute time.

sofiaViennaDepartureTime = 13 * 60 + 40 sofiaViennaDelayTime = rexp(100000, 1/35) sofiaViennaFlightTime = 100 sofiaViennaArrivalTime = sofiaViennaDepartureTime + sofiaViennaDelayTime + sofiaViennaFlightTime timezoneCorrection = 60 sofiaViennaArrivalTime = sofiaViennaArrivalTime - timezoneCorrection viennaAirportTransfer = rnorm(100000, 20, 5) gateClosingCorrection = 20 totalTimeToFlight = sofiaViennaArrivalTime + viennaAirportTransfer + gateClosingCorrection viennaFrankfurtDepartureTime = 15 * 60 + 10 viennaFrankfurtDelay = rexp(100000, 1/9) viennaFrankfurtTotalTime = viennaFrankfurtDepartureTime + viennaFrankfurtDelay result.set = viennaFrankfurtTotalTime - totalTimeToFlight sum(result.set > 0) / 100000 [1] 0.3963

After adding and deducing all different random variable samples we got our final result and stored it in *result.set . *To calculate my probability, I just need to count number of values greater than 0 and divide them by total number of values, which is equal to 100,000.

So, I have .3963 chance of arriving on time to catch my next flight, or, in other words, I should catch my next flight 4 out of 10 times!

If we plot resulting values using histogram, it would look something like this:

## Conclusion

Is this model accurate? Probably not. There are two main reasons: first of all, I made a lot of assumptions regarding probability distributions – maybe exponential and normal distributions I used don’t fit this data so well.

Also, there are only 74 observations for delay time for flight from Sofia to Vienna – maybe our sample mean is biased and one outlier is heavily affecting mean value. With a larger sample, mean delay might be smaller causing distribution to have a different shape, resulting in more positive outcomes.

Was this a nice brain exercise? It most definitely was and it helped me kill some time at the airport! 🙂

I hope you enjoyed this post and found out something new from it. As always, feedback is highly appreciated and I would love to hear your opinion!

For your information, I made it to my next flight and published this post from Frankfurt International Airport! 🙂

Now, off to my flight to Dublin!