DMA Linking Concepts
In DMA, the 3 key concepts can actually be connected. This is not as hard as it sounds, what it means, is that you use the output from one concept as the input to the other. As such, let’s first review the 3 key concepts and what their inputs, outputs, and their purposes are:
- Regression: Takes in raw data points, returns a mathematical function which gives an average and standard deviation. It’s purpose is to find linear relationships between data.
- Simulation: Takes in a set of probabilities or rules and returns simulated data often containing an average and standard deviation. It’s purpose is to predict outcomes, particularly for problems that are too difficult to mathematically compute.
- Optimization: Takes in a mathematical function, constraints, and decision vars, and returns the optimal outcome and, sometimes, a sensitivity report.
With the above, we can pretty easily see how they chain together, In particular:
- Regression to Simulation: regression returns a normal distribution and a math function which can be used to simulate variables using NORM.INV.
- Regression to Optimization: The output of a regression can be used to formulate an objective function or even constraints! Imagine, I don’t want to have more than 100 people in my store, I use regression to predict how many people will be in the store, thus I can use the regression output in my constraints section.
- Simulation to Optimization: The output of simulation gives us a lot of random data points. We can throw these data points into optimization, and can even ask it to minimize or maximize standard deviation, or even the number of instances where we’re profitable.
Case Examples
Regression to Simulation
Case: Oakland As
In this case, we had data to predict the number of attendees at a baseball game based on “Roddy’s Predictions.” Our goal is to choose between two contract terms and pick the one that costs us the least.
- First, we run a regression using Roddy’s Predictions to predict the number of attendance at home games.
- Using the prediction formula, and Roddy’s next predictions, we can create a normal distribution where the mean is the prediction and the standard deviation is the standard error of regression.
- Next, we simulate lots of possible attendance numbers using NORM.INV based on the normal distribution we created.
- Finally, we calculate the cost of each contract for each predicted value, and then see the average and standard deviation of the costs associated with each contract.
- Then using this, we can choose a contract based on our risk terms and averages.
Regression to Optimization
Case: 4Star
In this case, we had various demand numbers for different prices. Our goal was to optimize the price of the tires.
- First, we run a regression analysis to predict the demand using the price numbers
- Next, we use the formula from the regression, to simulate demand based on a price variable
- Finally, setting a constraint on number of total tires we can sell, we optimize for price!
Simulation to Optimization
Case: Professor Selects a Portfolio of Chinese Stocks
In this case, we’re given a set of historical stock returns. Our goal is to maximize our profit and minimize our risk.
- First, we simulate lots of possible annual stock returns using the methods outlined in the case.
- Next, we will lock those values, and optimize for the weightings of each stock in our portfolio.
- We can then calculate the returns of each year, and using those, graph the mean and standard deviation across all our simulated stock returns.
- Finally, we use optimization to maximize our returns while setting a constraint on standard deviation (risk).
A Few Extra Notes
To quell any remaining worries, let’s talk about the remaining 3 possible ways to combine things and why they don’t make any sense.
- Simulation to Regression: Yes, while simulation gives us a lot of data and we can pass it to regression… if we’ve already simulated it, then we already know the relationship. There’s no new knowledge regression can tell us!
- Optimization to Anything: Because we can only run optimization once and it’s a manual process, we can’t really pipe the output into anything else.
What About All Three?
Fine! I’ll do all 3, omg stfu. Here’s a decently simple example:
I own an ice-cream cart. I want to decide how many ice-cream bars I should buy before I head out tomorrow. I have historical data depicting the demand for ice-cream as well as the temperature that day. I also have the forecast temperature for tomorrow. How much ice-cream should I buy to maximize profit while minimizing the risk of going out of stock?
- First, we should see if there’s a relationship between temperature and ice-cream demand. For example, we can run a regression analysis on historical temperature vs demand.
- Next, we can predict the demand for the next day using our regression output, we also get a standard error number.
- Using the predicted value and the standard error number, we can use NORM.INV to simulate lots of possible datapoints for demand tomorrow.
- Next, we can calculate our profit for the next day as well as if we went out of stock or not.
- Finally, we can run optimization. Maybe we set up an extra constraint that we want to keep the probability of going out of stock below 5%. Then we can optimize for how many ice-cream bars to buy to maximize profit!
There, we went from data, used regression to get predictions, used predictions to simulate data points, and used that data to optimize for our profit and minimize risk!