Competing Function Model Validation

Lesson ~10 min read 8 MCQs

In simple terms: In simple terms, this topic is about picking the best type of equation (like a line or a curve) to describe a set of data points, and then using a special graph called a residual plot to prove it's the right choice.

Why this matters

Imagine your friend Maya starts a small business selling custom-painted sneakers online. The first week, she sells 3 pairs. The next week, 5. Then 8, 14, and 24. She's thrilled, but she needs to predict future sales to know how many plain white sneakers to order.

Is her sales growth a straight line? Or is it curving upwards, getting faster and faster? If she models it as a simple line, she might not order enough inventory and miss out on sales. If she assumes it's exploding faster than it is, she'll be stuck with boxes of unsold shoes in her garage in Dallas.

Choosing the right kind of function to model her sales is crucial. In this lesson, we'll learn how to be a data detective. We'll explore how to pick the best model for a set of data and, most importantly, how to prove our choice is the right one using a powerful tool called a residual plot.

Concept overview

flowchart TD
    A[Start with a data set] --> B{Choose potential models};
    B --> C[Linear Model];
    B --> D[Quadratic Model];
    B --> E[Exponential Model];
    C --> F[Generate Residual Plot];
    D --> F;
    E --> F;
    F --> G{Does the plot show a pattern?};
    G -- Yes --> H[Poor fit. Try another model.];
    H --> B;
    G -- No --> I[Good fit.];
    I --> J[Consider context of the problem];
    J --> K[Select and justify the best model];

Core explanation

When you first look at a set of data points on a graph, you might have a gut feeling about what kind of function would fit it best. Does it look like a line? A parabola? An exponential curve? That intuition is a great starting point, but the AP exam will ask you to justify your choice with evidence.

Let's meet our three main modeling functions:

Linear Models (y = mx + b): These are your go-to for data that shows a constant rate of change. For every step you take in x, the y value changes by about the same amount. Think of a car on cruise control on a highway in Kansas.
Quadratic Models (y = ax² + bx + c): These are perfect for data that increases and then decreases, or vice-versa, like the path of a basketball shot by Jordan. The rate of change is not constant; it's changing in a linear fashion.
*Exponential Models (`y = a b^x)**: These describe data where the rate of change is *multiplicative*. Theyvalue gets multiplied by the same factor for each step inx`. This is the classic model for things like population growth or, as we'll see, viral online trends.

Finding the "Error" with Residuals

Let's go back to Maya's sneaker business. Her sales data is {(1, 3), (2, 5), (3, 8), (4, 14), (5, 24)}, where x is the week and y is the number of pairs sold.

Comparing linear and exponential models for Maya's sales data.

Your graphing calculator can run a "regression" to find the best-fit equation of each type.

Linear Regression: y = 5.2x - 4.6
Exponential Regression: y = 1.62(1.75)^x

Which one is better? We need to check the error for each model. In statistics, we call this error a residual.

The formula is simple: Residual = Actual Value (y) - Predicted Value (ŷ)

The "actual" value is the real data point Maya collected. The "predicted" value (we call it ŷ, pronounced "y-hat") is what the model equation spits out for a given x.

Let's calculate the residual for Week 4 (x=4) for both models. The actual sales were 14 pairs.

Linear model predictions and residuals for Maya's sales data.

Linear Model Prediction
ŷ = 5.2(4) - 4.6 = 20.8 - 4.6 = 16.2
- Linear Residual: 14 - 16.2 = -2.2
Exponential Model Prediction
ŷ = 1.62(1.75)⁴ ≈ 1.62(9.3789) ≈ 15.19
- Exponential Residual: 14 - 15.19 = -1.19

The exponential model's prediction was closer to the actual value for this week. Its residual is smaller. But to truly validate a model, we need to look at the residuals for all the data points at once.

The Power of the Residual Plot

This is where the magic happens. A residual plot is a scatter plot of our residuals. For each original x value, we plot its corresponding residual on the y-axis.

The one and only rule for a good model is: The residual plot must show no obvious pattern. It should look like a random shotgun blast of points scattered above and below the x-axis (where the residual is 0).

Let's look at the plots for Maya's data.

A 2x2 grid of plots. Top-left: scatter plot with a linear fit. Top-right: the linear model's residual plot, showing a U-shape. Bottom-left: scatter plot with an exponential fit. Bottom-right: the exponential model's residual plot, showing random scatter.

Linear Model's Residual Plot (Top Right)
Look closely. You can see a distinct U-shaped, parabolic pattern. The residuals start positive, go negative, then become positive again.
Exponential Model's Residual Plot (Bottom Right)
This looks like a mess! The points are randomly scattered around the horizontal line y=0.

The random scatter of the exponential residual plot tells us that its errors are... well, random. They aren't predictable. This means the exponential function has captured the underlying trend of the data very well. For Maya's business, the exponential model is the clear winner.

Context is King

Finally, always think about the context. Does the model make sense? For business sales that are "going viral," exponential growth is a very reasonable assumption. For modeling the height of a thrown baseball, a quadratic model (a parabola) makes physical sense. If the data showed the amount of gas left in a car's tank as miles are driven, a linear model would be the most logical choice. Sometimes, the story behind the data gives you a huge clue.

And what about those over- or underestimates? If Maya were using her model to budget for expenses, she might prefer a model that overestimates her sales so she has a conservative budget. If she's promising delivery times to customers, she might want a model that underestimates her production speed to give herself a buffer. The "best" model can sometimes depend on the decisions you're trying to make.

Worked examples

Example 1

Choosing Between Linear and Exponential

Problem: A biologist is tracking the area covered by a specific type of algae in a pond in Seattle. The data is recorded weekly: {(1, 3), (2, 5), (3, 8), (4, 14), (5, 24)}, where x is the week and y is the area in square meters. Determine whether a linear or exponential model is a better fit and justify your answer using residuals.

Solution:

1
Identify the Goal
We need to compare a linear model and an exponential model for the given data and use residual plots to prove which is better. This is the exact scenario from our core explanation.
2
Find the Regression Models
Using a graphing calculator:
- Linear Regression: y = 5.2x - 4.6
- Exponential Regression: y = 1.62(1.75)^x
3
Analyze the Residuals
To justify our choice, we must analyze the residuals. While a calculator will generate the full residual plot for you, let's calculate a few by hand to understand the process. The residual is Actual - Predicted.

Week (x) Actual Area (y) Linear Predicted (ŷ) Linear Residual Exp. Predicted (ŷ) Exp. Residual

1 3 0.6 2.4 2.84 0.16

2 5 5.8 -0.8 4.96 0.04

3 8 11.0 -3.0 8.68 -0.68
4
Interpret the Residual Plots
- If we were to plot all the linear residuals, we would see a clear U-shaped pattern. The residuals are 2.4, -0.8, -3.0, -2.2, 3.0. They start high, dip low, and come back up. This pattern is a dead giveaway that the linear model is a poor fit.
- The exponential residuals are 0.16, 0.04, -0.68, -1.19, 0.81. They are smaller in magnitude and don't show an obvious pattern. They hover randomly around zero.
5
Conclusion
The exponential model, y = 1.62(1.75)^x, is a much better fit for the algae growth data.
- Justification
  The residual plot for the exponential model shows a random scatter of points with no discernible pattern, indicating that the model captures the underlying trend of the data well. In contrast, the residual plot for the linear model shows a clear parabolic pattern, indicating it is a poor fit.
- Common Mistake Alert
  A student might just look at the r or R² value on their calculator. While a higher R² value often corresponds to a better fit, the AP exam specifically requires you to use the residual plot as the primary justification. Don't just state the R² value; talk about the pattern (or lack thereof) in the residual plot.

Week (x)	Actual Area (y)	Linear Predicted (ŷ)	Linear Residual	Exp. Predicted (ŷ)	Exp. Residual
1	3	0.6	2.4	2.84	0.16
2	5	5.8	-0.8	4.96	0.04
3	8	11.0	-3.0	8.68	-0.68

Example 2

Identifying a Quadratic Fit

Problem: A group of students in Chicago launches a model rocket. They record its height at different times. The data is {(1, 43), (2, 78), (3, 101), (4, 110), (5, 107), (6, 88)}, where x is time in seconds and y is height in feet. Which model type—linear, quadratic, or exponential—is most appropriate?

Solution:

1
Consider the Context
The problem describes the height of a rocket. We know from physics (and from throwing any object in the air) that its path will be an up-and-down arc. This strongly suggests a quadratic model (a parabola opening downwards).
2
Examine the Data
The y values increase and then decrease (43, 78, 101, 110, 107, 88). This is the classic signature of a quadratic function. A linear model (always increasing or decreasing) and an exponential model (always increasing and getting steeper) cannot fit this data.
3
Confirm with Residuals (Conceptual)
- If we fit a linear model, the data points would be above the line at the beginning and end, and below the line in the middle. The residual plot would have a clear, sad-face parabolic shape (∩). This pattern tells us the linear model is wrong.
- If we fit a quadratic model, the curve would closely follow the path of the data points. The residual plot would show a random scatter of points around y=0, confirming it's a good fit.
4
Conclusion
A quadratic model is the most appropriate.
- Justification: The context of projectile motion suggests a parabolic path, which is modeled by a quadratic function. Furthermore, the data itself shows the height increasing and then decreasing. A residual plot for a quadratic regression would show no discernible pattern, validating this choice, while a linear model's residual plot would show a clear parabolic pattern, invalidating it.

Try it yourself

Problem 1: A new coffee shop in Boston tracks its number of daily customers over the first six days: {(1, 50), (2, 61), (3, 73), (4, 84), (5, 96), (6, 107)}.

Your task
Run a linear and an exponential regression on this data. Which model is a better fit?
Hint
Calculate the differences between consecutive y values. Is the change roughly constant (additive) or is it growing? Then, imagine what the residual plot for the worse model would look like. Would it have a pattern?

Problem 2: Carlos is saving for a new gaming computer. His savings are {(1, $50), (2, $110), (3, $180), (4, $260), (5, $350)}.

Your task
Determine if a linear or quadratic model better represents his savings pattern.
Hint
Look at the "rate of change of the rate of change." The first differences are +60, +70, +80, +90. Since the differences are increasing linearly, what does that imply about the original function? Justify your choice by describing the expected residual plot for the best model.

Coffee shop customer data with linear and exponential fits.

Carlos's savings data with linear and quadratic fits.

TL;DR

In simple terms, this topic is about picking the best type of equation (like a line or a curve) to describe a set of data points, and then using a special graph called a residual plot to prove it's the right choice.

Key terms

AP Precalculus Function models Model validation Residual plot Linear regression Exponential regression Quadratic regression Data analysis Residuals Best fit ===END===

You can now…

2.6.A: Construct linear, quadratic, and exponential models based on a data set.
2.6.B: Validate a model constructed from a data set.

Essential knowledge (exam-tested)

2.6.A.1: Two variables in a data set that demonstrate a slightly changing rate of change can be modeled by linear, quadratic, and exponential function models.
2.6.A.2: Models can be compared based on contextual clues and applicability to determine which model is most appropriate.
2.6.B.1: A model is justified as appropriate for a data set if the graph of the residuals of a regression, the residual plot, appear without pattern.
2.6.B.2: The difference between the predicted and actual values is the error in the model. Depending on the data set and context, it may be more appropriate to have an underestimate or overestimate for any given interval.

Concept map

flowchart TD
    A[Start with a data set] --> B{Choose potential models};
    B --> C[Linear Model];
    B --> D[Quadratic Model];
    B --> E[Exponential Model];
    C --> F[Generate Residual Plot];
    D --> F;
    E --> F;
    F --> G{Does the plot show a pattern?};
    G -- Yes --> H[Poor fit. Try another model.];
    H --> B;
    G -- No --> I[Good fit.];
    I --> J[Consider context of the problem];
    J --> K[Select and justify the best model];

Read what Saavi narrates

Hey everyone, it's Saavi. Let's talk about being a data detective.

Imagine your friend starts a business selling custom sneakers. Sales are taking off! Week one, she sells 3 pairs. Week two, 5. Then 8, 14, and 24. She needs to know how many shoes to order for the future. Is her business growing in a straight line, or is it curving upwards, exponentially? Picking the wrong model could cost her thousands of dollars.

This is what today's lesson is all about: when you have data from the real world, how do you choose the best function to model it? We'll look at linear, quadratic, and exponential functions. But the real secret weapon here is something called a 'residual'.

A residual is just the difference between the actual data... what really happened... and the predicted value from your model. Residual equals Actual minus Predicted.

Let's take our sneaker example. We can create a linear model and an exponential model. For the linear model, the residual plot shows a clear U-shape. But for the exponential model, the residual plot looks like a random mess of dots.

And here's the most important takeaway, the number one thing students get wrong: that random mess is exactly what we want to see! A pattern in the residual plot is a bad sign. It means your model is systematically flawed. The random plot tells you the model's errors are just... random noise. It means you've found a great fit.

So, when you're asked to justify your choice on the AP exam, don't just say the graph looks right. You need to state that the residual plot shows no discernible pattern. That is your evidence.

You've got this. It's a powerful tool, and once you get the hang of looking for that random scatter, you'll be able to validate models with confidence. Keep practicing!

Back to AP Precalculus Take a mock exam

Settings & Accessibility