Free for students · Ad-free · WCAG 2.1 AA Compliant · Accessibility

Competing Function Model Validation

Lesson ~10 min read 8 MCQs

In simple terms: In simple terms, this topic is about picking the best type of equation (like a line or a curve) to describe a set of data points, and then using a special graph called a residual plot to prove it's the right choice.

Why this matters

Imagine your friend Maya starts a small business selling custom-painted sneakers online. The first week, she sells 3 pairs. The next week, 5. Then 8, 14, and 24. She's thrilled, but she needs to predict future sales to know how many plain white sneakers to order.

Is her sales growth a straight line? Or is it curving upwards, getting faster and faster? If she models it as a simple line, she might not order enough inventory and miss out on sales. If she assumes it's exploding faster than it is, she'll be stuck with boxes of unsold shoes in her garage in Dallas.

Choosing the right kind of function to model her sales is crucial. In this lesson, we'll learn how to be a data detective. We'll explore how to pick the best model for a set of data and, most importantly, how to prove our choice is the right one using a powerful tool called a residual plot.

Concept overview

flowchart TD
    A[Start with a data set] --> B{Choose potential models};
    B --> C[Linear Model];
    B --> D[Quadratic Model];
    B --> E[Exponential Model];
    C --> F[Generate Residual Plot];
    D --> F;
    E --> F;
    F --> G{Does the plot show a pattern?};
    G -- Yes --> H[Poor fit. Try another model.];
    H --> B;
    G -- No --> I[Good fit.];
    I --> J[Consider context of the problem];
    J --> K[Select and justify the best model];
This diagram shows a flowchart for validating a function model. It starts with a data set, branches to choosing a linear, quadratic, or exponential model, and then generating a residual plot. A decision node asks if the plot has a pattern; if yes, it's a poor fit and loops back, if no, it's a good fit, leading to a final step of considering context to select the best model.

Core explanation

When you first look at a set of data points on a graph, you might have a gut feeling about what kind of function would fit it best. Does it look like a line? A parabola? An exponential curve? That intuition is a great starting point, but the AP exam will ask you to justify your choice with evidence.

Let's meet our three main modeling functions:

  • Linear Models (y = mx + b): These are your go-to for data that shows a constant rate of change. For every step you take in x, the y value changes by about the same amount. Think of a car on cruise control on a highway in Kansas.
  • Quadratic Models (y = ax² + bx + c): These are perfect for data that increases and then decreases, or vice-versa, like the path of a basketball shot by Jordan. The rate of change is not constant; it's changing in a linear fashion.
  • *Exponential Models (`y = a b^x)**: These describe data where the rate of change is *multiplicative*. Theyvalue gets multiplied by the same factor for each step inx`. This is the classic model for things like population growth or, as we'll see, viral online trends.

Finding the "Error" with Residuals

Let's go back to Maya's sneaker business. Her sales data is {(1, 3), (2, 5), (3, 8), (4, 14), (5, 24)}, where x is the week and y is the number of pairs sold.

Comparing linear and exponential models for Maya's sales data.

Your graphing calculator can run a "regression" to find the best-fit equation of each type.

  • Linear Regression: y = 5.2x - 4.6
  • Exponential Regression: y = 1.62(1.75)^x

Which one is better? We need to check the error for each model. In statistics, we call this error a residual.

The formula is simple: Residual = Actual Value (y) - Predicted Value (ŷ)

The "actual" value is the real data point Maya collected. The "predicted" value (we call it ŷ, pronounced "y-hat") is what the model equation spits out for a given x.

Let's calculate the residual for Week 4 (x=4) for both models. The actual sales were 14 pairs.

Linear model predictions and residuals for Maya's sales data.
  • Linear Model Prediction
    ŷ = 5.2(4) - 4.6 = 20.8 - 4.6 = 16.2
    • Linear Residual: 14 - 16.2 = -2.2
  • Exponential Model Prediction
    ŷ = 1.62(1.75)⁴ ≈ 1.62(9.3789) ≈ 15.19
    • Exponential Residual: 14 - 15.19 = -1.19

The exponential model's prediction was closer to the actual value for this week. Its residual is smaller. But to truly validate a model, we need to look at the residuals for all the data points at once.

The Power of the Residual Plot

This is where the magic happens. A residual plot is a scatter plot of our residuals. For each original x value, we plot its corresponding residual on the y-axis.

The one and only rule for a good model is: The residual plot must show no obvious pattern. It should look like a random shotgun blast of points scattered above and below the x-axis (where the residual is 0).

Let's look at the plots for Maya's data.

A 2x2 grid of plots. Top-left: scatter plot with a linear fit. Top-right: the linear model's residual plot, showing a U-shape. Bottom-left: scatter plot with an exponential fit. Bottom-right: the exponential model's residual plot, showing random scatter.

  • Linear Model's Residual Plot (Top Right)
    Look closely. You can see a distinct U-shaped, parabolic pattern. The residuals start positive, go negative, then become positive again.
  • Exponential Model's Residual Plot (Bottom Right)
    This looks like a mess! The points are randomly scattered around the horizontal line y=0.

The random scatter of the exponential residual plot tells us that its errors are... well, random. They aren't predictable. This means the exponential function has captured the underlying trend of the data very well. For Maya's business, the exponential model is the clear winner.

Context is King

Finally, always think about the context. Does the model make sense? For business sales that are "going viral," exponential growth is a very reasonable assumption. For modeling the height of a thrown baseball, a quadratic model (a parabola) makes physical sense. If the data showed the amount of gas left in a car's tank as miles are driven, a linear model would be the most logical choice. Sometimes, the story behind the data gives you a huge clue.

And what about those over- or underestimates? If Maya were using her model to budget for expenses, she might prefer a model that overestimates her sales so she has a conservative budget. If she's promising delivery times to customers, she might want a model that underestimates her production speed to give herself a buffer. The "best" model can sometimes depend on the decisions you're trying to make.

Worked examples

Example 1

Choosing Between Linear and Exponential

Problem: A biologist is tracking the area covered by a specific type of algae in a pond in Seattle. The data is recorded weekly: {(1, 3), (2, 5), (3, 8), (4, 14), (5, 24)}, where x is the week and y is the area in square meters. Determine whether a linear or exponential model is a better fit and justify your answer using residuals.

Solution:

  1. 1
    Identify the Goal
    We need to compare a linear model and an exponential model for the given data and use residual plots to prove which is better. This is the exact scenario from our core explanation.
  2. 2
    Find the Regression Models
    Using a graphing calculator:
    • Linear Regression: y = 5.2x - 4.6
    • Exponential Regression: y = 1.62(1.75)^x
  3. 3
    Analyze the Residuals
    To justify our choice, we must analyze the residuals. While a calculator will generate the full residual plot for you, let's calculate a few by hand to understand the process. The residual is Actual - Predicted.
    Week (x) Actual Area (y) Linear Predicted (ŷ) Linear Residual Exp. Predicted (ŷ) Exp. Residual
    1 3 0.6 2.4 2.84 0.16
    2 5 5.8 -0.8 4.96 0.04
    3 8 11.0 -3.0 8.68 -0.68
  4. 4
    Interpret the Residual Plots
    • If we were to plot all the linear residuals, we would see a clear U-shaped pattern. The residuals are 2.4, -0.8, -3.0, -2.2, 3.0. They start high, dip low, and come back up. This pattern is a dead giveaway that the linear model is a poor fit.
    • The exponential residuals are 0.16, 0.04, -0.68, -1.19, 0.81. They are smaller in magnitude and don't show an obvious pattern. They hover randomly around zero.
  5. 5
    Conclusion
    The exponential model, y = 1.62(1.75)^x, is a much better fit for the algae growth data.
    • Justification
      The residual plot for the exponential model shows a random scatter of points with no discernible pattern, indicating that the model captures the underlying trend of the data well. In contrast, the residual plot for the linear model shows a clear parabolic pattern, indicating it is a poor fit.
    • Common Mistake Alert
      A student might just look at the r or value on their calculator. While a higher value often corresponds to a better fit, the AP exam specifically requires you to use the residual plot as the primary justification. Don't just state the value; talk about the pattern (or lack thereof) in the residual plot.
Example 2

Identifying a Quadratic Fit

Problem: A group of students in Chicago launches a model rocket. They record its height at different times. The data is {(1, 43), (2, 78), (3, 101), (4, 110), (5, 107), (6, 88)}, where x is time in seconds and y is height in feet. Which model type—linear, quadratic, or exponential—is most appropriate?

Solution:

  1. 1
    Consider the Context
    The problem describes the height of a rocket. We know from physics (and from throwing any object in the air) that its path will be an up-and-down arc. This strongly suggests a quadratic model (a parabola opening downwards).
  2. 2
    Examine the Data
    The y values increase and then decrease (43, 78, 101, 110, 107, 88). This is the classic signature of a quadratic function. A linear model (always increasing or decreasing) and an exponential model (always increasing and getting steeper) cannot fit this data.
  3. 3
    Confirm with Residuals (Conceptual)
    • If we fit a linear model, the data points would be above the line at the beginning and end, and below the line in the middle. The residual plot would have a clear, sad-face parabolic shape (). This pattern tells us the linear model is wrong.
    • If we fit a quadratic model, the curve would closely follow the path of the data points. The residual plot would show a random scatter of points around y=0, confirming it's a good fit.
  4. 4
    Conclusion
    A quadratic model is the most appropriate.
    • Justification: The context of projectile motion suggests a parabolic path, which is modeled by a quadratic function. Furthermore, the data itself shows the height increasing and then decreasing. A residual plot for a quadratic regression would show no discernible pattern, validating this choice, while a linear model's residual plot would show a clear parabolic pattern, invalidating it.

Try it yourself

Problem 1: A new coffee shop in Boston tracks its number of daily customers over the first six days: {(1, 50), (2, 61), (3, 73), (4, 84), (5, 96), (6, 107)}.

  • Your task
    Run a linear and an exponential regression on this data. Which model is a better fit?
  • Hint
    Calculate the differences between consecutive y values. Is the change roughly constant (additive) or is it growing? Then, imagine what the residual plot for the worse model would look like. Would it have a pattern?

Problem 2: Carlos is saving for a new gaming computer. His savings are {(1, $50), (2, $110), (3, $180), (4, $260), (5, $350)}.

  • Your task
    Determine if a linear or quadratic model better represents his savings pattern.
  • Hint
    Look at the "rate of change of the rate of change." The first differences are +60, +70, +80, +90. Since the differences are increasing linearly, what does that imply about the original function? Justify your choice by describing the expected residual plot for the best model.
Coffee shop customer data with linear and exponential fits.
Carlos's savings data with linear and quadratic fits.