Introduction to Using Data Sets

Lesson ~11 min read 8 MCQs

In simple terms: In simple terms, a data set is a group of related facts, and this lesson is about making a step-by-step plan to go through those facts one-by-one to answer a question.

Why this matters

Imagine your school's soccer team is running a fundraiser. You're selling team-branded water bottles, and you have a simple notebook where you've jotted down the number of bottles sold each day for the past two weeks: 15, 22, 18, 25, 30, 12, 9, 17, 20, 24, 28, 31, 19, 21.

The soccer team's daily sales data set.

The coach comes to you and asks, "Great job! What was our single best sales day? And what was our average for the two weeks?"

Suddenly, that simple list of numbers isn't just a list anymore. It's a data set. To answer the coach's questions, you can't just stare at the whole list at once. You need a process. You need a way to go through it systematically.

That's exactly what we're learning today: how to think about collections of data and create a clear, logical plan—an algorithm—to pull useful information out of them.

Concept overview

flowchart TD
    A[Start] --> B{Initialize `max_so_far` to the first item};
    B --> C{For each remaining item in the list...};
    C --> D{Is current item > `max_so_far`?};
    D -- Yes --> E[Update `max_so_far` to current item];
    D -- No --> F[Move to next item];
    E --> F;
    C -- Done with all items --> G[End: `max_so_far` holds the maximum value];
    F --> C;

Core explanation

Welcome to the world of data collections! It might sound intimidating, but the core idea is something you do every day.

What is a Data Set?

A data set is just a collection of related pieces of information. That's it. (EK 4.2.A.1)

Think about:

The contacts in your phone: a data set of names and numbers.
Your favorite Spotify playlist: a data set of songs.
A grocery list: a data set of items you need to buy.

In computer science, we often work with data sets of numbers. For example, a list of final exam scores for a class: [95, 81, 76, 99, 88, 72]. Each number is a piece of data, and together they form a data set.

A data set of final exam scores.

The Goal: Answering Questions with Data

We don't collect data just for fun. We use it to solve problems or answer questions. (EK 4.2.A.2)

Looking at our list of scores [95, 81, 76, 99, 88, 72], we could ask:

What is the highest score? (99)
What is the lowest score? (72)
How many students passed? (Assuming a pass is >= 65, all 6 of them)
What is the average score?

To answer these, we need to manipulate and analyze the data. And that brings us to the single most important concept in this lesson.

The "One at a Time" Principle

When we process a data set, we almost always do it one item at a time.

Imagine you're a cashier at Target. A customer, Carlos, comes up with a full cart. You don't try to look at all the items at once and guess the total. That would be impossible! Instead, you follow a simple algorithm:

Pick up one item.
Scan its barcode.
Add its price to a running total.
Set the item aside.
Repeat until the cart is empty.

Computers work the same way. They aren't magical; they're just incredibly fast at following simple, repetitive instructions. When we give a computer a data set, it iterates (or loops) through it, looking at just one value at a time to perform a calculation. (EK 4.2.A.2)

Planning Your Algorithm with a Diagram

Before you write a single line of code, you should have a plan. For data sets, one of the best ways to plan is to use a simple table or chart. This helps you visualize the "one at a time" process. (EK 4.2.A.3)

Let's make a plan to find the highest score in our list: [95, 81, 76, 99, 88, 72].

Our algorithm in plain English would be:

Create a variable, let's call it highest_score_so_far, and set it to the first score in the list.
Go through the rest of the scores, one by one.
For each score, compare it to highest_score_so_far.
If the current score is higher, update highest_score_so_far to this new score.
If it's not higher, do nothing and move to the next score.
Once you've checked all the scores, highest_score_so_far will hold the answer.

Let's trace this with a table, which is a great way to represent our plan. (LO 4.2.A)

Current Score Being Examined	`highest_score_so_far` (before check)	Is `Current Score` > `highest_score_so_far`?	`highest_score_so_far` (after check)
(Start)	(Initialized to first item: 95)	-	95
81	95	No (81 is not > 95)	95
76	95	No (76 is not > 95)	95
99	95	Yes (99 is > 95)	99
88	99	No (88 is not > 99)	99
72	99	No (72 is not > 99)	99

This table is our algorithm represented visually. It forces us to think one step at a time. By building this plan, we've defined a clear, repeatable process that a computer can follow perfectly. This skill—translating a question into a step-by-step process—is the foundation of everything we'll do with arrays and ArrayLists.

Tracing the algorithm to find the highest score.

See it in action

Variables

Narration

Step 0 / 0

Worked examples

Let's solidify this with a couple of practical examples. The key is to define the problem, then build a step-by-step plan.

Example 1

Calculating an Average

Problem: You're given the number of minutes Maya spent on her coding homework each night last week: [45, 60, 0, 90, 75, 55, 30]. Calculate the average number of minutes she spent per day.

Solution Walkthrough:

1
Identify the Goal
We need to calculate an average. The formula for an average is Total Sum / Number of Items. This tells us we need two things from the data set: the sum of all values and the count of all values.
2
Plan the Algorithm
We'll process the list one item at a time.
- Initialize a variable total_minutes to 0.
- Initialize a variable day_count to 0. (Or we can just use the known size, 7).
- Iterate through the list [45, 60, 0, 90, 75, 55, 30].
- For each number, add it to total_minutes.
- After the loop is finished, divide total_minutes by the number of items (7).
3
Trace the Plan

Current Item	`total_minutes` (before add)	`total_minutes` (after add)
(Start)	0	0
45	0	45
60	45	105
0	105	105
90	105	195
75	195	270
55	270	325
30	325	355

Final Calculation: The loop is done. total_minutes is 355. The number of items is 7. Average = 355 / 7 ≈ 50.71 minutes.

Example 2

Counting Items that Meet a Condition

Problem: A list represents the point values of prizes at a school carnival: [5, 20, 100, 10, 20, 50, 5, 100]. How many prizes are "big ticket" items, worth 50 points or more?

Solution Walkthrough:

1
Identify the Goal
We aren't summing or averaging. We are counting how many items meet a specific condition (value >= 50).
2
Plan the Algorithm
- Initialize a counter variable, big_ticket_count, to 0. This is our bucket.
- Iterate through the list [5, 20, 100, 10, 20, 50, 5, 100], one item at a time.
- For each item, ask a question: "Is this value greater than or equal to 50?"
- If the answer is yes, add 1 to big_ticket_count.
- If the answer is no, do nothing.
- After checking all items, big_ticket_count will hold our answer.
3
Trace the Plan

Current Item	Is `Item >= 50`?	`big_ticket_count`
(Start)	-	0
5	No	0
20	No	0
100	Yes	1
10	No	1
20	No	1
50	Yes	2
5	No	2
100	Yes	3

Final Result: After checking the whole list, the final value of big_ticket_count is 3. There are 3 "big ticket" items.

Tracing the average calculation for Maya's homework times.

Avoid calculating average inside the loop; sum first, then divide.

Try it yourself

Time to put these ideas into practice. Don't write code—just think through the algorithm and trace it on paper.

Problem 1: Finding the Lowest Bid

You're helping plan a school event and have collected bids from several catering companies for a pasta dinner. The bids are: [$15.50, $12.00, $18.00, $14.25, $12.50]. Your goal is to find the lowest bid.

Your Task
Describe the algorithm in plain English. Then, create a trace table similar to the "find the highest score" example to prove your algorithm works.
Hint
This is the mirror image of finding the maximum. What variable will you need? What will you initialize it to? What question will you ask at each step?

Problem 2: Counting Rainy Days

You have a data set representing daily rainfall in inches for Seattle over 10 days: [0.0, 0.5, 0.2, 0.0, 0.0, 1.1, 0.8, 0.1, 0.0, 0.3]. You want to know on how many days it actually rained.

Your Task
Describe the algorithm to count the number of days with rainfall greater than 0.0.
Hint
What variable do you need to keep track of your count? What is the condition that causes you to increment that counter?

Tracing the algorithm to find the lowest bid.

Tracing the algorithm to count rainy days.

TL;DR

In simple terms, a data set is a group of related facts, and this lesson is about making a step-by-step plan to go through those facts one-by-one to answer a question.

Key terms

AP Computer Science A Data set Algorithm Data processing AP CSA Unit 4 Programming logic Data analysis Iteration AP prep Computer science basics ===END===

You can now…

4.2.A: Represent patterns and algorithms that involve data sets found in everyday life using written language or diagrams.

Essential knowledge (exam-tested)

4.2.A.1: A data set is a collection of specific pieces of information or data.
4.2.A.2: Data sets can be manipulated and analyzed to solve a problem or answer a question. When analyzing data sets, values within the set are accessed and utilized one at a time and then processed according to the desired outcome.
4.2.A.3: Data can be represented in a diagram by using a chart or table. This visual can be used to plan the algorithm that will be used to manipulate the data.

Concept map

flowchart TD
    A[Start] --> B{Initialize `max_so_far` to the first item};
    B --> C{For each remaining item in the list...};
    C --> D{Is current item > `max_so_far`?};
    D -- Yes --> E[Update `max_so_far` to current item];
    D -- No --> F[Move to next item];
    E --> F;
    C -- Done with all items --> G[End: `max_so_far` holds the maximum value];
    F --> C;

Read what Saavi narrates

Hi everyone, it's Saavi. Let's talk about something we all do without even thinking about it: working with lists of information.

Imagine your school's soccer team is running a fundraiser, and you've jotted down the number of water bottles sold each day: 15, 22, 18, and so on. That list of numbers is a data set. Now, if the coach asks for the best sales day, you need a process to find the answer. You can't just guess.

That's our main idea today. We're learning how to take a collection of information, a data set, and create a step-by-step plan, an algorithm, to get answers from it.

The most important rule is to process the data one item at a time. Think of a cashier scanning your groceries. They scan one item, add its price to the total, and move to the next. They don't try to add it all up at once. We need to think like that cashier.

Let's try a quick example. Say we have the daily high temperatures for a week in Boston: 55, 58, 52, 61, and 60 degrees. We want the average temperature.

First, we need a plan. To get an average, we need the total sum of the temperatures. So, our plan is to go through the list, one day at a time, and add the temperature to a running total.

We'll start with a variable, let's call it `total_temp`, and set it to zero. This is a crucial step. A really common mistake is forgetting to initialize your variables. If you try to add to a variable that has no starting value, the computer doesn't know what to do. So, always start your sums at zero.

Okay, `total_temp` is zero. First day is 55. Add it. Our total is now 55. Next is 58. Add it to 55... our total becomes 113. Next is 52. Add it... total is 165. Then 61... total is 226. Finally, 60... our grand total is 286.

Now that we've gone through the *entire* list, we can do the final step. We divide our total of 286 by the number of days, which was 5. That gives us an average of 57.2 degrees.

See how we trusted the process? We focused on one simple step—adding the current number—and repeated it. That's the heart of working with data sets. Keep practicing this way of thinking, and you'll be ready for anything the exam throws at you. You've got this.

Back to AP Computer Science A Take a mock exam

Settings & Accessibility