Free for students · Ad-free · WCAG 2.1 AA Compliant · Accessibility

Introduction to Using Data Sets

Lesson ~11 min read 8 MCQs

In simple terms: In simple terms, a data set is a group of related facts, and this lesson is about making a step-by-step plan to go through those facts one-by-one to answer a question.

Why this matters

Imagine your school's soccer team is running a fundraiser. You're selling team-branded water bottles, and you have a simple notebook where you've jotted down the number of bottles sold each day for the past two weeks: 15, 22, 18, 25, 30, 12, 9, 17, 20, 24, 28, 31, 19, 21.

The soccer team's daily sales data set.

The coach comes to you and asks, "Great job! What was our single best sales day? And what was our average for the two weeks?"

Suddenly, that simple list of numbers isn't just a list anymore. It's a data set. To answer the coach's questions, you can't just stare at the whole list at once. You need a process. You need a way to go through it systematically.

That's exactly what we're learning today: how to think about collections of data and create a clear, logical plan—an algorithm—to pull useful information out of them.

Concept overview

flowchart TD
    A[Start] --> B{Initialize `max_so_far` to the first item};
    B --> C{For each remaining item in the list...};
    C --> D{Is current item > `max_so_far`?};
    D -- Yes --> E[Update `max_so_far` to current item];
    D -- No --> F[Move to next item];
    E --> F;
    C -- Done with all items --> G[End: `max_so_far` holds the maximum value];
    F --> C;
This flowchart diagram illustrates an algorithm to find the maximum value in a data set. It shows the steps of initializing a variable, looping through each item, comparing it to the current maximum, and updating the maximum if a larger value is found. The process repeats until all items are checked.

Core explanation

Welcome to the world of data collections! It might sound intimidating, but the core idea is something you do every day.

What is a Data Set?

A data set is just a collection of related pieces of information. That's it. (EK 4.2.A.1)

Think about:

  • The contacts in your phone: a data set of names and numbers.
  • Your favorite Spotify playlist: a data set of songs.
  • A grocery list: a data set of items you need to buy.

In computer science, we often work with data sets of numbers. For example, a list of final exam scores for a class: [95, 81, 76, 99, 88, 72]. Each number is a piece of data, and together they form a data set.

A data set of final exam scores.

The Goal: Answering Questions with Data

We don't collect data just for fun. We use it to solve problems or answer questions. (EK 4.2.A.2)

Looking at our list of scores [95, 81, 76, 99, 88, 72], we could ask:

  • What is the highest score? (99)
  • What is the lowest score? (72)
  • How many students passed? (Assuming a pass is >= 65, all 6 of them)
  • What is the average score?

To answer these, we need to manipulate and analyze the data. And that brings us to the single most important concept in this lesson.

The "One at a Time" Principle

When we process a data set, we almost always do it one item at a time.

Imagine you're a cashier at Target. A customer, Carlos, comes up with a full cart. You don't try to look at all the items at once and guess the total. That would be impossible! Instead, you follow a simple algorithm:

  1. Pick up one item.
  2. Scan its barcode.
  3. Add its price to a running total.
  4. Set the item aside.
  5. Repeat until the cart is empty.

Computers work the same way. They aren't magical; they're just incredibly fast at following simple, repetitive instructions. When we give a computer a data set, it iterates (or loops) through it, looking at just one value at a time to perform a calculation. (EK 4.2.A.2)

Planning Your Algorithm with a Diagram

Before you write a single line of code, you should have a plan. For data sets, one of the best ways to plan is to use a simple table or chart. This helps you visualize the "one at a time" process. (EK 4.2.A.3)

Let's make a plan to find the highest score in our list: [95, 81, 76, 99, 88, 72].

Our algorithm in plain English would be:

  1. Create a variable, let's call it highest_score_so_far, and set it to the first score in the list.
  2. Go through the rest of the scores, one by one.
  3. For each score, compare it to highest_score_so_far.
  4. If the current score is higher, update highest_score_so_far to this new score.
  5. If it's not higher, do nothing and move to the next score.
  6. Once you've checked all the scores, highest_score_so_far will hold the answer.

Let's trace this with a table, which is a great way to represent our plan. (LO 4.2.A)

Current Score Being Examined highest_score_so_far (before check) Is Current Score > highest_score_so_far? highest_score_so_far (after check)
(Start) (Initialized to first item: 95) - 95
81 95 No (81 is not > 95) 95
76 95 No (76 is not > 95) 95
99 95 Yes (99 is > 95) 99
88 99 No (88 is not > 99) 99
72 99 No (72 is not > 99) 99

This table is our algorithm represented visually. It forces us to think one step at a time. By building this plan, we've defined a clear, repeatable process that a computer can follow perfectly. This skill—translating a question into a step-by-step process—is the foundation of everything we'll do with arrays and ArrayLists.

Tracing the algorithm to find the highest score.

See it in action

Variables
Narration
Step 0 / 0

Worked examples

Let's solidify this with a couple of practical examples. The key is to define the problem, then build a step-by-step plan.

Example 1

Calculating an Average

Problem: You're given the number of minutes Maya spent on her coding homework each night last week: [45, 60, 0, 90, 75, 55, 30]. Calculate the average number of minutes she spent per day.

Solution Walkthrough:

  1. 1
    Identify the Goal
    We need to calculate an average. The formula for an average is Total Sum / Number of Items. This tells us we need two things from the data set: the sum of all values and the count of all values.
  2. 2
    Plan the Algorithm
    We'll process the list one item at a time.
    • Initialize a variable total_minutes to 0.
    • Initialize a variable day_count to 0. (Or we can just use the known size, 7).
    • Iterate through the list [45, 60, 0, 90, 75, 55, 30].
    • For each number, add it to total_minutes.
    • After the loop is finished, divide total_minutes by the number of items (7).
  3. 3
    Trace the Plan
Current Item total_minutes (before add) total_minutes (after add)
(Start) 0 0
45 0 45
60 45 105
0 105 105
90 105 195
75 195 270
55 270 325
30 325 355
  1. Final Calculation: The loop is done. total_minutes is 355. The number of items is 7. Average = 355 / 7 ≈ 50.71 minutes.
Example 2

Counting Items that Meet a Condition

Problem: A list represents the point values of prizes at a school carnival: [5, 20, 100, 10, 20, 50, 5, 100]. How many prizes are "big ticket" items, worth 50 points or more?

Solution Walkthrough:

  1. 1
    Identify the Goal
    We aren't summing or averaging. We are counting how many items meet a specific condition (value >= 50).
  2. 2
    Plan the Algorithm
    • Initialize a counter variable, big_ticket_count, to 0. This is our bucket.
    • Iterate through the list [5, 20, 100, 10, 20, 50, 5, 100], one item at a time.
    • For each item, ask a question: "Is this value greater than or equal to 50?"
    • If the answer is yes, add 1 to big_ticket_count.
    • If the answer is no, do nothing.
    • After checking all items, big_ticket_count will hold our answer.
  3. 3
    Trace the Plan
Current Item Is Item >= 50? big_ticket_count
(Start) - 0
5 No 0
20 No 0
100 Yes 1
10 No 1
20 No 1
50 Yes 2
5 No 2
100 Yes 3
  1. Final Result: After checking the whole list, the final value of big_ticket_count is 3. There are 3 "big ticket" items.
Tracing the average calculation for Maya's homework times.
Avoid calculating average inside the loop; sum first, then divide.

Try it yourself

Time to put these ideas into practice. Don't write code—just think through the algorithm and trace it on paper.

Problem 1: Finding the Lowest Bid

You're helping plan a school event and have collected bids from several catering companies for a pasta dinner. The bids are: [$15.50, $12.00, $18.00, $14.25, $12.50]. Your goal is to find the lowest bid.

  • Your Task
    Describe the algorithm in plain English. Then, create a trace table similar to the "find the highest score" example to prove your algorithm works.
  • Hint
    This is the mirror image of finding the maximum. What variable will you need? What will you initialize it to? What question will you ask at each step?

Problem 2: Counting Rainy Days

You have a data set representing daily rainfall in inches for Seattle over 10 days: [0.0, 0.5, 0.2, 0.0, 0.0, 1.1, 0.8, 0.1, 0.0, 0.3]. You want to know on how many days it actually rained.

  • Your Task
    Describe the algorithm to count the number of days with rainfall greater than 0.0.
  • Hint
    What variable do you need to keep track of your count? What is the condition that causes you to increment that counter?
Tracing the algorithm to find the lowest bid.
Tracing the algorithm to count rainy days.