Discover more from The Python Coding Stack • by Stephen Gruppetta
Pay As You Go • Generate Data Using Generators (Data Structure Categories #7)
Generators • Part 7 of the Data Structure Categories Series
It's often useful to think of data structures as storage units with objects inside them. A list could be a shelf with numbered boxes in a row. And you can picture a dictionary as a locker system where each locker has a label.
In these examples, the items are stored in these storage units. However, you could also have a structure that creates the objects as and when they're needed. You can imagine a 3D printer that creates the item you need when you need it rather than storing it in a unit.
A generator doesn't store any of its data. Instead, it creates each item when it's needed. If this sounds familiar, it's because you read something similar in the previous article in this series about iterators. A generator is an iterator. We'll see what makes a generator a generator in the rest of this article.
I’m grateful to all subscribers on The Python Coding Stack. If you want to support the publication further, you may want to consider a paid subscription.
The Data Structure Categories Series
We reached the final article in this series. You can read the previous ones by following the links in this overview:
Generators, Generators, and Generators
When we use the term "generators", we normally refer to generator iterators. However, we could also be referring to generator functions or generator expressions.
Confused? Let's start clearing some of that confusion. And the best way to do so is to look at some examples.
Let's start with generator expressions.
In fact, let's not start with generator expressions. Let's start with list comprehensions instead:
The expression within the square brackets creates a list containing all the numbers the expression represents. The result is the list
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]. The list
numbers contains all the elements within it.
Let's replace the square brackets with parentheses—they're the round brackets. How much of a difference can this small change in the type of brackets make? [Spoiler alert: a lot]
The first line is similar to a list comprehension. But beware, it's not a tuple comprehension, even though round brackets replace the square brackets. After all, it's not the round brackets that make a tuple.
This is a generator expression. It creates a generator iterator, which is assigned to
numbers_gen. The object created, the generator iterator, doesn't store any data. Nor does it reference data stored elsewhere. Instead, it will generate the data when it's needed.
Note that generator iterators are iterators. As you read in the previous article in this series, one way of getting the next item from an iterator is to use the built-in function
And you can keep calling
next() until you run out of values to generate:
We'll talk about generator iterators in more detail later.
Note for the pedants: I'm using the term "storing data" for lists and other data structures that contain data. However, these structures don't store the data within them. Instead, they hold references to objects stored elsewhere. However, this point is not too relevant in the context of this article. Therefore, for the sake of this article, I'll continue to use the term "store" to differentiate between data structures like lists, which hold data, and generator iterators, which do not.
Let's look at another way of creating generator iterators. You can define the following function:
This function definition is similar to a standard function definition but uses the
yield keyword instead of
return. A function definition that contains a
yield statement is a generator function. A generator function also creates a generator iterator. Let's try this out:
numbers refers to a generator object. Incidentally, the generic terms "generator object" and "generator" normally refer to a generator iterator.
When you create a generator object using a generator function, the function is paused at the beginning. Each time you call
next(), the function will execute all the lines up to and including the
yield statement. Then it will pause again:
print() function was executed, and the value
42 returned by
next(). The generator
numbers is currently in a paused state waiting for the following
next() call. However, when this occurs, there's nothing left in the function:
This raises a
StopIteration exception. You've seen this
StopIteration earlier in this article and when you read about iterators in the previous article in this series.
The following example will shed a bit more light on the process:
When you create the generator object using the generator function, the function is paused at the start. Here are the steps that occur when you start calling
The first time you call
next(), the function starts from the beginning and prints out
"I was here!". It also gives back the value
42. The function pauses just after it yields this number and waits for the next time it's needed.
The second time you call
next(), the function resumes from where it left earlier. It prints the second phrase,
"I already told you: I was here", and gives back the number in the second
84. The function pauses at this point.
The third time you call
next(), the function carries on and prints the third phrase and gives back the value in the third
Finally, the fourth call to
StopIterationsince the generator function has reached the end.
Earlier in this article, you used a generator expression to make a generator that creates the doubles of the numbers from
9. You can replicate this generator using a generator function:
You create the generator
numbers from the generator function
get_some_numbers(). The first time you call
for loop in the generator function starts iterating. However, the function will pause each time there's a
yield statement. The following calls to
next() resume the
for loop and yield the following number.
You can also create a new generator iterator using the same generator function. Note that in the code so far, you consumed the first three numbers of the first generator
numbers before you create the second generator:
The two generators,
numbers_again, are independent of each other even though they're created from the same generator function.
And you can also consume the generator using a
Note that you had already used up the first few values of
numbers. Therefore, the loop resumed from the next available value.
We've seen three distinct uses of the term "generator". Generator expressions and generator functions create generator iterators. Often, you'll see the term "generator" or "generator object" used to refer to the generator iterator.
I left the section about generator iterators for last since I've already introduced these and discussed them in the previous sections.
A generator iterator is an iterator, as the name implies. It doesn't hold any of its data, but generates values when they're needed. It operates on a "pay as you go" basis. You don't need to invest in creating the items and storing them before you need them, as you do in a list or other structures that store data.
Like all iterators, generator objects are iterable. You can use them in
for loops and wherever you need iterables. However, generators don't have a size, and they're not containers.
The term “generator” comes from the Latin generare, which means “to produce”. The root "gene-" has older origins and means "give birth". Therefore, a generator produces or gives birth to an item, one at a time!
This brings us to the end of this seven-part series on Data Structure Categories. Here's the diagram I presented in previous articles showing the hierarchy of the categories I covered.
In the diagram, you can observe the three categories that sit at the top of the hierarchy: iterable, container, and sized. Many data types you’re familiar with belong to all three of these.
You can also see the branch containing iterators and generators as quite separate from the others since they don't contain the data the represent. This makes them more lightweight and memory-efficient.
When you code, you deal with data types all the time. However, what matters most often is not the data type itself, but the properties you want to use. By considering the categories of these data types—whether they're iterable or sized, say—rather than just the types themselves, you can focus on the properties that are crucial when choosing the right data type.
Code in this article uses Python 3.11
The Python Coding Stack • by Stephen Gruppetta is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Recently published articles on The Python Coding Stack:
Clearing The Deque—Tidying My Daughter's Soft Toys • A Python Picture Story Exploring Python's
dequedata structure through a picture story. [It's pronounced "deck"]
The Final Year at Hogwarts School of Codecraft and Algorithmancy (Harry Potter OOP Series #7) Year 7 at Hogwarts School of Codecraft and Algorithmancy • Class methods and static methods
Tap, Tap, Tap on The Tiny
turtleTypewriter. A mini-post using
lambdafunctions to hack keybindings in Python's
Python Quirks? Party Tricks? Peculiarities Revealed… (Paid article) Three "weird" Python behaviours that aren't weird at all
The Mayor of Py Town's Local Experiment: A Global Disaster. Why variables within functions are local
Recently published articles on Breaking the Rules, my other substack about narrative technical writing:
Frame It • Part 2 (Ep. 9). Why and when to use story-framing
The Rhythm of Your Words (Ep. 8). Can you control your audience's pace and rhythm when they read your article?
A Near-Perfect Picture (Ep. 7). Sampling theory for technical article-writing • Conceptual resolution
The Wrong Picture (Ep. 6). How I messed up • When an analogy doesn't work
The Broom and the Door Frame (Ep. 5). How the brain deals with stories
Stats on the Stack
Age: 4 months, 2 weeks, and 5 days old
Number of articles: 27
Each article is the result of years of experience and many hours of work. Hope you enjoy each one and find them useful. If you're in a position to do so, you can support this Substack further with a paid subscription. In addition to supporting this work, you'll get access to the full archive of articles and some paid-only articles.