The Python Coding Stack • by Stephen Gruppetta

Share this post

Pay As You Go • Generate Data Using Generators (Data Structure Categories #7)

thepythoncodingstack.substack.com

Discover more from The Python Coding Stack • by Stephen Gruppetta

I write the articles I wish I had when I was learning Python programming I learn through narratives, stories. And I communicate in the same way, with a friendly and relaxed tone, clear and accessible
Over 1,000 subscribers
Continue reading
Sign in

Pay As You Go • Generate Data Using Generators (Data Structure Categories #7)

Generators • Part 7 of the Data Structure Categories Series

Stephen Gruppetta
Aug 31, 2023
3
Share this post

Pay As You Go • Generate Data Using Generators (Data Structure Categories #7)

thepythoncodingstack.substack.com
2
Share

It's often useful to think of data structures as storage units with objects inside them. A list could be a shelf with numbered boxes in a row. And you can picture a dictionary as a locker system where each locker has a label.

In these examples, the items are stored in these storage units. However, you could also have a structure that creates the objects as and when they're needed. You can imagine a 3D printer that creates the item you need when you need it rather than storing it in a unit.

A generator doesn't store any of its data. Instead, it creates each item when it's needed. If this sounds familiar, it's because you read something similar in the previous article in this series about iterators. A generator is an iterator. We'll see what makes a generator a generator in the rest of this article.


I’m grateful to all subscribers on The Python Coding Stack. If you want to support the publication further, you may want to consider a paid subscription.


The Data Structure Categories Series

We reached the final article in this series. You can read the previous ones by following the links in this overview:

  • Iterable

  • Sequence

  • Mapping

  • Container

  • Collection

  • Iterator

  • Generator (this article)

Generators, Generators, and Generators

When we use the term "generators", we normally refer to generator iterators. However, we could also be referring to generator functions or generator expressions.

Confused? Let's start clearing some of that confusion. And the best way to do so is to look at some examples.

Generator expressions

Let's start with generator expressions.

In fact, let's not start with generator expressions. Let's start with list comprehensions instead:

The expression within the square brackets creates a list containing all the numbers the expression represents. The result is the list [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]. The list numbers contains all the elements within it.

Let's replace the square brackets with parentheses—they're the round brackets. How much of a difference can this small change in the type of brackets make? [Spoiler alert: a lot]

The first line is similar to a list comprehension. But beware, it's not a tuple comprehension, even though round brackets replace the square brackets. After all, it's not the round brackets that make a tuple.

This is a generator expression. It creates a generator iterator, which is assigned to numbers_gen. The object created, the generator iterator, doesn't store any data. Nor does it reference data stored elsewhere. Instead, it will generate the data when it's needed.

Note that generator iterators are iterators. As you read in the previous article in this series, one way of getting the next item from an iterator is to use the built-in function next():

And you can keep calling next() until you run out of values to generate:

We'll talk about generator iterators in more detail later.

Note for the pedants: I'm using the term "storing data" for lists and other data structures that contain data. However, these structures don't store the data within them. Instead, they hold references to objects stored elsewhere. However, this point is not too relevant in the context of this article. Therefore, for the sake of this article, I'll continue to use the term "store" to differentiate between data structures like lists, which hold data, and generator iterators, which do not.

Generator functions

Let's look at another way of creating generator iterators. You can define the following function:

This function definition is similar to a standard function definition but uses the yield keyword instead of return. A function definition that contains a yield statement is a generator function. A generator function also creates a generator iterator. Let's try this out:

The name numbers refers to a generator object. Incidentally, the generic terms "generator object" and "generator" normally refer to a generator iterator.

When you create a generator object using a generator function, the function is paused at the beginning. Each time you call next(), the function will execute all the lines up to and including the yield statement. Then it will pause again:

The print() function was executed, and the value 42 returned by next(). The generator numbers is currently in a paused state waiting for the following next() call. However, when this occurs, there's nothing left in the function:

This raises a StopIteration exception. You've seen this StopIteration earlier in this article and when you read about iterators in the previous article in this series.

The following example will shed a bit more light on the process:

When you create the generator object using the generator function, the function is paused at the start. Here are the steps that occur when you start calling next():

  1. The first time you call next(), the function starts from the beginning and prints out "I was here!". It also gives back the value 42. The function pauses just after it yields this number and waits for the next time it's needed.

  2. The second time you call next(), the function resumes from where it left earlier. It prints the second phrase, "I already told you: I was here", and gives back the number in the second yield statement, 84. The function pauses at this point.

  3. The third time you call next(), the function carries on and prints the third phrase and gives back the value in the third yield statement.

  4. Finally, the fourth call to next() raises a StopIteration since the generator function has reached the end.

Earlier in this article, you used a generator expression to make a generator that creates the doubles of the numbers from 0 to 9. You can replicate this generator using a generator function:

You create the generator numbers from the generator function get_some_numbers(). The first time you call next(), the for loop in the generator function starts iterating. However, the function will pause each time there's a yield statement. The following calls to next() resume the for loop and yield the following number.

You can also create a new generator iterator using the same generator function. Note that in the code so far, you consumed the first three numbers of the first generator numbers before you create the second generator:

The two generators, numbers and numbers_again, are independent of each other even though they're created from the same generator function.

And you can also consume the generator using a for loop:

Note that you had already used up the first few values of numbers. Therefore, the loop resumed from the next available value.

Generator iterators

We've seen three distinct uses of the term "generator". Generator expressions and generator functions create generator iterators. Often, you'll see the term "generator" or "generator object" used to refer to the generator iterator.

I left the section about generator iterators for last since I've already introduced these and discussed them in the previous sections.

A generator iterator is an iterator, as the name implies. It doesn't hold any of its data, but generates values when they're needed. It operates on a "pay as you go" basis. You don't need to invest in creating the items and storing them before you need them, as you do in a list or other structures that store data.

Like all iterators, generator objects are iterable. You can use them in for loops and wherever you need iterables. However, generators don't have a size, and they're not containers.


Etymology Corner

The term “generator” comes from the Latin generare, which means “to produce”. The root "gene-" has older origins and means "give birth". Therefore, a generator produces or gives birth to an item, one at a time!


This brings us to the end of this seven-part series on Data Structure Categories. Here's the diagram I presented in previous articles showing the hierarchy of the categories I covered.

In the diagram, you can observe the three categories that sit at the top of the hierarchy: iterable, container, and sized. Many data types you’re familiar with belong to all three of these.

You can also see the branch containing iterators and generators as quite separate from the others since they don't contain the data the represent. This makes them more lightweight and memory-efficient.

When you code, you deal with data types all the time. However, what matters most often is not the data type itself, but the properties you want to use. By considering the categories of these data types—whether they're iterable or sized, say—rather than just the types themselves, you can focus on the properties that are crucial when choosing the right data type.

Code in this article uses Python 3.11


The Python Coding Stack • by Stephen Gruppetta is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


Stop Stack

#27

  • Recently published articles on The Python Coding Stack:

    • Clearing The Deque—Tidying My Daughter's Soft Toys • A Python Picture Story Exploring Python's deque data structure through a picture story. [It's pronounced "deck"]

    • The Final Year at Hogwarts School of Codecraft and Algorithmancy (Harry Potter OOP Series #7) Year 7 at Hogwarts School of Codecraft and Algorithmancy • Class methods and static methods

    • Tap, Tap, Tap on The Tiny turtle Typewriter. A mini-post using functools.partial() and lambda functions to hack keybindings in Python's turtle module

    • Python Quirks? Party Tricks? Peculiarities Revealed… (Paid article) Three "weird" Python behaviours that aren't weird at all

    • The Mayor of Py Town's Local Experiment: A Global Disaster. Why variables within functions are local

  • Recently published articles on Breaking the Rules, my other substack about narrative technical writing:

    • Frame It • Part 2 (Ep. 9). Why and when to use story-framing

    • The Rhythm of Your Words (Ep. 8). Can you control your audience's pace and rhythm when they read your article?

    • A Near-Perfect Picture (Ep. 7). Sampling theory for technical article-writing • Conceptual resolution

    • The Wrong Picture (Ep. 6). How I messed up • When an analogy doesn't work

    • The Broom and the Door Frame (Ep. 5). How the brain deals with stories

  • Stats on the Stack

    • Age: 4 months, 2 weeks, and 5 days old

    • Number of articles: 27

    • Subscribers: 957

  • Each article is the result of years of experience and many hours of work. Hope you enjoy each one and find them useful. If you're in a position to do so, you can support this Substack further with a paid subscription. In addition to supporting this work, you'll get access to the full archive of articles and some paid-only articles.

3
Share this post

Pay As You Go • Generate Data Using Generators (Data Structure Categories #7)

thepythoncodingstack.substack.com
2
Share
Previous
Next
2 Comments
Share this discussion

Pay As You Go • Generate Data Using Generators (Data Structure Categories #7)

thepythoncodingstack.substack.com
Horst JENS
Aug 31Liked by Stephen Gruppetta

A very nice series of articles, thank you for writing it!

Expand full comment
Reply
Share
1 reply by Stephen Gruppetta
1 more comment...
Top
New
Community

No posts

Ready for more?

© 2023 Stephen Gruppetta
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing