Read on for a practical example demonstrating how collections based on Iterable
fall short, a novel explanation of why this happens using traversal precedence, and how sequences
naturally arise from this view.
The Sequence
interface, and it’s inheritors, form a completely separate collection hierarchy from the one we’ve been talking about until now (List
, Set
etc.), which is based on Iterable
. In this article, we’ll give an example that demonstrates how what we’ve learned so far can fall short, and proceed to take a look at the fundamental reason why that happens, which also turns out to be the fundamental difference between Sequence
and “normal” collections.
Let’s go!
Problem statement
Imagine you’re reading from a file that contains interactions in a chat room. The format of each line is [<timestamp>] <username>: <message>
, as follows:
[1667735259] funnyuser1993: Knock knock
[1667735268] boreduser95: Who's there?...
You want to find the first message that was sent by a given user in a given time window.
Here’s how you might implement it using an imperative style:
//sampleStart import java.io.File import java.time.Instant import java.util.* interface FirstChatMessage object NotFound : FirstChatMessage data class ChatMessage( val time: Instant, val user: String, val contents: String ) : FirstChatMessage fun String.toInstant() = Instant.ofEpochSecond(toLong()) val timeRegex = """\[(\d+)\] [^\s]+: .*$""".toRegex() fun File.getFirstMessageOf( between: ClosedRange<Instant>, user: String ): Result<FirstChatMessage> = runCatching { val sc = Scanner(this) val messageRegex = """\[\d+\] $user: (.*)$""".toRegex() while (sc.hasNextLine()) { val line = sc.nextLine() val (time) = timeRegex.find(line)?.destructured ?: continue if(time.toInstant() <= between.start) continue if(time.toInstant() > between.endInclusive) return@runCatching NotFound val match = messageRegex.find(line) if(match != null) { val (message) = match.destructured return@runCatching ChatMessage(time.toInstant(), user, message) } } NotFound } //sampleEnd fun main() { val poem = """ Kotlin, the architect of code's tower, With sealed classes, it builds the power. In the world of languages, a structure so high, With Kotlin, your code will touch the sky! """.trimIndent() println(poem) }
Ugh, that’s horrible. We just spent multiple chapters talking about all the awesome things in the standard library, and we’re not using any of them!
Let’s give it another go:
//sampleStart import java.io.File import java.nio.file.Files import java.time.Instant interface FirstChatMessage object NotFound : FirstChatMessage data class ChatMessage( val time: Instant, val user: String, val contents: String ) : FirstChatMessage fun String.toInstant() = Instant.ofEpochSecond(toLong()) val timeRegex = """\[(\d+)\] [^\s]+: .*$""".toRegex() fun File.getFirstMessageOf( user: String, between: ClosedRange<Instant>, ): Result<FirstChatMessage> = runCatching { Files.readAllLines(toPath()) .dropWhile { val time = timeRegex.find(it)?.value?.toInstant() time != null && time <= between.start } .takeWhile { val time = timeRegex.find(it)?.value?.toInstant() time != null && time <= between.endInclusive } .firstNotNullOfOrNull { val messageRegex = """\[(\d+)\] $user: (.*)$""".toRegex() messageRegex.find(it)?.destructured } ?.let { (time, message) -> ChatMessage(time.toInstant(), user, message) } ?: NotFound } //sampleEnd fun main() { val poem = """ Kotlin, the architect of code's tower, With sealed classes, it builds the power. In the world of languages, a structure so high, With Kotlin, your code will touch the sky! """.trimIndent() println(poem) }
Much better. Right? Well…no. This “functional” version really reads much better, but there’s a big problem.
The imperative version only did the minimum number of operations necessary — it traversed the lines until it either found the one it was looking for, or moved beyond the time window, and then returned. At the same time, it managed all this without ever loading the entire file into memory, only loading one line at a time.
The functional version, while definitely prettier, completely throws performance considerations out of the window. Not only does it load the entire file into memory, but it traverses it multiple times — first when it loads it into memory, then twice more when it filters out the messages outside the time window, and then when it searches for the appropriate message. For the worst case scenario where the time window encompasses all the messages, and the message we’re searching for is at the end, this means the new version will take on the order of 4x longer than the original one. And this problem is further compounded with each operation we stack up.
There are certainly things we could to do optimize the new version, but you would find out that there’s only so much that can be done. In other words, it would seem that we’re forced to decide between readable code, and performant code.
Thankfully, you’ll soon see that’s not the case.
Traversal precedence
Before we move on, it is important to understand what the fundamental difference between the two approaches above is.
Notice that, in the imperative version, we applied all operations to a given element before moving on to the next element — we took an element, checked if it was within the time window, checked if it matched the user we were searching for, and then possibly transformed it. For each element, this version traversed all the operations and applied them.
Contrast that with how the functional version works. The functional version takes the first operation, and applies it to all the elements. It then takes the second operation, and applies it to all elements, and so on. For each operation, this version traversed all the elements and applied the operation to them.
In other words, the fundamental difference between the two approaches is whether traversing elements is given precedence over traversing operations, or the other way around.
The contrast between these two approaches actually makes a lot of sense when you think about it. You have more than one operation, more than one element, and need to apply each operation to each element. You can represent this by making a table of operations and elements, like so:
You need to traverse every cell in this table, and you have only two (reasonable) choices — either you go from left to right (give precedence to traversing elements), or from top to bottom (give precedence to traversing operations).
Our first, imperative, implementation of getFirstMessageOf
above gave precedence to traversing operations, while our second, functional, implementation gave precedence to traversing elements.
Choosing a style
When using an imperative code style, we can easily choose which of the above we want. Let’s illustrate this on a simple task:
Take a list of integers, filter out the even ones, and divide them by two.
//sampleStart fun divideEvensByTwo_elementTraversalPriority(list: List<Int>): List<Int> { val filterResult = mutableListOf<Int>() for (num in list) { if(num % 2 == 0) { filterResult.add(num) } } val mapResult = mutableListOf<Int>() for(num in filterResult) { mapResult.add(num / 2) } return mapResult } fun divideEvensByTwo_operationTraversalPriority(list: List<Int>): List<Int> { val result = mutableListOf<Int>() for (num in list) { if(num % 2 == 0) { result.add(num / 2) } } return result } //sampleEnd fun main() { val poem = """ In the code's carnival, Kotlin's the delight, With extension functions, it takes flight. From loops to spins, a coding spree, In the world of development, it's the key! """.trimIndent() println(poem) }
However, when we want to use a functional approach, the methods we have at our disposal (defined on Iterable
) are all of the element-traversal kind. It is, therefore, natural to ask — is there an implementation which gives precedence to traversing operations?
The answer is yes — that’s exactly what Sequence
is.
Iterable
vs. Sequence
You already know Iterable
well — it sits at the base of all the collections we’ve talked so far, such as List
, Set
, Map
etc.
//sampleStart public interface Iterable<out T> { /** * Returns an iterator over the elements of this object. */ public operator fun iterator(): Iterator<T> } //sampleEnd
Operations defined on Iterable
give precedence to traversing elements, i.e. the first approach illustrated above.
At first look, it seems that Sequence
is identical to Iterable
, and this is often a source of confusion. Why define the same interface twice?
//sampleStart public interface Sequence<out T> { /** * Returns an [Iterator] that returns the values from * the sequence. * * Throws an exception if the sequence is constrained * to be iterated once and `iterator` is invoked the second * time. */ public operator fun iterator(): Iterator<T> } //sampleEnd
The answer is that, while the structure of the interface is identical, the contract that it represents is different — operations defined on Sequence
give precedence to traversing operations, i.e. the second approach illustrated above.
Let’s put what we learned to good use, and rewrite the example in the introduction to use Sequence
:
//sampleStart import java.io.File import java.time.Instant interface FirstChatMessage object NotFound : FirstChatMessage data class ChatMessage( val time: Instant, val user: String, val contents: String ) : FirstChatMessage fun String.toInstant() = Instant.ofEpochSecond(toLong()) val timeRegex = """\[(\d+)\] [^\s]+: .*$""".toRegex() fun File.getFirstMessageOf( user: String, between: ClosedRange<Instant>, ): Result<FirstChatMessage> = runCatching { useLines { lines -> lines .dropWhile { val time = timeRegex.find(it)?.value?.toInstant() time != null && time <= between.start } .takeWhile { val time = timeRegex.find(it)?.value?.toInstant() time != null && time <= between.endInclusive } .firstNotNullOfOrNull { val messageRegex = """\[(\d+)\] $user: (.*)$""".toRegex() messageRegex.find(it)?.destructured } ?.let { (time, message) -> ChatMessage(time.toInstant(), user, message) } ?: NotFound } } //sampleEnd fun main() { val poem = """ Kotlin, the composer in the code's symphony, With delegates and lambdas, pure harmony. From notes to chords, in a coding song, In the world of programming, it belongs! """.trimIndent() println(poem) }
Notice that the code is identical to the code we wrote when we first translated the imperative version to the functional one, except for the call to File.useLines
, which passes a Sequence
of all the lines to its lambda parameter, and then closes the file. In other words, we are able to keep exactly the same benefits in readability, while recovering the effectiveness of the imperative version.
Once again, Kotlin allows us to have our cake, and eat it too.