Longest Collatz Sequence
The following iterative sequence is defined for the set of positive integers:
\(n \to n/2\) (\(n\) is even)<\br> \(n \to 3n + 1\) (\(n\) is odd)
Using the rule above and starting with \(13\), we generate the following sequence: \begin{align} 13 \to 40 \to 20 \to 10 \to 5 \to 16 \to 8 \to 4 \to 2 \to 1. \end{align}
It can be seen that this sequence (starting at \(13\) and finishing at \(1\)) contains \(10\) terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at \(1\).
Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million.
Solution
Let’s make aa brute-force solution and see how it works.
We can add one optimization that cuts the execution time in half. Due to how the sequences are calculated, a collatzLength(2n) is always bigger than collatzLength(n) by 1.
So, we can check only the upper half of the range to get the maximum length of the sequence. Here’s the code.
import scala.annotation.tailrec
object Euler014 extends EulerApp {
override def execute(): Any = (500_000 until 1_000_000).maxBy(collatzLength)
private def collatzLength(n: Int): Int = {
@tailrec
def collatzLength(n: Long, currentLength: Int): Int = n match {
case 1 => currentLength + 1
case n if n % 2 == 0 => collatzLength(n / 2, currentLength + 1)
case n => collatzLength(n * 3 + 1, currentLength + 1)
}
collatzLength(n, 0)
}
}
My first though when seeing this problem was to use the dynamic programming. After all, we’ll keep hitting the same sub-sequences (such as 16 -> 8 -> 4 -> 2 -> 1) many times.
I implemented it using a map while keeping it immutable, but it wasn’t a great idea. The code ran much slower. In fact, even with a mutable map it was around 50% slower than the brute-force one.
Often it’s worth starting out from a basic solution. This way there’s less room for mistake, and we have a baseline result that we can improve further.
Mutable solution
Let’s think for a while how can we improve the brute-force solution.
As I’ve mentioned, some operations are done multiple times. We can memoize them, in an array-based cache for fast lookups. Why array? Because map would be much slowe, and we know its max size.
In fact, we don’t have to cache all the values. You can try it out yourself by limiting the cacheLimit: to 100_000.
It should be slower, but still faster than the brute-force solution.
import scala.annotation.tailrec
object Euler014M extends EulerApp {
override def execute(): Any = (1 until 1_000_000).maxBy(collatzLength)
private val cacheLimit = 1_000_000
private val cache = new Array[Int](cacheLimit)
cache(1) = 1
private def collatzLength(base: Int): Int = {
@tailrec
def compute(n: Long, length: Int): Int = {
if (n < cacheLimit && cache(n.toInt) != 0) {
length + cache(n.toInt)
} else if (n % 2 == 0) {
compute(n / 2, length + 1)
} else {
compute(n * 3 + 1, length + 2)
}
}
val result = compute(base, 0)
if (base < cacheLimit) cache(base) = result
result
}
}
As you see we get rid of the optimization to start from 500_000. That’s because, it turns out, it works slightly slower than starting from the beginning, because our cache has a lot of misses this way, and we don’t backtrace the values to fill it.
We could try to do it, but is it worth increasing the complexity of the solution even further, if it already runs in milliseconds? Sometimes it is, but here, I’ll pass.
Epilogue
If it wasn’t a computational problem, but a business-oriented one, then the base, immutable solution would be the one preferred in almost all the cases, due to how simple and safe it is.
The mutable solution has much more places where errors can be made. For example, the ‘n.toInt’ is only safe because we check right before that n is below a cache limit, which prevents the overflow.