A few signposts for your Scala Journey

As we have blogged about previously, we have been using Scala for much of our backend work for the last 2-3 years. On balance we think that it has worked well and continues to work well for the types of systems that we build.

Of course in that period of time, we have discovered, as many others before us travelling on the road from journeyman to master have likewise discovered, that Scala has its fair share of hidden complexity, pitfalls and hazards, in part due to its rich heritage that borrows from many languages and programming paradigms.

In fact, for me personally, I feel that with Scala, more than other languages I have learned in the past, one never arrives at mastery but is in a constant state of becoming (to paraphrase Dylan).

In this blog I will mention just a few "betcha-didn't-know" aspects of Scala that I have encountered on the long and winding road (we love our musical references as Football Radar Engineers).

These aren't necessarily the most advanced features of the language (no fancy type-level or implicit stuff here) but they can crop up in daily usage and it can be hard to find suitable documentation explaining what is happening.

So here I'll:

I hope you will find these little signposts useful on your journey to Scala mastery.

Hidden depths (or deceptions) of for-comprehensions

When you are starting out with Scala you tend to use the for-comprehension syntax quite a lot because it is vaguely familiar to the for looping constructs you would likely have encountered in other languages.

After some time you find out that for-comprehensions are really syntactic sugar for chained calls to the methods flatMap and map available on Scala collections.

At that point you probably first decide to stick with for-comprehensions because flatMap and map are scary sounding things. But then once you get over your initial misgivings you tend to mainly reach for flatMap and map, and only reach for the for-comprehension when you have long chains of flatMaps and maps and you want to make the flow more readable.

Well at least that describes a bit of my journey and I am merely guessing that it is similar for others.

But recently I learned something new about for-comprehensions that has made me start using them more and once again favouring them because they tend to be more readable. Specifically I learned that they can cover the functionality of one of my favourite work-horses, collect.

Consider the following contrived example where we have an instance of List[Try[Int]] where we want to keep only the Successs. Then, for each of the Successs that we have kept we want to double the Int value that it contains. Finally, we want to be left with an instance of List[Int].

Recall that the scala.util.Try[T] type has two subtypes Success[T] and Failure[T] for representing, respectively, whether a computation has successfully completed and returned a value or whether it failed with an exception.

Let us try this in the REPL:

scala> import scala.util.{Try, Success}  
import scala.util.{Try, Success}

scala> val tries: List[Try[Int]] = Try("1".toInt) :: Try("2".toInt) :: Try("3".toInt) :: Try("four".toInt) :: Nil  
tries: List[scala.util.Try[Int]] = List(Success(1), Success(2), Success(3), Failure(java.lang.NumberFormatException: For input string: "four"))  

A very typical attempt, using the basic collection combinators to express our desired logic, might look like the following:

scala> :paste  
// Entering paste mode (ctrl-D to finish)

tries  
  .filter { t: Try[Int] => t.isSuccess }
  .map { t: Try[Int] => t.get * 2 }

// Exiting paste mode, now interpreting.

res0: List[Int] = List(2, 4, 6)  

Sorry, of course the obligatory one-liner:

scala> tries filter (_.isSuccess) map (_.get * 2)  
res1: List[Int] = List(2, 4, 6)  

But, whenever you have code where a filter is followed by a map (or vice versa), we can immediately think of replacing the two combinators with a single use of collect, which then saves us from traversing the list twice (and in this case has the added bonus of helping us avoid calling the .get method on Try[Int] that might potentially throw an exception):

scala> tries collect { case Success(x) => x * 2 }  
res2: List[Int] = List(2, 4, 6)  

Recall that collect is defined on List (and other collection classes) as follows:

def collect[B](pf: PartialFunction[A, B]): List[B]

This indicates the collect takes a PartialFunction as a parameter (the case Success(x) => x * 2 part defined in the above listing), which you might in turn recall is a function only defined for some values of its input type.

So basically collect acts like map except it only applies the partial function to the values for which it is defined (in the above example it is defined only for Success values so it is applied to only those values in the list).

Well, it turns out that we can use a for-comprehension to achieve the same thing -- i.e. safely traverse the list and extract the elements that match the Success pattern that we want -- just as efficiently as with a collect:

scala> for { Success(x) <- tries } yield x * 2  
res3: List[Int] = List(2, 4, 6)  

This works because as I recently discovered, when there is a pattern-match on the left side of a for-comprehension generator -- i.e. Success(x) in the above listing -- what the compiler does is to expand the pattern-matching into a withFilter (a lazy version of filter). So the above translates to:

scala> :paste  
// Entering paste mode (ctrl-D to finish)

tries withFilter {  
  case Success(x) => true
  case _ => false
} map {
  case Success(x) => x * 2
}

// Exiting paste mode, now interpreting.

res4: List[Int] = List(2, 4, 6)  

That of course looks a lot like the very first attempt we started with above, but the difference is that since withFilter is lazy, we don't have to worry about traversing the list twice.

If you consult the Scala library API docs you will see the description of withFilter indicates that whereas filter *eagerly* creates a new collection, withFilter "only restricts the domain of subsequent map, flatMap, foreach, and withFilter operations."

But what I like is that the for-comprehension sugar hides all of that from you.

This corrects a misunderstanding I had about for-comprehensions. I used to think that the compiler always directly translated clauses in a for-comprehension to map or flatMap. So in my mind for { Success(x) <- l } yield x * 2 would have been de-sugared to...:

tries map { case Success(x) => x * 2 }  

...which of course will fail at runtime because our Try("four".toInt) was a Failure and not a Success:

scala> tries map { case Success(x) => x * 2 }  
<console>:10: warning: match may not be exhaustive.  
It would fail on the following input: Failure(_)  
              tries map { case Success(x) => x * 2 }
                        ^
scala.MatchError: Failure(java.lang.NumberFormatException: For input string: "four") (of class scala.util.Failure)  

Now based on that you would be justified if you complained that for-comprehensions are a bit deceptive. However, in this case I will take the glass-half-full tack and conclude that the for-comprehension has revealed that it has hidden depths!

final case class is not redundant after all

After some time, as you get more comfortable with the language, you might start to explore some of the more popular open-source Scala libraries to start to get a feel for best-practices and language idioms, and perhaps just satisfy your curiosity about how some of the constructs you use daily are implemented.

If you do this, it would not be too long before you encounter final case class. In fact if you clicked through to the source of Success from our previous example, you will see that it is defined as final case class Success(...).

final of course is a modifier from Java, and if you consult the Java documentation, you will see that "a class that is declared final cannot be subclassed" (See: The Java Tutorials -- Writing Final Classes and Methods).

For me this was a bit confusing because as I understood it, you couldn't extend a case class anyway so there was no way to create a subclass.

scala> case class Point(x: Int, y: Int)  
defined class Point

scala> case class ThreeDPoint(x: Int, y: Int, z: Int) extends Point(x, y)  
<console>:9: error: case class ThreeDPoint has case ancestor Point, but case-to-case inheritance is prohibited. To overcome this limitation, use extractors to pattern match on non-leaf nodes.  
       case class ThreeDPoint(x: Int, y: Int, z: Int) extends Point(x, y)

So surely the final is redundant, no? And all of those Scala experts, including those behind the final case class lint in Scala WartRemover, must just enjoy typing unnecessary characters. Well, to misappropriate another (obscure) musical reference, fifty million Scala developers can't be wrong.

And of course they aren't.

First, I once again had to correct one of my common misunderstandings. If you look again at the above error message, it says "case-to-case inheritance is prohibited", meaning that the following is legitimate:

scala> class ThreeDPoint(x: Int, y: Int, z: Int) extends Point(x, y)  
defined class ThreeDPoint  

However, if you declare Point as final then you get the following, whether ThreeDPoint is another case class or not:

scala> final case class Point(x: Int, y: Int)  
defined class Point

scala> class ThreeDPoint(x: Int, y: Int, z: Int) extends Point(x, y)  
<console>:9: error: illegal inheritance from final class Point  
       class ThreeDPoint(x: Int, y: Int, z: Int) extends Point(x, y)

You cannot even mix in a trait when trying to instantiate Point with the new keyword:

scala> trait ThirdDimension { val z: Int }  
defined trait ThirdDimension

scala> new Point(1, 2) with ThirdDimension { val z = 3 }  
<console>:11: error: illegal inheritance from final class Point  
              new Point(1, 2) with ThirdDimension { val z = 3 }

On the surface there doesn't seem much...err...point to this other than nailing your representations shut. However, with a little more digging, it seems that one of the real pay-offs with using final case class is that the compiler can its knowledge of the fact the type cannot be extended to help to weed out certain types of errors.

For example, imagine this (only slightly) contrived example where you previously implemented a function distance for calculating the distance between two points, where a point was represented as a sequence of Ints but now you are sensibly refactoring it to a Point case class. So you change the signature of distance:

scala> :paste  
// Entering paste mode (ctrl-D to finish)

def distance(a: Point, b: Point): Double = (a, b) match {  
  case (Seq(ax: Int, ay: Int), Seq(bx: Int, by: Int)) =>
    math.sqrt(math.pow((bx - ax), 2) + math.pow((by - ay), 2))
}

// Exiting paste mode, now interpreting.

<console>:9: error: scrutinee is incompatible with pattern type;  
 found   : Seq[A]
 required: Point
       def distance(a: Point, b: Point): Double = (a, b) match { case (Seq(ax: Int, ay: Int), Seq(bx: Int, by: Int)) =>

The compiler blocks this buggy change because once Point is declared final the compiler makes it mandatory that you change the patten-match in the distance function from a Seq to a Point because it knows that a Point can be nothing other than a Point.

Without Point being declared final you could do part of the refactoring and leave the remaining buggy implementation (although being good TDD'ers presumably you have some tests that would catch this regression!):

scala> :paste  
// Entering paste mode (ctrl-D to finish)

def distance(a: Point, b: Point): Double = (a, b) match {  
  case (Seq(ax: Int, ay: Int), Seq(bx: Int, by: Int)) =>
    math.sqrt(math.pow((bx - ax), 2) + math.pow((by - ay), 2))
}

// Exiting paste mode, now interpreting.

distance: (a: Point, b: Point)Double

scala> distance(Point(-2, -3), Point(-4, 4))  
scala.MatchError: (Point(-2,-3),Point(-4,4)) (of class scala.Tuple2)  

The compiler cannot offer any help in the case when Point isn't final because it cannot rule out the scenario where distance is called with an instance of Point that mixes in the Seq[A] trait.

Admittedly this is perhaps a bit "edge-casey" but it seems there is in fact a bit of method to the final case class madness. And I am definitely intrigued to see if there are more cases where the compiler can help prevent certain bugs because of the added type restriction.

By-name parameters should come with a warning

This final signpost indicates a real gotcha that I think is not immediately obvious. And because it isn't obvious and has the capacity to yield subtle bugs, I am surprised that some usages of by-name parameters do not either:

  • generate a warning when the -Xlint compiler flag is used
  • or warrant a lint on tools like WartRemover
  • or receive well-documented treatment in introductory books on the language (e.g. the common "gotcha" that I will describe below does not seem to be mentioned in Programming In Scala, which is meant as the definitive guide to the language.

In the Control Abstraction chapter of Programming In Scala, by-name parameters (not to be confused with the other Scala feature of named parameters), are introduced as a feature that allow us to mimic so-called non-strict evaluation semantics.

In a nutshell, non-strict evaluation means that the arguments passed to a function are not evaluated before we execute the function (as is the default case in Scala and most modern languages which have strict evaluation) but are evaluated when they are referenced by-name inside the body the function.

You might not have realised it, but already in the first example in this post you encountered non-strict evaluation with the Try(...) construct for creating the instances of Try[Int].

Recall that Try(...) is really shorthand for calling Try.apply(...) on the Try companion object.

If you take a look at the implementation of Try.apply() you will see it is defined as:

def apply[T](r: => T): Try[T] =  
    try Success(r) catch {
      case NonFatal(e) => Failure(e)
    }

The => T in the method parameter list is what gives the method its "non-strictness" and can be contrasted with the form (r: T) that we use to define typical "strictly" evaluated function parameters.

The reason that Try.apply(...) must be defined this way is because the expression being evaluated may throw an exception and we want a chance to catch that exception within the body of Try.apply(...) so that we can materialise it as a Failure.

If the expression passed to Try.apply(...) was evaluated strictly the exception would be thrown up the call-chain before we got a chance to wrap it up in a Failure. Consider this bad re-write of Try.apply(...) called TryBadStrict.apply(...):

scala> import scala.util.{Try, Success, Failure}  
import scala.util.{Try, Success, Failure}

scala> import scala.util.control.NonFatal  
import scala.util.control.NonFatal

scala> :paste  
// Entering paste mode (ctrl-D to finish)

object TryBadStrict {  
  def apply[T](r: T): Try[T] =
    try Success(r) catch {
      case NonFatal(e) => Failure(e)
    }
}

// Exiting paste mode, now interpreting.

defined object TryBadStrict

scala> val tries: List[Try[Int]] = TryBadStrict("1".toInt) :: TryBadStrict("2".toInt) :: TryBadStrict("3".toInt) :: TryBadStrict("four".toInt) :: Nil  
java.lang.NumberFormatException: For input string: "four"  

We are not even able to create our val tries: List[Try[Int]] because the expression "four".toInt has thrown before we got a chance to wrap it in a Failure.

Another typical usage of the by-name feature is to implement a small control abstraction to time blocks of code. The following example shows one (subtly wrong) implementation of a method for timing a block of code that returns a Future (e.g. we might use it to time an external service call):

scala> def timeFutureSubtlyWrong[A](block: => Future[A])(implicit ec: ExecutionContext): Future[A] = {  
     |   val t0 = System.currentTimeMillis()
     |   block onComplete { _ =>
     |     val t1 = System.currentTimeMillis()
     |     println(s"Elapsed time: ${t1 - t0} ms")
     |   }
     |   block
     | }
timeFutureSubtlyWrong: [A](block: => scala.concurrent.Future[A])(implicit ec: scala.concurrent.ExecutionContext)scala.concurrent.Future[A]  

When we invoke this method we get the following:

scala> import ExecutionContext.Implicits.global  
import ExecutionContext.Implicits.global

scala> timeFutureSubtlyWrong(Future { println("Calling external service"); 1 + 2 })  
Calling external service  
Calling external service  
Elapsed time: 1 ms  
res0: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise@59696551  

Note that "Calling external service" is printed twice indicating that the block passed to timeFutureSubtlyWrong was evaluated twice -- once for each time it is referred to by name in timeFutureSubtlyWrong.

Depending on what the block of code is doing this may not necessarily be the end of the world -- even though it is most certainly not intended. But it is not too difficult to imagine other scenarios where this is more of a serious bug -- e.g. mistakenly calling out twice to some payment service!!

The correct implementation (or at least the one that is most likely intended) is to cache the result of the block in a val within the function and then refer to that val when you need the result of the block. For example:

scala> def timeFuture[A](block: => Future[A])(implicit ec: ExecutionContext): Future[A] = {  
     |   val t0 = System.currentTimeMillis()
     |   val invokedBlock = block
     |   invokedBlock onComplete { _ =>
     |     val t1 = System.currentTimeMillis()
     |     println(s"Elapsed time: ${t1 - t0} ms")
     |   }
     |   invokedBlock
     | }
timeFuture: [A](block: => scala.concurrent.Future[A])(implicit ec: scala.concurrent.ExecutionContext)scala.concurrent.Future[A]

scala> timeFuture(Future { println("Calling external service"); 1 + 2 })  
Calling external service  
Elapsed time: 0 ms  
res1: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise@5e0ec41f  

And voila! -- we only have one occurrence of "Calling external service"

I will finish by saying that I hope none of this has put you off of the language. Scala is a rich, expressive language that we have had great success with as a team and that personally I have enjoyed more than any other language I have learned in the past years. It is just helpful to be aware of some of its quirks and hidden complexities.

Often these are the kinds of quirks that once you know about them you think "well of course, how else should it work!".

But sometimes these things are not immediately obvious and hopefully if you have read these signposts along your Scala journey and it saves you from a bug or helps you right safer code, then writing this post would have been worth it for me.