← Learning

Scala Guidelines

Note: This is a PR’able document, this is not sent set in stone, if you think changes are needed or have questions please feel free to PR against this

Scala Development Standards

What follows are the standards we use for writing good Scala code. These are considered requirements, but they can be bent if you provide justification in your pull request. Regardless, these standards are the rubric by which we determine “good Scala code.” Pull Requests that do not follow these standards and do not offer sufficient justification for why they do not will not be merged in. If sufficiently large edge cases arise, open an issue or pull request on this document. These standards will certainly change over time as we strive to write the best code possible.

As we implement these standards, we will see improved code and long term gains. However, the impact it will have on us in the short term need to be noted and accounted for.

First and foremost, many pull requests will take more time to merge in the immediate future. We will be accounting for coding standards we have not dealt with in the past, all of us will be learning new paradigms, and the review process will require more back-and-forth. This change is expected and is an important part of the learning process, but we need to account for it when estimating our work.

We also must be willing to be wrong. Code that would have looked fine in the past will be rejected based on these standards. We must be willing to both call out and be called out on code that does not pass muster. As this change slows us down, we need to make up some of that loss by being direct with our feedback and by not being offended by direct feedback. This is a team effort to bring up the level of our services and to become better programmers.

General Standards

1. Write referentially transparent functions.

What are referentially transparent functions? Put simply, referentially transparent functions are functions without observable side effects. For more detail, see: Functional Programming for All and What Purity Is and Isn’t.

When should I use them? Always, but this raises questions about how to deal with I/O, to name one example. In pure FP in Scala, this almost always means using the scalaz or fs2 libraries’ Task monad. Monads are a means of describing computation that must be sequenced in some way, and may interact with the outside world. The description is an algebraic structure, which can be composed with other algebraic structures into a larger one, which is then interpreted once, in your program’s main function, colloquially referred to as “the end of the world” (i.e. the last thing before the program exits).

Why should I use them? To support local reasoning. The goal of functional programming is to be able to look at an expression, see what arguments it takes, and see what result it yields, without having to worry about ambient state, interaction with the outside world, etc. An expression may interact with the outside world when it’s interpreted, but that should only be relevant in terms of how it uses its arguments or produces its result, and it must follow some simple laws to ensure the expression is well-behaved with respect to other expressions.

The point of local reasoning, in turn, is to be able to accurately assess code’s correctness before it even runs, reducing (but not eliminating) the testing burden, eliminating false “this is fine” beliefs due to inaccurate understanding of execution context, etc.

How do I do it? See: Functional Programming for All, Functional Programming in Scala, Learning scalaz, and Herding Cats (scalaz and Cats are Scala FP libraries with much overlap; we’ll likely move to Cats when its ecosystem is more mature).

This is a big topic, and will be central to our commitment to ongoing education, internally and externally.

2. Create algebras to describe business logic and not describe implementation.

What does this mean? “An algebra” is just a fancy way of saying “a description of data types and the operations on them as a data structure.” Conceptually, it’s very similar to writing a trait. In fact, you could write a trait and then refactor it to an algebra easily. An algebra is expressed as a sealed trait with a type variable. Case classes named after the operations you want to support, with the fields being their arguments, extending the base trait parameterized by the return type come after.

From nothing more than this, you can derive a “free Monad” as a one-liner and define smart constructors for the Monad, one line per operation. You can then write a program using all of the functions on Monads, Monoids, Applicatives, Functors, and Semigroups, because any Monad is all of those things. You just can’t run the program until you have an interpreter for it. Your interpreter will most likely interpret your free Monad into the Task monad, which is discussed in point 8 below.

“Functions Operating On Types + Types + Properties/Laws/Business Logic = Algebra” - Functional and Reactive Domain Modeling

When should I do this? Whenever you can identify a discrete “concern” that you can identify entities and operations on them for that might involve I/O, manipulating state, and/or concurrency. If it makes you think “I should write a DSL for this,” it’s a great candidate.

How do I do this? Simply model your data accordingly. For more information, see:

3. Do not use .isInstanceOf, .asInstanceOf, null, or throw Exception. Do not use .head or .get.

What does this mean? Depending on the runtime type of values is almost guaranteed to break type safety and limits generality (“parametricity”) and extensibility (by only checking for today’s types, not tomorrow’s). .head and .get throw exceptions when the value in question is empty. Use .headOption, .fold, .getOrElse, and other functions that work with empty or non-existent values instead.

When do I do this? Always.

4. Application Errors should be values.

What does this mean? Errors that can happen during the computation of your program should be Values. We don’t use Exceptions or Throwables as control flow, but handle regular errors through our functions. We often use scalaz’s \/ to represent failures/errors for your application. Generally, the left of the \/ will be a context-dependent type that extends RuntimeException so the disjunction can represent application or infrastructure failure with equal facility and without excessive nesting.

When do I do this? Any time your function could fail. Combined with point 3, this means “all functions must be total,” which means that, given an argument, they must always return a value. Since most interesting functions can fail, this means we need a means of representing “success or failure,” hence “disjunction,” or “\/.” The error type is on the left by convention. This observation generalizes: any type that can represent two or more alternative types will do. The bigger question is how you expect client code to deal with the result, which may suggest using scalaz’s Validation rather than \/; the difference is outside the scope of this document, and the two are easily interchangable anyway. Please consult with your lead for details.

How do I do this? When in doubt, return an Error \/ Something, where Error extends Throwable with NoStacktrace. Create a sealed trait extending RuntimeException, and extend that for your specific error types.

5. Avoid stringly-typed programs. Use types to your advantage.

What does this mean? Stringly-typed programs refer to programs that use the contents of a String to manage control of the program. When you have a string as your type, you open yourself up to accidentally passing in the wrong thing. To avoid this we should use types, such as case classes, to let the compiler catch this for us. Also see refined.

When should I do this? Always. Keep an eye on any Strings in your code and ensure they do not leave you open to invalid input outside the type system.

How do I do this? case class Password(s: String) extends AnyVal

Another option would be to use the “safe constructor pattern” with sealed abstract case class documented below.

6. Decouple Algebra from the Implementation while publishing ‘what to do’ and hiding the ‘how to do it’.

What does this mean? Our interfaces/ contracts that we publish should show what they’re going to do, but the ‘how we do it’ can be hidden behind these interfaces. The caller of the function doesn’t need to know how it works since they only care about the interface that was shown to them. This allows implementations of these interfaces to change without affecting the calling code. This allows us to modularize our code base around these interfaces and how they interact.

7. Prefer type classes for constraints to use the least powerful abstraction. Abstract early, evaluate late.

What does this mean? Take this example:

def call[F[_] : Monad, A, B](x: String): F[B \/ A] = ???

The implementation of the call can be any type that has a monad instance. This is easier to test, as you can make it go to Either, Option, or any other type with a monad instance so we don’t have to worry about some type like Future when testing. Use the least powerful abstraction, as this allows for optimizations that are otherwise impossible and allows for greater code reuse. For example, Applicative is not sequential. Monad enforces sequential computation, and thus needs the previous comuptation result to start the next function. This means, in some cases, this will be less efficient.

What are Type Classes? Type classes are a way of providing functionality to some type A, similar to a Java Interface. They allow us to add functionality to existing types without changing their definition or extending them into a new type. This is known as ad-hoc polymorphism. For more information see:

When should I do this? Any time you’re going to use a type constructor, such as Future, push it out to the farthest edge possible. Use type classes within your interfaces to allow for easier testing and changes. For more information, see: The Worst Thing in our Code

How do I do this?

8. Use Kleisli/ReaderT for “dependency injection.”

What does this mean? Very often, code—especially actual business logic implementations—will need access to something more than just the arguments that are passed into the function. Common examples include a database connection pool, an HTTP client for consuming third-party REST APIs, etc. Kleisli captures the notion of a function with the signature A => M[B] for any type-constructor M, and provides typeclass instances based on available typeclass instances for M, as well as various additional functions based on available typeclass instances for M (for example, if M has a Monad instance, Kleisli[M, A, B] which models A => M[B], provides the so-called “fish operator,” >=>, for expressing ((A => M[B]) >=> (B => M[C])): A => M[C]). So if A is the “additional required context,” a function can return a Kleisli instead of just an M[B] to reflect the fact, while retaining all the power of the typeclass instances available for M, desirable compositional properties, etc.

ReaderT in scalaz is just a type alias for Kleisli[Id, _, _]. That is, it’s handy for those occasions where you have an otherwise pure function that needs some “global” context available to it.

When should I use them? When you need some “global” context in order to actually implement your logic. Realistically, this will mean “a lot of the time.”

How do I use them? Something like this:

type ApplicationK[A] = Kleisli[Task, Config, A]

def getUser(userId: Long): ApplicationK[Option[User]] = kleisli { cfg =>
  sql"select first, last, age, sex from user where id = $id".query[User].option.transact(cfg.xa)
}

This expressed that getUser takes a userId and returns an Option[User] eventually, but need a Config to actually do so. In this example, the Config contains a Doobie Transactor[Task] that mediates access to the database. Calling this function, then, returns a Kleisli that, when run with a Config value, will return a Task[Option[User]] which, when run, will (fail or) return an Option[User].

9. Use effect capturing structures for referential transparency

What is Task? Task

A Task is a description of a computation that may also have effects: do I/O, change the value of a variable, do things concurrently, fail… in other words, it’s a Monad. Because A => Task[B] captures the notion of “take an A and eventually return a B,” Task is often considered a more principled alternative to Future. However, the use of Task need not imply concurrency. For example, Task.delay(log.debug(s"The value of foo is $foo.")) is a great (and common) way to describe the process of logging that statement (and obviously must be done in a context where foo is in scope), but it does not consume a thread in order to do so. Note that just Task(log.debug(...)) does consume a thread, for no good reason in this case.

There’s much, much more to Task. In particular, Task.async is spectacularly useful for wrapping callback-based multithreaded Java APIs and making them nicely monadic. Please see the link above for more details.

When should I use them? When you need to express an effect outside the context of an algebra/free Monad that you’re defining, and/or for concurrency. From a design perspective, what you define a free Monad for and what you just put in Task is likely the most important decision you’ll make. When in doubt, put it in Task. If you determine later that this code really expresses a domain concern and deserves its own little free Monad DSL, refactoring into such a free Monad with an interpreter that interprets into Task is reasonably straightforward.

How do I use them? See the above link. For integrating with Future-oriented code, see also Delorean.

See also our talk on using Tasks.

10. Package Structure should be 1:1 with directory structure

The package structure of projects should follow the directory structure. For example: Sources in com.banno.projectName can be in src/main/scala/com/banno/projectName, com.banno.projectName.repository should be in src/main/scala/com/banno/projectName/repository, and com.banno.projectName.repository.interpreter should be in src/main/scala/com/banno/projectName/repository/interpreter.

New Project Standards

New projects should conform to our general standards. In addition, code should consist of modules of code as described in the book Functional And Reactive Domain Modeling. Free Monads can be used for creating DSLs, separating the construction of the program from the actual running of the side-effects, and being able to switch interpretations at run-time. Free Monads are an efficient way to describe your business logic without supposing how it will be executed.

Old Project Standards

Old projects are in a tough spot when it comes to moving to a more FP oriented architecture, but we can start by following the General Guidelines up above. Move pieces slowly to conform to the guidelines and eventually these will be constructed like our newer projects.