First steps in Scala for beginning programmers, Part 3

Topics: conditional execution with if-else blocks and matching

Preface

This is part 3 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other resources on the links page of the Computational Linguistics course I’m creating these for.

Conditionals

Variables come and variables go, and they take on different values depending on the input. We typically need to enact different behaviors conditioned on those values. For example, let’s simulate a bar tender in Austin who must make sure that he doesn’t give alcohol to individuals under 21 years of age.

scala> def serveBeer (customerAge: Int) = if (customerAge >= 21) println("beer") else println("water")
serveBeer: (customerAge: Int)Unit

scala> serveBeer(23)
beer

scala> serveBeer(19)
water

What we’ve done here is a standard use of conditionals to produce one action or another — in this case just printing one message or another. The expression in the if (…) is a Boolean value, either true or false. You can see this by just doing the inequality directly:

scala> 19 >= 21
res7: Boolean = false

And these expressions can be combined according to the standard rules for conjunction and disjunction of Booleans. Conjunction is indicated with && and disjunction with ||.

scala> 19 >= 21 || 5 > 2
res8: Boolean = true

scala> 19 >= 21 && 5 > 2
res9: Boolean = false

To check equality, use ==.

scala> 42 == 42
res10: Boolean = true

scala> "the" == "the"
res11: Boolean = true

scala> 3.14 == 6.28
res12: Boolean = false

scala> 2*3.14 == 6.28
res13: Boolean = true

scala> "there" == "the" + "re"
res14: Boolean = true

The equality operator == is different from the assignment operator =, and you’ll get an error if you attempt to use = for equality tests.

scala> 5 = 5
<console>:1: error: ';' expected but '=' found.
5 = 5
^

scala> x = 5
<console>:10: error: not found: value x
val synthvar$0 = x
^
<console>:7: error: not found: value x
x = 5
^

The first example is completely bad because we cannot hope to assign a value to a constant like 5. With the latter example, the error complains about not finding a value x. That’s because it is a valid construct, assuming that a var variable x has been previously defined.

scala> var x = 0
x: Int = 0

scala> x = 5
x: Int = 5

Recall that with var variables, it is possible to assign them a new value. However, it is actually not necessary to use vars much of the time, and there are many advantages with sticking with vals. I’ll be helping you think in these terms as we go along. For now, try to ignore the fact that vars exist in the language!

Back to conditionals. First, here are more comparison operators:

x == y   (x is equal to y)
x != y    (x does not equal y)
x > y     (x is larger than y)
x < y     (x is less than y)
x >= y   (x is equal to y, or larger than y)
x <= y   (x is equal to y, or less than y)

These operators work on any type that has a natural ordering, including Strings.

scala> "armadillo" < "bear"
res25: Boolean = true

scala> "armadillo" < "Bear"
res26: Boolean = false

scala> "Armadillo" < "Bear"
res27: Boolean = true

Clearly, this isn’t the usual alphabetic ordering you are used to. Instead it is based on ASCII character encodings.

A very beautiful and useful thing about conditionals in Scala is that they return a value. So, the following is a valid way to set the values of the variables x and y.

scala> val x = if (true) 1 else 0
x: Int = 1

scala> val y = if (false) 1 else 0
y: Int = 0

Not so impressive here, but let’s return to the bartender, and rather than the serveBeer function printing a String, we can have it return a String representing a beverage, “beer” in the case of a 21+ year old and “water” otherwise.

scala> def serveBeer (customerAge: Int) = if (customerAge >= 21) "beer" else "water"
serveBeer: (customerAge: Int)java.lang.String

scala> serveBeer(42)
res21: java.lang.String = beer

scala> serveBeer(20)
res22: java.lang.String = water

Notice how the first serveBeer function returned Unit but this one returns a String. Unit means that no value is returned — in general this is to be discouraged for reasons we won’t get into here. Regardless of that, the general pattern of conditional assignment shown above is something you’ll be using a lot.

Conditionals can also have more than just the single if and else.  For example, let’s say that the bartender simply serves age appropriate drinks to each customer, and that 21+ get beer, teenagers get soda and little kids should get juice.

scala> def serveDrink (customerAge: Int) = {
|     if (customerAge >= 21) "beer"
|     else if (customerAge >= 13) "soda"
|     else "juice"
| }
serveDrink: (customerAge: Int)java.lang.String

scala> serveDrink(42)
res35: java.lang.String = beer

scala> serveDrink(16)
res36: java.lang.String = soda

scala> serveDrink(6)
res37: java.lang.String = juice

And of course, the Boolean expressions in any of the ifs or else ifs can be complex conjunctions and disjunctions of smaller expressions. Let’s consider a computational linguistics oriented example now that can take advantage of that, and which we will continue to build on in later tutorials.

Everybody (hopefully) knows what a part-of-speech is. (If not, go check out Grammar Rock on YouTube.) In computational linguistics, we tend to use very detailed tagsets that go far beyond “noun”, “verb”, “adjective” and so on. For example, the tagset from the Penn Treebank uses NN for singular nouns (table), NNS for plural nouns (tables), NNP for singular proper noun (John), and NNPS for plural proper noun (Vikings).

Here’s an annotated sentence with postags from the first sentence of the Wall Street Journal portion of the Penn Treebank, in the format word/postag.

The/DT index/NN of/IN the/DT 100/CD largest/JJS Nasdaq/NNP financial/JJ stocks/NNS rose/VBD modestly/RB as/IN well/RB ./.

We’ll see how to process these en masse shortly, but for now, let’s build a function that turns single tags like “NNP” into “NN” and “JJS” into “JJ”, using conditionals. We’ll let all the other postags stay as they are.

We’ll start with a suboptimal solution, and then refine it. The first thing you might try is to create a case for every full form tag and output its corresponding shortened tag.

scala> def shortenPos (tag: String) = {
|     if (tag == "NN") "NN"
|     else if (tag == "NNS") "NN"
|     else if (tag == "NNP") "NN"
|     else if (tag == "NNPS") "NN"
|     else if (tag == "JJ") "JJ"
|     else if (tag == "JJR") "JJ"
|     else if (tag == "JJS") "JJ"
|     else tag
| }
shortenPos: (tag: String)java.lang.String

scala> shortenPos("NNP")
res47: java.lang.String = NN

scala> shortenPos("JJS")
res48: java.lang.String = JJ

So, it’s doing the job, but there is a lot of redundancy — in particular, the return value is the same for many cases. We can use disjunctions to deal with this.

def shortenPos2 (tag: String) = {
  if (tag == "NN" || tag == "NNS" || tag == "NNP" || tag == "NNP") "NN"
  else if (tag == "JJ" || tag == "JJR" || tag == "JJS") "JJ"
  else tag
}

These are logically equivalent.

There is an easier way of doing this, using properties of Strings. Here, the startsWith method is very useful.

scala> "NNP".startsWith("NN")
res51: Boolean = true

scala> "NNP".startsWith("VB")
res52: Boolean = false

We can use this to simplify the postag shortening function.

def shortenPos3 (tag: String) = {
  if (tag.startsWith("NN")) "NN"
  else if (tag.startsWith("JJ")) "JJ"
  else tag
}

This makes it very easy to add an additional condition that collapses all of the verb tags to “VB”. (Left as an exercise.)

A final note of conditional assignments: they can return anything you like, so, for example, the following are all valid. For example, here is a (very) simple (and very imperfect) English stemmer that returns the stem and and suffix.

scala> def splitWord (word: String) = {
|     if (word.endsWith("ing")) (word.slice(0,word.length-3), "ing")
|     else if (word.endsWith("ed")) (word.slice(0,word.length-2), "ed")
|     else if (word.endsWith("er")) (word.slice(0,word.length-2), "er")
|     else if (word.endsWith("s")) (word.slice(0,word.length-1), "s")
|     else (word,"")
| }
splitWord: (word: String)(String, java.lang.String)

scala> splitWord("walked")
res10: (String, java.lang.String) = (walk,ed)

scala> splitWord("walking")
res11: (String, java.lang.String) = (walk,ing)

scala> splitWord("booking")
res12: (String, java.lang.String) = (book,ing)

scala> splitWord("baking")
res13: (String, java.lang.String) = (bak,ing)

If we wanted to work with the stem and suffix directly with variables, we can assign them straight away.

scala> val (stem, suffix) = splitWord("walked")
stem: String = walk
suffix: java.lang.String = ed

Matching

Scala provides another very powerful way to encode conditional execution called matching. They have much in common with if-else blocks, but come with some nice extra features. We’ll go back to the postag shortener, starting with a full list out of the tags and what to do in each case, like our first attempt with if-else.

def shortenPosMatch (tag: String) = tag match {
  case "NN" => "NN"
  case "NNS" => "NN"
  case "NNP" => "NN"
  case "NNPS" => "NN"
  case "JJ" => "JJ"
  case "JJR" => "JJ"
  case "JJS" => "JJ"
  case _ => tag
}

scala> shortenPosMatch("JJR")
res14: java.lang.String = JJ

Note that the last case, with the underscore “_” is the default action to take, similar to the “else” at the end of an if-else block.

Compare this to the if-else function shortenPos from before, which had lots of repetition in its definition of the form “else if (tag == “. Match statements allow you to do the same thing, but much more concisely and arguably, much more clearly. Of course, we can shorten this up.

def shortenPosMatch2 (tag: String) = tag match {
  case "NN" | "NNS" | "NNP" | "NNPS" => "NN"
  case "JJ" | "JJR" | "JJS" => "JJ"
  case _ => tag
}

Which is quite a bit more readable than the if-else shortenPosMatch2 defined earlier.

In addition to readability, match statements provide some logical protection. For example, if you accidentally have two cases that overlap, you’ll get an error.


scala> def shortenPosMatchOops (tag: String) = tag match {
|   case "NN" | "NNS" | "NNP" | "NNPS" => "NN"
|   case "JJ" | "JJR" | "JJS" => "JJ"
|   case "NN" => "oops"
|   case _ => tag
| }
<console>:10: error: unreachable code
case "NN" => "oops"

This is an obvious example, but with more complex match options, it can save you from bugs!

We cannot use the startsWith method the same way we did with the if-else shortenPosMatch3. However, we can use regular expressions very nicely with match statements, which we’ll get to in a later tutorial.

Where match statements really shine is that they can match on much more than just the value of simple variables like Strings and Ints.  One use of matches is to check the types of the input to a function that can take a supertype of many types. Recall that Any is the supertype of all types; if we have the following function that takes an argument with any type, we can use matching to inspect what the type of the argument is and do different behaviors accordingly.

scala> def multitypeMatch (x: Any) = x match {
|    case i: Int => "an Int: " + i*i
|    case d: Double => "a Double: " + d/2
|    case b: Boolean => "a Boolean: " + !b
|    case s: String => "a String: " + s.length
|    case (p1: String, p2: Int) => "a Tuple[String, Int]: " + p2*p2 + p1.length
|    case (p1: Any, p2: Any) => "a Tuple[Any, Any]: (" + p1 + "," + p2 + ")"
|    case _ => "some other type " + x
| }
multitypeMatch: (x: Any)java.lang.String

scala> multitypeMatch(true)
res4: java.lang.String = a Boolean: false

scala> multitypeMatch(3)
res5: java.lang.String = an Int: 9

scala> multitypeMatch((1,3))
res6: java.lang.String = a Tuple[Any, Any]: (1,3)

scala> multitypeMatch(("hi",3))
res7: java.lang.String = a Tuple[String, Int]: 92

So, for example, if it is an Int, we can do things like multiplication, if it is a Boolean we can negate it (with !), and so on. In the case statement, we provide a new variable that will have the type that is matched, and then after the arrow =>, we can use that variable in a type safe manner. Later we’ll see how to create classes (and in particular case classes), where this sort of matching based function is used regularly.

In the meantime, here’s an example of a simple addition function that allows one to enter a String or Int to specify its arguments. For example, the behavior we desire is this:

scala> add(1,3)
res4: Int = 4

scala> add("one",3)
res5: Int = 4

scala> add(1,"three")
res6: Int = 4

scala> add("one","three")
res7: Int = 4

Let’s assume that we only handle the spelled out versions of 1 through 5, and that any string we cannot handle (e.g. “six” and aardvark”) is considered to be 0. Then the following two functions using matches handle it.

def convertToInt (x: String) = x match {
  case "one" => 1
  case "two" => 2
  case "three" => 3
  case "four" => 4
  case "five" => 5
  case _ => 0
}

def add (x: Any, y: Any) = (x,y) match {
  case (x: Int, y: Int) => x + y
  case (x: String, y: Int) => convertToInt(x) + y
  case (x: Int, y: String) => x + convertToInt(y)
  case (x: String, y: String) => convertToInt(x) + convertToInt(y)
  case _ => 0
}

Like if-else blocks, matches can return whatever type you like, including Tuples, Lists and more.

Match blocks are used in many other useful contexts that we’ll come to later. In the meantime, it is also worth pointing out that matching is actually used in variable assignment. We’ve seen it already with Tuples, but it can be done with Lists and other types.

scala> val (x,y) = (1,2)
x: Int = 1
y: Int = 2

scala> val colors = List("blue","red","yellow")
colors: List[java.lang.String] = List(blue, red, yellow)

scala> val List(color1, color2, color3) = colors
color1: java.lang.String = blue
color2: java.lang.String = red
color3: java.lang.String = yellow

This is especially useful in the case of the args Array that comes from the command line when creating a script with Scala. For example, consider a program that is run as following.

$ scala nextYear.scala John 35
Next year John will be 36 years old.

Here’s how we can do it. (Save the next two lines as nextYear.scala and try it out.)

val Array(name, age) = args
println("Next year " + name + " will be " + (age.toInt + 1) + " years old.")

Notice that we had to do age.toInt. That is because age itself is a String, not an Int.

Conditional execution with if-else blocks and match blocks is a powerful part of building complex behaviors into your programs that you’ll see and use frequently!

Copyright 2011 Jason Baldridge

The text of this tutorial is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. Attribution may be provided by linking to http://www.jasonbaldridge.com and to this original tutorial.

Suggestions, improvements, extensions and bug fixes welcome — please email Jason at jasonbaldridge@gmail.com or provide a comment to this post.

About these ads
6 comments
  1. Justin said:

    In the definition of shortenPos, there are two identical conditions:
    if (tag == “NNS”) “NN”
    else if (tag == “NNS”) “NN”
    I suspect the first condition should be:
    if (tag == “NN”) “NN”
    though in this (and later functions), this condition seems unnecessary, given:
    else tag

    In the definition of shortenPos2, there are two identical disjuncts in the antecedent of the conditional:
    tag == “NNP” || tag == “NNP”
    I believe the latter should be:
    tag == “NNPS”

    Or maybe I’ve misunderstood something?

    • Thanks for noticing that — it’s fixed.

      And yeah, the else tag thing makes it unnecessary, but this is just a wee example to show if-else blocks. :)

  2. I wouldn’t agree with the assertion:

    “We cannot use the startsWith method the same way we did with the if-else shortenPosMatch3. ”

    def shortenPosMatch3 (tag: String) = tag match {
    case x if x.startsWith(“NN”) => “NN”
    case x if x.startsWith(“JJ”) => “JJ”
    case _ => tag
    }

    works fine. Having said that I don’t know how to get rid of the ‘x if x.’ which does seem rather redundant in this case.

    There are some situations where conditionals in match statements work well, so I think it’s worth clarifying.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,661 other followers

%d bloggers like this: