First steps in Scala for beginning programmers, Part 7

Topics: Maps, Sets, groupBy, Options, flatten, flatMap

Preface

This is part 7 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other resources on the links page of the Computational Linguistics course I’m creating these for.

Lists (and other sequence data structures, like Ranges and Arrays) allow you to group collections of objects in an ordered manner: you can access elements of a list by indexing their position in the list, or iterate over the list elements, one by one, using for expressions and sequence functions like map, filter, reduce and fold. Another important kind of data structure is the associative array, which you’ll come to know in Scala as a Map. (Yes, this has the unfortunate ambiguity with the map function, but their use will be quite clear from context.) Maps allow you to store a collection of key-value pairs and to access the values by the keys associated with them, rather than via an index (as with a List).

Example cases where you could use a Map:

  • Associating English words with their German translations
  • Associating each word with its count in a given text
  • Associating each word with its possible parts-of-speech

You’ll see concrete examples of each of these in this post.

Creating Maps and accessing their elements

Maps are quite intuitive to grasp. Here’s an example with a few English words and their German translations. One easy way of creating a Map is by passing in a list of pairs, where the first element of each pair defines a key and the second defines a corresponding value.

scala> val engToDeu = Map(("dog","Hund"), ("cat","Katze"), ("rhinoceros","Nashorn"))
engToDeu: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(dog -> Hund, cat -> Katze, rhinoceros -> Nashorn)

Notice that the Map entries are of the form key -> value. We may then retrieve the German translation for dog by providing the key “dog” to the Map we created.

scala> engToDeu("dog")
res0: java.lang.String = Hund

Think for a moment what you would have to do to accomplish this with Lists. You’d need need two Lists, one for each language, and they’d need to be aligned so that each element in one list corresponded to its translation in the other list.

scala> val engWords = List("dog","cat","rhinoceros")
engWords: List[java.lang.String] = List(dog, cat, rhinoceros)

scala> val deuWords = List("Hund","Katze","Nashorn")
deuWords: List[java.lang.String] = List(Hund, Katze, Nashorn)

Then, to find the translation of cat, you would have to find the index of cat in engWords, and then look up that index in deuWords.

scala> engWords.indexOf("cat")
res2: Int = 1

scala> deuWords(engWords.indexOf("cat"))
res3: java.lang.String = Katze

This is actually quite inefficient, as well as having other problems. Maps are the right thing for what we want here, and they do they job of retrieving values for keys quite efficiently.

It turns out that we can take two lists that are aligned in this way and construct a Map very easily. Recall that zipping two lists together creates one list of pairs, where each pair gives the elements that shared the same index.

scala> engWords.zip(deuWords)
res4: List[(java.lang.String, java.lang.String)] = List((dog,Hund), (cat,Katze), (rhinoceros,Nashorn))

By calling the toMap method on such a List of pairs, we get a Map.

scala> engWords.zip(deuWords).toMap
res5: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(dog -> Hund, cat -> Katze, rhinoceros -> Nashorn)

Note that even though the REPL is showing the order of the key-value pairs to be the same as the original list we constructed the map from, there is no inherent order to the elements of a Map.

You can add elements to a Map to create a new Map using the + operator and an arrow -> between each key and value pair.


scala> engToDeu + "owl" -> "Eule"
res6: (java.lang.String, java.lang.String) = (Map(dog -> Hund, cat -> Katze, rhinoceros -> Nashorn)owl,Eule)

scala> engToDeu + ("owl" -> "Eule", "hippopotamus" -> "Nilpferd")
res7: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(rhinoceros -> Nashorn, dog -> Hund, owl -> Eule, hippopotamus -> Nilpferd, cat -> Katze)

You can add one Map to another using the ++ operator.


scala> val newEntries = Map(("hippopotamus", "Nilpferd"),("owl","Eule"))
newEntries: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(hippopotamus -> Nilpferd, owl -> Eule)

scala> val expandedEngToDeu = engToDeu ++ newEntries
expandedEngToDeu: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(rhinoceros -> Nashorn, dog -> Hund, owl -> Eule, hippopotamus -> Nilpferd, cat -> Katze)

You can do the same by passing in a List of tuples to the ++ operator.


scala> engToDeu ++ List(("hippopotamus", "Nilpferd"),("owl","Eule"))
res8: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(rhinoceros -> Nashorn, dog -> Hund, owl -> Eule, hippopotamus -> Nilpferd, cat -> Katze)

And you can remove a key from a Map with the – operator.


scala> engToDeu - "dog"
res9: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(cat -> Katze, rhinoceros -> Nashorn)

See the Map API for more examples of such functions. Note: throughout this post, I’m sticking to immutable Maps — if you are looking at any other tutorials and are wondering why certain methods from those aren’t working here, they may have been using mutable Maps, which we’ll discuss later.

If we ask for the value associated with a key that doesn’t exist in the Map, we get an error.

scala> engToDeu("bird")
java.util.NoSuchElementException: key not found: bird
at scala.collection.MapLike$class.default(MapLike.scala:224)
(etc.)

You can check for whether a key is in the Map using the contains method.

scala> engToDeu.contains("bird")
res10: Boolean = false

scala> engToDeu.contains("dog")
res11: Boolean = true

Let’s say you had a list of English words and wanted to look up their corresponding German words, and you want to protect yourself against the NoSuchElementException. One way to do this is to filter the words using contains, and then map the remaining ones through engToDeu.

scala> val wordsToTranslate = List("dog","bird","cat","armadillo")
wordsToTranslate: List[java.lang.String] = List(dog, bird, cat, armadillo)

scala> wordsToTranslate.filter(x=>engToDeu.contains(x)).map(x=>engToDeu(x))
res12: List[java.lang.String] = List(Hund, Katze)

This is a useful ways of safely applying a Map to a list of items. However, we’ll see a better way to deal with missing values later on, using Options.

If you there is a sensible default value for any key you might try with your map, you can use the getOrElse method. You provide the key as the first argument, and then the default value as the second.


scala> engToDeu.getOrElse("dog","???")
res1: java.lang.String = Hund

scala> engToDeu.getOrElse("armadillo","???")
res2: java.lang.String = ???

It is quite common to use getOrElse with a default of 0 for Maps that contain statistics, such as word counts (see below), where the absence of a key naturally indicates that it has, e.g., a count of zero.

If you have a consistent default value for any keys that aren’t in the Map, you can set it by using the withDefault method.


scala> val engToDeu = Map(("dog","Hund"), ("cat","Katze"), ("rhinoceros","Nashorn")).withDefault(x => "???")
engToDeu: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(dog -> Hund, cat -> Katze, rhinoceros -> Nashorn)

scala> engToDeu("armadillo")
res3: java.lang.String = ???

Now you can ask for values in the usual manner, without needing to use getOrElse and providing the default every time.

Keys and values in Maps

You may have observed that Scala tells you more than that you have just created a Map. Like List, Map is a parameterized type, which means that it is a generic way of collecting a bunch of objects of particular types together. Above we saw an instance of a Map[String, String] (leaving off the java.lang part to make it clearer). The first String indicates that the keys are strings and the second that values are Strings. Basically, any type can be used in either position (warning: you should avoid using mutable data structures as keys unless you know what you are doing). Here are some examples (try to ignore the scala.collection.immutable and java.lang parts and just focus on the Map[X,Y] signatures we get).

scala> Map((10,"ten"), (100,"one hundred"))
res0: scala.collection.immutable.Map[Int,java.lang.String] = Map(10 -> ten, 100 -> one hundred)

scala> Map(("a",1),("b",2))
res1: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1, b -> 2)

scala> Map((1,3.14), (2,6.28))
res2: scala.collection.immutable.Map[Int,Double] = Map(1 -> 3.14, 2 -> 6.28)

scala> Map((("pi",1),3.14), (("tau",2),6.28))
res3: scala.collection.immutable.Map[(java.lang.String, Int),Double] = Map((pi,1) -> 3.14, (tau,2) -> 6.28)

scala> Map(("the",List("Determiner")),("book",List("Verb","Noun")),("off",List("Preposition","Verb")))
res4: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] = Map(the -> List(Determiner), book -> List(Verb, Noun), off -> List(Preposition, Verb))

The last two examples show some very useful aspects of key and values types that allow you to use more complex keys and values. The former uses a (String, Int) pair as a key, with signature Map[(String, Int), Double], and the latter uses a List[String] as the value, with signature Map[String, List[String]]. So you can bundle together several types using tuples and you can use parameterized data structures to parameterize another data structure.

A simple translation task

Here is a mini German/English dictionary as a Map.

scala> val miniDictionary = Map(("befreit","liberated"),("baeche","brooks"),("eise","ice"),("sind","are"),("strom","river"),("und","and"),("vom","from"))
miniDictionary: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(und -> and, eise -> ice, sind -> are, befreit -> liberated, strom -> river, vom -> from, baeche -> brooks)

We can provide a (very bad) translation of the German sentence “vom eise befreit sind strom und baeche” using this dictionary: we simply split the German sentence and then map over its elements, looking up each word in the dictionary.

scala> val example = "vom eise befreit sind strom und baeche"
example: java.lang.String = vom eise befreit sind strom und baeche

scala> example.split(" ").map(deuWord => miniDictionary(deuWord)).mkString(" ")
res0: String = from ice liberated are river and brooks

Okay, not quite “from the ice they are freed, the stream and brook” but then again it’s pretty much the dumbest machine translation approach available…

A danger of course is that we will have words that aren’t in the dictionary, leading to an exception.

scala> val example2 = "vom eise befreit sind strom und schiffe"
example2: java.lang.String = vom eise befreit sind strom und schiffe

scala> example2.split(" ").map(deuWord => miniDictionary(deuWord)).mkString(" ")
java.util.NoSuchElementException: key not found: schiffe

We’ll return to this below.

Creating Maps from Lists using groupBy

We frequently have data stored in a particular data structure and would like to work with it using another data structure that organizes the data points in some other manner. Here, we’ll look at how to convert a List into Map using the groupBy method in order to do some useful processing for working with parts-of-speech. We’ll also see the Set data structure along the way.

We’ll start with a very basic example of what groupBy does. Given a list of number tokens, we can obtain a Map from the number types to all of the tokens of each number.

scala> val numbers = List(1,4,5,1,6,5,2,8,1,9,2,1)
numbers: List[Int] = List(1, 4, 5, 1, 6, 5, 2, 8, 1, 9, 2, 1)

scala> numbers.groupBy(x=>x)
res19: scala.collection.immutable.Map[Int,List[Int]] = Map(5 -> List(5, 5), 1 -> List(1, 1, 1, 1), 6 -> List(6), 9 -> List(9), 2 -> List(2, 2), 8 -> List(8), 4 -> List(4))

As you can see from the result, groupBy took the anonymous function x=>x, grouped all of the elements of the List that have the same value of x, and then created a Map from each x to the group containing its tokens. So, we get 2 mapping to a List containing 2′s, and so on. This probably seems a bit weird, but it is incredibly useful when we consider Lists that have more interesting elements in them. To do so, let’s go back to the part-of-speech tagging example from Part 4 of these tutorials. Say we have a sentence that is tagged with parts of speech, such as the following (made up) example that ensures some tag ambiguities.

in the dark , a tall man saw the saw that he needed to man to cut the dark tree .

The parts-of-speech could be annotated as follows (with lots of simplifications, and apologies to any offense caused to anyone’s linguistic sensitivities).

in/Prep the/Det dark/Noun ,/Punc a/Det tall/Adjective man/Noun saw/Verb the/Det saw/Noun that/Pronoun he/Pronoun needed/Verb to/Prep man/Verb to/Prep cut/Verb the/Det dark/Adjective tree/Noun ./Punc

See Part 4 for detailed explanation of how the following expression turns a string like this into a List of tuples.

scala> val tagged = "in/Prep the/Det dark/Noun ,/Punc a/Det tall/Adjective man/Noun saw/Verb the/Det saw/Noun that/Pronoun he/Pronoun needed/Verb to/Prep man/Verb to/Prep cut/Verb the/Det dark/Adjective tree/Noun ./Punc".split(" ").toList.map(x => x.split("/")).map(x => (x(0), x(1)))
tagged: List[(java.lang.String, java.lang.String)] = List((in,Prep), (the,Det), (dark,Noun), (,,Punc), (a,Det), (tall,Adjective), (man,Noun), (saw,Verb), (the,Det), (saw,Noun), (that,Pronoun), (he,Pronoun), (needed,Verb), (to,Prep), (man,Verb), (to,Prep), (cut,Verb), (the,Det), (dark,Adjective), (tree,Noun), (.,Punc))

Now, let’s use groupBy in various ways on this. The first thing we might be interested in is seeing which parts of speech each word is associated with.

scala> val groupedTagged = tagged.groupBy(x => x._1)
groupedTagged: scala.collection.immutable.Map[java.lang.String,List[(java.lang.String, java.lang.String)]] = Map(in -> List((in,Prep)), needed -> List((needed,Verb)), . -> List((.,Punc)), cut -> List((cut,Verb)), saw -> List((saw,Verb), (saw,Noun)), a -> List((a,Det)), man -> List((man,Noun), (man,Verb)), that -> List((that,Pronoun)), dark -> List((dark,Noun), (dark,Adjective)), to -> List((to,Prep), (to,Prep)), , -> List((,,Punc)), tall -> List((tall,Adjective)), he -> List((he,Pronoun)), tree -> List((tree,Noun)), the -> List((the,Det), (the,Det), (the,Det)))

So, now you see that the keys in the Map constructed by groupBy are the words and the values are the groups of the original elements. You can then see that the anonymous function x => x._1 provided to groupBy does two things: it specifies the part of the input elements that will group different items together and it specifies that that part of the input defines the key space.

However, we don’t quite have what we want, which is to have the set of parts of speech associated with each word. Instead we have a List of tuples, e.g.:

scala> groupedTagged("saw")
res21: List[(java.lang.String, java.lang.String)] = List((saw,Verb), (saw,Noun))

Focussing on just this for a moment, we can map this and produce a List with just the parts-of-speech, and then turn that List into a Set with the toSet method in order to get just the unique parts-of-speech.

scala> groupedTagged("saw").map(x=>x._2)
res24: List[java.lang.String] = List(Verb, Noun)

scala> groupedTagged("saw").map(x=>x._2).toSet
res25: scala.collection.immutable.Set[java.lang.String] = Set(Verb, Noun)

Converting the List to a Set didn’t do much here, but consider the, which has multiple tokens with the same part-of-speech.

scala> groupedTagged("the")
res26: List[(java.lang.String, java.lang.String)] = List((the,Det), (the,Det), (the,Det))

scala> groupedTagged("the").map(x=>x._2)
res27: List[java.lang.String] = List(Det, Det, Det)

scala> groupedTagged("the").map(x=>x._2).toSet
res28: scala.collection.immutable.Set[java.lang.String] = Set(Det)

Sets are yet another of the useful data structures you have to work with, along with Maps and Lists. They work just like you would expect Sets to: they contain a collection of unique, unordered elements, and they allow you to see whether an element is in the set, whether one set is a subset of another, iterate over their elements, etc.

Now, back to getting from the word/tag pairs to a mapping from words to possible tags for each word. The keys we got from tagged.groupBy(x => x._1)  are what we want, but we want to transform the values from Lists of word/tag tokens to Sets of tags, which we can do with the mapValues method on Maps.

scala> val wordsToTags = tagged.groupBy(x => x._1).mapValues(listOfWordTagPairs => listOfWordTagPairs.map(wordTagPair => wordTagPair._2).toSet)
wordsToTags: scala.collection.immutable.Map[java.lang.String,scala.collection.immutable.Set[java.lang.String]] = Map(in -> Set(Prep), needed -> Set(Verb), . -> Set(Punc), cut -> Set(Verb), saw -> Set(Verb, Noun), a -> Set(Det), man -> Set(Noun, Verb), that -> Set(Pronoun), dark -> Set(Noun, Adjective), to -> Set(Prep), , -> Set(Punc), tall -> Set(Adjective), he -> Set(Pronoun), tree -> Set(Noun), the -> Set(Det))

The bit inside the mapValues(…) part will have some readers scrunching up their eyes, but you just need to look at the line where we got res28 above: if you understood that, then you just need to realize we are doing exactly the same thing, but now in the context of mapping over the values rather than dealing with a single value. Now you know how to map over values that you are mapping over.

Now that it is hand, we can easily query the wordsToTags Map to see whether various words have various tags.

scala> wordsToTags("man")("Noun")
res8: Boolean = true

scala> wordsToTags("man")("Det")
res9: Boolean = false

scala> wordsToTags("man")("Verb")
res10: Boolean = true

scala> wordsToTags("saw")("Verb")
res11: Boolean = true

This is an example of how data structures within data structures (here Sets within a Map) are quite useful. (Exercise: think about what a tree is for a moment and how you might implement it using Lists.)

There are a variety of things you can do in computational linguistics with Maps from words to their parts-of-speech. A simple example is to compute the average number of tags per word type.

scala> val avgTagsPerType = wordsToTags.values.map(x=>x.size).sum/wordsToTags.size.toDouble
avgTagsPerType: Double = 1.2

If it isn’t clear to you what is going on here, tease it apart in your own REPL!

We can turn our word/tag pairs the other way to find out which words go with each part-of-speech. The only thing we need to do is groupBy on the second element of each pair, and then map the List values to their first element and get a Set from those.

scala> val tagsToWords = tagged.groupBy(x => x._2).mapValues(listOfWordTagPairs => listOfWordTagPairs.map(wordTagPair => wordTagPair._1).toSet)
tagsToWords: scala.collection.immutable.Map[java.lang.String,scala.collection.immutable.Set[java.lang.String]] = Map(Prep -> Set(in, to), Det -> Set(the, a), Noun -> Set(dark, man, saw, tree), Pronoun -> Set(that, he), Verb -> Set(saw, needed, man, cut), Punc -> Set(,, .), Adjective -> Set(tall, dark))

This basic paradigm is a powerful one for flipping between different data structures depending on what our needs are. It also demonstrates several important concepts with working with Lists, Maps and Sets. The next section shows a simple application of this idea for counting words in a text.

Counting words

A common task in computational linguistics is to calculate word statistics, and the most basic of those is to count the number of tokens of each word type in a particular text. The most common way to store and access those counts is in a Map, but how do you create such a Map from a given text? If we look at a text as a list of strings, then the groupBy paradigm we did above gives us exactly what we need — in fact it is even simpler than the word/tag manipulations done above.

The example text we’ll use is the tongue-twister about woodchucks.

scala> val woodchuck = "how much wood could a woodchuck chuck if a woodchuck could chuck wood ? as much wood as a woodchuck would , if a woodchuck could chuck wood ."
woodchuck: java.lang.String = how much wood could a woodchuck chuck if a woodchuck could chuck wood ? as much wood as a woodchuck would , if a woodchuck could chuck wood .

Given this, here’s how we can compute the number of occurrences of each word type. First we groupBy on the elements. Though a list of strings isn’t as interesting as having a list of Tuples as we had with words and tags, it still produces a useful result: we now have a unique set of keys corresponding to the types of elements found in the Array, and there is a corresponding value to each one that is the Array of tokens of that type.

scala> woodchuck.split(" ").groupBy(x=>x)
res29: scala.collection.immutable.Map[java.lang.String,Array[java.lang.String]] = Map(woodchuck -> Array(woodchuck, woodchuck, woodchuck, woodchuck), chuck -> Array(chuck, chuck, chuck), . -> Array(.), would -> Array(would), if -> Array(if, if), a -> Array(a, a, a, a), as -> Array(as, as), , -> Array(,), how -> Array(how), much -> Array(much, much), wood -> Array(wood, wood, wood, wood), ? -> Array(?), could -> Array(could, could, could))

And, we want to do something much simpler than what we did with the part-of-speech example: we just need to count the length of each list, since they each contain every token of the corresponding word type. The function passed to mapValues is thus quite a bit simpler than the ones given in the previous section.

scala> val counts = woodchuck.split(" ").groupBy(x=>x).mapValues(x=>x.length)
counts: scala.collection.immutable.Map[java.lang.String,Int] = Map(woodchuck -> 4, chuck -> 3, . -> 1, would -> 1, if -> 2, a -> 4, as -> 2, , -> 1, how -> 1, much -> 2, wood -> 4, ? -> 1, could -> 3)

With counts, we can now access the frequencies of any of the words that were in the text.

scala> counts("woodchuck")
res5: Int = 4

scala> counts("could")
res6: Int = 3

Easy!  Of course, we normally want to build word counts for texts that are longer and are stored in a file rather than explicitly added to Scala code. The next tutorial will demonstrate how to do that.

Iterating over the keys and values in a Map

The material above shows some useful aspects of Maps, but of course there is much more you can do with them, often requiring iterating through the key-value pairs in the Map. We’ll use the counts Map created above for demonstrating this.

You can access just the keys, or just the values.

scala> counts.keys
res0: Iterable[java.lang.String] = Set(woodchuck, chuck, ., would, if, a, as, ,, how, much, wood, ?, could)

scala> counts.values
res1: Iterable[Int] = MapLike(4, 3, 1, 1, 2, 4, 2, 1, 1, 2, 4, 1, 3)

Notice that these are both Iterable data structures, so we can do all of the usual mapping, filtering, and so on, that we have already done with lists. (You may convert them to Lists if you like using toList, of course.)

You can print out all of the key -> value pairs in the Map in a number of ways. One is to use a for expression.

scala> for ((k,v) <- counts) println(k + " -> " + v)
woodchuck -> 4
chuck -> 3
. -> 1
would -> 1
if -> 2
a -> 4
as -> 2
, -> 1
how -> 1
much -> 2
wood -> 4
? -> 1
could -> 3

And here are other ways to achieve the same result (output omitted since it is the same).

for (k <- counts.keys) println(k + " -> " + counts(k))
counts.map(kvPair => kvPair._1 + " -> " + kvPair._2).foreach(println)
counts.keys.map(k => k + " -> " + counts(k)).foreach(println)
counts.foreach { case(k,v) => println(k + " -> " + v) }
counts.foreach(kvPair => println(kvPair._1 + " -> " + kvPair._2))

And so on. Basically, you are able to step through the Map one key-value pair at a time, or you can grab the set of keys and then step through those and access the values from the map. Which form you use depends on what you need — for example, the foreach construct doesn’t return a value, but the for expressions and the map expressions do return values. Why would you do that? Well, as an example, consider grouping all words that have occurred the same number of times.

scala> val countsToWords = counts.keys.toList.map(k => (counts(k),k)).groupBy(x=>x._1).mapValues(x=>x.map(y=>y._2))
countsToWords: scala.collection.immutable.Map[Int,List[java.lang.String]] = Map(3 -> List(chuck, could), 4 -> List(woodchuck, a, wood), 1 -> List(., would, ,, how, ?), 2 -> List(if, as, much))

We go from a Map to a Set of its keys to a List of those keys to a List of Tuples of the values and the keys to a Map from the values of the original Map to such Tuples, and then we map the values of the new map to just contain the words (the original keys). (That’s a mouthful, so try each step in the REPL to see what is going on in detail.)

Now we can output countsToWords sorted in descending numerical order by count, and then by alphabetical order by word within each count.

scala> countsToWords.keys.toList.sorted.reverse.foreach(x => println(x + ": " + countsToWords(x).sorted.mkString(",")))
4: a,wood,woodchuck
3: chuck,could
2: as,if,much
1: ,,.,?,how,would

Options and flatMapping for dealing with missing keys

I pointed out toward the start of this tutorial that we run into trouble if we ask for a key that doesn’t exist in a Map. Let’s go back to the engToDeu Map we began with.

scala> val engToDeu = Map(("dog","Hund"), ("cat","Katze"), ("rhinoceros","Nashorn"))
engToDeu: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(dog -> Hund, cat -> Katze, rhinoceros -> Nashorn)

scala> engToDeu("dog")
res0: java.lang.String = Hund

scala> engToDeu("bird")
java.util.NoSuchElementException: key not found: bird

There is another way of accessing the elements of a Map, using the get method.

scala> engToDeu.get("dog")
res2: Option[java.lang.String] = Some(Hund)

scala> engToDeu.get("bird")
res3: Option[java.lang.String] = None

Now, the return value is an Option[String]. An Option is either a Some that contains a value or a None, which means there is no value. If you want to get the value out of a Some, you use the get method on Options.

scala> val dogTrans = engToDeu.get("dog")
dogTrans: Option[java.lang.String] = Some(Hund)

scala> dogTrans.get
res4: java.lang.String = Hund

If you just use get on a Map to obtain an Option and then immediately call get on the Option, we get the same behavior we had before.

scala> engToDeu.get("dog").get
res6: java.lang.String = Hund

scala> engToDeu.get("bird").get
java.util.NoSuchElementException: None.get

So, at this point, you are probably thinking that this sounds like a waste of time that is just making things more complex. Wait! It actually is tremendously useful because of pattern matching and the way many methods on sequences work.

First, here is how you can write a protected form of translating the words in a list without getting an exception.

scala> wordsToTranslate.foreach { x => engToDeu.get(x) match {
|   case Some(y) => println(x + " -> " + y)
|   case None =>
| }}
dog -> Hund
cat -> Katze

I know… this probably still isn’t convincing — it still looks more involved than the conditional we used (far) above to check whether engToDeu contained a given key (at least for this particular example). Hold on… because now we are just about ready for things to get simpler, and learn some useful things about Lists in doing so.

First, you should know about a great method on Lists called flatten. If you have a List of Lists of Strings, you can use flatten to get a single List of Strings. Consider the following example, in which we flatten a List of Lists of Strings and make a single String out of the result with mkString. Notice that the empty List in the third spot of the main List just disappears when we flatten it.

scala> val sentences = List(List("Here","is","sentence","one","."),List("The","third","sentence","is","empty","!"),List(),List("Lastly",",","we","have","a","final","sentence","."))
sentences: List[List[java.lang.String]] = List(List(Here, is, sentence, one, .), List(The, third, sentence, is, empty, !), List(), List(Lastly, ,, we, have, a, final, sentence, .))

scala> sentences.flatten
res0: List[java.lang.String] = List(Here, is, sentence, one, ., The, third, sentence, is, empty, !, Lastly, ,, we, have, a, final, sentence, .)

scala> sentences.flatten.mkString(" ")
res1: String = Here is sentence one . The third sentence is empty ! Lastly , we have a final sentence .

Flattening in general is pretty useful in its own right. Where it comes to play with Option values is that Options can be thought of a Lists: Somes are like one element Lists and Nones are like empty Lists. So, when you have a List of Options, the flatten method gives you the value in a Some and any Nones just drop away.

scala> wordsToTranslate.map(x => engToDeu.get(x))
res12: List[Option[java.lang.String]] = List(Some(Hund), None, Some(Katze), None)

scala> wordsToTranslate.map(x => engToDeu.get(x)).flatten
res13: List[java.lang.String] = List(Hund, Katze)

This is such a generally useful paradigm that there is a function flatMap which does exactly this.

scala> wordsToTranslate.flatMap(x => engToDeu.get(x))
res14: List[java.lang.String] = List(Hund, Katze)

So, returning to the translation example above, we can now safely skip on by “schiffe” without fuss.

scala> example2.split(" ").flatMap(deuWord => miniDictionary.get(deuWord)).mkString(" ")
res15: String = from ice liberated are river and

Whether this is the desired behavior in this particular case is another question (e.g. you really should be doing some special unknown word handling). Nonetheless, you’ll find that flatMap is quite handy in general for this sort of pattern, in which a list of elements is used to retrieve values from a Map that will be missing some of those values.

An example of the further use of Options and flatMap is that you also may create functions that return Options and are thus amenable to flatMapping. Consider a function that squares only odd numbers and throws evens away (note: the % operator is the modulo operator that finds the remainder of division of one number by another — try it in the REPL).


scala> def squareOddNumber (x: Int) = if (x % 2 != 0) Some(x*x) else None
squareOddNumber: (x: Int)Option[Int]

If you map over the numbers 1 to 10, you’ll see the Somes and Nones, and if you flatMap it, you get exactly the desired result of the squares of all the odd numbers without any pollution from the evens.

scala> (1 to 10).toList.map(x=>squareOddNumber(x))
res16: List[Option[Int]] = List(Some(1), None, Some(9), None, Some(25), None, Some(49), None, Some(81), None)

scala> (1 to 10).toList.flatMap(x=>squareOddNumber(x))
res17: List[Int] = List(1, 9, 25, 49, 81)

This turns out to be amazingly useful and common, so much so that the expression “just flatMap that shit” has become a common refrain among Scala programmers. Scala programmers even write scripts to remind them to do it. :)

Copyright 2011 Jason Baldridge

The text of this tutorial is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. Attribution may be provided by linking to http://www.jasonbaldridge.com and to this original tutorial.

Suggestions, improvements, extensions and bug fixes welcome — please email Jason at jasonbaldridge@gmail.com or provide a comment to this post.

About these ads
3 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,661 other followers

%d bloggers like this: