First steps in Scala for beginning programmers, Part 11

Topics: SBT, scalabha, packages, build systems

Preface

This is part 11 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other resources on the links page of the Computational Linguistics course I’m creating these for.

This tutorial gives an introduction to building Scala applications using SBT (the Simple Build Tool). This will be done in the context of the Scalabha package, which I have created for primarily for my Introduction to Computational Linguistics class. Some supporting code is available in Scalabha for some basic natural language processing tasks; most relevant at the moment is the code that is in Scalabha that supports the part-of-speech tagging homework for the class.

The previous tutorial showed how Scala code can be compiled with scalac and then run with scala. One problem we ended up with is that there were generated class files littering the working directory. Another thing we did not discuss is how a large system can be created in a modular way that organizes code and classes. For example, you might want to have code in different directories generate classes that can be used by one another. You also may want want to incorporate classes from other libraries into your own code. The solutions we’ll discuss to address these needs and more are build systems and packages.

Note: The tutorial assumes you are using some version of Unix. If you are on Windows, you should consider using Cygwin, or you could dual boot your computer.

Note: In this tutorial, I’ll assume you are using as simple text editor to modify files. However, note that the general setup you are working with here can be used from more powerful Integrated Developer Environements (IDEs) like Eclipse, IntelliJ, and NetBeans.

Setting up Scalabha

We’ll work with SBT, which is perhaps the most popular build tool for Scala.  The Scalabha toolkit mentioned earlier uses SBT (version 0.11.0), so we’ll discuss SBT in the Scalabha context.

The first thing you need to do is download Scalabha v0.1.1 Next unzip the file, change to the directory it unpacked to, and list the directory contents.

$ unzip scalabha-0.1.1-src.zip
Archive:  scalabha-0.1.1-src.zip
<lots of output>
$ cd scalabha-0.1.1
$ ls
CHANGES.txt README      build.sbt   project
LICENSE     bin         data        src

Briefly, these contents are:

  • README: A text file describing how to install Scalabha on your machine.
  • LICENSE: A text file giving the license, which is the Apache Software License 2.0.
  • CHANGES.txt: A text file describing the modifications made for each version (not much so far).
  • build.sbt: A text file that contains instructions for SBT regarding how to build Scalabha
  • bin: A directory that contains the scalabha script, which will be used to run applications developed within the Scalabha build system and also to run SBT itself. It also contains sbt-launch-0.11.0.jar, which is a bottled up package of SBT’s classes that will allow us to use SBT very easily. There are some other files that are Perl scripts that are relevant for a research project and aren’t important here.
  • data: A directory containing part-of-speech tagged data for English and Czech that forms the basis for the fourth homework of my Introduction to Computational Linguistics course this semester.
  • project: A directory containing a single file “plugins.sbt” which tells SBT to use the Assembly plugin. More on this later.
  • src: The most important directory of all — it contains the source code of the Scalabha system, and is where you’ll be adding some code as you work with SBT.

At this point you should read the README and get Scalabha set up on your computer, including building the system from source. In this tutorial, I will give some extra details on using SBT and code development with it, complementing and extending the brief information given in the README.

Note that I will refer the environment variable SCALABHA_DIR below. As specified in the README, you should set this variable’s value to be where you unpacked Scalabha. For example, for me this directory is ~/devel/scalabha.

Tip: to make it so that you don’t have to set your environment variables every time you open a new shell, you can set environment variables in your ~/.profile (Mac, Cygwin) or ~/.bash_aliases (Ubuntu) files. For example, this is in my profile files on my machines.

export SCALABHA_DIR=$HOME/devel/scalabha
export PATH=$PATH:$SCALABHA_DIR/bin

SBT: The Simple Build Tool

This is not a tutorial about setting up a project to use SBT — it is simply about how to use a project that is already set up for SBT. So, if you are looking for resources about learning SBT, what you’ll mainly find are resources to help programmers configure SBT for their project. These will likely confuse you (the Simple Build Tool is not so simple any more, when it comes to configuration). Using it is straightforward, but the kind of know-how that experienced coders have with using something like SBT is what you probably won’t find much help on. Here, I intend to give the basics so that you have a better starting point for doing more with SBT.

First off, there is a bit of slight of hand with Scalabha that could be confusing. Rather than having users install SBT themselves, I have put the jar file for SBT in the bin directory of Scalabha; then, the scalabha executable (in that same directory) can pick that up and use it to run SBT. (My students and I have set up a number of Scala/Java projects in this way, including Fogbow, Junto, Textgrounder, and Updown.) The scalabha executable has a number of execution targets (more on this later), and one of these is “build“. When you call scalabha’s build target, it invokes SBT and drops you into the SBT interface.

Do the following, in your SCALABHA_DIR.

$ scalabha build
[info] Loading project definition from /Users/jbaldrid/devel/scalabha/project
[info] Set current project to Scalabha (in build file:/Users/jbaldrid/devel/scalabha/)
>

You could have achieved the same by downloading SBT and running it according to the instructions for SBT, but this setup saves you that trouble and ensures that you get the right version of SBT. It is just worth pointing out so that you don’t think that Scalabha is SBT –  SBT is entirely independent of Scalabha.

If you have had any trouble with the Scalabha setup, you can create an issue on the Scalabha Bitbucket site. That just means that I’ll get a notice that you had some problems and can hopefully help you out. And, it is possible that someone else will have had the same problem, in which case you might find your answer there. Most of the problems with this sort of setup are due to confusions about environment variables and unfamiliarity with command line tools.

Compiling with SBT

Let’s actually do something with SBT now. If you successfully got through the README, you will have already done what is next, but I’ll give some more details about what is going on.

Because you may have run some SBT actions already as part of doing the README, start out by running the “clean” action so that we’re on the same page.

> clean
[success] Total time: 0 s, completed Oct 26, 2011 10:18:08 AM

Then, run the “compile” action.

> compile
[info] Updating {file:/Users/jbaldrid/devel/scalabha/}default-86efd0...
[info] Done updating.
[info] Compiling 13 Scala sources to /Users/jbaldrid/devel/scalabha/target/classes...
[success] Total time: 9 s, completed Oct 26, 2011 10:18:19 AM

In another shell (which means another command line window), go to SCALABHA_DIR and list the contents of the directory. You’ll see that two new directories have been created, lib_managed and target. The first is where other libraries have been download from the internet and placed into the Scalabha project space so that they can be easily used — don’t worry about this for the time being. The second is where the compiled class files have gone. To see some example class files, do the following.

$ ls target/classes/opennlp/scalabha/postag/
BaselineTagger$$anonfun$tag$1.class
BaselineTagger.class
EnglishTagInfo$$anonfun$zipWithTag$1$1.class
<... many more class files ...>
RuleBasedTagger$$anonfun$tag$2.class
RuleBasedTagger$$anonfun$tagWord$1.class
RuleBasedTagger.class

These were generated from the following source files.

$ ls src/main/scala/opennlp/scalabha/postag/
HmmTagger.scala PosTagger.scala

Open up PosTagger.scala in a text editor and look at it — you’ll see the class and object definitions that were the sources for the generated class files in the target/classes directory. Basically, SBT has conveniently handled the separation of source and compile class files so that we don’t have the class files littering our work space.

How does SBT know where the class files are? Simple: it is configured to look at src/main/scala and compile every .scala file it finds under that directory. In just a bit, you’ll start adding your own scala files and be able to compile and run them as part of the Scalabha build system.

Next, at the SBT prompt, invoke the “package” action.

> package
[info] Updating {file:/Users/jbaldrid/devel/scalabha/}default-86efd0...
[info] Done updating.
[info] Packaging /Users/jbaldrid/devel/scalabha/target/scalabha-0.1.1.jar ...
[info] Done packaging.
[success] Total time: 0 s, completed Oct 26, 2011 10:19:02 AM

In the shell prompt that we used to list files previously, list the contents of the target directory.

$ ls target/
cache              classes            scalabha-0.1.1.jar streams

You have just created scalabha-0.1.1.jar, a bottled up version of the Scalabha code that others could use in their own libraries. The extension “jar” stands for Java Archive, and it is basically just a zipped up collection of a bunch of class files.

Scalabha itself uses another of supporting libraries produced by others. To see the jars that are used as supporting libraries by Scalabha, do the following.

$ ls lib_managed/jars/*/*/*.jar
lib_managed/jars/jline/jline/jline-0.9.94.jar
lib_managed/jars/junit/junit/junit-3.8.1.jar
lib_managed/jars/org.apache.commons/commons-lang3/commons-lang3-3.0.1.jar
lib_managed/jars/org.clapper/argot_2.9.1/argot_2.9.1-0.3.5.jar
lib_managed/jars/org.clapper/grizzled-scala_2.9.1/grizzled-scala_2.9.1-1.0.8.jar
lib_managed/jars/org.scalatest/scalatest_2.9.0/scalatest_2.9.0-1.6.1.jar

Of course, you may still be wondering what it means to “use a library” in your code. More on this after we talk about packages and actually start doing some code ourselves.

Packages

Projects with a lot of code are generally organized into a package that has a set of sub-packages for parts of the code base that work closely together. At the very high level, a package is simply a way to ensure that we have unique fully qualified names for classes. For example, there is a class called Range in the Apache Commons Lang library and in the core Scala library. If you want to use both of these classes in the same piece of code, there is an obvious problem of a name conflict. Fortunately, they are contained within packages that allow us to refer to them uniquely.

  • Range in the Apache Commons Lang library is org.apache.commons.lang3.Range
  • Range in Scala is scala.collection.immutable.Range

So, when we do need to use them together, we are still able to do so without conflict. You’ve actually already seen some package names before, for example with java.lang.String and the distinction between scala.collection.mutable.Map and scala.collection.immutable.Map.

To see the packages and classes in Scalabha, run the “doc” action in SBT.

> doc
[info] Generating API documentation for main sources...
model contains 35 documentable templates
[info] API documentation generation successful.
[success] Total time: 7 s, completed Oct 26, 2011 10:22:23 AM

Now, point your browser to the file target/api/index.html. Note: this means doing “open file” and then going to your SCALABHA_DIR and then to target, then to api, and then selecting index.html. You can then browse the packages and classes in Scalabha. For example, look at HmmTagger, which is in the package opennlp.scalabha.postag, and you’ll see some of the fields and functions that are made available by that class.

But, you may still be wondering: how do I use these packages and classes in my code anyway? We do so via import statements. We’ll explore this by creating our own source code and compiling it.

Creating and compiling new code in SBT

First, we’ll begin by just doing a simple hello world application that is done in the context of Scalabha and uses a package name. Get set up for this by doing the following set of commands.

Now, point your browser to the file target/api/index.html. Note: this means doing “open file” and then going to your SCALABHA_DIR and then to target, then to api, and then selecting index.html. You can then browse the packages and classes in Scalabha. For example, look at HmmTagger, which is in the package opennlp.scalabha.postag, and you’ll see some of the fields and functions that are made available by that class.

But, you may still be wondering: how do I use these packages and classes in my code anyway? We do so via import statements. We’ll explore this by creating our own source code and compiling it.

Creating and compiling new code in SBT

First, we’ll begin by just doing a simple hello world application that is done in the context of Scalabha and uses a package name. Get set up for this by doing the following set of commands.

$ cd $SCALABHA_DIR
$ cd src/main/scala/opennlp/
$ mkdir bcomposes

Next, using a text editor, create the file Hello.scala in the src/main/scala/opennlp/bcomposes directory with the following contents.

package opennlp.bcomposes

object Hello {
  def main (args: Array[String]) = println("Hello, world!")
}

This is just like the hello world object from the previous tutorial, but now it has the additional package specification that indicates that its fully qualified name is opennlp.bcomposes.Hello.

Because the source code for Hello.scala is in a sub-directory of the src/main/scala directory, we can now compile this file using SBT. Make sure to save Hello.scala, and then go back to your SBT prompt and type “compile“.

> compile
[info] Compiling 1 Scala source to /Users/jbaldrid/devel/scalabha/target/classes...
[success] Total time: 1 s, completed Oct 26, 2011 10:35:15 AM

Notice that it compiled just one Scala source: SBT has already compiled the other source files in Scalabha, so it only had to compile the new one that you just saved.

Having successfully created and compiled the opennlp.bcomposes.Hello object, we can now run it. The scalabha executable provides a “run” target that allows you to run any of the code you’ve produced in the Scalabha build setup. In your shell, type the following.

$ scalabha run opennlp.bcomposes.Hello
Hello, world!

There is actually a bunch of stuff going on under the hood that ensures that your new class is included in the CLASSPATH and can be used in this manner (see bin/scalabha for details). This will simplify things for you considerable. To make a long story short, getting the CLASSPATH appropriately set is one of the main points of confusion for new developers; this way you can keep on moving without having to worry about what is essentially a plumbing problem.

Now, let’s say you want to change the definition of the Hello object to also print out an additional message that is supplied on the command line. Modify the main method to look like this.

def main (args: Array[String]) {
  println("Hello, world!")
  println(args(0))
}

Now save it, and try running it.

$ scalabha run opennlp.bcomposes.Hello Goodbye
Hello, world!

Oops — it didn’t work?! I’ve just forced you directly into a common point of confusion for students who are switching from scripting to compiling: you must compile before it can be used. So, invoke compile in SBT, and then try that command again.

$ scalabha run opennlp.bcomposes.Hello Goodbye
Hello, world!
Goodbye

To see what happens when you produce a syntax error in your Scala code, go back to Hello.scala and change first print statement in the main method so that it is missing the last quote:

println("Hello, world!)

Now go back to SBT and compile again to see the love letter you get from the Scala compiler.

[info] Compiling 1 Scala source to /Users/jbaldrid/devel/scalabha/target/classes...
[error] /Users/jbaldrid/devel/scalabha/src/main/scala/opennlp/bcomposes/Hello.scala:5: unclosed string literal
[error]     println("Hello, world!)
[error]             ^
[error] /Users/jbaldrid/devel/scalabha/src/main/scala/opennlp/bcomposes/Hello.scala:7: ')' expected but '}' found.
[error]   }
[error]   ^
[error] two errors found
[error] {file:/Users/jbaldrid/devel/scalabha/}default-86efd0/compile:compile: Compilation failed
[error] Total time: 0 s, completed Oct 26, 2011 11:02:07 AM

The compile attempt failed, and you must go back and fix it. But don’t do that yet. There’s a handy aspect of SBT in this write-save-compile loop that saves you time and effort: SBT allows triggered executation of actions, which means that SBT can automatically perform an action if there is a change to the stuff it cares about. The compile action cares about the source code, so it can monitor changes in the file system and automatically recompile any time a file is saved. To do this, you simply add ~ in front of the action.

Before fixing the error, type ~compile into SBT. You’ll see the same error message as before, but don’t worry about that. The last line of output from SBT will say:

1. Waiting for source changes... (press enter to interrupt)

Now go to Hello.scala again, add the quote back in, and save the file. This triggers the compile action in SBT, so you’ll see it automatically compile, with a success message.

[info] Compiling 1 Scala source to /Users/jbaldrid/devel/scalabha/target/classes...
[success] Total time: 0 s, completed Oct 26, 2011 11:02:49 AM
2. Waiting for source changes... (press enter to interrupt)

This is a nice way to see if your code is compiling as you work on it, with very little effort. Every time you save the file, it will let you know if there are problems. And, you’ll also be able to use the scalabha run target and know that you are using the latest compiled version when you do so.

As you develop your code in this way, you can invoke the “doc” action in SBT, then reload the index.html page in your browser, and it will show you the updated documentation for the things you’ve created. Try it now and look at the opennlp.bcomposes package that you’ve now created.

Creating code that uses existing packages

Now we can come back to using code from existing packages. In the past (if you’ve gone through all of these tutorials), you’ve seen statements like import scala.io.Source. That came from the standard Scala library, so it is always available to any Scala program. However, you can also use classes developed by others in a similar manner, provided your CLASSPATH is set up such that they are available. That is exactly what SBT does for you: all of the classes that are defined in the src/main/scala sub-directories are ready for your use.

As an example, save the following code as src/main/scala/opennlp/bcomposes/TreeTest.scala. It constructs a standard phrase structure tree for the sentence “I like coffee.”

package opennlp.bcomposes

import opennlp.scalabha.model.{Node,Value}

object TreeTest {

  def main (args: Array[String]) {
    val leaf1 = Value("I")
    val leaf2 = Value("like")
    val leaf3 = Value("coffee")
    val subjNpNode = Node("NP", List(leaf1))
    val verbNode = Node("V", List(leaf2))
    val objNpNode = Node("NP", List(leaf3))
    val vpNode = Node("VP", List(verbNode, objNpNode))
    val sentenceNode = Node("S", List(subjNpNode, vpNode))

    println("Printing the full tree:\n" + sentenceNode)
    println("\nPrinting the children of the VP node:\n" + vpNode.children)

    println("\nPrinting the yield of the full tree:\n" + sentenceNode.getTokens.mkString(" "))
    println("\nPrinting the yield of the VP node:\n" + vpNode.getTokens.mkString(" "))
  }

}

There are a few things to note here. The import statement at the top is what tells Scala the fully qualified package names for the classes Node and Value. You could have equivalently written it less concisely as follows.

import opennlp.scalabha.model.Node
import opennlp.scalabha.model.Value

Or, you could have left out the import statement and written the fully qualified names everywhere, e.g.:

val leaf1 = opennlp.scalabha.model.Value("I")

Second, Node and Value are case classes. We’ll discus this more later, but for now, all you need to know is that to create an object of the Node or Value classes, it isn’t necessary to use the “new” keyword.

Third, the print statements are using the Scalabha API (Application Programming Interface) to do useful things with the objects, such as printing out the tree they describe, printing the yield of the nodes (the words that they cover), and so on. The scaladoc you looked at before for Scalabha shows you these functions, so go have a look if you haven’t already.

Note that if you had left the triggered compilation on, SBT will have automatically compiled the TreeTest.scala. Otherwise, make sure to compile it yourself. Then, run it.

$ scalabha run opennlp.bcomposes.TreeTest
Printing the full tree:
Node(S,List(Node(NP,List(Value(I))), Node(VP,List(Node(V,List(Value(like))), Node(NP,List(Value(coffee)))))))

Printing the children of the VP node:
List(Node(V,List(Value(like))), Node(NP,List(Value(coffee))))

Printing the yield of the full tree:
I like coffee

Printing the yield of the VP node:
like coffee

Make and use your own package

By importing the classes you need in this manner, you can get more done by using them as you need. Any class in Scalabha or in the libraries that are included with it will be available for you, including any classes you define. As an example, do the following.

$ cd $SCALABHA_DIR/src/main/scala/opennlp/bcomposes
$ mkdir person
$ mkdir music

Now save the Person class from the previous tutorial as Person.scala in the person directory. Here’s the code again (note the addition of the package statement).

package opennlp.bcomposes.person

class Person (
  val firstName: String,
  val lastName: String,
  val age: Int,
  val occupation: String
) {

  def fullName: String = firstName + " " + lastName

  def greet (formal: Boolean): String = {
    if (formal)
      "Hello, my name is " + fullName + ". I'm a " + occupation + "."
    else
      "Hi, I'm " + firstName + "!"
  }

}

Now save the following as RadioheadGreeting.scala in the music directory.

package opennlp.bcomposes.music

import opennlp.bcomposes.person.Person

object RadioheadGreeting {

  def main (args: Array[String]) {
    val thomYorke = new Person("Thom", "Yorke", 43, "musician")
    val johnnyGreenwood = new Person("Johnny", "Greenwood", 39, "musician")
    val colinGreenwood = new Person("Colin", "Greenwood", 41, "musician")
    val edObrien = new Person("Ed", "O'Brien", 42, "musician")
    val philSelway = new Person("Phil", "Selway", 44, "musician")
    val radiohead = List(thomYorke, johnnyGreenwood, colinGreenwood, edObrien, philSelway)
    radiohead.foreach(bandmember => println(bandmember.greet(false)))
  }

}

When we did the compilation tutorial previously, Person.scala and RadioheadGreeting.scala were in the same directory, which allowed the latter to know about the Person class. Now that they are in separate packages, the Person class must be explicitly imported; once you’ve done so, you can code with Person objects just as you did before.

Finally, to run it, we now must specify the fully qualified package name for RadioheadGreeting.

$ scalabha run opennlp.bcomposes.music.RadioheadGreeting
Hi, I'm Thom!
Hi, I'm Johnny!
Hi, I'm Colin!
Hi, I'm Ed!
Hi, I'm Phil!

A note on package names and their relation to directories

Package names are made unique by certain conventions that generally ensure you won’t get clashes. For example, we are using opennlp.scalabha and opennlp.bcomposes, which I happen to know are unique. Quite often these names will include full internet domains, in reverse, like org.apache.commons and com.cloudera.crunch. By convention, we put the source files that are in packages (and subpackages) in directory structures that reflect the names. So, for example, opennlp.bcomposes.music.RadioheadGreeting is in the directory src/main/scala/opennlp/bcomposes/music. However, it is worth noting that this is not a hard constraint with Scala (as it is with Java).

There is a great deal more to using a build system, but this is where I must end this discussion, hoping it is enough to get the core concepts across and make it possible for my students to do the homework on part-of-speech tagging and making use of the opennlp.scalabha.postag package!

Copyright 2011 Jason Baldridge

The text of this tutorial is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. Attribution may be provided by linking to http://www.jasonbaldridge.com and to this original tutorial.

Suggestions, improvements, extensions and bug fixes welcome — please email Jason at jasonbaldridge@gmail.com or provide a comment to this post.

 

About these ads
2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,661 other followers

%d bloggers like this: