Using Twitter4j with Scala to perform user actions

Topics: twitter,twitter4j,word clouds

Introduction

My previous post showed how to use Twitter4j in Scala to access Twitter streams. This post shows how to control a Twitter user’s actions using Twitter4j. The primary purpose of this functionality is perhaps to create interfaces for Twitter like TweetDeck, but it can also be used to create bots that take automated actions on Twitter (one bot I’m playing around with is @tshrdlu, using the code in this tutorial and the code in the tshrdlu repository).

This post will only cover a small portion of the things you can do, but they are some of the more common things and I include a couple of simple but interesting use cases. Once you have these things in place, it is straightforward to figure out how to use the Twitter4j API docs (and Stack Overflow) to do the rest.

Getting set up: code and authorization

Rather than having the reader build the code up while going through the tutorial, I’ve set up the code in the repository twitter4j-tutorial. The version needed for this tutorial as v0.2.0. You can download a tarball of that version, which may be easier to work with if there have been further developments to the repository since the writing of this tutorial. Checkout or download that code now. The main file of interest is:

  • src/main/scala/TwitterUser.scala

This tutorial is mainly a walk through for that file in blog form, with some additional pointers and explanations here and there.

You also need to set up the authorization details. See “Setting up authorization” section of the previous post to do this if you haven’t already.

READ THE FOLLOWING

IMPORTANT: for this tutorial you must set the permissions for your application to be “Read and Write“. This does NOT mean to use ‘chmod’. It means going to the Twitter developers application site, signing in with your Twitter account, clicking on “Settings” and setting the permissions to read and write.

OKAY, THANKS FOR PAYING ATTENTION

In the previous tutorial, authorization details were put into code. This time, we’ll use a twitter4j.properties file. This is easy: just add a file with that name to the twitter4j-tutorial directory with the following contents, substituting your details as appropriate.

oauth.consumerKey=[your consumer key here]
oauth.consumerSecret=[your consumer secret here]
oauth.accessToken=[your access token here]
oauth.accessTokenSecret=[your access token secret here]

Rate limits and a note of caution

Unlike streaming access to Twitter, performing user actions via the API is subject to rate limits. Once you hit your limit, Twitter will throw an exception and refuse to comply with your requests until a period of time has passed (usually 15 minutes). Twitter does this to limit bad bots and also preserve their computational resources. For more information on rate limits, see Twitter’s page about rate limiting.

I’ll discuss how to manage rate limits later in the post, but I mention them up front in case you exceed them while messing around with things early on.

A word of caution is also in order: since you are going to be able to take actions automatically, like following users, posting a status, and retweeting, you could end up doing many of these actions in rapid succession. This will (a) use up your rate limit very quickly, (b) probably not be interesting behavior, and (c) could get your account suspended. Make sure to follow the rules, especially those on following users.

If you are going to mess around quite a bit with actual posting, you may also want to consider creating an account that is not your primary Twitter account so that you don’t annoy your actual followers. (Suggestion: see the paragraph on “Create account” in part one of project phase one of my Applied NLP course for tips on how to add multiple accounts with the same gmail address.)

Basic interactions: searching, timelines, posting

All of the examples belowe are implemented as objects with main methods that do something using a twitter4j.Twitter object. To make it so we don’t have to call the TwitterFactory repeatedly, we first define a trait that gets a Twitter instance set up and ready to use.

trait TwitterInstance {
  val twitter = new TwitterFactory().getInstance
}

By extending this trait, our objects can access the twitter object conveniently.

As a first simple example, we can search for tweets that match a query by using the search method. The following object takes a query string given on the command line query, searches for tweets using that query, and prints them.

object QuerySearch extends TwitterInstance {

  def main(args: Array[String]) {
    val statuses = twitter.search(new Query(args(0))).getTweets
    statuses.foreach(status => println(status.getText + "\n"))
  }

}

Note that this uses a Query object, whereas with using a TwitterStream, a FilterQuery was needed. Also, for this to work, we must have the following import available:

import collection.JavaConversions._

This ensures that we can use the java.util.List returned by the getTweets method (of twitter4j.QueryResult) as if it were a Scala collection with the method foreach (and map, filter, etc). This is done via implicit conversions that make working with Java libraries far nicer than it would be otherwise.

To run this, go to the twitter4j-tutorial directory, and do the following (some example output shown):

$ ./build
> run-main bcomposes.twitter.QuerySearch scala
[info] Running bcomposes.twitter.QuerySearch scala
E' avvilente non sentirsi all'altezza di qualcosa o qualcuno, se non si possiede quella scala interiore sulla quale l'autostima pu? issarsi

Scala workshop will run with ECOOP, July 2nd in Montpellier, South of France. Call for papers is out. http://t.co/3WS6tHQyiF

#scala http://t.co/JwNrzXTwm8 Even two of them in #cologne #germany . #thumbsup

RT @MILLIB2DAL: @djcameo Birthday bash 30th march @ Scala nightclub 100 artists including myself make sur u reach its gonna be #Legendary

@kot_2010 I think it's the same case with Scala: with macros it will tend to "outsource" things to macro libs, keeping a small lang core.

RT @waxzce: #scala hiring or job ? go there : http://t.co/NeEjoqwqwT

@esten That's not only a front-end problem. Scala devs should use scalaz.Equal and === for type safe equality. /cc @sharonw

<...more...>

[success] Total time: 1 s, completed Feb 26, 2013 1:54:44 PM

You might see some extra communications from SBT, which will probably need to download dependencies and compile the code. For the rest of the examples below, you can run them in a similar manner, substituting the right object name and providing any necessary arguments.

There are various timelines available for each user, including the home timeline, mentions timeline, and user timeline. They are accessible as twitter4j.api.TimelineResources. For example, the following object shows the most recent statuses on the authenticating user’s home timeline (which are the tweets by people the user follows).

object GetHomeTimeline extends TwitterInstance {

  def main(args: Array[String]) {
    val num = if (args.length == 1) args(0).toInt else 10
    val statuses = twitter.getHomeTimeline.take(num)
    statuses.foreach(status => println(status.getText + "\n"))
  }

}

The number of tweets to show is given as the command-line argument.

You can also update the status of the authenticating user from the command line using the following object. Calling it will post to the authenticating user’s account (so only do it if you are comfortable with the command-line argument you give it going onto your timeline).

object UpdateStatus extends TwitterInstance {
  def main(args: Array[String]) {
    twitter.updateStatus(new StatusUpdate(args(0)))
  }
}

There are plenty of other useful methods that you can use to interact with Twitter, and if you have successfully run the above three, you should be able to look at the Twitter4j javadocs and start using them. Some examples doing more interesting things are given below.

Replying to tweets written to you

The following object goes through the most recent tweets that have mentioned the authenticating user, and replies “OK.” to them. It includes the author of the original tweet and any other entities that were mentioned in it.

object ReplyOK extends TwitterInstance {

  def main(args: Array[String]) {
    val num = if (args.length == 1) args(0).toInt else 10
    val userName = twitter.getScreenName
    val statuses = twitter.getMentionsTimeline.take(num)
    statuses.foreach { status => {
      val statusAuthor = status.getUser.getScreenName
      val mentionedEntities = status.getUserMentionEntities.map(_.getScreenName).toList
      val participants = (statusAuthor :: mentionedEntities).toSet - userName
      val text = participants.map(p=>"@"+p).mkString(" ") + " OK."
      val reply = new StatusUpdate(text).inReplyToStatusId(status.getId)
      println("Replying: " + text)
      twitter.updateStatus(reply)
    }}
  }

}

This should be mostly self-explanatory, but there are a couple of things to note. First, you can find all the entities that have been mentioned (via @-mentions) in the tweet via the method getUserMentionEntities of the twitter4j.Status class. The code ensures that the author of the original tweet (who isn’t necessarily mentioned in it) is included as a participant for the response, and also we take out the authenticating user. So, if the message “@tshrdlu What do you think of @tshrdlc?” is sent from @jasonbaldridge, the response will be “@jasonbaldridge @tshrdlc OK.” Note how the screen names do not have the @ symbol, so that must be added in the tweet text of the reply.

Second, notice that StatusUpdate objects can be created by chaining methods that add more information to them, e.g. setInReplyToStatusId and setLocation, which incrementally build up the StatusUpdate object that gets actually posted. (This is a common Java strategy that basically helps get around the fact that parameters to classes can neither be specified by name in Java nor have defaults, the way Scala does.)

Checking and managing rate limit information

None of the above code makes many requests from Twitter, so there was little danger of exceeding rate limits. These limits are a mixture of both time and number of requests: you basically get a certain number of requests every hour (currently 350) per authenticating user. Because of these limits, you should consider accessing tweets, timelines, and such using the streaming methods when you can.

Every response you get from Twitter comes back as a sub-class of twitter4j.TwitterResponse, which not only gives you what you want (like a QueryResult) but also gives you information about your connection to Twitter. For rate limit information, you can use the getRateLimitStatus method, which can then inform you about the number of requests you can still make and the time until your limit resets.

The trait RateChecker below has a function checkAndWait that, when given a TwitterResponse object, checks whether the rate limit has been exceeded and wait if it has. When the rate is exceeded, it finds out how much time remains until the rate limit is reset and makes the thread sleep until that time (plus 10 seconds) has passed.

trait RateChecker {

  def checkAndWait(response: TwitterResponse, verbose: Boolean = false) {
    val rateLimitStatus = response.getRateLimitStatus
    if (verbose) println("RLS: " + rateLimitStatus)

    if (rateLimitStatus != null && rateLimitStatus.getRemaining == 0) {
      println("*** You hit your rate limit. ***")
      val waitTime = rateLimitStatus.getSecondsUntilReset + 10
      println("Waiting " + waitTime + " seconds ( " + waitTime/60.0 + " minutes) for rate limit reset.")
      Thread.sleep(waitTime*1000)
    }
  }

}

Using rate limits is actually more complex than this. For example, this strategy ignores the fact that different request types have different limits, but it keeps things simple. This is surely not an optimal solution, but it does the trick for present purposes.

Note also that you can directly ask for rate limit information from the twitter4j.Twitter instance itself, using the getRateLimitStatus method. Unlike the results for the same method on a TwitterResponse, this gives a Map from various request types to the current rate limit statuses for each one. In a real application, you’d want to control each of these different limits at a more fine-grained level using this information.

Not all of the methods of Twitter4j classes actually hit the Twitter API. To see whether a given method does, look at its Javadoc: if it’s description says “This method calls http://api.twitter.com/1.1/some/method.json“, then it does hit the API. Otherwise, it doesn’t and you don’t need to guard it.

Examples using the checkAndWait function are given below.

Creating a word cloud from followers’ descriptions

Here’s a more interesting task: given a Twitter user, compute the counts of the words in the descriptions given in the bios of their followers and build a word cloud from them. The following code does this, outputing the resulting counts in a file, the contents of which can be pasted into Wordle’s advanced word cloud input.

object DescribeFollowers extends TwitterInstance with RateChecker {

  def main(args: Array[String]) {
    val screenName = args(0)
    val maxUsers = if (args.length==2) args(1).toInt else 500
    val followerIds = twitter.getFollowersIDs(screenName,-1).getIDs

    val descriptions = followerIds.take(maxUsers).flatMap { id => {
      val user = twitter.showUser(id)
      checkAndWait(user)
      if (user.isProtected) None else Some(user.getDescription)
    }}

    val tword = """(?i)[a-z#@]+""".r.pattern
    val words = descriptions.flatMap(_.toLowerCase.split("\\s+"))
    val filtered = words.filter(_.length > 3).filter(tword.matcher(_).matches)
    val counts = filtered.groupBy(x=>x).mapValues(_.length)
    val rankedCounts = counts.toSeq.sortBy(- _._2)

    import java.io._
    val wordcountFile = "/tmp/follower_wordcount.txt"
    val writer = new BufferedWriter(new FileWriter(wordcountFile))
    for ((w,c) <- rankedCounts)
      writer.write(w+":"+c+"\n")
    writer.flush
    writer.close
  }

}

The thing to consider is that if you are pointing this at a person with several hundred followers, you will exceed the rate limit. The call to getFollowersIDs is a single hit, and then each call to showUser is a hit. Because the showUser calls come in rapid succession, we check the rate limit status after each one using checkAndWait (which is available because we mixed in the RateChecker trait) and it waits for the limit to reset as previously discussed, keeping us from exceeding the rate limit and getting an exception from Twitter.

The number of users returned by getFollowersIDs is at most 5000. If you run this on a user who has more followers, followers beyond 5000 won’t be considered. If you want to tackle such a user, you’ll need to use the cursor, which is the integer provided as the argument to getFollowersIDs, and make multiple calls while incrementing that cursor to get more.

Most of the rest of the code is just standard Scala stuff for getting the word counts and outputting them to a file. Note that a small effort is done to reduce the non-alphabetic characters (but allowing # and @) and filtering out short words.

As an example of the output, when put into Wordle, here is the word cloud for my followers.

jasonbaldridge_wordcloud

This looks about right for me—completely expected in fact—but it is still cool that it comes out of my followers’ self descriptions. One could start thinking of some fun algorithms for exploiting this kind of representation of a user to look into how well different users align or don’t align with their followers, or to look for clusters of different types of followers, etc.

Retweeting automatically

Tired of actually reading those tweets in your timeline and retweeting some of them? The following code gets some of the accounts the authenticating user follows, grabs twenty of those users, filters them to get interesting ones, and then takes up to 10 of the remaining ones and retweets their most recent statuses (provided they aren’t replies to someone else).

object RetweetFriends extends TwitterInstance with RateChecker {

  def main(args: Array[String]) {
    val friendIds = twitter.getFriendsIDs(-1).getIDs
    val friends = friendIds.take(20).map { id => {
      val user = twitter.showUser(id)
      checkAndWait(user)
      user
    }}

    val filtered = friends.filter(admissable)
    val ranked = filtered.map(f => (f.getFollowersCount, f)).sortBy(- _._1).map(_._2)

    ranked.take(10).foreach { friend => {
      val status = friend.getStatus
      if (status!=null && status.getInReplyToStatusId == -1) {
        println("\nRetweeting " + friend.getName + ":\n" + status.getText)
        twitter.retweetStatus(status.getId)
        Thread.sleep(30000)
      }
    }}
  }

  def admissable(user: User) = {
    val ratio = user.getFollowersCount.toDouble/user.getFriendsCount
    user.getFriendsCount < 1000 && ratio > 0.5
  }

}

The getFriendsIDs method is used to get the users that the authenticating user is following (but who do not necessarily follow the authenticating user, despite the use of the word “friend”). We again take care with the rate limiting on gathering the users. We filter these users, looking for those who follow fewer than 1000 users and those who have a follower/friend ratio of greater than .5, in a simple attempt to filter out some less interesting (or spammy) accounts. The remaining users are then ranked according to their number of followers (most first). Finally, we take (up to) 10 of these (the take method returns 3 things if you ask for 10 but there are just 3), look at their most recent status, and if it is not null and isn’t a reply to someone, we retweet it. Between each of these, we wait for 30 seconds so that anyone following our account doesn’t get an avalanche of retweets.

Conclusion

This post and the related code should provide enough to get a decent feel for working with Twitter4j, including necessary setup and using some of the methods to start creating applications with it in Scala. See project phase three of my Applied NLP course to see exercises and code that takes this further to do interesting things for automated bots, including mixing streaming access and user access to get more complex behaviors.

About these ads
1 comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 2,297 other followers

%d bloggers like this: