Why Care Block:
Oh man, things are about to get real juicy – this is the final reveal of the Twitter API and I couldn’t think of a better time to do this. After all, we’ve made a complete mess out of Stata with our unruly bitwise operators, unending computer security algorithms, and un-um… unfortunate time-based C-plugins. Well, my Stata friends, this is how to do anything with Stata and the Twitter API.
Let’s post! Seriously, let’s tweet about things using Stata. This method can be used not only to tweet, but also to send direct messages, retweet things, and like (favorite) others’ tweets. If you’re like me, you don’t really use Twitter that well and I’m sure my 40 followers (I rounded up) would agree. But wouldn’t it be great to find a way to reach thousands of Twitter followers based off of hashtags or messages or even emoji? If the answer is no, fine, that’s my best attempt at convincing you but still feel free to read on.
Let’s bring together everything we need to send a Tweet out on the internet:
Easy way to turn readable stuff into unreadable stuff for humans but it’s not so bad for computers. This takes all alphanumeric characters plus the plus and the “/” (which equals 64 characters to choose from), converts 8-bit chunks into 6-bit chunks which end up receiving the specific character. Sometimes equal signs are added, but you already know that.
See post on Base64 Encoding
Bitwise Operators and Security Methods
We’ve got bits dressed to the nines: shifting operators, rotating operators, logical operators, overflow methods, and all kinds of methods to translate from base 2 to base 16 to base 10.
These feed into our HMAC-SHA1 algorithm, which was awful to develop… this seriously has to go in the “lots of effort, little reward” category.
See post on Bitwise Operators and HMAC-SHA1
We needed time - I mean we all do – in seconds since 1970 based off of Greenwich Time. We could have just accounted for our respective time zones (unless you’re located in GMT+0 of course), but why not use this as a great learning opportunity?
See post on Plugins
Once? None? Scone? What’s a nonce? Simply put, it’s a random 32 character word containing all alphanumeric symbols. Twitter’s example shows this being Base64 encoded, but all it has to be is a random string. vqib2ve89dOnTusESAS26Ceu9TcUES2i – see? Easy, just make sure you change the characters each time.
Based on the UTF-8 URL Encoding standards, we need to replace certain characters that would otherwise cause complications in a URL. Don’t worry though, we’ve got a function for that. Just know that valid URL characters are all letters and numbers as well as the dash, the period, the underscore, and the tilde.
Send Out the Tweets
Twitter has some fabulous documentation for their API so it’s fairly easy to find the method that you want to do. These processes can be generalized and used across many of these methods so we are by no means limited to just tweeting. For tweets (hereon referred to as status updates to be consistent with Twitter’s documentation), we use the base or “resource” URL:
Remember, status updates can only be 140 characters and Twitter will actually shorten URLs for you, but still keep in mind your character limits!
All statuses must be percent encoded, so let’s go ahead and create our status local; we’ll also store our base URL while we’re at it.
local b_url "https://api.twitter.com/1.1/statuses/update.json" local status = "Sent out this tweet using #Stata and the #Twitter #API – http://www.wmatsuoka.com/stata/stata-and-the-twitter-api-part-ii tells you how!" mata: st_local("status", percentencode("`status'"))
We then take all of our application information: consumer key, consumer secret, access token, and access secret and store these as locals as well.
local cons_key "ZAPp9dO37PnzCofN2Nm8n8kye" local cons_sec "kfbtARFpBIdb515iaS48kYZjWhLoIdbEiAINDVX0c3W3e0fgWe" local accs_key "1234567890-YFklDWGuvSIGLYPMnAfOZgLgLsMXjKIHqaIr1F5" local accs_sec "LYvHWfTS6LXjtDPVXchs6dXUG52l41j4HmicYwjr8aStw"
For this section, we will be creating all of our necessary authentication variables. The first step to creating our secret signing key is to make a string that contains our consumer secret key, concatenated with an ampersand, concatenated with our access secret key. We’ll also include all the necessary locals we talked about earlier.
local s_key = "`cons_sec'&`accs_sec'" mata: st_local("nonce", gen_nonce_32()) local sig_meth = "HMAC-SHA1" // If you don't make a plugin, just make sure you do seconds since 1970 // probably using c(current_date) and c(current_time) - [your time zone] plugin call st_utm local ts = (clock(substr("`utm_time'", 4, .), "MDhmsY") - clock("1970", "Y"))/1000
Now it’s time to create our signature. We start by percent encoding our base URL. For our signature string, we include the following categories:
local sig = "oauth_consumer_key=`cons_key'&oauth_nonce=`nonce'&oauth_signature_method=`sig_meth'&oauth_timestamp=`ts'&oauth_token=`accs_tok'&oauth_version=1.0&status=`status'"
Then, percent encode the signature string and base URL:
mata: st_local("pe", percentencode("`b_url'")) mata: st_local("pe_sig", percentencode("`sig'"))
This next step is why we spent an ungodly amount of time on bits and hashes! We need to transform our signature string into a Base64 encoding of the HMAC-SHA1 hash result with our message being the percent-encoded signature base hashed with our secret key. Sorry if that's a mouthful, but we can see what that translates to below.
local sig_base = "POST&`pe'&" mata: x=sha1toascii(hmac_sha1("`s_key'", "`sig_base'`pe_sig'")) mata: st_local("sig", percentencode(encode64(x)))
Finally, we’re almost done with Twitter and can move on to other more important things! First let’s make sure to post our Tweet because, after all, that's what we're here for.
!curl -k --request "POST" "`b_url'" --data "status=`status'" --header "Authorization: OAuth oauth_consumer_key="`cons_key'", oauth_nonce="`nonce'", oauth_signature="`sig'", oauth_signature_method="`sig_meth'", oauth_timestamp="`ts'", oauth_token="`accs_tok'", oauth_version="1.0"" --verbose
Now we can marvel at how much time we spent learning about something 99% of us don’t really care about. This is for you, 1% - this is all for you.
PS: Want to add emoji? Go to this site, find the URL Escape Code sections, and add the result to your status string!
Ex: %F0%9F%93%88 = Chart with Upward Trend
Stata Plugins: the C-Sweet
I thought I’d take a quick break from the latest series on building the Twitter API Library to bring you a message sponsored by Stata Plugins! Okay, we’re not officially sponsored by Stata and, legally speaking, we’re not sponsored by anyone. Technicalities. Maybe someday you’ll wish to plug your brand [here], but even I think that’s highly unlikely – so let’s just get right to plugins.
Do you know C? I wouldn’t say I know it very well, but that sure as hell shouldn’t stop us – this is a great learning opportunity! If you know the language, please feel free to skip to the next paragraph. Still here? Excellent! That means you’re like me and aren’t too concerned with lower level programming languages (that’s code for a language that’s a lot more explicit – it doesn’t look like English anymore). After all, a statement such as:
regress price mpg
is much easier to understand than this great (albeit extremely technical) Stata Blog post showing us what’s actually going on behind the scenes so to speak. Now, I actually have some training in Java, but honestly I have to tribute my smooth transition into C to Mata. Mata is a wonderful extension to Stata and I highly recommend picking it up if you haven’t already. It can be difficult to transition to Mata if you simply use Stata and there seems to be a lack of literature pertaining to learning Mata. I’ll point you to UCLA’s code fragment on Mata or Bill Gould’s Mata Matters/Mata the Missing Manual resources if you wish to learn more. The observation is: Mata prepares you for C because it’s extremely similar to the C language and a great transitional language from Stata to C. What’s the point though, right? Well, Mata enables you to be quicker and do complicated things directly from your do-file. Learning C, while sometimes unnecessary if your job is strictly to do quick analysis, will make it so that you can blow your colleagues away by showing them how your new program reduced the return time of a command from .8 to .6 seconds. Sorry again colleagues.
Troubleshooting is difficult and I hope to have a video available soon, but not today. So what’s a practical application for plugins? Directly related to the Twitter API posts, we need Stata to provide us with the time. “That’s dumb” you’d say. Just use - c(current_time) -. Fine, but now we need time in UTC which by the way is combination of the acronyms Coordinate Universal Time (CUT) and the French Temps Universel Coordonné (TUC). So how about:
clock(c(current_time) , "hms") – (TimeOffsetfromGMT)*60*60*1000
Close! But we’re also extremely lazy and want the computer to figure out which time zone we’re in (regardless of whether we have internet connection or not) and it needs to work in batch mode! Okay okay, enough with the requests! Basically, the question is how can we take the computer’s internal UTC time information? That’s where Stata plugins come in handy.
At a high level, what we’re going to do is tell Stata: “Hey Stata, go ahead and talk to the computer. Ask it what the current UTC time is and place the result in a local macro that we can call later. Thanks!”
Because I have a 64-bit system, it seems that the simple things tend to get a little complicated. I compile the file in Visual Studio (x64) and I also have the 64-bit version of Stata installed. Here’s the main file st_utc.c which should be compiled with the two necessary Stata files stplugin.c and stplugin.h. In other words, it worked for me but unfortunately I give no guarantees that it will work seamlessly on your end. Please edit as necessary.
So what’s going on here? Well, the beauty of this file is that we can load whatever libraries we need – here we included the time library so that we can access time variables directly from our stata_call routine. Note how we declare our error handling variable to be of type ST_retcode. We are able to call SF_display which prints the contents of a variable or string to the Stata console and SF_macro_save which allows us to access the contents of the variable buf (our time string) directly from the Stata local `utc_time’ thanks to the stplugin.h header.
After compiling st_utc.dll, you can either rename it st_utc.plugin and place it in your personal ado directory for easy access, or keep it as a dll and specify the location of the file using the option using() as shown below:
program st_utc, plugin using(st_utc.dll) plugin call st_utc
And it’s that easy! Or so it seems. There are a lot of technical errors that you have to keep in mind from operating systems, to Stata versions (look to older plugin interface versions with older versions of Stata), to whether you really want to spend the time debugging your compiled code. It’s well worth getting it to work at least once! The important take away here is that any Stata user can greatly extend the capabilities of Stata; we haven’t even talked about javacall yet. So the next time a non-Stata user tries to assert their software package dominance by saying: “oh, Stata couldn’t do “x” so I had to use some otheR package”, bet them $20 they’re wrong. Most of the time, they don’t know what they’re talking about and you’ll have $20 to spend on some sweet Stata swag (seriously, we’re not sponsored by Stata).
Will Matsuoka is the creator of W=M/Stata - he likes creativity and simplicity, taking pictures of food, competition, and anything that can be analyzed.