OAuthWhy Care Block:
Oh man, things are about to get real juicy – this is the final reveal of the Twitter API and I couldn’t think of a better time to do this. After all, we’ve made a complete mess out of Stata with our unruly bitwise operators, unending computer security algorithms, and un-um… unfortunate time-based C-plugins. Well, my Stata friends, this is how to do anything with Stata and the Twitter API.
Let’s post! Seriously, let’s tweet about things using Stata. This method can be used not only to tweet, but also to send direct messages, retweet things, and like (favorite) others’ tweets. If you’re like me, you don’t really use Twitter that well and I’m sure my 40 followers (I rounded up) would agree. But wouldn’t it be great to find a way to reach thousands of Twitter followers based off of hashtags or messages or even emoji? If the answer is no, fine, that’s my best attempt at convincing you but still feel free to read on. Let’s bring together everything we need to send a Tweet out on the internet:
Base64 Encoding
Easy way to turn readable stuff into unreadable stuff for humans but it’s not so bad for computers. This takes all alphanumeric characters plus the plus and the “/” (which equals 64 characters to choose from), converts 8-bit chunks into 6-bit chunks which end up receiving the specific character. Sometimes equal signs are added, but you already know that.
See post on Base64 Encoding Bitwise Operators and Security Methods
We’ve got bits dressed to the nines: shifting operators, rotating operators, logical operators, overflow methods, and all kinds of methods to translate from base 2 to base 16 to base 10.
These feed into our HMAC-SHA1 algorithm, which was awful to develop… this seriously has to go in the “lots of effort, little reward” category. See post on Bitwise Operators and HMAC-SHA1 Stata Plugins
We needed time - I mean we all do – in seconds since 1970 based off of Greenwich Time. We could have just accounted for our respective time zones (unless you’re located in GMT+0 of course), but why not use this as a great learning opportunity?
See post on Plugins Nonce
Once? None? Scone? What’s a nonce? Simply put, it’s a random 32 character word containing all alphanumeric symbols. Twitter’s example shows this being Base64 encoded, but all it has to be is a random string. vqib2ve89dOnTusESAS26Ceu9TcUES2i – see? Easy, just make sure you change the characters each time.
Percent Encoding
What%27s%20percent%20encoding%3F
Based on the UTF-8 URL Encoding standards, we need to replace certain characters that would otherwise cause complications in a URL. Don’t worry though, we’ve got a function for that. Just know that valid URL characters are all letters and numbers as well as the dash, the period, the underscore, and the tilde. ![]()
Send Out the Tweets
Twitter has some fabulous documentation for their API so it’s fairly easy to find the method that you want to do. These processes can be generalized and used across many of these methods so we are by no means limited to just tweeting. For tweets (hereon referred to as status updates to be consistent with Twitter’s documentation), we use the base or “resource” URL:
https://api.twitter.com/1.1/statuses/update.json Remember, status updates can only be 140 characters and Twitter will actually shorten URLs for you, but still keep in mind your character limits! All statuses must be percent encoded, so let’s go ahead and create our status local; we’ll also store our base URL while we’re at it. local b_url "https://api.twitter.com/1.1/statuses/update.json" local status = "Sent out this tweet using #Stata and the #Twitter #API – http://www.wmatsuoka.com/stata/stata-and-the-twitter-api-part-ii tells you how!" mata: st_local("status", percentencode("`status'")) We then take all of our application information: consumer key, consumer secret, access token, and access secret and store these as locals as well. local cons_key "ZAPp9dO37PnzCofN2Nm8n8kye" local cons_sec "kfbtARFpBIdb515iaS48kYZjWhLoIdbEiAINDVX0c3W3e0fgWe" local accs_key "1234567890-YFklDWGuvSIGLYPMnAfOZgLgLsMXjKIHqaIr1F5" local accs_sec "LYvHWfTS6LXjtDPVXchs6dXUG52l41j4HmicYwjr8aStw" For this section, we will be creating all of our necessary authentication variables. The first step to creating our secret signing key is to make a string that contains our consumer secret key, concatenated with an ampersand, concatenated with our access secret key. We’ll also include all the necessary locals we talked about earlier. local s_key = "`cons_sec'&`accs_sec'" mata: st_local("nonce", gen_nonce_32()) local sig_meth = "HMAC-SHA1" // If you don't make a plugin, just make sure you do seconds since 1970 // probably using c(current_date) and c(current_time) - [your time zone] plugin call st_utm local ts = (clock(substr("`utm_time'", 4, .), "MDhmsY") - clock("1970", "Y"))/1000
Now it’s time to create our signature. We start by percent encoding our base URL. For our signature string, we include the following categories:
local sig = "oauth_consumer_key=`cons_key'&oauth_nonce=`nonce'&oauth_signature_method=`sig_meth'&oauth_timestamp=`ts'&oauth_token=`accs_tok'&oauth_version=1.0&status=`status'" Then, percent encode the signature string and base URL: mata: st_local("pe", percentencode("`b_url'")) mata: st_local("pe_sig", percentencode("`sig'"))
This next step is why we spent an ungodly amount of time on bits and hashes! We need to transform our signature string into a Base64 encoding of the HMAC-SHA1 hash result with our message being the percent-encoded signature base hashed with our secret key. Sorry if that's a mouthful, but we can see what that translates to below.
local sig_base = "POST&`pe'&" mata: x=sha1toascii(hmac_sha1("`s_key'", "`sig_base'`pe_sig'")) mata: st_local("sig", percentencode(encode64(x))) Finally, we’re almost done with Twitter and can move on to other more important things! First let’s make sure to post our Tweet because, after all, that's what we're here for. !curl -k --request "POST" "`b_url'" --data "status=`status'" --header "Authorization: OAuth oauth_consumer_key="`cons_key'", oauth_nonce="`nonce'", oauth_signature="`sig'", oauth_signature_method="`sig_meth'", oauth_timestamp="`ts'", oauth_token="`accs_tok'", oauth_version="1.0"" --verbose Now we can marvel at how much time we spent learning about something 99% of us don’t really care about. This is for you, 1% - this is all for you.
PS: Want to add emoji? Go to this site, find the URL Escape Code sections, and add the result to your status string! Ex: %F0%9F%93%88 = Chart with Upward Trend
4 Comments
Application-Only AuthenticationWhy Care Block:
This is it – the culmination of all of our hard work and it’s broken up into two posts. The first (this one) shows us how to get Twitter data, and the second shows us how to post things on Twitter using Stata and our API library.
This post is brought to you thanks to an openness and willingness to share information. In today’s day and age, it’s ridiculous to try and hide information or impede the transfer of knowledge to others. I recently experienced first-hand the negative result openness can have on those who wish to obfuscate information. I guess these certain individuals are incapable of adapting, sharing, and participating in the global community – perhaps it’s not their fault if they feel threatened, but this does not excuse what I believe to be an original sin when it comes to technology. Sure it’s difficult to produce new things and sometimes even more difficult to scrap projects you’re emotionally attached to, but to abandon creativity and sequester change is the antithesis of this post. I’m sure many of you have dealt with an issue similar to this in your line of work, please don’t give up the good fight. If this sounds familiar, well this post is dedicated to you. Just like the last Fitbit API post, all corresponding variables are color-coded for your convenience. The first thing we need to do is set up an application with the Twitter API. Once you enter all the necessary information, go to the “Keys and Access Tokens” tag to get your Consumer Key and Consumer Secret. FYI, all information provided below is fake, sorry to disappoint.
Below the Application Settings section is the “Your Access Token” section which contains the Access Token and Access Token Secret which we will use to get our basic tweet data and search for other interesting things.
Just like our Fitbit API, we need to first get an Authorization Token by sending a cURL request that takes the Base64 encoded consumer key and client secret. First let’s set up our variables:
local cons_key "ZAPp9dO37PnzCofN2Nm8n8kye" local cons_sec "kfbtARFpBIdb515iaS48kYZjWhLoIdbEiAINDVX0c3W3e0fgWe" local accs_key "1234567890-YFklDWGuvSIGLYPMnAfOZgLgLsMXjKIHqaIr1F5" local accs_sec "LYvHWfTS6LXjtDPVXchs6dXUG52l41j4HmicYwjr8aStw"
Note: we will not be using the access key or secret until the next post
And create our encoded consumer information string (recall encode64 is available through this post): encode64 "`cons_key':`cons_sec'" local client64 = r(base64) Now we simply send our request to Twitter: !curl -k -i -H "Content-Type: application/x-www-form-urlencoded" -H "Authorization: Basic `client64'" --request POST --data "grant_type=client_credentials" https://api.twitter.com/oauth2/token --output "token.json" Be sure to parse the access token from our outputted token.json file (perhaps this would be a good place to enter error handling methods), I stored the value in the global macro $token. Finally, send one more request that contains the query you’re looking for: !curl -k -i -H "Authorization: Bearer $token" --request GET "https://api.twitter.com/1.1/statuses/user_timeline.json?count=100&screen_name=_belenchavez" --output "output.json" In this case, we were able to pull the top 100 things from @_belenchavez’s timeline using the statuses API. For more information about what you can do with the application only authentication, check our Twitter’s documentation here. Otherwise, take a look at this frequency cloud of my commonly Tweeted/Re-Tweeted hashtags of 2016 thanks to wordclouds.com. It even looks somewhat like the Twitter logo! Why Care Block?
I applaud all of you who have made it this far, but unfortunately things are going to get super boring! Well, besides applause I’d also like to take a sentence to thank you for reading this content: whether you’re reading this out of boredom, coercion, to make fun of it, or you just have a general interest, I’d like to extend my gratitude.
Switching back to the good stuff, today we’re talking about hash functions. Think hash browns *delicious* - they’re sliced, diced, and fried to a golden brown; drenched or drizzled with hot sauce or ketchup (I prefer Cholula and Tabasco) and resembles nothing like the raw potato it once was. Now, take your hash browns and try to reconstruct the original potato. “No!” you’d say, “That’s really hard.” Keep in mind this post is about security, from a non-cryptographic scientist or security expert, so anticipate a lot more potato metaphors because that’s about all I know about security. This post is really about making the Twitter API work within Stata without any outside plugins or packages. To do so, we must find a way to recreate the HMAC-SHA1 algorithm which is quite difficult without our previous toolset we designed, so make sure you’ve gone over the bitwise operators. Once we’re on the same page, we can start getting into this hash function in order to send requests using Twitter’s API so that you can retrieve you beautiful, crispy hash brown data in return. The Fuss
HMAC-SHA1 is the type of procedure we’re trying to reproduce from the Wikipedia article available here. I didn’t care much for it. It didn’t explain what it is, what it was really doing, or why we should care at a simple level so that someone less savvy (like me) could understand just what the fuss is all about. Both HMAC and SHA1 are procedures. HMAC stands for Hash Message Authentication Code and is the procedure we use to combine a message with a secret key whereas SHA-1 stands for Security Hash Algorithm 1, which was made by the NSA and is used to break a message into a corresponding hash string with a fixed block size. Together, they make sure that the integrity of the data hasn’t been compromised and that you’re not sending your raw secret message over the internet…series of tubes. The nice thing about the SHA-1 algorithm is that very slight changes, say replacing a single character in the message, dramatically changes the resulting hash so that it’s difficult to crack (though apparently not secure enough for today’s standards). Remember, HMAC is the main procedure to combine a secret key with the message and SHA-1 is just the type of hash function implemented; but you probably don’t care that much and just want to see the code. So let’s get started!
The SHA-1
SHA-1 has a lot of destructive procedures, a lot of breaking bits into smaller bits, and a lot of the previously developed bitwise functions. Let’s see a quick example of this in action.
Say I have the string phrase "The Ore-Ida brand is a syllabic abbreviation of Oregon and Idaho" and I want to run the SHA-1 Function on it. mata: sha1("The Ore-Ida brand is a syllabic abbreviation of Oregon and Idaho") The results is (hex): 156a5e19b6301e43794afc5e5aff0584e25bfbe7 In Base64: FWpeGbYwHkN5SvxeWv8FhOJb++c=
Good luck figuring out the original string from the Base64 encoding. Now remember, this is not a post about theory or reasoning behind the SHA-1 procedure; this post is about making it work. Therefore, the advanced Stata user who wishes to replicate/improve this code might find this next section interesting. Here are some of my observations regarding repackaging the SHA-1 function.
The HMAC
Compared to SHA-1, the HMAC procedure is a walk in the potato fields… it’s easy as potatoes… I’m running out of references here. Did you know that the potato was the first vegetable to be grown in space? If astronauts could do that with potatoes, we can certainly make HMAC-SHA1 work with Stata. There’s really just a three step process at play. The HMAC procedure takes two inputs: a key and a message.
The Conclusion
Well, this post (and the previous one) has been filled with a little more technical jargon than I planned. I like to keep these posts fun and poignant, but this material is for the dedicated and serves as a reference for those who want to expand Stata’s capabilities (even if it takes a little longer than expected). This new Mata function allows Stata to perform the HMAC-SHA1 procedure which is vital for enabling Twitter requests through Stata so that the entire process can be contained within one do-file. Here’s an example of the function in action:
. mata: hmac_sha1("Secret Key", "Message to be sent") d5052c13e868ea7c932be9279752e9e67c8195bd . mata: hmac_sha1("Secret Key", "Message to be Sent") f67f5f90132583de85abf0d61fed2a2144be1f04 You can see how the examples show that slight changes in the message dramatically change the output. Feel free to download the process below. All subroutines are included for your convenience. Good luck! ![]()
Why Care Block: Happy New Year! We're starting 2016 with a bang, including this three part series on acquiring Fitbit data through Fitbit’s API services. 2015 was a great year and marked the start of this website, and I can guarantee that there will be much more content to come. See the new "Why Care Block"? It’s inspired by a Holiday conversation with family and gives a brief synopsis of why we’re even doing this in the first place. With that, let's talk about the topic of the day: Base64 encoding. Base64 encoding exists for some privacy purpose. That's about all I want to say about Base64 encoding. Let's just look at an example, they’re a lot cooler. Say for instance we have the phrase: “This is kinda boring…” By converting every character to ASCII bytes (and by potentially adding padding), converting each byte to base 2, taking the 8-bit representation of the binary character and convert it to a six bits, converting that back to base 10, and using those numbers (they fall between 0 and 63) to look up the corresponding character of the alphabet plus two other characters, we finally arrive at the following encoded string: "VGhpcyBpcyBraW5kYSBib3JpbmcuLi4=" While they mean the same thing, I think the first phrase was a little more accurate. We can encode this second phrase a second time and it gives us: "VkdocGN5QnBjeUJyYVc1a1lTQmliM0pwYm1jdUxpND0=" Now you can easily send encoded messages to your friend (not plural – I can't imagine that many people would tolerate this type of behavior). I’ve put together a Stata package containing these routines; however, no help file is available upon the first release. Here is a brief overview of the new commands, encode64 and decode64 at work. We first called encode64 on the string "client_id:123456789" to encode this value in Base64. The result shows under the command, but is also callable through the macro r(base64). We create a local macro that contains this encoded value ("Y2xpZW50X2lkOjEyMzQ1Njc4OQ==") and decode this message to get back to our original string. It's a lot less fun than my old Nintendo 64, but trust me, the results will be extremely rewarding! Feel free to download the ado file below. ![]()
|
AuthorWill Matsuoka is the creator of W=M/Stata - he likes creativity and simplicity, taking pictures of food, competition, and anything that can be analyzed. Archives
July 2016
Categories
All
|