William Matsuoka
  • Introduction
  • Topics
  • Stata
  • Teaching
    • ECON 641L
    • ECON 640L >
      • Econ-Data
  • Blog
  • About

W=M/Stata

Welcome

HMAC-SHA1 in Stata

2/10/2016

5 Comments

 
Why Care Block?
Security and the Twitter API – You need this if you want Twitter to work with Stata​
I applaud all of you who have made it this far, but unfortunately things are going to get super boring!  Well, besides applause I’d also like to take a sentence to thank you for reading this content: whether you’re reading this out of boredom, coercion, to make fun of it, or you just have a general interest, I’d like to extend my gratitude.

Switching back to the good stuff, today we’re talking about hash functions.  Think hash browns *delicious* - they’re sliced, diced, and fried to a golden brown; drenched or drizzled with hot sauce or ketchup (I prefer Cholula and Tabasco) and resembles nothing like the raw potato it once was.  Now, take your hash browns and try to reconstruct the original potato.  “No!” you’d say, “That’s really hard.”  Keep in mind this post is about security, from a non-cryptographic scientist or security expert, so anticipate a lot more potato metaphors because that’s about all I know about security.
​
This post is really about making the Twitter API work within Stata without any outside plugins or packages.  To do so, we must find a way to recreate the HMAC-SHA1 algorithm which is quite difficult without our previous toolset we designed, so make sure you’ve gone over the bitwise operators.  Once we’re on the same page, we can start getting into this hash function in order to send requests using Twitter’s API so that you can retrieve you beautiful, crispy hash brown data in return.

The Fuss

​HMAC-SHA1 is the type of procedure we’re trying to reproduce from the Wikipedia article available here.  I didn’t care much for it.  It didn’t explain what it is, what it was really doing, or why we should care at a simple level so that someone less savvy (like me) could understand just what the fuss is all about.  Both HMAC and SHA1 are procedures.  HMAC stands for Hash Message Authentication Code and is the procedure we use to combine a message with a secret key whereas SHA-1 stands for Security Hash Algorithm 1, which was made by the NSA and is used to break a message into a corresponding hash string with a fixed block size.  Together, they make sure that the integrity of the data hasn’t been compromised and that you’re not sending your raw secret message over the internet…series of tubes.  The nice thing about the SHA-1 algorithm is that very slight changes, say replacing a single character in the message, dramatically changes the resulting hash so that it’s difficult to crack (though apparently not secure enough for today’s standards).  Remember, HMAC is the main procedure to combine a secret key with the message and SHA-1 is just the type of hash function implemented; but you probably don’t care that much and just want to see the code.  So let’s get started!

The SHA-1

SHA-1 has a lot of destructive procedures, a lot of breaking bits into smaller bits, and a lot of the previously developed bitwise functions.  Let’s see a quick example of this in action.
Say I have the string phrase "The Ore-Ida brand is a syllabic abbreviation of Oregon and Idaho" and I want to run the SHA-1 Function on it.
mata: sha1("The Ore-Ida brand is a syllabic abbreviation of Oregon and Idaho") 
The results is (hex): 156a5e19b6301e43794afc5e5aff0584e25bfbe7
In Base64: FWpeGbYwHkN5SvxeWv8FhOJb++c=
​
​Good luck figuring out the original string from the Base64 encoding.  Now remember, this is not a post about theory or reasoning behind the SHA-1 procedure; this post is about making it work.  Therefore, the advanced Stata user who wishes to replicate/improve this code might find this next section interesting.  Here are some of my observations regarding repackaging the SHA-1 function.
Picture
  • Mata is incredibly fast, but much slower than the C-code equivalent due to all bits being interpreted as strings and converted back to bytes.
  • All bytes are subject to integer overflow – if adding two base 2 integers exceeds 2^32, the number acts as an odometer, reverting to zero before counting up again.
  • While the Wikipedia makes a lot of reference to the “big-endian” integer, the resulting link has a lot of information we might not care about.  For our purposes, “big-endian” means that it has left-padded zeros. 
    Say we needed the number 55 to be “big-endian” 16-bits.  Well, 55 is 110111 in base 2 and, since this is only 6 numbers, we’d need it to have 10 preceding zeros.
  • If your integer padded message length final message is greater than 512 bits, you need to break it out into as many 512-bit sections as necessary to only have 512-bit chunks.
  • Initialize your “h” variables once (this should be a given).  Each time we feed a block through our chunk-making process, overflow-add the “h” variables to the previous values until you run out of blocks.

The HMAC

Compared to SHA-1, the HMAC procedure is a walk in the potato fields… it’s easy as potatoes… I’m running out of references here.  Did you know that the potato was the first vegetable to be grown in space?  If astronauts could do that with potatoes, we can certainly make HMAC-SHA1 work with Stata.  There’s really just a three step process at play.  The HMAC procedure takes two inputs: a key and a message.

  1. If the key is longer than the blocksize (64 in this case) run the SHA-1 function on the key; if it’s shorter, left pad the key with 0x00 (- char(0) -); if it’s the same length, just keep the key.
  2. Create the outer key pad (64 * “\”) and the inner key pad (64 * “6”), convert these to their base 2 representation, and use the exclusive or bitwise operator on the key pad and the key.
  3. Concatenate the inner key pad with your message (inner key pad first) and run the SHA-1 function on the resulting padded message.  Take the outer key pad and concatenate the result of your first SHA-1 with the outer key pad (outer key pad first) and run the SHA-1 function on this step.  Now we’re technically done!

The Conclusion

​Well, this post (and the previous one) has been filled with a little more technical jargon than I planned.  I like to keep these posts fun and poignant, but this material is for the dedicated and serves as a reference for those who want to expand Stata’s capabilities (even if it takes a little longer than expected).  This new Mata function allows Stata to perform the HMAC-SHA1 procedure which is vital for enabling Twitter requests through Stata so that the entire process can be contained within one do-file.  Here’s an example of the function in action:
. mata: hmac_sha1("Secret Key", "Message to be sent")
  d5052c13e868ea7c932be9279752e9e67c8195bd
. mata: hmac_sha1("Secret Key", "Message to be Sent")
  f67f5f90132583de85abf0d61fed2a2144be1f04

You can see how the examples show that slight changes in the message dramatically change the output.  Feel free to download the process below.  All subroutines are included for your convenience.  Good luck!
hmac-sha1.do
File Size: 6 kb
File Type: do
Download File

5 Comments

    Author

    Will Matsuoka is the creator of W=M/Stata - he likes creativity and simplicity, taking pictures of food, competition, and anything that can be analyzed.

    For more information about this site, check out the teaser above!

    Archives

    July 2016
    June 2016
    March 2016
    February 2016
    January 2016
    December 2015
    November 2015
    October 2015
    September 2015

    Categories

    All
    3ds Max
    Adobe
    API
    Base16
    Base2
    Base64
    Binary
    Bitmap
    Color
    Crawldir
    Email
    Encryption
    Excel
    Exif
    File
    Fileread
    Filewrite
    Fitbit
    Formulas
    Gcmap
    GIMP
    GIS
    Google
    History
    JavaScript
    Location
    Maps
    Mata
    Music
    NFL
    Numtobase26
    Parsing
    Pictures
    Plugins
    Privacy
    Putexcel
    Summary
    Taylor Swift
    Twitter
    Vbscript
    Work
    Xlsx
    XML

    RSS Feed

Proudly powered by Weebly
  • Introduction
  • Topics
  • Stata
  • Teaching
    • ECON 641L
    • ECON 640L >
      • Econ-Data
  • Blog
  • About