Category: Binary

Stata and the Twitter API (Part II)

3/4/2016

OAuth

Why Care Block:
You can Tweet things in an automatic fashion directly from Stata

Oh man, things are about to get real juicy – this is the final reveal of the Twitter API and I couldn’t think of a better time to do this. After all, we’ve made a complete mess out of Stata with our unruly bitwise operators, unending computer security algorithms, and un-um… unfortunate time-based C-plugins. Well, my Stata friends, this is how to do anything with Stata and the Twitter API.

Let’s post! Seriously, let’s tweet about things using Stata. This method can be used not only to tweet, but also to send direct messages, retweet things, and like (favorite) others’ tweets. If you’re like me, you don’t really use Twitter that well and I’m sure my 40 followers (I rounded up) would agree. But wouldn’t it be great to find a way to reach thousands of Twitter followers based off of hashtags or messages or even emoji? If the answer is no, fine, that’s my best attempt at convincing you but still feel free to read on.

Let’s bring together everything we need to send a Tweet out on the internet:

Stata: Check
cURL: Check
Our Library: Check

And what does our library now consist of? (FYI: there’s some new stuff here)

Base64 Encoding

Easy way to turn readable stuff into unreadable stuff for humans but it’s not so bad for computers. This takes all alphanumeric characters plus the plus and the “/” (which equals 64 characters to choose from), converts 8-bit chunks into 6-bit chunks which end up receiving the specific character. Sometimes equal signs are added, but you already know that.

See post on Base64 Encoding

Bitwise Operators and Security Methods

We’ve got bits dressed to the nines: shifting operators, rotating operators, logical operators, overflow methods, and all kinds of methods to translate from base 2 to base 16 to base 10.

These feed into our HMAC-SHA1 algorithm, which was awful to develop… this seriously has to go in the “lots of effort, little reward” category.

See post on Bitwise Operators and HMAC-SHA1

Stata Plugins

We needed time - I mean we all do – in seconds since 1970 based off of Greenwich Time. We could have just accounted for our respective time zones (unless you’re located in GMT+0 of course), but why not use this as a great learning opportunity?

See post on Plugins

Nonce

Once? None? Scone? What’s a nonce? Simply put, it’s a random 32 character word containing all alphanumeric symbols. Twitter’s example shows this being Base64 encoded, but all it has to be is a random string. vqib2ve89dOnTusESAS26Ceu9TcUES2i – see? Easy, just make sure you change the characters each time.

Percent Encoding

What%27s%20percent%20encoding%3F

Based on the UTF-8 URL Encoding standards, we need to replace certain characters that would otherwise cause complications in a URL. Don’t worry though, we’ve got a function for that. Just know that valid URL characters are all letters and numbers as well as the dash, the period, the underscore, and the tilde.

twitter-programs-lib.do
File Size:	10 kb
File Type:	do

Download File

Send Out the Tweets

Twitter has some fabulous documentation for their API so it’s fairly easy to find the method that you want to do. These processes can be generalized and used across many of these methods so we are by no means limited to just tweeting. For tweets (hereon referred to as status updates to be consistent with Twitter’s documentation), we use the base or “resource” URL:

https://api.twitter.com/1.1/statuses/update.json

Remember, status updates can only be 140 characters and Twitter will actually shorten URLs for you, but still keep in mind your character limits!

All statuses must be percent encoded, so let’s go ahead and create our status local; we’ll also store our base URL while we’re at it.

local b_url "https://api.twitter.com/1.1/statuses/update.json"
local status = "Sent out this tweet using #Stata and the #Twitter #API – http://www.wmatsuoka.com/stata/stata-and-the-twitter-api-part-ii tells you how!" 
mata: st_local("status", percentencode("`status'"))

We then take all of our application information: consumer key, consumer secret, access token, and access secret and store these as locals as well.

local cons_key "ZAPp9dO37PnzCofN2Nm8n8kye"
local cons_sec "kfbtARFpBIdb515iaS48kYZjWhLoIdbEiAINDVX0c3W3e0fgWe"
local accs_key "1234567890-YFklDWGuvSIGLYPMnAfOZgLgLsMXjKIHqaIr1F5"
local accs_sec "LYvHWfTS6LXjtDPVXchs6dXUG52l41j4HmicYwjr8aStw"

For this section, we will be creating all of our necessary authentication variables. The first step to creating our secret signing key is to make a string that contains our consumer secret key, concatenated with an ampersand, concatenated with our access secret key. We’ll also include all the necessary locals we talked about earlier.

local s_key    = "`cons_sec'&`accs_sec'"
mata: st_local("nonce", gen_nonce_32())
local sig_meth = "HMAC-SHA1"
// If you don't make a plugin, just make sure you do seconds since 1970
// probably using c(current_date) and c(current_time) - [your time zone]
plugin call st_utm
local ts = (clock(substr("`utm_time'", 4, .), "MDhmsY") - clock("1970", "Y"))/1000

Now it’s time to create our signature. We start by percent encoding our base URL. For our signature string, we include the following categories:

Oauth_consumer_key - `cons_key’
Oauth_nonce - `nonce’
Oauth_signature_method – “HMAC-SHA1”
Oauth_timestamp - `ts’
Oauth_token - `accs_token’
Oauth_version – “1.0”
Status – Whatever we want our status to be (percent encoded of course)

Note how they’re in alphabetical order by category – it is very important that you organize your string in this order. This is how it looks:

local sig = "oauth_consumer_key=`cons_key'&oauth_nonce=`nonce'&oauth_signature_method=`sig_meth'&oauth_timestamp=`ts'&oauth_token=`accs_tok'&oauth_version=1.0&status=`status'"

Then, percent encode the signature string and base URL:

mata: st_local("pe", percentencode("`b_url'"))
mata: st_local("pe_sig", percentencode("`sig'"))

This next step is why we spent an ungodly amount of time on bits and hashes! We need to transform our signature string into a Base64 encoding of the HMAC-SHA1 hash result with our message being the percent-encoded signature base hashed with our secret key. Sorry if that's a mouthful, but we can see what that translates to below.

local sig_base = "POST&`pe'&"
mata: x=sha1toascii(hmac_sha1("`s_key'", "`sig_base'`pe_sig'"))
mata: st_local("sig", percentencode(encode64(x)))

Finally, we’re almost done with Twitter and can move on to other more important things! First let’s make sure to post our Tweet because, after all, that's what we're here for.

!curl -k --request "POST" "`b_url'" --data "status=`status'" --header "Authorization: OAuth oauth_consumer_key="`cons_key'", oauth_nonce="`nonce'", oauth_signature="`sig'", oauth_signature_method="`sig_meth'", oauth_timestamp="`ts'", oauth_token="`accs_tok'", oauth_version="1.0"" --verbose

Now we can marvel at how much time we spent learning about something 99% of us don’t really care about. This is for you, 1% - this is all for you.

Sent out this tweet using #Stata and the #Twitter #API – https://t.co/iX3kBd2VFu tells you how!
— William Matsuoka (@WilliamMatsuoka) March 4, 2016

PS: Want to add emoji? Go to this site, find the URL Escape Code sections, and add the result to your status string!

Ex: %F0%9F%93%88 = Chart with Upward Trend

3 Comments

Building an API Library

2/5/2016

2 Comments

Bitwise Operators (aka Bits, Bits, Bits Part II)

Did you know that Mata supports bitwise operators? Well, it actually doesn't – in the typical sense. But that won't stop us from making it work. You see, Mata can handle data extremely well, and with a little finesse, can be forced to do things it wasn't really made to do. Yes it's going to be slow, and yes it's probably not very useful to the average user, but let me try to convince you how great using Mata really is!

For those who don't know, Mata is a lower level language than Stata – many of Stata's complex functions are actually written in Mata because it's really quite fast. Mata mimics a lot of C's syntax, but also simplifies things so you don't feel like you have to explicitly declare everything. In previous posts we've exploited the power of the - inbase() - function and we will make ample use of that today.

Say we have a text file containing the word "Chunk". While we see a word, the computer sees numbers which correspond to each letter – otherwise known as ASCII. Mata's - ascii() - function can help us find this representation:

This is a simple way of converting our text into numbers, but how about into bytes? Typically, bytes are displayed in base 16:

But now we want to see each bit. Remember, I'm not a computer scientist, so you can trust me when I say this really isn't all that bad for those of you who haven't been exposed to this stuff. Just know that each of these bytes contains 8 bits. Each bit can either be on or off (1 or 0) which means that there’s a total possible bit combination of 2^8 = 256 per byte. Let's look at what Stata shows us when we look at everything in base 2:

Notice, I added a zero and a one at the end of the text string "chunk" for illustrative purposes. Why are these values not 0 and 1 respectively? That’s because the digits are also ASCII characters (digits 48 and 49). We can get the values of zero and one by using the - char() - function. For fun, we'll also look at values two and three as well.

Well, because there are technically 8 bits per byte, we need to pad each output with zeros so that the total length is 8 bits. For example: the value "3" can be written as "11" in base 2, but is the same "00000011" so that we can imagine all 8 bits. We can easily accomplish this in matrix form.

Notice the use of the colon operator? It's by far one of my favorite operands (not that I have that many) because it does the same operation on each element of the matrix, which makes the overall statement extremely succinct! The statement above just says: "Give me some zeros, exactly 8 minus how every many numbers we had, and append the original statement to the end to make sure every element has exactly 8 digits."

We should probably make this into a function, since we’ll use it a lot. So let's make the size of the padding an input as well.

mata:
string matrix padbit(string matrix x, real scalar padnum)
{
        string matrix y
        y = "0" :* (padnum :- strlen(x)) :+ x
        return (y)
}
end
mata: padbit(chunk, 8)

Logical Operators

And, Or, Not, Xor – These are easy to implement. You can think of these functions as truth tables if you want, but really, you're just comparing each bit to the respective bit of another byte.

For example: I have two bytes (they need to be of the same length): 00111100 and 10101010. If I were to use the And operator, for the result to be true (1), the first bit of Byte 1 must be the same number as the corresponding bit in Byte 2 (=). The following illustration helps demonstrate this idea:

The Or operator implies that either bit could be a 1 for the output to be true:

Not works with only one byte and says to flip all the bits (make all 1's 0's and all 0's 1's):

Finally, Xor (the exclusive or) states that Byte 1 and Byte 2 must have opposite values to be true. So if the first bit of Byte 1 is a zero, the first bit of Byte 2 has to be the opposite for the outcome to be true:

Had enough logic yet? I certainly have... so I won’t go into any of the functions anymore, but I’ll make these available for you to download so you won't have to worry about the logistics:

bitwise-mata-functions.do
File Size:	2 kb
File Type:	do

Download File

Shifts and Rotations

The second thing we need to handle when dealing with bitwise operations are methods to move the bits. There are two main ways to shuffle bits around: the shift and the rotation. While they both move the position of the bits, a shift actually destroys bits and replaces trailing values with zeros, whereas the rotation just moves the shifted value to the opposite side of the bit. Therefore, we can think of the shift as a way to destroy information and a rotate as a way of hiding information. Let’s see this in action:

Note: this animation was made using - twoway scatteri -

It's interesting to follow, though not all that interesting if it doesn't have a practical application. If you're like "honestly, I don’t know nothing about bit functions", I'll say - well I've got the one you. How about a way to easily encrypt your files so that you can hide your sensitive stuff in plain sight? First find a file and read it in:

// Read in File to Copy
fh  = fopen("Building an API Library.docx", "r")
eof = _fseek(fh, 0, 1)
fseek(fh, 0, -1)
x = fread(fh, eof)
fclose(fh)

Convert the file characters to the ASCII decimal value then convert this number into the base 2 representation (don't forget to pad the bits). From here, we can use our bitwise operations before converting this number back to base 10 and finally back to characters:

// Run Bitwise Not
y = padbit(inbase(2, ascii(x)), 8)
for (i=1; i<=cols(y); i++) {
        y[i] = bitnot(y[i])
}
y = char(frombase(2, y))

Lastly, write your newly altered file using Mata. This is obviously very basic, but a great way to hamper nosy coworkers.

// Write the Results to File
fh = fopen("Copy.docx", "w")
fwrite(fh, y)
fclose(fh)

But really, we're doing this with the ultimate goal of gaining access to Twitter data using Twitter's API. This is just step one of a three part series and if you're lost so far. there's plenty of time to see the motivation for this post in the next step. I promise it gets more interesting, yet I hope I've shown some of Mata's potential and how you can easily bring in the concepts to your own do-files.

2 Comments

Just One More Binary Thing (I Promise)

10/26/2015

1 Comment

On writing binary files in Stata/Mata

As a supplement to my most recent posts, I decided to put together a quick guide on writing and reading binary files. Stata has a great manual for this – however, I struggled to see how this works in Mata. I spent a good afternoon scouring Google, Stata Journals, and Statalist to no avail. Little did I know, I was looking in the wrong places and for the wrong commands. It wasn’t until I broke out my old Mata 12 Mata Reference guide that I realized the solutions lie not with fopen(), but with bufio() (and yes, bufio() is referenced in fopen() - always check your references)

We start by making up a fake type of file called a wmm file. This file always begins with the hex representation ff00, which we know just means 255-0 in decimal or 11111111 00000000 in binary. The next 20 characters spell out “Will M Matsuoka File” followed by a single byte containing the byte order or 00000001 in binary. From there, the next four bytes contains the location of our data as we put a huge buffer of zeros before any meaningful data. It makes sense to skip all of these zeros if we know we don’t need to ever use them. After these zeros, we’ll store the value of pi and end the files with ffff.

The file looks like this:

Stata's File Command

tempname fh

file open `fh' using testfile.wmm, replace write binary

file write `fh' %1bu (255) %1bu (0)
file write `fh' %20s "Will M Matsuoka File"

file set `fh' byteorder 1
file write `fh' %1bu (1)

* offset 200
file write `fh' %4bu (200)

forvalues i = 1/200 {
        file write `fh' %1bs (0)
}

file write `fh' %8z (c(pi))
file write `fh' %2bu (255) %2bu (255)

file close `fh'

The only thing I feel I need to note here is the binary option under file open. Other than that, take note that we’re setting the byteorder to 1. This is a good solution to writing binary files; however, since most of my functions are in Mata, we might as well figure out how to do this in Mata as well.

Mata's File Command

If you didn’t know, Mata’s fopen() command is very similar to Stata’s file commands, with a few slight differences that we won't touch on here. Just know, it's pretty awesome and don't forget the mata: command!

mata:
fh = fopen("testfile-fwrite.wmm", "w")

fwrite(fh, char((255, 0)))
fwrite(fh, "Will M Matsuoka File")

// We know that the byte order must be 1
fwrite(fh, char(1))

fwrite(fh, char(0)+char(0)+char(0)+char(200))
fwrite(fh, char(0)*200)

fwrite(fh, char(64) + char(9) + char(33) + char(251) + 
        char(84) + char(68) + char(45) + char(24))

fwrite(fh, char((0,255))*2)
fclose(fh)
end

I personally like the aesthetic of this syntax; it’s clean, neat, and relatively simple. The only problem is its ability to handle bytes. In short, it doesn’t do it at all. We’d have to build some more functions in order to accomplish this task (especially when it comes to storing double floating points) which is why Mata also has a full suite of buffered I/O commands. It’s a little more complicated, but well worth it. After all, we cheated in converting pi to a double floating storage point by using what we wrote in the previous command. This is not a good practice.

Mata's Buffered I/O Command

Let's get right to it

mata:
fh = fopen("testfile3-bufio.wmm", "w")

C = bufio()
bufbyteorder(C, 1)

fbufput(C, fh, "%1bu", (255, 0))
fbufput(C, fh, "%20s", "Will M Matsuoka File")

// We know that the byte order must be 1
fbufput(C, fh, "%1bu", bufbyteorder(C))

fbufput(C, fh, "%4bu", 200)
fbufput(C, fh, "%1bu", J(1, 200, 0))

fbufput(C, fh, "%8z", pi())

fbufput(C, fh, "%2bu", (255, 255))
fclose(fh)
end

The one distinction here is the use of the bufio() function. It creates a column vector containing the information of the byte order and Stata’s version, but allows us to use a range of binary formats available to use in Stata’s file write commands.

Reading the Files Back

Now that we’ve written three files, which (in theory) should be identical, let’s create a Mata function that reads the contents stored in these files. Note: it should return the value of pi in all three cases. As it turns out, they all do.

mata:
void read_wmm(string scalar filename)
{
        fh = fopen(filename, "r")
        C = bufio()

        fbufget(C, fh, "%1bu", 2)
        if (fbufget(C, fh, "%20s")!="Will M Matsuoka File") {
                errprintf("Not a proper wmm file")
                fclose(fh)
                exit(610)
        }
        
        bufbyteorder(C, fbufget(C, fh, "%1bu"))

        offset = fbufget(C, fh, "%4bu")
        fseek(fh, offset, 0)

        fbufget(C, fh, "%8z")

        fclose(fh)
}

read_wmm("testfile-fwrite.wmm")
read_wmm("testfile.wmm")
read_wmm("testfile3-bufio.wmm")
end

And there you have it, a bunch of different ways to do the same thing. While I enjoy using Mata’s file handling commands for its simplicity, it does get a little cumbersome when writing integers longer than 1 byte at a time. Time to start making your own secret file formats and mining data from others.

1 Comment

Bits, Bits, Bits

10/5/2015

1 Comment

This post is one of my white whales – the problem that has eluded me for far too long and drove me to the edge of insanity. I’m talking about writing binary. Now, let me be clear when I say that I am far from a computer scientist: I don’t think in base 16, I don’t dream in assembly code, I don’t limit my outcomes to zeros and ones. I do, however, digest material before writing about it, seek creative and efficient solutions to problems, and do my best to share this information with others (that’s where you come in).

First, let’s create a fake dataset with help from Nick Cox’s egenmore ssc command:

set obs 300
forvalues i = 1/200 {
        gen x`i' = round(runiform()*50*_n)
}

gen id = _n
reshape long x, i(id) j(vars)
egen count = xtile(x), nq(30)
keep id vars count

Today we will be making a bitmap of a map for Fitbit activities by writing bits of binned colors in a binary file. Alliteration aside, this post pulls from various sources and is intended to cover a great deal of topics that might be foreign to the average Stata user – in other words, hold on tight! It’s going to be a bumpy bitmap ride as we cover three major topics: (1) Color Theory, (2) Bitmap Structures, and (3) Writing Binary Files using Stata.

Color Theory

In creating gradients, it was recommended to me to use a Hue, Saturation, Value (HSV) linear interpolation rather than the Red, Green, Blue (RGB) interpolation because it looks more “natural.” I will not argue this point, as I know nothing about it. For me, I know that if I play with the sliders in Photoshop, it automatically changes the numbers and I never have to think about what it’s actually doing in the conversion. In order to convert from RGB to HSV and vice versa, I used the equations provided here – to learn about what’s going on, Wikipedia has a great article on the HSV cones!

Bitmap Structures

Uncompressed bitmaps are fairly easy to understand once you get the hang of them – they’re even easier to reverse engineer. In MS Paint, simply create a two pixel 24-bit bitmap and save the resulting picture. It’s a work of art, I know.

From here, run a hexdump on the file in Stata (see below image).

For simplicity’s sake, the only things we'll need to change are the numbers of rows and columns in the bitmap header.

As for the body, there are three rules we need to keep in mind here:

The picture starts in the lower left-hand corner, reading right to left. If you forget to do this, your picture will be vertically flipped.
The bytes are written in the opposite order for colors: instead of writing in RGB, you need to write in BGR.
All “columns” need to be divisible by four. You need to add zeros as a buffer until your column is divisible by four.

The final step is to make sure that the rows and columns are written in correct order. I did not find an efficient way to do this step, so input is always appreciated. We can see that the box labeled “Columns” is pretty straight forward when it’s a small number under 256 pixels. What happens when it exceeds 256 pixels? We have to write it in reverse order! For example: we will use Mata’s inbase() command to convert a theoretical picture's width of 500 pixels to base 16. The result?

. mata: inbase(16, 500)
  1f4

Let’s add a zero in front of that to get 01f4 as our width. Once again, we must reverse this order; therefore, our real values to write are f4 and 01. Converting these values using frombase() yields 244 and 1 respectively. These are the bytes we’ll end up writing in the next section.

Writing Binary

First, please visit Phil Ender’s website on writing and reading binary data as it heavily influenced this sections, and also because Phil is a great guy.

My code broke out this task in two sections: writing the header and writing the body:

The Header
The header is fairly straight forward: just copy and paste the hexdump from before (converting from base 16 to base 10) or by reading in the file byte by byte.

file open myfile using testgrad.bmp, write replace binary
file write myfile %1b (66) %1b (77) %1b (70) %1b (0) %1b (0) %1b (0)
file write myfile %1b (0) %1b (0) %1b (0) %1b (0) %1b (54)
file write myfile %1b (0) %1b (0) %1b (0) %1b (40)
file write myfile %1b (0) %1b (0) %1b (0)

mata: bitmap_rowcol(bitmap_size(200), bitmap_size(300))
file write myfile %1bu (`c1') %1bu (`c2') %1bu (0) %1bu (0)
file write myfile %1bu (`r1') %1bu (`r2') %1bu (0) %1bu (0)

file write myfile %1b (1) %1b (0) %1b (24) 
file write myfile %1b (0) %1b (0) %1b (0) %1b (0) %1b (0) %1b (16)
forvalues i = 1/19 {
        file write myfile %1b (0)
}

Notice the order of the column and row variables. I create these values using the separate function bitmap_rowcol() to deal with the problem mentioned earlier. Think of `c1’ as 244 and `c2’ as 1, and `r1’ as 2c and `r2’ as 1 for a width of 500 pixels and a height of 300 pixels with rules according to our previous analysis.

The Body
From there, we call bitmap_body in mata and close our file (cols is 200 in my example)::

mata: bitmap_body(${cols}, X, buff)
file close myfile

This is all great, but I’m not going to lie, it means absolutely nothing to me without seeing the final result. So here it is:

HSV Gradient with a 5 px Gaussian Blur

RGB Gradient with a 5 px Gaussian Blur

This can now produce textures for our Fitbit elevation map in the next series quickly, and effectively, all within Stata. The entire file is available here for those who are willing to put up with some messy code:

bitmap.do
File Size:	4 kb
File Type:	do

Download File

1 Comment

W=M/Stata

Welcome

Composer - Writing Music in Stata

Stata and the Twitter API (Part II)

OAuth

Base64 Encoding

Bitwise Operators and Security Methods

Stata Plugins

Nonce

Percent Encoding

Send Out the Tweets

Building an API Library

Bitwise Operators (aka Bits, Bits, Bits Part II)

Logical Operators

Shifts and Rotations

Just One More Binary Thing (I Promise)

On writing binary files in Stata/Mata

Stata's File Command

Mata's File Command

Mata's Buffered I/O Command

Reading the Files Back

Bits, Bits, Bits

Color Theory

Bitmap Structures

Writing Binary

Author

Archives

Categories

W=M/StataWelcome

Composer - Writing Music in Stata

Stata and the Twitter API (Part II)

OAuth

Base64 Encoding

​Bitwise Operators and Security Methods

Stata Plugins

​Nonce

Percent Encoding

​Send Out the Tweets

Building an API Library

Bitwise Operators (aka Bits, Bits, Bits Part II)

Logical Operators

Shifts and Rotations

Just One More Binary Thing (I Promise)

On writing binary files in Stata/Mata

Stata's File Command

Mata's File Command

Mata's Buffered I/O Command

Reading the Files Back

Bits, Bits, B﻿its

Color Theory​

​Bitmap Structures

Writing Binary

Author

Archives

Categories

W=M/Stata

Welcome

Bitwise Operators and Security Methods

Nonce

Send Out the Tweets

Bits, Bits, Bits

Color Theory

Bitmap Structures