William Matsuoka
  • Introduction
  • Topics
  • Stata
  • Teaching
    • ECON 641L
    • ECON 640L >
      • Econ-Data
  • Blog
  • About

W=M/Stata

Welcome

Building an API Library

2/5/2016

3 Comments

 

Bitwise Operators (aka Bits, Bits, Bits Part II)

Did you know that Mata supports bitwise operators?  Well, it actually doesn't – in the typical sense.  But that won't stop us from making it work.  You see, Mata can handle data extremely well, and with a little finesse, can be forced to do things it wasn't really made to do.  Yes it's going to be slow, and yes it's probably not very useful to the average user, but let me try to convince you how great using Mata really is!

For those who don't know, Mata is a lower level language than Stata – many of Stata's complex functions are actually written in Mata because it's really quite fast.  Mata mimics a lot of C's syntax, but also simplifies things so you don't feel like you have to explicitly declare everything.  In previous posts we've exploited the power of the - inbase() - function and we will make ample use of that today.

Say we have a text file containing the word "Chunk".  While we see a word, the computer sees numbers which correspond to each letter – otherwise known as ASCII.  Mata's - ascii() - function can help us find this representation:
Picture
This is a simple way of converting our text into numbers, but how about into bytes?  Typically, bytes are displayed in base 16:
Picture
But now we want to see each bit.  Remember, I'm not a computer scientist, so you can trust me when I say this really isn't all that bad for those of you who haven't been exposed to this stuff.  Just know that each of these bytes contains 8 bits.  Each bit can either be on or off (1 or 0) which means that there’s a total possible bit combination of 2^8 = 256 per byte.  Let's look at what Stata shows us when we look at everything in base 2:
Picture
Notice, I added a zero and a one at the end of the text string "chunk" for illustrative purposes.  Why are these values not 0 and 1 respectively?  That’s because the digits are also ASCII characters (digits 48 and 49).  We can get the values of zero and one by using the - char() - function.  For fun, we'll also look at values two and three as well.
Picture
​Well, because there are technically 8 bits per byte, we need to pad each output with zeros so that the total length is 8 bits.  For example: the value "3" can be written as "11" in base 2, but is the same "00000011" so that we can imagine all 8 bits.  We can easily accomplish this in matrix form.
Picture
Notice the use of the colon operator?  It's by far one of my favorite operands (not that I have that many) because it does the same operation on each element of the matrix, which makes the overall statement extremely succinct!  The statement above just says: "Give me some zeros, exactly 8 minus how every many numbers we had, and append the original statement to the end to make sure every element has exactly 8 digits."

We should probably make this into a function, since we’ll use it a lot.  So let's make the size of the padding an input as well.
mata:
string matrix padbit(string matrix x, real scalar padnum)
{
        string matrix y
        y = "0" :* (padnum :- strlen(x)) :+ x
        return (y)
}
end
mata: padbit(chunk, 8)


Logical Operators

And, Or, Not, Xor – These are easy to implement.  You can think of these functions as truth tables if you want, but really, you're just comparing each bit to the respective bit of another byte. 

For example: I have two bytes (they need to be of the same length): 00111100 and 10101010.  If I were to use the And operator, for the result to be true (1), the first bit of Byte 1 must be the same number as the corresponding bit in Byte 2 (=).  The following illustration helps demonstrate this idea:
Picture
The Or operator implies that either bit could be a 1 for the output to be true:
Picture
Not works with only one byte and says to flip all the bits (make all 1's 0's and all 0's 1's):
Picture
Finally, Xor (the exclusive or) states that Byte 1 and Byte 2 must have opposite values to be true.  So if the first bit of Byte 1 is a zero, the first bit of Byte 2 has to be the opposite for the outcome to be true:
Picture
Had enough logic yet?  I certainly have... so I won’t go into any of the functions anymore, but I’ll make these available for you to download so you won't have to worry about the logistics:
bitwise-mata-functions.do
File Size: 2 kb
File Type: do
Download File


Shifts and Rotations

The second thing we need to handle when dealing with bitwise operations are methods to move the bits.  There are two main ways to shuffle bits around: the shift and the rotation.  While they both move the position of the bits, a shift actually destroys bits and replaces trailing values with zeros, whereas the rotation just moves the shifted value to the opposite side of the bit.  Therefore, we can think of the shift as a way to destroy information and a rotate as a way of hiding information.  Let’s see this in action:
Picture
Note: this animation was made using - twoway scatteri -

It's interesting to follow, though not all that interesting if it doesn't have a practical application.  If you're like "honestly, I don’t know nothing about bit functions", I'll say - well I've got the one you.  How about a way to easily encrypt your files so that you can hide your sensitive stuff in plain sight?  First find a file and read it in:
// Read in File to Copy
fh  = fopen("Building an API Library.docx", "r")
eof = _fseek(fh, 0, 1)
fseek(fh, 0, -1)
x = fread(fh, eof)
fclose(fh)

​Convert the file characters to the ASCII decimal value then convert this number into the base 2 representation (don't forget to pad the bits).  From here, we can use our bitwise operations before converting this number back to base 10 and finally back to characters:
// Run Bitwise Not
y = padbit(inbase(2, ascii(x)), 8)
for (i=1; i<=cols(y); i++) {
        y[i] = bitnot(y[i])
}
y = char(frombase(2, y))

​Lastly, write your newly altered file using Mata.  This is obviously very basic, but a great way to hamper nosy coworkers.
// Write the Results to File
fh = fopen("Copy.docx", "w")
fwrite(fh, y)
fclose(fh)

​But really, we're doing this with the ultimate goal of gaining access to Twitter data using Twitter's API.  This is just step one of a three part series and if you're lost so far. there's plenty of time to see the motivation for this post in the next step.  I promise it gets more interesting, yet I hope I've shown some of Mata's potential and how you can easily bring in the concepts to your own do-files.
3 Comments
learn more link
7/26/2018 06:46:44 pm

You're struggling really a good way which is easy to understand the library collection as well as other's concepts. I'm excited to visit your blog post and hope you will be touch to post such nicely content. Thank you!

Reply
Tennessee Toy Store link
3/26/2021 07:40:12 pm

This wass a lovely blog post

Reply
Farnborough W4M link
12/30/2024 12:20:40 am

I enjoyed this post, thanks for sharing.

Reply



Leave a Reply.

    Author

    Will Matsuoka is the creator of W=M/Stata - he likes creativity and simplicity, taking pictures of food, competition, and anything that can be analyzed.

    For more information about this site, check out the teaser above!

    Archives

    July 2016
    June 2016
    March 2016
    February 2016
    January 2016
    December 2015
    November 2015
    October 2015
    September 2015

    Categories

    All
    3ds Max
    Adobe
    API
    Base16
    Base2
    Base64
    Binary
    Bitmap
    Color
    Crawldir
    Email
    Encryption
    Excel
    Exif
    File
    Fileread
    Filewrite
    Fitbit
    Formulas
    Gcmap
    GIMP
    GIS
    Google
    History
    JavaScript
    Location
    Maps
    Mata
    Music
    NFL
    Numtobase26
    Parsing
    Pictures
    Plugins
    Privacy
    Putexcel
    Summary
    Taylor Swift
    Twitter
    Vbscript
    Work
    Xlsx
    XML

    RSS Feed

Proudly powered by Weebly
  • Introduction
  • Topics
  • Stata
  • Teaching
    • ECON 641L
    • ECON 640L >
      • Econ-Data
  • Blog
  • About