William Matsuoka
  • Introduction
  • Topics
  • Stata
  • Teaching
    • ECON 641L
    • ECON 640L >
      • Econ-Data
  • Blog
  • About

W=M/Stata

Welcome

Just One More Binary Thing (I Promise)

10/26/2015

1 Comment

 

On writing binary files in Stata/Mata

As a supplement to my most recent posts, I decided to put together a quick guide on writing and reading binary files.  Stata has a great manual for this – however, I struggled to see how this works in Mata.  I spent a good afternoon scouring Google, Stata Journals, and Statalist to no avail.  Little did I know, I was looking in the wrong places and for the wrong commands.  It wasn’t until I broke out my old Mata 12 Mata Reference guide that I realized the solutions lie not with fopen(), but with bufio() (and yes, bufio() is referenced in fopen() - always check your references)

We start by making up a fake type of file called a wmm file.  This file always begins with the hex representation ff00, which we know just means 255-0 in decimal or 11111111 00000000 in binary.  The next 20 characters spell out “Will M Matsuoka File” followed by a single byte containing the byte order or 00000001 in binary.  From there, the next four bytes contains the location of our data as we put a huge buffer of zeros before any meaningful data.  It makes sense to skip all of these zeros if we know we don’t need to ever use them.  After these zeros, we’ll store the value of pi and end the files with ffff. 

The file looks like this:

Stata's File Command

tempname fh

file open `fh' using testfile.wmm, replace write binary

file write `fh' %1bu (255) %1bu (0)
file write `fh' %20s "Will M Matsuoka File"

file set `fh' byteorder 1
file write `fh' %1bu (1)

* offset 200
file write `fh' %4bu (200)

forvalues i = 1/200 {
        file write `fh' %1bs (0)
}

file write `fh' %8z (c(pi))
file write `fh' %2bu (255) %2bu (255)

file close `fh'

​The only thing I feel I need to note here is the binary option under file open.  Other than that, take note that we’re setting the byteorder to 1.  This is a good solution to writing binary files; however, since most of my functions are in Mata, we might as well figure out how to do this in Mata as well.

Mata's File Command

If you didn’t know, Mata’s fopen() command is very similar to Stata’s file commands, with a few slight differences that we won't touch on here.  Just know, it's pretty awesome and don't forget the mata: command!
mata:
fh = fopen("testfile-fwrite.wmm", "w")

fwrite(fh, char((255, 0)))
fwrite(fh, "Will M Matsuoka File")

// We know that the byte order must be 1
fwrite(fh, char(1))

fwrite(fh, char(0)+char(0)+char(0)+char(200))
fwrite(fh, char(0)*200)

fwrite(fh, char(64) + char(9) + char(33) + char(251) + 
        char(84) + char(68) + char(45) + char(24))

fwrite(fh, char((0,255))*2)
fclose(fh)
end

​I personally like the aesthetic of this syntax; it’s clean, neat, and relatively simple.  The only problem is its ability to handle bytes.  In short, it doesn’t do it at all.  We’d have to build some more functions in order to accomplish this task (especially when it comes to storing double floating points) which is why Mata also has a full suite of buffered I/O commands.  It’s a little more complicated, but well worth it.  After all, we cheated in converting pi to a double floating storage point by using what we wrote in the previous command.  This is not a good practice.

Mata's Buffered I/O Command

Let's get right to it
mata:
fh = fopen("testfile3-bufio.wmm", "w")

C = bufio()
bufbyteorder(C, 1)

fbufput(C, fh, "%1bu", (255, 0))
fbufput(C, fh, "%20s", "Will M Matsuoka File")

// We know that the byte order must be 1
fbufput(C, fh, "%1bu", bufbyteorder(C))

fbufput(C, fh, "%4bu", 200)
fbufput(C, fh, "%1bu", J(1, 200, 0))

fbufput(C, fh, "%8z", pi())

fbufput(C, fh, "%2bu", (255, 255))
fclose(fh)
end

​The one distinction here is the use of the bufio() function.  It creates a column vector containing the information of the byte order and Stata’s version, but allows us to use a range of binary formats available to use in Stata’s file write commands.

Reading the Files Back

​Now that we’ve written three files, which (in theory) should be identical, let’s create a Mata function that reads the contents stored in these files.  Note: it should return the value of pi in all three cases.  As it turns out, they all do.
mata:
void read_wmm(string scalar filename)
{
        fh = fopen(filename, "r")
        C = bufio()

        fbufget(C, fh, "%1bu", 2)
        if (fbufget(C, fh, "%20s")!="Will M Matsuoka File") {
                errprintf("Not a proper wmm file")
                fclose(fh)
                exit(610)
        }
        
        bufbyteorder(C, fbufget(C, fh, "%1bu"))

        offset = fbufget(C, fh, "%4bu")
        fseek(fh, offset, 0)

        fbufget(C, fh, "%8z")

        fclose(fh)
}

read_wmm("testfile-fwrite.wmm")
read_wmm("testfile.wmm")
read_wmm("testfile3-bufio.wmm")
end

​​And there you have it, a bunch of different ways to do the same thing.  While I enjoy using Mata’s file handling commands for its simplicity, it does get a little cumbersome when writing integers longer than 1 byte at a time.  Time to start making your own secret file formats and mining data from others.
1 Comment
vidmate.onl link
5/6/2022 07:12:06 am

nks for ssaxharing the article, and more importantly, your personal experi ence mindfully using our emotions as data about our inner state and knowing when it’s better to de-escalate by taking a time out are great tools. Appreciate you reading and sharing your story since I can certainly relate and I think others can to

Reply



Leave a Reply.

    Author

    Will Matsuoka is the creator of W=M/Stata - he likes creativity and simplicity, taking pictures of food, competition, and anything that can be analyzed.

    For more information about this site, check out the teaser above!

    Archives

    July 2016
    June 2016
    March 2016
    February 2016
    January 2016
    December 2015
    November 2015
    October 2015
    September 2015

    Categories

    All
    3ds Max
    Adobe
    API
    Base16
    Base2
    Base64
    Binary
    Bitmap
    Color
    Crawldir
    Email
    Encryption
    Excel
    Exif
    File
    Fileread
    Filewrite
    Fitbit
    Formulas
    Gcmap
    GIMP
    GIS
    Google
    History
    JavaScript
    Location
    Maps
    Mata
    Music
    NFL
    Numtobase26
    Parsing
    Pictures
    Plugins
    Privacy
    Putexcel
    Summary
    Taylor Swift
    Twitter
    Vbscript
    Work
    Xlsx
    XML

    RSS Feed

Proudly powered by Weebly
  • Introduction
  • Topics
  • Stata
  • Teaching
    • ECON 641L
    • ECON 640L >
      • Econ-Data
  • Blog
  • About