On writing binary files in Stata/Mata
As a supplement to my most recent posts, I decided to put together a quick guide on writing and reading binary files. Stata has a great manual for this – however, I struggled to see how this works in Mata. I spent a good afternoon scouring Google, Stata Journals, and Statalist to no avail. Little did I know, I was looking in the wrong places and for the wrong commands. It wasn’t until I broke out my old Mata 12 Mata Reference guide that I realized the solutions lie not with fopen(), but with bufio() (and yes, bufio() is referenced in fopen() - always check your references)
We start by making up a fake type of file called a wmm file. This file always begins with the hex representation ff00, which we know just means 255-0 in decimal or 11111111 00000000 in binary. The next 20 characters spell out “Will M Matsuoka File” followed by a single byte containing the byte order or 00000001 in binary. From there, the next four bytes contains the location of our data as we put a huge buffer of zeros before any meaningful data. It makes sense to skip all of these zeros if we know we don’t need to ever use them. After these zeros, we’ll store the value of pi and end the files with ffff. The file looks like this: Stata's File Commandtempname fh file open `fh' using testfile.wmm, replace write binary file write `fh' %1bu (255) %1bu (0) file write `fh' %20s "Will M Matsuoka File" file set `fh' byteorder 1 file write `fh' %1bu (1) * offset 200 file write `fh' %4bu (200) forvalues i = 1/200 { file write `fh' %1bs (0) } file write `fh' %8z (c(pi)) file write `fh' %2bu (255) %2bu (255) file close `fh' The only thing I feel I need to note here is the binary option under file open. Other than that, take note that we’re setting the byteorder to 1. This is a good solution to writing binary files; however, since most of my functions are in Mata, we might as well figure out how to do this in Mata as well. Mata's File Command
If you didn’t know, Mata’s fopen() command is very similar to Stata’s file commands, with a few slight differences that we won't touch on here. Just know, it's pretty awesome and don't forget the mata: command!
mata: fh = fopen("testfile-fwrite.wmm", "w") fwrite(fh, char((255, 0))) fwrite(fh, "Will M Matsuoka File") // We know that the byte order must be 1 fwrite(fh, char(1)) fwrite(fh, char(0)+char(0)+char(0)+char(200)) fwrite(fh, char(0)*200) fwrite(fh, char(64) + char(9) + char(33) + char(251) + char(84) + char(68) + char(45) + char(24)) fwrite(fh, char((0,255))*2) fclose(fh) end I personally like the aesthetic of this syntax; it’s clean, neat, and relatively simple. The only problem is its ability to handle bytes. In short, it doesn’t do it at all. We’d have to build some more functions in order to accomplish this task (especially when it comes to storing double floating points) which is why Mata also has a full suite of buffered I/O commands. It’s a little more complicated, but well worth it. After all, we cheated in converting pi to a double floating storage point by using what we wrote in the previous command. This is not a good practice. Mata's Buffered I/O Command
Let's get right to it
mata: fh = fopen("testfile3-bufio.wmm", "w") C = bufio() bufbyteorder(C, 1) fbufput(C, fh, "%1bu", (255, 0)) fbufput(C, fh, "%20s", "Will M Matsuoka File") // We know that the byte order must be 1 fbufput(C, fh, "%1bu", bufbyteorder(C)) fbufput(C, fh, "%4bu", 200) fbufput(C, fh, "%1bu", J(1, 200, 0)) fbufput(C, fh, "%8z", pi()) fbufput(C, fh, "%2bu", (255, 255)) fclose(fh) end The one distinction here is the use of the bufio() function. It creates a column vector containing the information of the byte order and Stata’s version, but allows us to use a range of binary formats available to use in Stata’s file write commands. Reading the Files Back
Now that we’ve written three files, which (in theory) should be identical, let’s create a Mata function that reads the contents stored in these files. Note: it should return the value of pi in all three cases. As it turns out, they all do.
mata: void read_wmm(string scalar filename) { fh = fopen(filename, "r") C = bufio() fbufget(C, fh, "%1bu", 2) if (fbufget(C, fh, "%20s")!="Will M Matsuoka File") { errprintf("Not a proper wmm file") fclose(fh) exit(610) } bufbyteorder(C, fbufget(C, fh, "%1bu")) offset = fbufget(C, fh, "%4bu") fseek(fh, offset, 0) fbufget(C, fh, "%8z") fclose(fh) } read_wmm("testfile-fwrite.wmm") read_wmm("testfile.wmm") read_wmm("testfile3-bufio.wmm") end And there you have it, a bunch of different ways to do the same thing. While I enjoy using Mata’s file handling commands for its simplicity, it does get a little cumbersome when writing integers longer than 1 byte at a time. Time to start making your own secret file formats and mining data from others.
1 Comment
5/6/2022 07:12:06 am
nks for ssaxharing the article, and more importantly, your personal experi ence mindfully using our emotions as data about our inner state and knowing when it’s better to de-escalate by taking a time out are great tools. Appreciate you reading and sharing your story since I can certainly relate and I think others can to
Reply
Leave a Reply. |
AuthorWill Matsuoka is the creator of W=M/Stata - he likes creativity and simplicity, taking pictures of food, competition, and anything that can be analyzed. Archives
July 2016
Categories
All
|