Coercion. It’s the topic of the day, and a short topic at that. Coercion is the leading driver for Excel usage for most Stata users. Workbook linkage is usually a pain – the practice of linking one cell from a workbook to a different cell in another workbook – and can turn a relatively simple task into a nightmare usually ending the job with a headache and a stiff drink.
However, with a few simple Mata commands, the crisscross of equal signs, dollar signs, and formulas can turn a repetitive task into a few simple copy-and-paste commands. There is also less room for error. For fun, we'll do this entire thing in Mata. First, we need to decide how many observations/variables we need. Let’s look at our current workbook and target workbook.
So we need 5 columns and 7 rows, with columns from D to H and rows from 3 to 9 in our BookB set. Write it:
mata: st_addobs(7) st_addvar("str255", "v" :+ strofreal(1..5))
Starting with a blank dataset, add seven observations. From there, we create 5 variables (v1 v2 v3 v4 v5) that are all of type str255. This is a placeholder value for long equations. The second part of this command “v” :+ strofreal(1..5) is my favorite part about Mata. The colon operator (:+) is an element-wise operator that is super helpful. No more Matrix conformability errors (well maybe a few). It creates the matrix: ("v1", "v2", "v3", "v4", "v5") without much work and allows for any number of variables! Sidenote – changing (1..5) to (1..500000) took about 3 seconds to create 500,000 v#’s. Next, let’s set up our workbook name: bookname = "'C:\Users\wmmatsuo\Desktop\[BookB.xlsx]Linked'!" Specify the location of the workbook with the file name in square brackets and enclose the sheet name in apostrophes. End that expression with an exclamation point! Since we know we want D through H, and cells 3-9, we can write the following: cols = numtobase26(4..8) rows = strofreal(3::9) numtobase26() turns 4 through 8 into D through H. For more information see my previous post on its inverse. The rows variable is a column vector containing values 3 through 9. Now it’s time to build our expression: expr = "=" :+ bookname :+ "$" :+ J(7, 1, cols) :+ "$" :+ rows We put an equals sign in front of everything so that Excel will know we’re calling an equation. Follow that with the workbook name. If you didn’t know, dollar signs in excel will lock your formulas so they don’t move when you copy formulas from one cell to another. We’ll put those in as well. The next part, J(7, 1, cols), takes our column letters and essentially repeats it seven rows down. Thanks to the colon operator, we just have to add another dollar sign to all elements of the matrix and our row numbers. Since the new cols matrix contains seven rows, and our row vector contains seven rows, it knows to repeat the row values for all columns. Let’s just put that matrix back into Stata and compress our variables. st_sstore(., ("v" :+ strofreal(1..5)), expr) stata("compress") end And voilà, we have a set of equations ready to copy into Excel. One quick tip: set your copy preferences so that when you copy the formulas, you won’t also copy variable names by going to edit->preferences->data editor-> uncheck “Include variable names on copy to clipboard”. Now just copy the equations directly from your Stata browse window into Excel, and enjoy those sweet results! In other news, this post built upon the putexcel linking discoveries found over at belenchavez.com - who recently had her Fitbit stolen! Follow her journey to get it back on her website or on Twitter - hopefully she'll be able to coerce the thief into returning the device.
2 Comments
Did you know that Mata has a hidden command called numtobase26? Given its usefulness with putexcel, I can’t possibly see why it’s hidden! Perhaps Stata doesn’t want us using their highly-tuned statistical analysis software as a glorified spreadsheet beautifier. Nevertheless, you may be forced to automate a god-awful excel report in order to free up time to utilize Stata to the fullest!
This is where we start to depart from our beloved putexcel, and shift slightly towards xl(). Both commands tend to serve the same purpose, but much like Stata vs. Mata, putexcel is meant for the layman, and Mata is meant to make things fast while being slightly less readable. Gross oversimplification aside, putexcel favors the traditional spreadsheet alphabetical columns names combined with row numbers A1, A2, A3 whereas xl() generally takes the r1, c1 form. Say you have a dataset with 50 variables: v1, v2, v3, …, v50 and you wish to rename the variables so that they match the rows of excel. We could try and loop through the alphabet c(ALPHA), but once you get to Z, things start to get tricky. For me, a much simpler solution is to use numtobase26. For those unfamiliar with Mata, you can make a vector of sequential numbers with ease: mata: (1..50)
We have a row vector with 50 elements that we can now convert to our excel column names using:
mata: numtobase26((1..50)) A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX
We just need to store it as a local so we can use it easily in Stata. (Also concatenate the elements)
mata: st_local("alphavars", invtokens(numtobase26(1..50))) ren * (`alphavars'), dry You can imagine how this can greatly simplify creating putexcel functions that require you to reference columns by their excel name using a simple foreach var of varlist * combined with the levelsof technique described in Part I. Codebook
What happens if you want to go the other way? Say you have the column BC12 but you wish to convert this to 55, 12 so you can use it in one if xl()’s many functions. Unfortunately, I could not find an easy solution, so I made a rough fix that should be similar to how putexcel handles its cell expressions. Feel free to use the numofbase26() Mata function if you ever find yourself in this predicament.
real matrix numofbase26(string matrix base) { real matrix output, pwr, b real scalar i, j, k, l base = strupper(base) output = J(rows(base), cols(base), .) for (i=1; i<=rows(base); i++) { for (j=1; j<=cols(base); j++) { if (strlen(base[i,j]) == 1) output[i,j] = ascii(base[i,j]) - 64 else { l = strlen(base[i,j]) b = pwr = J(1, l, .) for (k=1; k<=l; k++) { b[1,k] = ascii(substr(base[i,j], k, 1)) - 64 pwr[1, k] = l - k } output[i,j] = rowsum(b :* (26:^pwr)) } } } return(output) } Happy Automating! I promise the next post will contain less technical stuff and more pictures. |
AuthorWill Matsuoka is the creator of W=M/Stata - he likes creativity and simplicity, taking pictures of food, competition, and anything that can be analyzed. Archives
July 2016
Categories
All
|