Extracting Location Data from Pictures
You’re being tracked – in the world we live in today, almost everything you do has some sort of digital trace. I recently went to True North Tavern in North Park (San Diego) to mourn yet another 49er defeat – and to my surprise, they had a bouncer at the door checking IDs. I’ll admit, I am very lax in giving out my ID to anyone who asks for it, but on this occasion I did hesitate to provide identification a little as I saw a big black machine right by the door. Luckily, I made it through after a quick glance but as it turns out, a few friends were not so lucky. Besides visually checking ID’s the bouncers sometimes scan driver’s licenses into these black machines, which stores the patron’s driver’s license information including the photo! Today, we will explore why this is a very bad practice as we transition to talking about Exif data.
Exif, or exchangeable image file format, is a way to store information on your images or audio file. If you have an iPhone, you will probably notice that you can actually view your images on a map, and it shows you where and when you took the picture. This is really neat, especially if you’re a photo editor and need to match light or camera settings in Photoshop without having access to the physical camera. However, this is also incredibly dangerous. Here is a quick look at the properties of a photo I took with my iPhone 5.
You can see everything, including the exact location of where I took the picture. Now, I’m sure this is not all that shocking, but it can be a little nerve wracking when you post pictures online knowing that someone can easily trace back where you were with a click of a button. Luckily, many social media websites such as Facebook and Twitter are aware of this and will strip your picture’s Exif data for you.
Exif data are stored in the header of your picture. What the heck does that mean? Every picture contains a header, and most new cameras and cell phones will actually write this data in the header. Your computer then reads this information and is able to tell your location, camera settings, and time stamps of the photo. One section actually contains all of the Exif tags – it’s really just a collection of information. In order to read this information, we have to read in the picture file which is a series of zeros and ones. Yup, we’re using Mata’s bufio() functions to read in all of this information. Bufio() allows us to write not only string characters, but also binary integers into files without using Stata’s file command. I will bore you with the details in a subsequent post, but for now, I’d like to introduce my program – exiflatlon – which strips JPG, TIF, and WAV file formats of latitude and longitude information. It can be used in two ways:
Let’s try out the directory specification.
In the event there is no exif information, exiflaton will provide you with a warning and will replace those latitude values and longitude values with zeros in the resulting dataset. It only takes milliseconds to rip the locations off each photo – and this is valuable information, especially if you don’t want the wrong type of people (or really anyone) tracking you. While I only use Stata for good, there are likely people who use it for bad purposes. Only joking, Stata users are never malicious, but there are plenty of other programmers who are and who have more efficient tools at their disposal for such (and maybe creepier) purposes. In today’s age, we need to be careful about safeguarding our data because there are no moral boundaries when it comes to taking data. Companies love information, and will not hesitate to steal this information. After all, hiding the fact that “by entering this premise, you consent to us taking all your information from your ID” is in no way consent when it’s not clearly stated. I’m looking at you, True North Tavern...
Some of the best ideas are those that happen right outside of your comfort zone. Unfortunately, they can also lie outside of your current means as well. Not to worry though, because with a little luck and a lot of perseverance, we can eventually erode away at the mountainous tasks that lie ahead. Like a running stream we can… well, perhaps that’s quite enough nature metaphors, but I think you get the picture.
I’m personally extremely excited to reveal this next set of blog posts because it ties together so many different concepts. This means that it will be broken out into several different parts. However, the complexity also comes at a cost. I thought I would be done by now – the struggle is real!
While I continue to do my research, I’ve added a quick snippet of what’s to be expected:
Alright, so it’s not that exciting. Yet. Remember what I said about current means? 3D rendering takes quite a long time, and my current computer can’t handle it that well which means I’ll be upgrading (or borrowing) resources. I’d like to point out the silver lining within this video though – this is actual data for elevation and location. Stata wrote this. Stata also wrote these mountains below.
With some help from an excellent source, Stata is able to bring in Fitbit data with ease.
Finally, here’s a panorama picture from inside my pocket. It plays a big role in the posts to come...
Will Matsuoka is the creator of W=M/Stata - he likes creativity and simplicity, taking pictures of food, competition, and anything that can be analyzed.