One very good consequence of the Facebook/Cambridge Analytica story is that a lot of people are discovering the surprisingly large amount of data that Facebook holds on them. The BBC’s Rory Cellan-Jones was “somewhat shocked” to see what it had on him. And The Verge has a good piece on the subject with particular reference to Android phones.
In essence, Facebook always asks for quite a lot of data when you install its apps, and people seem to be too quick to offer that data when it comes to installing those apps. Only now are they discovering what they’re sharing.
“Yes, yes. Just install and let me get onto Facebook,” seems to be the default thought process.
Now I’m not going to pretend that I’ve always slavishly careful about those permissions myself, but I certainly wanted to see what Facebook holds on me. So I went to the Facebook Settings page and clicked on the Download A Copy link at the bottom.
Facebook first has to prepare the data, crunching it into a Zip file for you. You need to re-enter your password to begin the process, and Facebook promises to email you when the link is ready.
Based on others, I thought it may take a while to compile, but in face it took just 16 minutes. Fast considering the volume of data and the number of users who are perhaps also doing this right now. You have to re-enter your password a second time, and then the file downloads.
I’ve been on Facebook since 2007, and I thought that this could be a big file. In the end it was just over 1.1GB. I’ve uploaded a lot of photos to the service in the past, but particularly in the early years of Facebook, they heavily down-sampled those pictures. (Another reminder that you shouldn’t use Facebook as your only photo backup.)
Anyway, the file extracts easily enough and Facebook has built a fairly intuitive html interface for you to examine your data offline.
My profile data is an interesting place to start. Facebook seems to have detected a single family relationship. While relatively few of my family are on Facebook, some of those who are, were not picked up here as family members. If they don’t have the same surname it might not be obvious to an algorithm.
The interests section is very odd, and not very accurate. When Facebook first started, you just had empty text boxes to fill out. I wrote a general stream of consciousness about music, TV, movies and so on. At various points Facebook has tried to clean that up a little, isolating artists and titles, and linking them to official accounts or lists that it has.
But despite prompts to help them (and help me!), I never really played ball. So there is one novel listed in books, which I think I was probably reading at the time. There is one TV series – one that I absolutely do not recommend. Movies are a little more populated, but with films I may have referenced directly on the service rather than anything else. And music is very limited. Facebook really doesn’t know much about my media consumption.
In general, Facebook would learn a lot more about my media choices if they scanned through this blog!
Otherwise, most of the rest is either groups or people I’ve taken an interest in. I would say that they’ve used Instagram heavily for the latter.
Probably the most contentious area is the list of contacts. And for me, that’s a moment in time, when I did at one point let Facebook into my phone or Gmail account. The list of contacts is old, and while many of those email addresses and phone numbers still work, they’re cast in aspic. Over the years I’ve had any number of phones, and if and when I install a Facebook app, I never give permission for it to see my contacts.
My Timeline is as you would expect – everything I’ve written on Facebook. I link my Twitter account to Facebook, because I’m far more active there. All those Tweets are also captured here. But nothing I wouldn’t expect Facebook to have.
As I mentioned above, I’ve uploaded a number of photos to Facebook over the years. They tend to be more social photos than anything, and Facebook was an easy way to share with friends and work colleagues. Latterly, anything that I’ve cross-posted from Instagram shows up. [Update: A friend – on Facebook – noted that captions for photos are not included]
There are only a limited number of videos, again social, and no surprises.
Messages lists all my Facebook message and Messenger interactions. I loathe Messenger and don’t ever have it permanently installed (On occasion I’ve installed it for a short, but necessary period of time. I uninstall it immediately thereafter). Nontheless, again there were no surprises.
The data supplied by Facebook on “Pokes” (Remember them?) was incomplete. I only had one poke listed!
Security lists a variety of things including devices, and even IP addresses from which I’ve accessed Facebook.
The final two key pieces of note were Applications and Ads. I recently cleared out the list of applications that I allow Facebook links to. It’s always worth doing this on a regular basis. I know precisely which apps are currently linked, and there is a good reason for each of them. There are only five.
Ads are broken into three parts. There’s the list of topics that Facebook thinks you’re interested in. This is a curious mix of very broad things (“Music”) and very narrow things (“Dan Martin (cyclist)”). It’s reasonably fair, although I don’t really have a particular interest in Citroen, nor Motor Sports or Auto racing. And I’ve no idea why “BBC Radio Solent” is one of a handful of radio stations listed as being of interest to me [Update: I worked out that a former work colleague of mine works there now, and I’ve liked some of their activities]. They do at least list my current employer! My previous employer is not listed. It’s possibly that this list is dynamically updated and pruned accordingly.
Ads History claims to list all the ads I’ve clicked on. They only have two listed – both this year – and one without a named advertiser. This is clearly missing data. While I do recall clicking the one named advertiser, and although I rarely click advertisements, I have clicked others in the past. Incredibly, I once actually bought something on the basis of a Facebook ad! Extraordinary, I know.
Finally, perhaps most worrying for me, is a list of “Advertisers with your contact info.” Most of the list is made up of KLM subsidiaries. I once entered a KLM competition on Facebook, and must have agreed they could use my data. I rarely participate in competitions that require much data access for this very reason. Uber, Airbnb, Deliveroo and eBay Canada seem to have my details. But there are a hole bunch of seemingly related “Crowdfunding” companies who have my data. I’ve no idea how they got it, and more importantly, I’ve no idea how to remove it from them. In general it’s quite a contained list.
Notably, Facebook does not have a list of my outgoing or incoming calls, and it’s not had access to any SMS messages I’ve sent. I’ve never given permission, and never wanted to use one of its products as my default SMS app.
The most sensitive data is my list of contacts. But that data is old and is not being updated since the Facebook app on my current phone does not have permission.
As I’ve said repeatedly on this blog, I’ve never found Facebook the most trustworthy company. But on the other hand, there aren’t any surprises to me from what Facebook has in my data.
I think that there are some incomplete aspects of it. I’ve clearly clicked on more ads that Facebook is admitting – but perhaps they delete that data after a period? Less importantly, the list of Pokes was incomplete. I mention that only as it suggests that this might not be a truly complete picture of my Facebook activity.
But I also know that if I carried out the same process for Google, it would be a lot larger. Google has all my email. It has all my contacts. It stores documents, photos and videos for me. I use its browsers multiple times per day. It knows what YouTube videos I watch. It knows what music I listen to. I’ve had phones running its software for years. They know where I go.
In all of that respect, it’s potentially a much scarier proposition.
And yet, I do have more trust in Google than I do in Facebook. Perhaps that’s misplaced? Perhaps not. But in general terms, I think people are clearer in their knowledge of how their Google data is used.
Auditing who knows what about you is important, and we should all be doing this on a regular basis. It’ll be a much bigger job, but it looking at my Google Data might be worth doing too…
It’s probably worth highlighting a few things that you don’t get from this data.
- Likes – Given that a key part of the Cambridge Analytica story is about trying to determine OCEAN psychographic measures from Facebook likes, a record of comments and pages I’ve “liked” is data that’s relevant but not here.
- Facebook Pixel data – Facebook Pixel is the technology that Facebook uses to determine where users also go. While that could be websites that simply allow you to comment via your Facebook login, it might as well be websites that you never realised had installed the pixel. In effect, when you visit such a site, Facebook knows about it. It gives them some of the data that Google collates about you via its ad networks.
- Geographic data – Facebook loves to know where you are. I mostly have this turned off, but couldn’t definitively say that this has always been the case. While Google has its Timeline History that tells you where you’ve been, there doesn’t seem to be an equivalent for Facebook’s location data. Incidentally, if you’ve never explored that Google data, I’d urge you to. You’ll be delighted, scared and possibly both. (Note to crime drama and fiction writers: Nobody ever uses this, although I understand it potentially increases the difficulty in plotting your story as mobile phones in general have.)
- Whatsapp or Instagram data – I’ve noted that some of my Instagram information does seem to have fed through to parent company Facebook’s data. But that doesn’t seem to be the case for WhatsApp. Within the EU, Facebook has been limited quite significantly about how much data it shares. The UK’s Information Commissioner made that very point again recently. But it’s worth noting nonetheless.