Examining My Facebook Downloads

One very good consequence of the Facebook/Cambridge Analytica story is that a lot of people are discovering the surprisingly large amount of data that Facebook holds on them. The BBC’s Rory Cellan-Jones was “somewhat shocked” to see what it had on him. And The Verge has a good piece on the subject with particular reference to Android phones.

In essence, Facebook always asks for quite a lot of data when you install its apps, and people seem to be too quick to offer that data when it comes to installing those apps. Only now are they discovering what they’re sharing.

“Yes, yes. Just install and let me get onto Facebook,” seems to be the default thought process.

Now I’m not going to pretend that I’ve always slavishly careful about those permissions myself, but I certainly wanted to see what Facebook holds on me. So I went to the Facebook Settings page and clicked on the Download A Copy link at the bottom.

Facebook first has to prepare the data, crunching it into a Zip file for you. You need to re-enter your password to begin the process, and Facebook promises to email you when the link is ready.

Based on others, I thought it may take a while to compile, but in face it took just 16 minutes. Fast considering the volume of data and the number of users who are perhaps also doing this right now. You have to re-enter your password a second time, and then the file downloads.

I’ve been on Facebook since 2007, and I thought that this could be a big file. In the end it was just over 1.1GB. I’ve uploaded a lot of photos to the service in the past, but particularly in the early years of Facebook, they heavily down-sampled those pictures. (Another reminder that you shouldn’t use Facebook as your only photo backup.)

Anyway, the file extracts easily enough and Facebook has built a fairly intuitive html interface for you to examine your data offline.

My profile data is an interesting place to start. Facebook seems to have detected a single family relationship. While relatively few of my family are on Facebook, some of those who are, were not picked up here as family members. If they don’t have the same surname it might not be obvious to an algorithm.

The interests section is very odd, and not very accurate. When Facebook first started, you just had empty text boxes to fill out. I wrote a general stream of consciousness about music, TV, movies and so on. At various points Facebook has tried to clean that up a little, isolating artists and titles, and linking them to official accounts or lists that it has.

But despite prompts to help them (and help me!), I never really played ball. So there is one novel listed in books, which I think I was probably reading at the time. There is one TV series – one that I absolutely do not recommend. Movies are a little more populated, but with films I may have referenced directly on the service rather than anything else. And music is very limited. Facebook really doesn’t know much about my media consumption.

In general, Facebook would learn a lot more about my media choices if they scanned through this blog!

Otherwise, most of the rest is either groups or people I’ve taken an interest in. I would say that they’ve used Instagram heavily for the latter.

Probably the most contentious area is the list of contacts. And for me, that’s a moment in time, when I did at one point let Facebook into my phone or Gmail account. The list of contacts is old, and while many of those email addresses and phone numbers still work, they’re cast in aspic. Over the years I’ve had any number of phones, and if and when I install a Facebook app, I never give permission for it to see my contacts.

My Timeline is as you would expect – everything I’ve written on Facebook. I link my Twitter account to Facebook, because I’m far more active there. All those Tweets are also captured here. But nothing I wouldn’t expect Facebook to have.

As I mentioned above, I’ve uploaded a number of photos to Facebook over the years. They tend to be more social photos than anything, and Facebook was an easy way to share with friends and work colleagues. Latterly, anything that I’ve cross-posted from Instagram shows up. [Update: A friend – on Facebook – noted that captions for photos are not included]

There are only a limited number of videos, again social, and no surprises.

Messages lists all my Facebook message and Messenger interactions. I loathe Messenger and don’t ever have it permanently installed (On occasion I’ve installed it for a short, but necessary period of time. I uninstall it immediately thereafter). Nontheless, again there were no surprises.

The data supplied by Facebook on “Pokes” (Remember them?) was incomplete. I only had one poke listed!

Security lists a variety of things including devices, and even IP addresses from which I’ve accessed Facebook.

The final two key pieces of note were Applications and Ads. I recently cleared out the list of applications that I allow Facebook links to. It’s always worth doing this on a regular basis. I know precisely which apps are currently linked, and there is a good reason for each of them. There are only five.

Ads are broken into three parts. There’s the list of topics that Facebook thinks you’re interested in. This is a curious mix of very broad things (“Music”) and very narrow things (“Dan Martin (cyclist)”). It’s reasonably fair, although I don’t really have a particular interest in Citroen, nor Motor Sports or Auto racing. And I’ve no idea why “BBC Radio Solent” is one of a handful of radio stations listed as being of interest to me [Update: I worked out that a former work colleague of mine works there now, and I’ve liked some of their activities]. They do at least list my current employer! My previous employer is not listed. It’s possibly that this list is dynamically updated and pruned accordingly.

Ads History claims to list all the ads I’ve clicked on. They only have two listed – both this year – and one without a named advertiser. This is clearly missing data. While I do recall clicking the one named advertiser, and although I rarely click advertisements, I have clicked others in the past. Incredibly, I once actually bought something on the basis of a Facebook ad! Extraordinary, I know.

Finally, perhaps most worrying for me, is a list of “Advertisers with your contact info.” Most of the list is made up of KLM subsidiaries. I once entered a KLM competition on Facebook, and must have agreed they could use my data. I rarely participate in competitions that require much data access for this very reason. Uber, Airbnb, Deliveroo and eBay Canada seem to have my details. But there are a hole bunch of seemingly related “Crowdfunding” companies who have my data. I’ve no idea how they got it, and more importantly, I’ve no idea how to remove it from them. In general it’s quite a contained list.

Notably, Facebook does not have a list of my outgoing or incoming calls, and it’s not had access to any SMS messages I’ve sent. I’ve never given permission, and never wanted to use one of its products as my default SMS app.

The most sensitive data is my list of contacts. But that data is old and is not being updated since the Facebook app on my current phone does not have permission.

As I’ve said repeatedly on this blog, I’ve never found Facebook the most trustworthy company. But on the other hand, there aren’t any surprises to me from what Facebook has in my data.

I think that there are some incomplete aspects of it. I’ve clearly clicked on more ads that Facebook is admitting – but perhaps they delete that data after a period? Less importantly, the list of Pokes was incomplete. I mention that only as it suggests that this might not be a truly complete picture of my Facebook activity.

But I also know that if I carried out the same process for Google, it would be a lot larger. Google has all my email. It has all my contacts. It stores documents, photos and videos for me. I use its browsers multiple times per day. It knows what YouTube videos I watch. It knows what music I listen to. I’ve had phones running its software for years. They know where I go.

In all of that respect, it’s potentially a much scarier proposition.

And yet, I do have more trust in Google than I do in Facebook. Perhaps that’s misplaced? Perhaps not. But in general terms, I think people are clearer in their knowledge of how their Google data is used.

Auditing who knows what about you is important, and we should all be doing this on a regular basis. It’ll be a much bigger job, but it looking at my Google Data might be worth doing too…


It’s probably worth highlighting a few things that you don’t get from this data.

  • Likes – Given that a key part of the Cambridge Analytica story is about trying to determine OCEAN psychographic measures from Facebook likes, a record of comments and pages I’ve “liked” is data that’s relevant but not here.
  • Facebook Pixel dataFacebook Pixel is the technology that Facebook uses to determine where users also go. While that could be websites that simply allow you to comment via your Facebook login, it might as well be websites that you never realised had installed the pixel. In effect, when you visit such a site, Facebook knows about it. It gives them some of the data that Google collates about you via its ad networks.
  • Geographic data – Facebook loves to know where you are. I mostly have this turned off, but couldn’t definitively say that this has always been the case. While Google has its Timeline History that tells you where you’ve been, there doesn’t seem to be an equivalent for Facebook’s location data. Incidentally, if you’ve never explored that Google data, I’d urge you to. You’ll be delighted, scared and possibly both. (Note to crime drama and fiction writers: Nobody ever uses this, although I understand it potentially increases the difficulty in plotting your story as mobile phones in general have.)
  • Whatsapp or Instagram data – I’ve noted that some of my Instagram information does seem to have fed through to parent company Facebook’s data. But that doesn’t seem to be the case for WhatsApp. Within the EU, Facebook has been limited quite significantly about how much data it shares. The UK’s Information Commissioner made that very point again recently. But it’s worth noting nonetheless.

View from the Shard

View from the Shard-7

To the 34th floor of The Shard and the Shangri La Hotel where a friend is having a party. A great night but a challenge to get good photos from. I knew that the internal lights of the glass would cause horrible reflections inside, so to try to combat it, I arrived with a piece of black felt into which I’d cut a hole that wrapped around the lens of my RX100. The idea was to block out light and reflections. But triple glazed windows basically defeated many of my plans. The photo below is a good example:

View from the Shard-1

I like the photo a lot, but there’s a great big bit of table reflecting in the lower part of it. Short of turning all the lights off, there’s not much I can do. In the end, go with it, and try to work around the problem.

View from the Shard-2

View from the Shard-4

View from the View from the Shard-5

The Sky Garden

Sky Garden-27

The Sky Garden is that rarest of things – a view of London from atop a sky scraper that is actually free of charge to visit. It sits on top of 20 Fenchurch Street, aka the walkie talkie.

I believe that there was some quid pro quo done with the building’s owners letting them get bigger as the building went upwards in return for providing free public entry. That’s the good news. The bad news is that to get up you have to book a free ticket in advance, and at time of writing they are completely sold out. The official website suggests keeping an eye on it, or following their Twitter feed. The other way, of course, is to book into the bar or one of the two restaurants.

I won’t bother repeating what others have already said, least of all Diamond Geezer who has written a very good blog on his visit(s).

When I first booked my tickets, I noticed something about “professional” photogaphic gear not being welcome. And certainly no tripods. Just to be on the safe side, I went with my RX100 M3 point and shoot, although plenty of others had DSLR’s. I saw one woman using filters on her camera too – unusual to see when you don’t have a tripod.

From a photographic perspective, the biggest challenge is internal light reflections. The only outside bit is a balcony that was closed off when I visited. It too has high glass, but you could probably hold your camera over that. Added to which, it had been raining on and off, and that left raindrops on one side of the building’s glass.

What I will say is that it really wasn’t very crowded. They seem to limit the numbers quite heavily, and free tickets being free, you imagine that a number of people didn’t show up. The slowest part of getting through the airport-style security for me was the fact that many people were relying on the ticket barcodes on their phone (they email a PDF for tickets). This can be a bit fiddly, particularly with multi-page PDFs for several members of the party. I brought a print-out.

Once in, nobody is going to kick you out. My start time was 3.45pm, and I knew sunset was an hour later. To be honest, if you’re not going to have a drink at the bar, then there’s not a great deal to do. It’s all fully enclosed, and you walk around, take selfies (everyone apart from me, I would conservatively say), and then leave again. I hung around to wait for it to get darker.

So there you go. Keep an eye on the website. Ordinarily they say you have to book at least three days in advance, but I fear that it’s going to be harder than that to get in. With the Shard over the river costing £25, this is a bargain. Even the distant Monument below costs £4 – although climbing that is more of an achievement than hitting “35” in a lift.

Sky Garden-31

Sky Garden-37

Sky Garden-18

Sky Garden-14

Sky Garden-8

Sky Garden-33

More photos on Flickr.

On a Canal in a Canoe – Secret Adventures

Secret Adventures Canoe at Night-18

It being the middle of January, and therefore getting quite cold, what better way of spending a Monday evening could there be than paddling a canoe around London?

This was an organised trip via “Secret Adventures“, an internet Meetup group. We set off from Moo Canoes’ base in Limehouse, heading up Limehouse Cut heading in the direction of Stratford, before passing 3 Mills Studios, continuing up the Lee Navigation, ignoring turn-offs that are still closed due to post-Olympic development until we reached the lock just adjacent to the back of the Olympic stadium near Fish Island, before continuing a little further up to Crate Brewery & Pizzeria.

It was good fun, and not too hard on the upper body! The well organised event did a good job pairing people up for the boats and ensuring we had the basics before hitting the water. I did manage to fairly soak my legs however – something to do with being 6’2″ and not being able to canoe with my legs flat. And although I kept the camera dry, I fear the Lee Navigation must now have a Lowe Pro camera case (thankfully otherwise empty) to add to its disturbingly large collection of junk.

The pictures I took tended to look better on the back of my camera than they did on a 23” monitor. Using ISO 6400 quite a lot, a certain amount of noise reduction has needed to be applied.

Anyway, as well as these photos, there are more over on Flickr.

Highly recommended!

Secret Adventures Canoe at Night-19

Secret Adventures Canoe at Night-5

Secret Adventures Canoe at Night-9

Secret Adventures Canoe at Night-13

Cyclo Cross World Cup – Milton Keynes 2014 – Part 1

CXWC Milton Keynes 2014 Women-19

Over the weekend, it was the Milton Keynes round of the UCI’s Cyclo Cross World Cup. It’s the first time a round of this event has taken place outside mainland Europe. So definitely worth a 30 minute train ride up to see it! The event took place on a specially built course in Campbell Park, the park to the north of the town centre.

Here’s what you need to know – although the weekend’s events took place in the dry, it had certainly been raining there recently. This was serious mud.

The park is naturally hilly, and the course went up and down parts of it, with the expected sections where you have to carry your bike. But some flat sections were also so muddy, you had to carry your bike.

You can read more about the races here, but needless to say, I took a lot of photos. You can some of the photos from the womens’ race here, and the rest on Flickr. The mens’ race is to follow.

CXWC Milton Keynes 2014 Women-23

CXWC Milton Keynes 2014 Women-24

CXWC Milton Keynes 2014 Women-7

CXWC Milton Keynes 2014 Women-5

CXWC Milton Keynes 2014 Women-40

CXWC Milton Keynes 2014 Women-51

CXWC Milton Keynes 2014 Women-49

Aldwych on Exposure

I’m continuing to play around with the best way of displaying photos online, and a new kid on the block is Exposure. They use high resolution photos that fully utilise screen real estate to make pleasingly simple photo layouts.

Interestingly, although they’re free for your first three sets of photos, you need to pay after that. That’s because, as they explain, they wanted a business proposition from the outset, and didn’t want to get customers and then try to work out how to monetise them. In any case, advertising would really spoil the experience.

I’ve now put a couple of sets onto the service. My latest is a collection of photos taken at the now closed Aldwych tube station on The Strand.

Previously, I put some photos of Suffolk on the service.

I don’t see myself abandoning Flickr any time too soon. And while Google+ is doing a lot with photos, I’m not ready to fully utilise that either. Exposure recently added the ability to use custom domains. If that extended to sub-domains (it may – I’ve not checked) then it might be a solution to my currently underwhelming photography page.

I would like more control how Flickr lets you embed photos externally though. For example, the photo above is an iframe, and while it lets visitors easily see a fullscreen version of the photo (if they’re on itself – they may not get the fullscreen option in a reader like Feedly), there is nothing to stop the viewer disappearing off into my full Flickr photoset. While in and of itself, that mightn’t be a bad thing, I’m generally using a photo to illustrate some writing.

Currently you can still embed the “old” way if you revert to the older style Flickr. I’m not as hung up on Flickr as some are, and while they now seem to have a tendency to launch products before they’re ready, I’m pleased to see Yahoo finally developing the platform.

I do shortly face the dilemma of whether I continue on my grandfathered subscription plan, or whether I stop paying and start to see ads. I tend to think that I use Flickr enough that I don’t want advertising though. I’ll have to wait and see.