How we built a system that houses some of the world's most sensitive data

15 July 2020 Karl Anderson

Security

Understanding the data security challenge

When Safelink launched in 2010, our first customers were offshore wealth management firms. They shared a concern about data security that bordered on paranoia, and dramatic exposees like that of the "Panama Papers" in 2016 were, broadly speaking, what they were trying to avoid.

We spoke with these businesses to learn about their needs and experiences, and heard about a range of approaches that weren't working very well. PGP plugins for email were secure but cumbersome, and were abandoned after the first couple of exchanges.

Faxes were used to communicate with private clients in despotic nations, on the assumption that these were less likely to be intercepted and, therefore, less likely to result in kidnappings or extortion bids.

Some relationships could only be conducted by telephone.

The default position was to not share very much at all, and certainly not online. There just wasn't a simple way to share legal and financial information that offered the security and comfort that people wanted.

Creating a new breed of encryption security

We recognised that we needed to create a level of security that was unprecedented for an online communications service, without creating the sort of hassles that made them unsustainable.

If we were to provide a new solution we knew it needed to be secure but in a transparent way, and provider-managed. This was at a time well before anybody knew what a cloud was.

The solution we built allowed our clients to upload and access information in the way they expected, while underneath, the system quietly applied encryption and key management techniques that segmented and encrypted information using ephemeral keys.

Symmetric keys are generated securely as they're needed and are used to encrypt whatever data needs encrypting and do whatever processing work is needed at the time.

Enforcing segmentation

Instead of storing these keys, we use asymmetric encryption to create an encrypted copy for each user that's authorised to read the data, such that only those users can obtain that key. Then, the original symmetric key is thrown away.

When an authorised user signs in later, their device is able to provide enough information to decrypt the keys and then the documents, but without anybody being aware of what's happening beneath the waterline.

A benefit of the segmentation and use of separate encryption keys is that we can be certain that information for one of our client's relationships could never cross over with that of another, even in the case of subsequent programming errors or data theft.

We didn't stop at document contents, either; document names, metadata, folder names, and any other field that could conceivably contain sensitive information is encrypted in the same way.

We don’t retain the keys

A bigger benefit was that by not storing the encryption keys ourselves, we, even as the provider of the Safelink managed service, were unable to access the documents or details that our clients were sharing, and that if a disk or server were ever removed from a data center, nothing would be readable.

Not retaining any keys was a big differentiating factor, and as far as we know, it still sets us apart today.

Nearly every cloud and on-premise enterprise information system encrypts data, and that's great, and some of these systems go as far as using separate keys for each document in their systems. But if the keys are shared between clients, or worse, across the whole platform, or if the per-document keys are decryptable by the provider at will, then the provider ultimately has access to that information, as might inquisitive employees, or any outsider who compromises their access controls.

Even the use of client-provided keys, hardware security modules (HSMs), or client-provided HSMs doesn't help if the provider's system can programmatically retrieve those keys at any time.

Oddly, that's not even the hard part. The encryption and key management itself is relatively straightforward, to the point of being quite elegant, as it's been encapsulated in a layer that other features build upon. We've built simple things like shared calendars, and bigger things like document review and a full workflow engine and it all benefits from the foundation of isolated encryption.

Consequences of encrypting data while not having custodianship of the keys

The hard part is that, with our approach, any processing that we need to apply to documents needs to happen in tightly locked-down execution contexts, using small and heavily audited fragments of code, in short and finite time windows before the relevant keys are thrown away.

That affects the way we write, review, test, and run our code - though only in a good way.

However, features that would normally be easy to build, like full-text indexing of document names and content, and even just sorting documents by name, are an order of magnitude harder to achieve.

Instead of being able to throw words into a standard search engine, which would ultimately store document contents on disk in a retrievable form, we index cryptographic hashes of words and phrases, and have built a complete proprietary search engine around it. Using full-disk or partition encryption is not enough, as that counts as a shared key that would violate the segmentation of data that we maintain.

We can't use standard database indexes to implement features like "sort documents by name", because the names themselves are encrypted. For that we have another proprietary mechanism that calculates and stores relative orderings without ever storing actual names in any database.

It's also challenging to support a system when your support team has no ability to see what the customer is seeing, as is helping groups of people who have all lost their passwords at the same time, rendering their information unrecoverable, even in cases when we have many backups of the encrypted data.

These are the inevitable consequences of encrypting data while not having custodianship of the keys, but we've developed good solutions for these too, involving people, processes, guidance, and various technical fail-safes and fallbacks that our customers can control.

Where to from here?

We believe this approach to encryption has proven itself and will continue to serve our needs, and when our customers face new requirements like that of GDPR, the encryption and isolation Safelink provided was well ahead of what was called for. In its lifetime, our technology has been upgraded to support new ciphers, 256 bit keys, integrations to single sign-on (SSO) services, and the provision of two generations of APIs and should work well as we migrate to new database technologies.

The fact that we can't accidentally divulge customer data (because we can't access it ourselves), and that even a flaw in our access control mechanisms couldn't lead to a breach, continues to be of great comfort to us and to the customers that trust us with their data.

Editor's note: This post was originally published in April 2021 and has been updated in January 2023 for accuracy and comprehensiveness.

Karl Anderson

Founder at Safelink