At the intersection of efficiency and privacy

At the intersection of efficiency and privacy

This year’s iDash competition was all about identifying Covid variants while maintaining patient confidentiality. Dr. Mor Weiss and her colleagues came up with a solution that utilizes homomorphic encryption and snatched third place

Last month, cryptography expert Dr. Mor Weiss learned that she won third place in the 2021 iDash competition. The iDash competition has been taking place annually since 2014 and revolves around privacy-preserving solutions for medical and biological problems, particularly ones based on Fully Homomorphic Encryption (FHE). Each year, participants are presented with several challenges. The winning solutions must solve the problem while maintaining the privacy of all parties involved. “This year’s iDash competition was about identifying Covid variants while protecting the confidentiality of the medical data collected from participants,” shares Dr. Weiss, who participated in the competition alongside Dr. Haim Shaul (IBM), Ben Galili (Technion), Prof. Adi Akavia (University of Haifa), and Prof. Zohar Yakhini (Reichman University and Technion).

“This year’s scenario was about a large, central entity (a large hospital, government, the WHO) that collected a significant number of samples of different Covid variants, and used Machine Learning techniques to identify a model – that given a new sample can determine, with high likelihood, which variant is in it,” explains Dr. Weiss. “Now, a small HMO gets a patient sample (that is, the virus’s DNA sequence) and wants to send that sample to the central entity to find out which Covid variant the patient has. The problem is that sending out this information exposes sensitive medical data. Some regulations go as far as forbidding such medical information from leaving the country (and as a result, an out-of-country medical entity cannot be used). This year’s iDash challenge was to come up with a way to use the services of the central entity without allowing it to learn anything about the patient’s sample. We used homomorphic encryption, which enables computation over encrypted messages as if they were not encrypted.

How does it work? Encryption usually takes a message like “Hello world!” and “translates” it into gibberish, a random sequence of characters, for example, AH6038^$#^VNJKGsldgjh. You generally can’t compute over the gibberish. In this scenario, if I want to look for a certain DNA sequence in the sample to determine the variant, I couldn’t do that on the encrypted sample, because the sequence was turned into gibberish. And that’s the amazing thing about homomorphic encryption: even though the encrypted sample looks like gibberish, it still allows us to perform computations (like searching for a specific string, or any other computation, really).

That said, homomorphic encryption has one main disadvantage: it “translates” simple computations into heavy ones – things that would have taken milliseconds on a standard PC all of a sudden take ages. So the key challenge in using homomorphic encryption is expressing the computation as simply as possible so that it would be executed on the encrypted sample at nearly the same speed as it would on an unencrypted sample. What does “simple” mean? Try calculating in your head, without using a calculator, the product of two large numbers, say 99x53. It’s a difficult task, but if we simplify the expression, writing it as 100x53-53, we translated a complex task into one that is much simpler.

That was this year’s iDash challenge, and our solution won third place. We designed an entire system that includes software for the client-end (like the HMO from our previous example) and the server-end (the main entity). The client encrypted their patients’ samples and sent those to the server; the server performed a homomorphic computation over the encrypted samples. Throughout the process, the samples remain encrypted, as do all the data computed from them. Once processed, the server sends out the encrypted result (i.e., the variant type) to the client, who can then decrypt it. Our solution achieved near-perfect precision (99.8%-100% correct variant classification) with excellent runtimes (few milliseconds per client, less than one millisecond per server).”

Last Updated Date : 16/12/2021