Dr. Or Sheffet is Developing Algorithms that Protect our Privacy
In this technological age, information about each and every one of us – personal and sensitive information – is collected in large databases: medical information is collected in hospitals, financial information is collected by banks and credit card companies, and even our ID and residential address are kept by government offices or the bureau of statistics. Due to the sensitive nature of such information, maintaining its privacy is imperative.
So how do we mitigate between these two conflicting goals? This is the main field of study of Dr. Or Sheffet (39), who joined the Faculty of Engineering and the Center for Research in Cyber Security this academic year. Sheffet, a theoretician specializing in differential privacy– a course he is teaching this semester – has joined the faculty after attaining his BSc from the Hebrew University, an MSc from Weizmann Institute, a PhD from Carnegie Mellon University, postdoctoral fellowships in Berkeley and in Harvard, and a faculty position at the University of Alberta in Canada. “I’ve always designed algorithms,” admits Sheffet, “recently, they’re mostly algorithms that learn how to analyze data while protecting privacy.”
Over the years, plenty of ideas and heuristic methods were considered for solving this issue. The most common of these is the idea of `anonymization’ by removing the fields generally referred to as ‘personal identifiers’ (names, addresses, ID numbers, etc.) from the data. “Such heuristics sound good at first,” says Sheffet, “but they come without any formal guarantee, and numerous studies have repeatedly shown that they don’t guarantee privacy.” Thus, in 2006, Dwork, McSherry, Nissim & Smith decided to tackle the subject with rigor. They posed the question of the definition of the term `privacy preserving algorithm’ and what properties should such an algorithm satisfy. “For example, one of the desired properties of a privacy preserving algorithm is that privacy should be maintained even after composition. Suppose today I analyze data using privacy preserving algorithm A, and tomorrow I use privacy preserving algorithm B; there shouldn’t be any way to combine the outputs of the two algorithms together and suddenly reveal some individual’s personal details,” explains Sheffet. “This is just one of several properties that such a privacy preserving algorithm must have. What Dwork et al did was to semantically define the term ‘privacy preserving algorithm’, by putting forth a definition that quantifies the privacy-loss the algorithm. These differentially private algorithms work by adding random noise during their computations, noise that masks the possible effect of a single individual’s data on the result of the computation; and so, the outputs of such algorithms do not allow me to infer whether any particular person was present in / absent from the data. This is the notion of ‘differential privacy’ – the algorithm uses random noise to maintain privacy.”
Since 2006, theoreticians and practitioners who specialize in differential privacy have been working on methods of data analysis and add random noise to various analyses in order to make them privacy-preserving. “The tension between the accuracy of the analysis and the magnitude of the noise we add so that we maintain privacy is always present,” says Sheffet, “but the beauty of this method is that you can quantify this tension and say, ‘with this-and-that magnitude of noise you allow for so-and-so privacy loss and the level of accuracy you lose is such-and-such’.” Nowadays, Sheffet brings this fascinating subject to the Faculty and is looking for students to join him. “I’m looking for strong students with a high level of mathematical understanding, and in particular students who are good with probabilistic analysis. I don’t have “sexy” robots or graphical demos that I can use to draw people into the field – all I do math, but if you love the field, you’re more than welcome to join me.”