Efficient Privacy-Preserving General Edit Distance and Beyond Ruiyu Zhu Yan Huang Indiana University Indiana University Email:
[email protected] Email:
[email protected] Abstract—Edit distance is an important non-linear metric that secure computation protocols such as weighted edit distance, has many applications ranging from matching patient genomes Needleman-Wunsch, longest common subsequence (LCS), and to text-based intrusion detection. Depends on the application, heaviest common subsequence (HCS), using all existing ap- related string-comparison metrics, such as weighted edit distance, Needleman-Wunsch distance, longest common subsequences, and plicable optimizations including fixed-key hardware AES [12], heaviest common subsequences, can usually fit better than the [15], Half-Gate garbling [13], free-XOR technique [11]. We basic edit distance. When these metrics need to be calculated on report the performance of these protocols in the “Best Prior” sensitive input strings supplied by mutually distrustful parties, row of TableI, as well as in the performance charts of Figure4, it is more desirable but also more challenging to compute 5 in Section V-A and use them as baselines to evaluate our them in privacy-preserving ways. In this paper, we propose efficient secure computation protocols for private edit distance as new approach. Note that our baseline performance numbers well as several generalized applications including weighted edit are already much better than any generic protocols we can distance (with potentially content-dependent weights), longest find in the literature, simply because we have, for the first common subsequence, and heaviest common subsequence. Our time, applied the most recent optimizations (such as Half- protocols run 20+ times faster and use an order-of-magnitude Gates, efficient AESNI-based garbling, and highly customized less bandwidth than their best previous counterparts.