タイトル:Deterministic Document Exchange Protocols, and Almost Optimal Binary Codes for Edit Errors 著者名: Kuan Cheng; Zhengzhong Jin; Xin Li; Ke Wu 雑誌名:Journal of the ACM 号数:- DOI:https://doi.org/10.1145/3561046 Abstract:We study two basic problems regarding edit errors, document exchange and error correcting codes. Here, two parties try to exchange two strings with length roughly n and edit distance at most k, or one party tries to send a string of length n to another party through a channel that can introduce at most k edit errors. The goal is to use the least amount of communication or redundancy possible. Both problems have been extensively studied for decades, and in this paper we focus on deterministic document exchange protocols and binary codes for insertions and deletions (insdel codes). It is known that for small k (e. g., k ≤ n/4), in both problems the optimal communication or redundancy size is Θ(klognk). In particular, this implies the existence of binary codes that can correct ε fraction of edit errors with rate 1−Θ(εlog(1ε)). However, known constructions are far from achieving these bounds. In this paper we significantly improve previous results on both problems. For document exchange, we give an efficient deterministic protocol with communication complexity O(klog2nk). This significantly improves the previous best known deterministic protocol, which has communication complexity O(k2 + klog 2n) [4]. For binary insdel codes, we obtain the following results: (1) An explicit binary insdel code with redundancy O(klog2nk). In particular this implies an explicit family of binary insdel codes that can correct ε fraction of insertions and deletions with rate 1−O(εlog2(1ε))=1−O˜(ε). This significantly improves the previous best known result which only achieves rate 1−O˜(ε√) [15], [14], and is optimal up to a log(1ε) factor. (2) An explicit binary insdel code with redundancy O(klog n). This significantly improves the previous best known result of [6], which only works for constant k and has redundancy O(k2log klog n); and that of [4], which has redundancy O(k2 + klog 2n). Our code has optimal redundancy for k ≤ n1 − α, any constant 0 < α < 1. This is the first explicit construction of binary insdel codes that has optimal redundancy for a wide range of error parameters k. In obtaining our results we introduce several new techniques. Most notably, we introduce the notion of ε-self matching hash functions and ε -synchronization hash functions. We believe our techniques can have further applications in the literature.
タイトル:Constrained Channel Capacity for DNA-based Data Storage Systems 著者名:Kaixin Fan; Huaming Wu; Zihui Yan 雑誌名:IEEE Communications Letters 号数:- DOI:10.1109/LCOMM.2022.3212200 Abstract:Deoxyribonucleic acid (DNA)-based data storage has grown rapidly due to its advantages with the increase in infrequently large amounts of data. However, when the maximum homopolymer runlength (RLL) of the DNA strand is large and the GC-content is either too high or too low, the DNA synthesis and sequencing processes are prone to substitution, deletion and insertion errors. To reduce errors in DNA synthesis and sequencing, we require that the DNA storage channel satisfies both k -RLL and strong-( l ,δ)-locally-GC-balanced constraints, where the former refers to the maximum homopolymer runlength in each sequence is at most k , and the latter refers to the number of G and C of every length-( l ’ ≥ l) subsequence is bounded between [ l ’/2-δ, l ’ /2+δ]. This constrained channel allows DNA data storage system to be less prone to errors during synthesis and sequencing and improves the success rate of Polymerase Chain Reaction (PCR) amplification. We propose a method to calculate the channel capacity. In particular, we provide a relationship between the 4-ary constrained channel capacity and the 2-ary constrained channel capacity, which makes it simpler to calculate the 4-ary constrained channel capacity.
タイトル:Generalization of Fibonacci Codes to the Non-Binary Case 著者名:Shmuel T. Klein; Tamar C. Serebro; Dana Shapira 雑誌名:IEEE Access 号数:- DOI:10.1109/ACCESS.2022.3214820 Abstract:Motivated by expected technological developments in which the basic unit of storage might possibly be d -ary elements, with d > 2, and not just bits, we extend the traditional Fibonacci code to non-binary codes of higher order, and prove their theoretical properties, as follows: (1) these codes are fixed in advance , and therefore do not need to be generated for each new probability distribution, yielding very simple and fast encoding and decoding procedures; (2) the codes are prefix-free : no codeword in the code is the prefix of any other codeword; (3) the codes are complete : if one adjoins any other d -ary string as an additional codeword, the obtained extended set of codewords is not uniquely decipherable anymore, and is therefore not useful as it might lead to ambiguities; and (4) the codes provide robustness against decoding errors: the number of lost codewords in case of an error will be limited. An error is defined as a d -ary digit changing its value, or an insertion of an extraneous d -ary digit, or an erroneous deletion of a d -ary digit. We provide experimental results on the compression performance, illustrating that the compression efficiency of non-binary Fibonacci codes is very close to the savings achieved by the corresponding non-binary Huffman coding of the same order, while providing simplicity and robustness.