削除符号論文メモ:2022/10/21

タイトル:Deterministic Document Exchange Protocols, and Almost Optimal 
Binary Codes for Edit Errors
著者名: Kuan Cheng; Zhengzhong Jin; Xin Li; Ke Wu
雑誌名:Journal of the ACM
号数:-
DOI:https://doi.org/10.1145/3561046

Abstract:We study two basic problems regarding edit errors, document 
exchange and error correcting codes. Here, two parties try to exchange 
two strings with length roughly n and edit distance at most k, or one 
party tries to send a string of length n to another party through a 
channel that can introduce at most k edit errors. The goal is to use the 
least amount of communication or redundancy possible. Both problems have 
been extensively studied for decades, and in this paper we focus on 
deterministic document exchange protocols and binary codes for 
insertions and deletions (insdel codes). It is known that for small k (e.
g., k ≤ n/4), in both problems the optimal communication or redundancy 
size is Θ(klognk). In particular, this implies the existence of binary 
codes that can correct ε fraction of edit errors with rate 1−Θ(εlog(1ε)).
 However, known constructions are far from achieving these bounds.

In this paper we significantly improve previous results on both problems.
 For document exchange, we give an efficient deterministic protocol with 
communication complexity O(klog2nk). This significantly improves the 
previous best known deterministic protocol, which has communication 
complexity O(k2 + klog 2n) [4]. For binary insdel codes, we obtain the 
following results:

(1)

An explicit binary insdel code with redundancy O(klog2nk). In particular 
this implies an explicit family of binary insdel codes that can correct 
ε fraction of insertions and deletions with rate 1−O(εlog2(1ε))=1−O˜(ε). 
This significantly improves the previous best known result which only 
achieves rate 1−O˜(ε√) [15], [14], and is optimal up to a log(1ε) factor.

(2)

An explicit binary insdel code with redundancy O(klog n). This 
significantly improves the previous best known result of [6], which only 
works for constant k and has redundancy O(k2log klog n); and that of [4],
 which has redundancy O(k2 + klog 2n). Our code has optimal redundancy 
for k ≤ n1 − α, any constant 0 < α < 1. This is the first explicit 
construction of binary insdel codes that has optimal redundancy for a 
wide range of error parameters k.

In obtaining our results we introduce several new techniques. Most 
notably, we introduce the notion of ε-self matching hash functions and ε
-synchronization hash functions. We believe our techniques can have 
further applications in the literature.

 

タイトル:Constrained Channel Capacity for DNA-based Data Storage 
Systems
著者名:Kaixin Fan; Huaming Wu; Zihui Yan
雑誌名:IEEE Communications Letters
号数:-
DOI:10.1109/LCOMM.2022.3212200

Abstract:Deoxyribonucleic acid (DNA)-based data storage has grown 
rapidly due to its advantages with the increase in infrequently large 
amounts of data. However, when the maximum homopolymer runlength (RLL) 
of the DNA strand is large and the GC-content is either too high or too 
low, the DNA synthesis and sequencing processes are prone to 
substitution, deletion and insertion errors. To reduce errors in DNA 
synthesis and sequencing, we require that the DNA storage channel 
satisfies both k -RLL and strong-( l ,δ)-locally-GC-balanced constraints,
 where the former refers to the maximum homopolymer runlength in each 
sequence is at most k , and the latter refers to the number of G and C 
of every length-( l ’ ≥ l) subsequence is bounded between [ l ’/2-δ, l ’
/2+δ]. This constrained channel allows DNA data storage system to be 
less prone to errors during synthesis and sequencing and improves the 
success rate of Polymerase Chain Reaction (PCR) amplification. We 
propose a method to calculate the channel capacity. In particular, we 
provide a relationship between the 4-ary constrained channel capacity 
and the 2-ary constrained channel capacity, which makes it simpler to 
calculate the 4-ary constrained channel capacity.

 

タイトル:Generalization of Fibonacci Codes to the Non-Binary Case
著者名:Shmuel T. Klein; Tamar C. Serebro; Dana Shapira
雑誌名:IEEE Access
号数:-
DOI:10.1109/ACCESS.2022.3214820

Abstract:Motivated by expected technological developments in which the 
basic unit of storage might possibly be d -ary elements, with d > 2, and 
not just bits, we extend the traditional Fibonacci code to non-binary 
codes of higher order, and prove their theoretical properties, as 
follows: (1) these codes are fixed in advance , and therefore do not 
need to be generated for each new probability distribution, yielding 
very simple and fast encoding and decoding procedures; (2) the codes are 
prefix-free : no codeword in the code is the prefix of any other 
codeword; (3) the codes are complete : if one adjoins any other d -ary 
string as an additional codeword, the obtained extended set of codewords 
is not uniquely decipherable anymore, and is therefore not useful as it 
might lead to ambiguities; and (4) the codes provide robustness against 
decoding errors: the number of lost codewords in case of an error will 
be limited. An error is defined as a d -ary digit changing its value, or 
an insertion of an extraneous d -ary digit, or an erroneous deletion of 
a d -ary digit. We provide experimental results on the compression 
performance, illustrating that the compression efficiency of non-binary 
Fibonacci codes is very close to the savings achieved by the 
corresponding non-binary Huffman coding of the same order, while 
providing simplicity and robustness.