削除符号論文メモ:2022/11/25

 

タイトル: Error Probability Bounds for Coded-Index DNA Storage Systems
著者名: Nir Weinberger
雑誌名: IEEE Transactions on Information Theory
号数: Volume: 68, Issue: 11
DOI: 10.1109/TIT.2022.3176917

Abstract: The DNA storage channel is considered, in which a codeword is comprised of M unordered DNA molecules. At reading time, N molecules are sampled with replacement, and then each molecule is sequenced. A coded-index concatenated-coding scheme is considered, in which the m th molecule of the codeword is restricted to a subset of all possible molecules (an inner code), which is unique for each m . The decoder has low-complexity, and is based on first decoding each molecule separately (the inner code), and then decoding the sequence of molecules (an outer code). Only mild assumptions are made on the sequencing channel, in the form of the existence of an inner code and decoder with vanishing error. 
The error probability of a random code as well as an expurgated code is analyzed and shown to decay exponentially with N . This establishes the importance of increasing the coverage depth N/M in order to obtain low error probability.

 

 

タイトル: Correcting Deletions With Multiple Reads
著者名: Johan Chrisnata; Han Mao Kiah; Eitan Yaakobi
雑誌名: IEEE Transactions on Information Theory
号数: Volume: 68, Issue: 11
DOI: 10.1109/TIT.2022.3184868

Abstract: The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication scenario where the sender transmits a codeword from some codebook and the receiver obtains multiple noisy reads of the codeword. Motivated by modern storage devices, we introduced a variant of the problem where the number of noisy reads N is fixed. Of significance, for the single-deletion channel, using log_2log_2(n)+O(1) redundant bits, we designed a reconstruction code of length n that reconstructs codewords from two distinct noisy reads (Cai et al. , 2021). In this work, we show that log_2log_2(n)−O(1) 
redundant bits are necessary for such reconstruction codes, thereby, demonstrating the optimality of the construction. Furthermore, we show that these reconstruction codes can be used in t -deletion channels (with t⩾2) to uniquely reconstruct codewords from n^(t−1)/(t−1)!+O(n^(t−2)) distinct noisy reads. For the two-deletion channel, using higher order VT syndromes and certain runlength constraints, we designed the class of higher order constrained shifted VT code with 2log_2(n)+o(log_2(n)) redundancy bits that can reconstruct any codeword from any N⩾5 of its length- (n−2) subsequences.

 

 

タイトル: On the Reverse-Complement String-Duplication System
著者名: Eyar Ben-Tolila; Moshe Schwartz
雑誌名: IEEE Transactions on Information Theory
号数: Volume: 68, Issue: 11
DOI: 10.1109/TIT.2022.3182873

Abstract: Motivated by DNA storage in living organisms, and by known biological mutation processes, we study the reverse-complement string-duplication system. We fully classify the conditions under which the system has full expressiveness, for all alphabets and all fixed duplication lengths. We then focus on binary systems with duplication length 2 and prove that they have full capacity, yet surprisingly, have zero entropy-rate. Finally, by using binary single burst-insertion correcting codes, we construct codes that correct a single reverse-
complement duplication of odd length, over any alphabet. The redundancy (in bits) of the constructed code does not depend on the alphabet size.

 

タイトル: Weight Enumerators and Cardinalities for Number-Theoretic Codes
著者名: Takayuki Nozaki
雑誌名: IEEE Transactions on Information Theory
号数: Volume: 68, Issue: 11
DOI: 10.1109/TIT.2022.3184776

Abstract: The number-theoretic code is a class of codes defined by single or multiple congruences. These codes are mainly used for correcting insertion and deletion errors, and for correcting asymmetric errors. This paper presents a formula for a generalization of the complete weight enumerator for the number-theoretic codes. This formula allows us to derive the weight enumerators and cardinalities for the number-theoretic codes. As a special case, this paper provides the Hamming weight enumerators and cardinalities of the non-binary Tenengolts’ codes, correcting single insertion or deletion. Moreover, we show that the formula deduces the MacWilliams identity for the linear codes over the ring of integers modulo r.

まる飲み

家の庭で鳥が立ち止まっていた。

良く見ると、魚を銜えている。

食べたいのに、魚が大きくて身動きができなくなっているようだ。

魚は尾びれをピクピクと動かしている。

しばしの格闘の末、ついに、魚の向きが変わった。

魚の体が、鳥のくちばしと並行に。

しかし、魚の体が大きくて、くちばしから大きくはみ出している。

 

ついに、まる飲みに成功。

魚の体が大きすぎたのか、鳥の様子がおかしい。

体を立てている。

良く見ると、くちばしから尾びれがチラチラと飛び出しているのが見える。

このまましばらくの間、鳥はじっと止まっていた。

僕は家を出る時間になったため、観察を終わりにした。

 

翌日、おそらく同じ鳥が庭に現れた。ちゃんと消化できたのだろう。


削除符号論文メモ:2022/10/21

タイトル:Deterministic Document Exchange Protocols, and Almost Optimal 
Binary Codes for Edit Errors
著者名: Kuan Cheng; Zhengzhong Jin; Xin Li; Ke Wu
雑誌名:Journal of the ACM
号数:-
DOI:https://doi.org/10.1145/3561046

Abstract:We study two basic problems regarding edit errors, document 
exchange and error correcting codes. Here, two parties try to exchange 
two strings with length roughly n and edit distance at most k, or one 
party tries to send a string of length n to another party through a 
channel that can introduce at most k edit errors. The goal is to use the 
least amount of communication or redundancy possible. Both problems have 
been extensively studied for decades, and in this paper we focus on 
deterministic document exchange protocols and binary codes for 
insertions and deletions (insdel codes). It is known that for small k (e.
g., k ≤ n/4), in both problems the optimal communication or redundancy 
size is Θ(klognk). In particular, this implies the existence of binary 
codes that can correct ε fraction of edit errors with rate 1−Θ(εlog(1ε)).
 However, known constructions are far from achieving these bounds.

In this paper we significantly improve previous results on both problems.
 For document exchange, we give an efficient deterministic protocol with 
communication complexity O(klog2nk). This significantly improves the 
previous best known deterministic protocol, which has communication 
complexity O(k2 + klog 2n) [4]. For binary insdel codes, we obtain the 
following results:

(1)

An explicit binary insdel code with redundancy O(klog2nk). In particular 
this implies an explicit family of binary insdel codes that can correct 
ε fraction of insertions and deletions with rate 1−O(εlog2(1ε))=1−O˜(ε). 
This significantly improves the previous best known result which only 
achieves rate 1−O˜(ε√) [15], [14], and is optimal up to a log(1ε) factor.

(2)

An explicit binary insdel code with redundancy O(klog n). This 
significantly improves the previous best known result of [6], which only 
works for constant k and has redundancy O(k2log klog n); and that of [4],
 which has redundancy O(k2 + klog 2n). Our code has optimal redundancy 
for k ≤ n1 − α, any constant 0 < α < 1. This is the first explicit 
construction of binary insdel codes that has optimal redundancy for a 
wide range of error parameters k.

In obtaining our results we introduce several new techniques. Most 
notably, we introduce the notion of ε-self matching hash functions and ε
-synchronization hash functions. We believe our techniques can have 
further applications in the literature.

 

タイトル:Constrained Channel Capacity for DNA-based Data Storage 
Systems
著者名:Kaixin Fan; Huaming Wu; Zihui Yan
雑誌名:IEEE Communications Letters
号数:-
DOI:10.1109/LCOMM.2022.3212200

Abstract:Deoxyribonucleic acid (DNA)-based data storage has grown 
rapidly due to its advantages with the increase in infrequently large 
amounts of data. However, when the maximum homopolymer runlength (RLL) 
of the DNA strand is large and the GC-content is either too high or too 
low, the DNA synthesis and sequencing processes are prone to 
substitution, deletion and insertion errors. To reduce errors in DNA 
synthesis and sequencing, we require that the DNA storage channel 
satisfies both k -RLL and strong-( l ,δ)-locally-GC-balanced constraints,
 where the former refers to the maximum homopolymer runlength in each 
sequence is at most k , and the latter refers to the number of G and C 
of every length-( l ’ ≥ l) subsequence is bounded between [ l ’/2-δ, l ’
/2+δ]. This constrained channel allows DNA data storage system to be 
less prone to errors during synthesis and sequencing and improves the 
success rate of Polymerase Chain Reaction (PCR) amplification. We 
propose a method to calculate the channel capacity. In particular, we 
provide a relationship between the 4-ary constrained channel capacity 
and the 2-ary constrained channel capacity, which makes it simpler to 
calculate the 4-ary constrained channel capacity.

 

タイトル:Generalization of Fibonacci Codes to the Non-Binary Case
著者名:Shmuel T. Klein; Tamar C. Serebro; Dana Shapira
雑誌名:IEEE Access
号数:-
DOI:10.1109/ACCESS.2022.3214820

Abstract:Motivated by expected technological developments in which the 
basic unit of storage might possibly be d -ary elements, with d > 2, and 
not just bits, we extend the traditional Fibonacci code to non-binary 
codes of higher order, and prove their theoretical properties, as 
follows: (1) these codes are fixed in advance , and therefore do not 
need to be generated for each new probability distribution, yielding 
very simple and fast encoding and decoding procedures; (2) the codes are 
prefix-free : no codeword in the code is the prefix of any other 
codeword; (3) the codes are complete : if one adjoins any other d -ary 
string as an additional codeword, the obtained extended set of codewords 
is not uniquely decipherable anymore, and is therefore not useful as it 
might lead to ambiguities; and (4) the codes provide robustness against 
decoding errors: the number of lost codewords in case of an error will 
be limited. An error is defined as a d -ary digit changing its value, or 
an insertion of an extraneous d -ary digit, or an erroneous deletion of 
a d -ary digit. We provide experimental results on the compression 
performance, illustrating that the compression efficiency of non-binary 
Fibonacci codes is very close to the savings achieved by the 
corresponding non-binary Huffman coding of the same order, while 
providing simplicity and robustness.