タイトル:Deterministic Document Exchange Protocols, and Almost Optimal
Binary Codes for Edit Errors
著者名: Kuan Cheng; Zhengzhong Jin; Xin Li; Ke Wu
雑誌名:Journal of the ACM
号数:-
DOI:https://doi.org/10.1145/3561046
Abstract:We study two basic problems regarding edit errors, document
exchange and error correcting codes. Here, two parties try to exchange
two strings with length roughly n and edit distance at most k, or one
party tries to send a string of length n to another party through a
channel that can introduce at most k edit errors. The goal is to use the
least amount of communication or redundancy possible. Both problems have
been extensively studied for decades, and in this paper we focus on
deterministic document exchange protocols and binary codes for
insertions and deletions (insdel codes). It is known that for small k (e.
g., k ≤ n/4), in both problems the optimal communication or redundancy
size is Θ(klognk). In particular, this implies the existence of binary
codes that can correct ε fraction of edit errors with rate 1−Θ(εlog(1ε)).
However, known constructions are far from achieving these bounds.
In this paper we significantly improve previous results on both problems.
For document exchange, we give an efficient deterministic protocol with
communication complexity O(klog2nk). This significantly improves the
previous best known deterministic protocol, which has communication
complexity O(k2 + klog 2n) [4]. For binary insdel codes, we obtain the
following results:
(1)
An explicit binary insdel code with redundancy O(klog2nk). In particular
this implies an explicit family of binary insdel codes that can correct
ε fraction of insertions and deletions with rate 1−O(εlog2(1ε))=1−O˜(ε).
This significantly improves the previous best known result which only
achieves rate 1−O˜(ε√) [15], [14], and is optimal up to a log(1ε) factor.
(2)
An explicit binary insdel code with redundancy O(klog n). This
significantly improves the previous best known result of [6], which only
works for constant k and has redundancy O(k2log klog n); and that of [4],
which has redundancy O(k2 + klog 2n). Our code has optimal redundancy
for k ≤ n1 − α, any constant 0 < α < 1. This is the first explicit
construction of binary insdel codes that has optimal redundancy for a
wide range of error parameters k.
In obtaining our results we introduce several new techniques. Most
notably, we introduce the notion of ε-self matching hash functions and ε
-synchronization hash functions. We believe our techniques can have
further applications in the literature.
タイトル:Constrained Channel Capacity for DNA-based Data Storage
Systems
著者名:Kaixin Fan; Huaming Wu; Zihui Yan
雑誌名:IEEE Communications Letters
号数:-
DOI:10.1109/LCOMM.2022.3212200
Abstract:Deoxyribonucleic acid (DNA)-based data storage has grown
rapidly due to its advantages with the increase in infrequently large
amounts of data. However, when the maximum homopolymer runlength (RLL)
of the DNA strand is large and the GC-content is either too high or too
low, the DNA synthesis and sequencing processes are prone to
substitution, deletion and insertion errors. To reduce errors in DNA
synthesis and sequencing, we require that the DNA storage channel
satisfies both k -RLL and strong-( l ,δ)-locally-GC-balanced constraints,
where the former refers to the maximum homopolymer runlength in each
sequence is at most k , and the latter refers to the number of G and C
of every length-( l ’ ≥ l) subsequence is bounded between [ l ’/2-δ, l ’
/2+δ]. This constrained channel allows DNA data storage system to be
less prone to errors during synthesis and sequencing and improves the
success rate of Polymerase Chain Reaction (PCR) amplification. We
propose a method to calculate the channel capacity. In particular, we
provide a relationship between the 4-ary constrained channel capacity
and the 2-ary constrained channel capacity, which makes it simpler to
calculate the 4-ary constrained channel capacity.
タイトル:Generalization of Fibonacci Codes to the Non-Binary Case
著者名:Shmuel T. Klein; Tamar C. Serebro; Dana Shapira
雑誌名:IEEE Access
号数:-
DOI:10.1109/ACCESS.2022.3214820
Abstract:Motivated by expected technological developments in which the
basic unit of storage might possibly be d -ary elements, with d > 2, and
not just bits, we extend the traditional Fibonacci code to non-binary
codes of higher order, and prove their theoretical properties, as
follows: (1) these codes are fixed in advance , and therefore do not
need to be generated for each new probability distribution, yielding
very simple and fast encoding and decoding procedures; (2) the codes are
prefix-free : no codeword in the code is the prefix of any other
codeword; (3) the codes are complete : if one adjoins any other d -ary
string as an additional codeword, the obtained extended set of codewords
is not uniquely decipherable anymore, and is therefore not useful as it
might lead to ambiguities; and (4) the codes provide robustness against
decoding errors: the number of lost codewords in case of an error will
be limited. An error is defined as a d -ary digit changing its value, or
an insertion of an extraneous d -ary digit, or an erroneous deletion of
a d -ary digit. We provide experimental results on the compression
performance, illustrating that the compression efficiency of non-binary
Fibonacci codes is very close to the savings achieved by the
corresponding non-binary Huffman coding of the same order, while
providing simplicity and robustness.