削除符号論文メモ:2022/12/06

タイトル:A Model for Optimal Assignment of Non-Uniquely Mapped NGS Reads in DNA Regions of Duplications or Deletions
著者名:Rituparna Sinha; Rajat K. Pal; Rajat K. De
雑誌名:2022 2nd International Conference on Intelligent Technologies (CONIT)
号数:-
DOI:10.1109/CONIT55038.2022.9848131

Abstract:Massively parallel sequencers have enabled genome sequences to be available at a very low cost and price, which opened huge scope on analyzing human genome sequences from different perspectives, thereby the association of diseases with genetic alterations gets further enlightened. However, the sequencing process and alignment of NGS technology based short reads suffer from various sequencing biases which needs to be addressed. In this work, the mappability bias occurring with respect to repeat rich regions of the DNA have been addressed in a novel approach. A model has been designed which considers all non-uniquely mapped reads and performs a pipeline of computations to  allocate the reads to an optimal location, due to which the precise detection of 
breakpoints in the region of duplications and deletions are obtained. In addition, the application of this model for mappability bias correction, prior to the detection of structurally altered regions of the genome, leads to a better sensitivity value.
タイトル:Deletions and Insertions of the Symbol “0” and Asymmetric/
Unidirectional Error Control Codes for the L1 Metric
著者名:Luca G. Tallini; Nawaf Alqwaifly; Bella Bose
雑誌名:IEEE Transactions on Information Theory
号数:-
DOI:10.1109/TIT.2022.3203594

Abstract:This paper gives some theory and efficient design of binary 
block codes capable of controlling the deletions of the symbol “0” (
referred to as 0-deletions) and/or the insertions of the symbol “0” (
referred to as 0-insertions). This problem of controlling 0-deletions 
and/or 0-insertions (referred to as 0-errors) is shown to be equivalent 
to the efficient design of L 1 metric asymmetric error control codes 
over the natural alphabet, IIN. In this way, it is shown that the t 0-
insertion correcting codes are actually capable of controlling much more;
 namely, they can correct t 0-errors, detect ( t + 1) 0-errors and, 
simultaneously, detect all occurrences of only 0-deletions or only 0-
insertions in every received word (briefly, they are t -Symmetric 0-
Error Correcting/( t +1)-Symmetric 0-Error Detecting/All Unidirectional 
0-Error Detecting ( t -Sy0EC/( t + 1)-Sy0ED/AU0ED) codes). From the 
relations with the L 1 distance error control codes, new improved bounds 
are given for the optimal t 0-error correcting codes. Optimal non-
systematic code designs are given. Decoding can be efficiently performed 
by algebraic means using the Extended Euclidean Algorithm (EEA).
タイトル:Bounds and Constructions for Insertion and Deletion Codes
著者名:Shu Liu; Chaoping Xing
雑誌名:IEEE Transactions on Information Theory
号数:-
DOI:10.1109/TIT.2022.3199503

Abstract:Insertion and deletion (insdel for short) codes have recently attracted a lot of attention due to their applications in many interesting fields such as DNA storage, DNA analysis, race-track memory error correction and language processing. The present paper mainly studies limits and constructions of insdel codes. The paper can be divided into two parts. The first part focuses on various bounds, while the second part concentrates on constructions of insdel codes. Although the insdel-metric Singleton bound has been derived before, it is still unknown if there are any nontrivial codes achieving this bound. Our first result shows that any nontrivial insdel codes do not achieve the insdel-metric Singleton bound. The second bound shows that every [n, k] Reed-Solomon code has insdel distance upper bounded by 2n - 4k + 4 and it is known in literature that an [n, k] Reed-Solomon code can have insdel distance 2n - 4k+4 as long as the field size is sufficiently large. The third bound shows a trade-off between insdel distance and code alphabet size for codes achieving the Hamming-metric Singleton bound. In the second part of the paper, we first provide a non-explicit construction of nonlinear codes that can approach the insdel-metric Singleton bound arbitrarily when the code alphabet size is sufficiently large. The second construction gives two-dimensional Reed-Solomon codes of length n and insdel distance 2n - 4 with field size q = O(n5). The non-explicit construction of insdel codes is based on constant-weight L1-codes that are introduced in this paper. We first establish a relation between constant-weight L1-codes and insdel codes. Based on this relation, we construct constant-weight L1-codes with reasonable parameters and subsequently give insdel codes approaching the insdel-metric Singleton bound. Via automorphism group of rational function field, we provide a necessary and sufficient condition under which a two
dimensional Reed-Solomon code of length n has insdel distance 2n - 4. 
Based on this criterion, we present a construction of q-ary two-dimensional Reed-Solomon codes of length n and insdel distance 2n - 4 with q = O(n5). Though this is worse than the current best field size, 
we provide a new angle to look into the problem.

 

タイトル:Beyond Single-Deletion Correcting Codes: Substitutions and 
Transpositions
著者名:Ryan Gabrys; Venkatesan Guruswami; João Ribeiro; Ke Wu
雑誌名:IEEE Transactions on Information Theory
号数:-
DOI:10.1109/TIT.2022.3202856

Abstract:We consider the problem of designing low-redundancy codes in settings where one must correct deletions in conjunction with substitutions or adjacent transpositions; a combination of errors that is usually observed in DNA-based data storage. One of the most basic versions of this problem was settled more than 50 years ago by Levenshtein, who proved that binary Varshamov-Tenengolts codes correct one arbitrary edit error, i.e., one deletion or one substitution, with nearly optimal redundancy. However, this approach fails to extend to many simple and natural variations of the binary single-edit error setting. In this work, we make progress on the code design problem above in three such variations: • We construct linear-time encodable and decodable length- n non-binary codes correcting a single edit error with nearly optimal redundancy log n + O (log log n ), providing an alternative simpler proof of a result by Cai, Chee, Gabrys, Kiah, and Nguyen (IEEE Trans. Inf. Theory 2021). This is achieved by employing what we call weighted VT sketches , a new notion that may be of independent interest. • We show the existence of a binary code correcting one deletion or one adjacent transposition with nearly optimal redundancy log n + O (log log n ). • We construct linear-time encodable and list-decodable binary codes with list-size 2 for one deletion and one substitution with redundancy 4 log n + O (log log n ). 
This matches the Gilbert-Varshamov existential bound up to an O (log log n ) additive term.