タイトル：A Model for Optimal Assignment of Non-Uniquely Mapped NGS Reads in DNA Regions of Duplications or Deletions 著者名：Rituparna Sinha; Rajat K. Pal; Rajat K. De 雑誌名：2022 2nd International Conference on Intelligent Technologies (CONIT) 号数：- DOI:10.1109/CONIT55038.2022.9848131 Abstract：Massively parallel sequencers have enabled genome sequences to be available at a very low cost and price, which opened huge scope on analyzing human genome sequences from different perspectives, thereby the association of diseases with genetic alterations gets further enlightened. However, the sequencing process and alignment of NGS technology based short reads suffer from various sequencing biases which needs to be addressed. In this work, the mappability bias occurring with respect to repeat rich regions of the DNA have been addressed in a novel approach. A model has been designed which considers all non-uniquely mapped reads and performs a pipeline of computations to allocate the reads to an optimal location, due to which the precise detection of breakpoints in the region of duplications and deletions are obtained. In addition, the application of this model for mappability bias correction, prior to the detection of structurally altered regions of the genome, leads to a better sensitivity value.

タイトル：Deletions and Insertions of the Symbol “0” and Asymmetric/ Unidirectional Error Control Codes for the L1 Metric 著者名：Luca G. Tallini; Nawaf Alqwaifly; Bella Bose 雑誌名：IEEE Transactions on Information Theory 号数：- DOI:10.1109/TIT.2022.3203594 Abstract：This paper gives some theory and efficient design of binary block codes capable of controlling the deletions of the symbol “0” ( referred to as 0-deletions) and/or the insertions of the symbol “0” ( referred to as 0-insertions). This problem of controlling 0-deletions and/or 0-insertions (referred to as 0-errors) is shown to be equivalent to the efficient design of L 1 metric asymmetric error control codes over the natural alphabet, IIN. In this way, it is shown that the t 0- insertion correcting codes are actually capable of controlling much more; namely, they can correct t 0-errors, detect ( t + 1) 0-errors and, simultaneously, detect all occurrences of only 0-deletions or only 0- insertions in every received word (briefly, they are t -Symmetric 0- Error Correcting/( t +1)-Symmetric 0-Error Detecting/All Unidirectional 0-Error Detecting ( t -Sy0EC/( t + 1)-Sy0ED/AU0ED) codes). From the relations with the L 1 distance error control codes, new improved bounds are given for the optimal t 0-error correcting codes. Optimal non- systematic code designs are given. Decoding can be efficiently performed by algebraic means using the Extended Euclidean Algorithm (EEA).

タイトル：Bounds and Constructions for Insertion and Deletion Codes 著者名：Shu Liu; Chaoping Xing 雑誌名：IEEE Transactions on Information Theory 号数：- DOI:10.1109/TIT.2022.3199503 Abstract：Insertion and deletion (insdel for short) codes have recently attracted a lot of attention due to their applications in many interesting fields such as DNA storage, DNA analysis, race-track memory error correction and language processing. The present paper mainly studies limits and constructions of insdel codes. The paper can be divided into two parts. The first part focuses on various bounds, while the second part concentrates on constructions of insdel codes. Although the insdel-metric Singleton bound has been derived before, it is still unknown if there are any nontrivial codes achieving this bound. Our first result shows that any nontrivial insdel codes do not achieve the insdel-metric Singleton bound. The second bound shows that every [n, k] Reed-Solomon code has insdel distance upper bounded by 2n - 4k + 4 and it is known in literature that an [n, k] Reed-Solomon code can have insdel distance 2n - 4k+4 as long as the field size is sufficiently large. The third bound shows a trade-off between insdel distance and code alphabet size for codes achieving the Hamming-metric Singleton bound. In the second part of the paper, we first provide a non-explicit construction of nonlinear codes that can approach the insdel-metric Singleton bound arbitrarily when the code alphabet size is sufficiently large. The second construction gives two-dimensional Reed-Solomon codes of length n and insdel distance 2n - 4 with field size q = O(n5). The non-explicit construction of insdel codes is based on constant-weight L1-codes that are introduced in this paper. We first establish a relation between constant-weight L1-codes and insdel codes. Based on this relation, we construct constant-weight L1-codes with reasonable parameters and subsequently give insdel codes approaching the insdel-metric Singleton bound. Via automorphism group of rational function field, we provide a necessary and sufficient condition under which a two dimensional Reed-Solomon code of length n has insdel distance 2n - 4. Based on this criterion, we present a construction of q-ary two-dimensional Reed-Solomon codes of length n and insdel distance 2n - 4 with q = O(n5). Though this is worse than the current best field size, we provide a new angle to look into the problem.

タイトル：Beyond Single-Deletion Correcting Codes: Substitutions and Transpositions 著者名：Ryan Gabrys; Venkatesan Guruswami; João Ribeiro; Ke Wu 雑誌名：IEEE Transactions on Information Theory 号数：- DOI:10.1109/TIT.2022.3202856 Abstract：We consider the problem of designing low-redundancy codes in settings where one must correct deletions in conjunction with substitutions or adjacent transpositions; a combination of errors that is usually observed in DNA-based data storage. One of the most basic versions of this problem was settled more than 50 years ago by Levenshtein, who proved that binary Varshamov-Tenengolts codes correct one arbitrary edit error, i.e., one deletion or one substitution, with nearly optimal redundancy. However, this approach fails to extend to many simple and natural variations of the binary single-edit error setting. In this work, we make progress on the code design problem above in three such variations: • We construct linear-time encodable and decodable length- n non-binary codes correcting a single edit error with nearly optimal redundancy log n + O (log log n ), providing an alternative simpler proof of a result by Cai, Chee, Gabrys, Kiah, and Nguyen (IEEE Trans. Inf. Theory 2021). This is achieved by employing what we call weighted VT sketches , a new notion that may be of independent interest. • We show the existence of a binary code correcting one deletion or one adjacent transposition with nearly optimal redundancy log n + O (log log n ). • We construct linear-time encodable and list-decodable binary codes with list-size 2 for one deletion and one substitution with redundancy 4 log n + O (log log n ). This matches the Gilbert-Varshamov existential bound up to an O (log log n ) additive term.