0

Random open syncmers

 1 year ago
source link: http://lh3.github.io/2022/10/21/random-open-syncmers
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Random open syncmers

21 October 2022

A kk-long sequence PP is a (kk,ss)-open-syncmer, s≤ks≤k, if P[1,s]P[1,s] is the smallest among all ss-mers in PP. Suppose function ϕϕ is a bijective hash function of kk-long sequences. PP is a random (kk,ss)-syncmer if ϕ(P)ϕ(P) is an open syncmer. Because we often map kk-mers to integers, ϕϕ can take the form of an invertible integer hash function. In practice, ϕϕ does not have to be a bijection. It can also map a sequence to an integer of a different length or even operate in the bit space (see the miniprot preprint).

As overlapping kk-mers have dependency, the definition of the original open syncmer often involves one more parameter to improve its quality. Original open syncmers also do not work well with protein sequences with varying amino acid frequency. Using a good hash function, random open syncmers do not have these problems.

I implemented random open syncmers in minimap2. In comparison to random minimizers of the same density, syncmers lead to better chaining scores but are more repetitive. This is partly because (kk,ww)-minimizers are generating kk-mers from a k+w−1k+w−1 window and to some extent, using slightly longer kk-mers in effect. Due to the repetitiveness, syncmers slow down minimap2 chaining a lot, similar to the observation made by Shaw and Yu (2022). I tried a few different syncmer configurations and found minimzers and syncmers are comparable overall. In practical implementation, it probably does not matter what strategy to use. Nonetheless, in theoretical analysis, random open syncmers are the better choice as they are largely independent of each other under a good hash function.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK