Performance Improvement of K-mer counting in DNA Sequence using Cache efficient Bloom filter and recursive hash function

Prakasam, Elakkiya; Manoharan, Arun

doi:https://dx.doi.org/10.12785/ijcds/120182

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 12
→
Issue 01
→
View Item

dc.contributor.author	Prakasam, Elakkiya
dc.contributor.author	Manoharan, Arun
dc.date.accessioned	2022-10-30T21:25:38Z
dc.date.available	2022-10-30T21:25:38Z
dc.date.issued	2022-10-30
dc.identifier.issn	2210-142X
dc.identifier.uri	https://journal.uob.edu.bh:443/handle/123456789/4667
dc.description.abstract	K-mer (k length substrings in a DNA sequence) counting plays an important role in genome assembly, sequence analysis, and error correction in sequence reads. In the Gene data sets, a single occurrence of k-mers occupies more storing space with a higher possibility of sequencing errors. Hence, error correction plays a significant role in eradicating such uninformative k-mers. Bloom filters data structure has been frequently used in k-mer counting for determining thek-mer occurence at least twice in a data set of a DNA sequence owing to its less memory usage and its fast querying. The standard bloom filer used in k-mer counting is not cache efficient as it accesses the whole bloom filter memory for single k-mer insertion/query. Also the Murmur hash consumes more time for hashing the k-mers from the Input Sequence. In this proposed work, we have improved the process of k-mer counting further by adopting different bloom architecture called a partitioned bloom data structure. The proposed architecture is cache efficient and uses only one memory access instead of in the standard bloom filte’s k memory accesses. The rolling hash in ntHash function is used for hashing the k-mers from the input sequence has further reduced the hash computation time of k-mers. The proposed architecture was compared with standard architecture and the results showed that the proposed k-mer counter minimized significantly the k-mers loading and querying time from the memory for different data sets.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Bahrain	en_US
dc.subject	K-mer counting, Bloom filter,Recursive Hash function,Genome Assembly	en_US
dc.title	Performance Improvement of K-mer counting in DNA Sequence using Cache efficient Bloom filter and recursive hash function	en_US
dc.type	Article	en_US
dc.identifier.doi	https://dx.doi.org/10.12785/ijcds/120182
dc.volume	12	en_US
dc.issue	1	en_US
dc.pagestart	1019	en_US
dc.pageend	1027	en_US
dc.contributor.authoraffiliation	School of Electronics Engineering,Vellore Institute of Technology, Vellore, Tamilnadu, India	en_US
dc.source.title	International Journal of Computing and Digital Systems	en_US
dc.abbreviatedsourcetitle	IJCDS	en_US