Metagenomic binning based on unsupervised extreme learning machine
Fecha
2023Resumen
Metagenomics studies the genetic information of microbial communities in different contexts. As metagenomic DNA is often fragmented and then sequenced into small reads, these reads can be assembled into longer sequences called contigs. An important step in the metagenomic analysis pipeline is Binning, which corresponds to the classification (supervised) or clustering (unsupervised) of reads or contigs. In the case of unsupervised Binning, several Machine Learning algorithms that use DNA sequence descriptors, such as k-mers Frequency and GC Content to perform clustering, have been employed. This paper proposes the use of Unsupervised Extreme Learning Machines (US-ELM) for Metagenomic Binning. The experiments use three datasets with different numbers of species present, and compare the results obtained by US-ELM with respect to the k-means and Maximization Expectation (ME) algorithms. The performance comparison employed metrics widely used in the problem, such as Accuracy, Rand�s index, and Clustering Computation Time. From the experiments, we can see that USELM windenly outperforms the other two clustering methods in accuracy. In terms of computational cost, US-ELM is comparable to k-means, and both algorithms are much faster than EM. Numerical results show the interesting potential of the US-ELM algorithm in the metagenomic binning problem.
Fuente
IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON), Valdivia, Chile, 1-6Link de Acceso
Click aquí para ver el documentoIdentificador DOI
doi.org/10.1109/CHILECON60335.2023.10418667Colecciones
La publicación tiene asociados los siguientes ficheros de licencia: