Correlation-based feature selection of single cell transcriptomics data from multiple sources

Mitić, Nenad; Malkov, Saša; Maljković Ružičić, Mirjana; Veljković, Aleksandar N.; Čukić, Ivan; Lin, Xin; Lyu, Minjie; Brusić, Vladimir

Please use this identifier to cite or link to this item: https://research.matf.bg.ac.rs/handle/123456789/1407

DC Field	Value	Language
dc.contributor.author	Mitić, Nenad	en_US
dc.contributor.author	Malkov, Saša	en_US
dc.contributor.author	Maljković Ružičić, Mirjana	en_US
dc.contributor.author	Veljković, Aleksandar N.	en_US
dc.contributor.author	Čukić, Ivan	en_US
dc.contributor.author	Lin, Xin	en_US
dc.contributor.author	Lyu, Minjie	en_US
dc.contributor.author	Brusić, Vladimir	en_US
dc.date.accessioned	2025-01-16T07:40:54Z	-
dc.date.available	2025-01-16T07:40:54Z	-
dc.date.issued	2025-01-06	-
dc.identifier.uri	https://research.matf.bg.ac.rs/handle/123456789/1407	-
dc.description.abstract	When applying data mining or machine learning techniques to large and diverse datasets, it is often necessary to construct descriptive and predictive models. Descriptive models are used to discover relationships between the attributes of the data while predictive models identify the characteristics of the data that will be collected in the future. Bioinformatics data is high-dimensional, making it practically impossible to apply the majority of “classical” algorithms for classification and clustering. Even if the algorithms are useful, training with large multidimensional data significantly increases processing time. The algorithms specialized for working with high-dimensional data often cannot process data containing large data sets with several thousand dimensions (features). Dimension reduction methods (such as PCA) do not provide satisfactory results, and also obscure the meaning of the original attributes in the data. For the constructed models to be usable, they must fulfill the requirement of scalability, as the amount of bioinformatics data is increasing rapidly. Furthermore, the significance of individual data features can differ from source to source. This paper describes an attribute selection method for efficient classification of high-dimensional (30,698) transcriptomics data collected from different sources. The proposed method was tested with 22 classification algorithms. The classification results for the selected attribute sets are comparable to the results for the complete attribute set.	en_US
dc.language.iso	en	en_US
dc.publisher	Springer	en_US
dc.relation.ispartof	Journal of Big Data	en_US
dc.subject	Classification	en_US
dc.subject	Feature selection	en_US
dc.subject	High-dimensional data	en_US
dc.subject	Transcriptomics data	en_US
dc.title	Correlation-based feature selection of single cell transcriptomics data from multiple sources	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1186/s40537-024-01051-z	-
dc.identifier.scopus	2-s2.0-85214266065	-
dc.identifier.isi	001390532000001	-
dc.identifier.url	https://api.elsevier.com/content/abstract/scopus_id/85214266065	-
dc.contributor.affiliation	Informatics and Computer Science	en_US
dc.contributor.affiliation	Informatics and Computer Science	en_US
dc.contributor.affiliation	Informatics and Computer Science	en_US
dc.contributor.affiliation	Informatics and Computer Science	en_US
dc.relation.issn	2196-1115	en_US
dc.description.rank	M21a+	en_US
dc.relation.firstpage	Article no. 4	en_US
dc.relation.volume	12	en_US
dc.relation.issue	1	en_US
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.grantfulltext	none	-
item.cerifentitytype	Publications	-
item.fulltext	No Fulltext	-
item.openairetype	Article	-
item.languageiso639-1	en	-
crisitem.author.dept	Informatics and Computer Science	-
crisitem.author.dept	Informatics and Computer Science	-
crisitem.author.dept	Informatics and Computer Science	-
crisitem.author.dept	Informatics and Computer Science	-
crisitem.author.orcid	0000-0002-4385-6322	-
crisitem.author.orcid	0000-0002-4390-9631	-
crisitem.author.orcid	0000-0001-5358-0828	-
Appears in Collections:	Research outputs

Show simple item record

SCOPUS^TM
Citations

2

checked on Feb 8, 2026

Page view(s)

7

checked on Jan 19, 2025

Google Scholar^TM

Check

SCOPUS^TM
Citations

Page view(s)

Google Scholar^TM

Altmetric

Altmetric

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM