A robust approach for improved prediction of E.coli promoter gene sequences: combining feature selection, fuzzy weighted pre-processing and AIRS

Document Type : Original Article

Authors

Selcuk University, Electrical and Electronics Engineering Department, 42035, Konya, TURKEY.

Abstract

Abstract:
In this paper, a different hybrid approach based on combining Feature Selection, Fuzzy
Weighted Pre-processing and Artificial Immune Recognition System is proposed to
forecast the E.coli Promoter Gene Sequences, which has promoters in strings that
represent nucleotides (one of A, G, T, or C). The proposed approach comprises three
stages. In the first stage, the dimensionality of this dataset has been reduced to 4
attributes from 57 attributes by means of feature selection process by C4.5 decision tree
rules. In the second stage, fuzzy weighted pre-processing has been used to weight E.coli
Promoter Gene Sequences dataset that has 4 attributes in interval of [0,1]. Finally, AIRS
classifier, is inspried from immune system, has been run to forecast the E.coli Promoter
Gene Sequences. While only the AIRS algorithm obtained 53.85% prediction accuracy
on the prediction of E.coli Promoter Gene Sequences using 50-50% training-test split,
the proposed method obtained 90.38% prediction accuracy on the same conditions. This
success shows that the proposed system is a robust and effective system in the
prediction of E.coli Promoter Gene Sequences.

Keywords