Imbalanced binary dataset containing protein traits for predicting their cellular localization sites.
ecoli1
A data frame with 336 instances, 77 of which belong to positive class, and 8 variables:
McGeoch's method for signal sequence recognition. Continuous attribute.
Von Heijne's method for signal sequence recognition. Continuous attribute.
von Heijne's Signal Peptidase II consensus sequence score. Discrete attribute.
Presence of charge on N-terminus of predicted lipoproteins. Discrete attribute.
Score of discriminant analysis of the amino acid content of outer membrane and periplasmic proteins. Continuous attribute.
Score of the ALOM membrane spanning region prediction program. Continuous attribute.
score of ALOM program after excluding putative cleavable signal regions from the sequence. Continuous attribute.
Two possible classes: positive (type im), negative (the rest).
Original available in UCI ML Repository.