Imbalanced binary dataset containing protein traits for predicting their cellular localization sites.

ecoli1

Format

A data frame with 336 instances, 77 of which belong to positive class, and 8 variables:

Mcg

McGeoch's method for signal sequence recognition. Continuous attribute.

Gvh

Von Heijne's method for signal sequence recognition. Continuous attribute.

Lip

von Heijne's Signal Peptidase II consensus sequence score. Discrete attribute.

Chg

Presence of charge on N-terminus of predicted lipoproteins. Discrete attribute.

Aac

Score of discriminant analysis of the amino acid content of outer membrane and periplasmic proteins. Continuous attribute.

Alm1

Score of the ALOM membrane spanning region prediction program. Continuous attribute.

Alm2

score of ALOM program after excluding putative cleavable signal regions from the sequence. Continuous attribute.

Class

Two possible classes: positive (type im), negative (the rest).

Source

KEEL Repository.

See also

Original available in UCI ML Repository.