An alignment-free measure based on physicochemical properties of amino acids for protein sequence comparison
Sequence comparison is an important topic in bioinformatics. With the exponential increase of biological sequences, the traditional protein sequence comparison methods - the alignment methods become limited, so the alignment-free methods are widely proposed in the past two decades. In this paper, we considered not only the six typical physicochemical properties of amino acids, but also their frequency and positional distribution. A 51-dimensional vector was obtained to describe the protein sequence. We got a pairwise distance matrix by computing the standardized Euclidean distance, and discriminant analysis and phylogenetic analysis can be made. The results on the Influenza A virus and ND5 datasets indicate that our method is accurate and efficient for classifying proteins and inferring the phylogeny of species.