Classical studies show that for many proteins, the information required for specifying the tertiary structure is contained in the amino acid sequence. Here, we attempt to define the sequence rules for specifying a protein fold by computationally creating artificial protein sequences using only statistical information encoded in a multiple sequence alignment and no tertiary structure information. Experimental testing of libraries of artificial WW domain sequences shows that a simple statistical energy function capturing coevolution between amino acid residues is necessary and sufficient to specify sequences that fold into native structures. The artificial proteins show thermodynamic stabilities similar to natural WW domains, and structure determination of one artificial protein shows excellent agreement with the WW fold at atomic resolution. The relative simplicity of the information used for creating sequences suggests a marked reduction to the potential complexity of the protein-folding problem.
ASJC Scopus subject areas