Global pentapeptide statistics are far away from expected distributions

Jarosław Poznański; Jan Topiński; Anna Muszewska; Konrad J. Dębski; Marta Hoffman-Sommer; Krzysztof Pawłowski; Marcin Grynberg

doi:10.1038/s41598-018-33433-8

Global pentapeptide statistics are far away from expected distributions

Jarosław Poznański, Jan Topiński, Anna Muszewska, Konrad J. Dębski, Marta Hoffman-Sommer, Krzysztof Pawłowski, Marcin Grynberg

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

The relationships between polypeptide composition, sequence, structure and function have been puzzling biologists ever since first protein sequences were determined. Here, we study the statistics of occurrence of all possible pentapeptide sequences in known proteins. To compensate for the non-uniform distribution of individual amino acid residues in protein sequences, we investigate separately all possible permutations of every given amino acid composition. For the majority of permutation groups we find that pentapeptide occurrences deviate strongly from the expected binomial distributions, and that the observed distributions are also characterized by high numbers of outlier sequences. An analysis of identified outliers shows they often contain known motifs and rare amino acids, suggesting that they represent important functional elements. We further compare the pentapeptide composition of regions known to correspond to protein domains with that of non-domain regions. We find that a substantial number of pentapeptides is clearly strongly favored in protein domains. Finally, we show that over-represented pentapeptides are significantly related to known functional motifs and to predicted ancient structural peptides.

Original language	English (US)
Article number	15178
Journal	Scientific reports
Volume	8
Issue number	1
DOIs	https://doi.org/10.1038/s41598-018-33433-8
State	Published - Dec 1 2018
Externally published	Yes

ASJC Scopus subject areas

General

Access to Document

10.1038/s41598-018-33433-8

Cite this

@article{6d04b7d9487f4f8cae1b9c0e7bbf2ad3,

title = "Global pentapeptide statistics are far away from expected distributions",

abstract = "The relationships between polypeptide composition, sequence, structure and function have been puzzling biologists ever since first protein sequences were determined. Here, we study the statistics of occurrence of all possible pentapeptide sequences in known proteins. To compensate for the non-uniform distribution of individual amino acid residues in protein sequences, we investigate separately all possible permutations of every given amino acid composition. For the majority of permutation groups we find that pentapeptide occurrences deviate strongly from the expected binomial distributions, and that the observed distributions are also characterized by high numbers of outlier sequences. An analysis of identified outliers shows they often contain known motifs and rare amino acids, suggesting that they represent important functional elements. We further compare the pentapeptide composition of regions known to correspond to protein domains with that of non-domain regions. We find that a substantial number of pentapeptides is clearly strongly favored in protein domains. Finally, we show that over-represented pentapeptides are significantly related to known functional motifs and to predicted ancient structural peptides.",

author = "Jaros{\l}aw Pozna{\'n}ski and Jan Topi{\'n}ski and Anna Muszewska and D{\c e}bski, {Konrad J.} and Marta Hoffman-Sommer and Krzysztof Paw{\l}owski and Marcin Grynberg",

note = "Publisher Copyright: {\textcopyright} 2018, The Author(s).",

year = "2018",

month = dec,

day = "1",

doi = "10.1038/s41598-018-33433-8",

language = "English (US)",

volume = "8",

journal = "Scientific reports",

issn = "2045-2322",

publisher = "Nature Publishing Group",

number = "1",

}

TY - JOUR

T1 - Global pentapeptide statistics are far away from expected distributions

AU - Poznański, Jarosław

AU - Topiński, Jan

AU - Muszewska, Anna

AU - Dębski, Konrad J.

AU - Hoffman-Sommer, Marta

AU - Pawłowski, Krzysztof

AU - Grynberg, Marcin

PY - 2018/12/1

Y1 - 2018/12/1

N2 - The relationships between polypeptide composition, sequence, structure and function have been puzzling biologists ever since first protein sequences were determined. Here, we study the statistics of occurrence of all possible pentapeptide sequences in known proteins. To compensate for the non-uniform distribution of individual amino acid residues in protein sequences, we investigate separately all possible permutations of every given amino acid composition. For the majority of permutation groups we find that pentapeptide occurrences deviate strongly from the expected binomial distributions, and that the observed distributions are also characterized by high numbers of outlier sequences. An analysis of identified outliers shows they often contain known motifs and rare amino acids, suggesting that they represent important functional elements. We further compare the pentapeptide composition of regions known to correspond to protein domains with that of non-domain regions. We find that a substantial number of pentapeptides is clearly strongly favored in protein domains. Finally, we show that over-represented pentapeptides are significantly related to known functional motifs and to predicted ancient structural peptides.

AB - The relationships between polypeptide composition, sequence, structure and function have been puzzling biologists ever since first protein sequences were determined. Here, we study the statistics of occurrence of all possible pentapeptide sequences in known proteins. To compensate for the non-uniform distribution of individual amino acid residues in protein sequences, we investigate separately all possible permutations of every given amino acid composition. For the majority of permutation groups we find that pentapeptide occurrences deviate strongly from the expected binomial distributions, and that the observed distributions are also characterized by high numbers of outlier sequences. An analysis of identified outliers shows they often contain known motifs and rare amino acids, suggesting that they represent important functional elements. We further compare the pentapeptide composition of regions known to correspond to protein domains with that of non-domain regions. We find that a substantial number of pentapeptides is clearly strongly favored in protein domains. Finally, we show that over-represented pentapeptides are significantly related to known functional motifs and to predicted ancient structural peptides.

UR - http://www.scopus.com/inward/record.url?scp=85054775034&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054775034&partnerID=8YFLogxK

U2 - 10.1038/s41598-018-33433-8

DO - 10.1038/s41598-018-33433-8

M3 - Article

C2 - 30310110

AN - SCOPUS:85054775034

SN - 2045-2322

VL - 8

JO - Scientific reports

JF - Scientific reports

IS - 1

M1 - 15178

ER -

Global pentapeptide statistics are far away from expected distributions

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this