Method Used to Identify PREX sequences

Peroxiredoxin classifications were made using the DASP profile search tool. Key residues were selected that define the Prx active site and included the three residues in the PXXX(T/S)XXC motif found in all peroxiredoxins as well as a conserved Trp /Phe residue located ~6 Å from the catalytic cysteine (Trp81 in Salmonella typhimurium AhpC). All residues with an atom located within 10 Å of the center of geometry of at least one of these key residues were extracted and the sequence fragments containing these residues were placed in order from N- to C-terminus to form the “active site signature.”

Signatures for multiple proteins within a Prx subfamily were combined to generate a subfamily-specific “functional site profile.” These profiles were then used to identify subfamily members from GenBank(nr), according to the previously described method. Each returned sequence contains a score (p-value) based on the probability of finding as good of a random match in a random sequence as the observed match. Structures used to create each subfamily profile and numbers of sequences identified from GenBank(nr) can be accessed here.

The results for each subfamily were edited to remove any sequences that did not contain the PXXX(T/S)XXC Prx motif or that were identified with a more significant p-value in another Prx subfamily. This method is described in more detail in Soito et al and Nelson et al.