Charles Darwin proposed the concept of natural selection nearly 150 years ago, yet the genetic basis of this idea has yet to be fully understood. After publishing his idea of natural selection, Thomas Morgan’s mutationism theory and the subsequent theory of neutral evolution suggested that genetic mutation was at the center of evolution, because some genetic mutations do not cause a measurable difference in fitness (Nei 2005). Although these two ideas do not oppose each other, the debate continues with researchers focusing on finding selection at the molecular level instead of an absolute law about the evolution (Nielsen 2005). Finding examples of molecular evolution requires testing genetic sequences for positive selection.

Along with the theoretical implications of discovering positive selection, molecular evolution has gained popularity because of the anticipated medicinal value. Genes that are susceptible to attack by pathogens should be selected against to be removed from a population. Therefore, if selection tests can identify these genes pharmaceutical companies can create specialized drugs to target these genes (Olson 2002). Similarly, sections of the genome that show high variability due to diseases that give advantages to heterozygous individuals such as sickle-cell anemia and Tay-Sachs disease will also be detected by selection tests (Nielsen 2005). Also, genes that play very important roles in biochemical pathways have been shown to be under positive selection. A good example of this is in the protocadherin genes that control the diversity of cell-surface receptors on neural genes (Wu 2005).

Although positive selection is highly publicized, there are three different types of selection that can occur at the molecular level. Positive selection occurs when a mutation gives an individual an adaptive advantage within a population leading a high frequency of copies of that mutation in a population. Negative or purifying selection occurs when a mutation results in a disadvantage and fewer copies of the mutation surviving into each successive generation until the mutation is completely purified from the population. Both of these types of selection are called directional selection because variability is not favored (Nielsen 2005). Another type of selection is balancing selection and will occur when individuals with variability in their genome have higher fitness (Olson 2002). The last type of selection is referred to as neutral selection and occurs when a mutation produces no advantage or disadvantage to the organism and selection does not effect the mutation (Nielsen 2005).

Selection is identified in a DNA sequence by comparing the rate of synonymous mutations per synonymous site (d S) to the rate of nonsynonymous mutations per nonsynonymous site(d N). The basic calculation for identifying selection is (d N / d S = ω). If ω > 1, positive selection has occurred, if ω = 1, neutral selection has occurred, and if ω < 1 purifying selection has occurred (Nei 2005).

Using the simple parameter ω does not account for variation in ω between sites and assumes equal ω over the entire phylogenetic tree. As a result, the simple ratio leads to signs of positive selection being lost among the more numerous negatively selected sequences (Yang and Nielsen 2002; Zhang et al. 2005). Current models account for variation in ω across site caused by the position of the mutation (i.e. second position mutation will change the resulting amino acid (Nei 2005)), the rate of transitions (T↔C; A↔G), the rate of traversions ( T, C ↔ A, G) and codon usage bias (Yang and Bielawski 2000). Furthermore, current models account for differences in ω without assuming that ω is equal in all lineages in the phylogentic tree used for analysis (Yang and Nielsen 2002; Zhang et al. 2005).




Time Line



My project will test for positive selection in previously published data sets from a lab at National Institutes of Health. Murphy et al. (2001) have amassed a data set consisting of 16,397 base pairs (bp) from 19 nuclear genes from 42 taxa of placental mammals and 2 marsupial taxa. Furthermore, Eizirik et al. (2001) have published a dataset consisting of 15 of the same nuclear genes, but have added an additional 22 taxa for a total of 9,772 bp across 64 taxa of placental mammals with 2 marsupial taxa (Table 1). Both of these data sets were created for a phylogenetic study of placental mammals and have never been tested for selection.


Murphy et al. (2001)

Eizirik et al. (2001)

No. nuclear genes

19 #

15 *

No. taxa



No. base pairs



Table 1. Summary of molecular datasets.

*Nuclear Genes in both sets – ADORA3, ADRB2, APP-3’UTR, ATP7A, BDNF, BMI1-3’UTR, CNR1, CREM-3’UTR, EDG1, PLCB4-3’UTR, PNOC, RAG1, RAG2, TYR, ZFX

#Nuclear genes in Murphy et al. (2001) only – ADRA2B, BRCA1, IRBP, VWF






Each gene will be edited to ensure that no sequence data is missing from any of the taxa before the genes are tested individually for selection. If a certain taxon is missing all or a majority of the sequence it will be removed entirely from that test. If only a small portion of the sequence is missing, all sequences will be edited to a consistent length.

Next, a phylogeny will be produced using the complete data set. If a taxon is missing from the edited sequence, the taxon will also be removed from the phylogeny so that the phylogeny and sequences match. PAUP software will be used to create the phylogeny using maximum likelihood methods. Once a phylogeny and edited sequence exist for each gene, both will be uploaded into the PAML (Yang 1997) software which implements the mathematical algorithm testing for selection. Each gene will be tested individually.

I anticipate testing about half of the 19 genes in the Murphy et al. (2001) dataset in the course of the semester. At the end of the research period, it is reasonable to expect to find signs of positive selection although we can not predict results. The results of the research will be incorporated into a manuscript to be submitted for publication in an academic journal. One gene of interest will be BRCA1 because this gene was tested for selection using only primates and positive selection was found (Pavlicke et al. 2004). I will prepare a poster of my results to present at the Evolution 2006 meetings in Stony Brook, NY, and the 2007 Kalman Symposium. Results will also be included in a manuscript which will be submitted to an academic journal for publication.



Eizirik E., W. J. Murphy, and S. J. O'Brien. 2001. Molecular dating and biogeography of the early placental mammal radiation. J. Hered. 92:212-219.


Murphy W. J., E. Eizirik, S. J. O'Brien, O. Madsen, M. Scally, C. J. Douady, E. Teeling, O. A. Ryder, M. J. Stanhope, W. W. de Jong, and M. S. Springer. 2001. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294:2348-2351.

Nei M. 2005. Selectionism and neutralism in molecular evolution. Mol. Biol. Evol. 22:2318-2342.

Nielsen R. 2005. Molecular signatures of natural selection. Ann. Rev. Gen. 39:197-218.

Olson, S. Population genetics: seeking the signs of selection. Science Mag. 298, 1324 (2002).

Pavlicke A., V. N. Noskov, N. Kouprina, J. C. Barrett, J. Jurka, and V. Larionov. 2004. Evolution of the tumor suppressor BRCA1 locus in primates: implications for cancer predisposition. Hum. Mol. Genet. 13:2737-2751.

Wu Q. 2005. Comparative genomics and diversifying selection of the clustered vertebrate protocadherin genes. Genetics 169:2179-2188.

Yang Z. H. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Computer Applications in the Biosciences 13:555-556.

Yang Z. H., and R. Nielsen. 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908-917.

Yang Z. H., and J. P. Bielawski. 2000. Statistical Methods for detecting molecular adaptation. Trends. Ecol. Evol. 15:496-503.

Zhang J. Z., R. Nielsen, and Z. H. Yang. 2005. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol. Biol. Evol. 22:2472-2479.