fcontrast |
Please help by correcting and extending the Wiki pages.
The method is explained in the 1985 paper. It assumes a Brownian motion model. This model was introduced by Edwards and Cavalli-Sforza (1964; Cavalli-Sforza and Edwards, 1967) as an approximation to the evolution of gene frequencies. I have discussed (Felsenstein, 1973b, 1981c, 1985d, 1988b) the difficulties inherent in using it as a model for the evolution of quantitative characters. Chief among these is that the characters do not necessarily evolve independently or at equal rates. This program allows one to evaluate this, if there is independent information on the phylogeny. You can compute the variance of the contrasts for each character, as a measure of the variance accumulating per unit branch length. You can also test covariances of characters.
The statistics that are printed out include the covariances between all pairs of characters, the regressions of each character on each other (column j is regressed on row i), and the correlations between all pairs of characters. In assessing degress of freedom it is important to realize that each contrast was taken to have expectation zero, which is known because each contrast could as easily have been computed xi-xj instead of xj-xi. Thus there is no loss of a degree of freedom for estimation of a mean. The degrees of freedom is thus the same as the number of contrasts, namely one less than the number of species (tips). If you feed these contrasts into a multivariate statistics program make sure that it knows that each variable has expectation exactly zero.
10 5 number of species, number of characters Alpha 2 name of 1st species, # of individuals 2.01 5.3 1.5 -3.41 0.3 data for individual #1 1.98 4.3 2.1 -2.98 0.45 data for individual #2 Gammarus 3 name of 2nd species, # of individuals 6.57 3.1 2.0 -1.89 0.6 data for individual #1 7.62 3.4 1.9 -2.01 0.7 data for individual #2 6.02 3.0 1.9 -2.03 0.6 data for individual #3 ... (and so on)
The covariances, correlations, and regressions for the "additive" (between-species evolutionary variation) and "environmental" (within-species phenotypic variation) are printed out (the maximum likelihood estimates of each). The program also estimates the within-species phenotypic variation in the case where the between-species evolutionary covariances are forced to be zero. The log-likelihoods of these two cases are compared and a likelihood ratio test (LRT) is carried out. The program prints the result of this test as a chi-square variate, and gives the number of degrees of freedom of the LRT. You have to look up the chi-square variable on a table of the chi-square distribution. The A option is available (if the W option is invoked) to allow you to turn off the doing of this test if you want to.
The log-likelihood of the data under the models with and without between-species For the moment the program cannot handle the case where within-species variation is to be taken into account but where only species means are available. (It can handle cases where some species have only one member in their sample).
We hope to fix this soon. We are also on our way to incorporating full-sib, half-sib, or clonal groups within species, so as to do one analysis for within-species genetic and between-species phylogenetic variation.
The data set used as an example below is the example from a paper by Michael Lynch (1990), his characters having been log-transformed. In the case where there is only one specimen per species, Lynch's model is identical to our model of within-species variation (for multiple individuals per species it is not a subcase of his model).
% fcontrast Continuous character Contrasts Input file: contrast.dat Phylip tree file (optional): contrast.tree Phylip contrast program output file [contrast.fcontrast]: Output written to file "contrast.fcontrast" Done. |
Go to the input files for this example
Go to the output files for this example
Continuous character Contrasts Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers: [-infile] frequencies File containing one or more sets of data [-intreefile] tree Phylip tree file (optional) [-outfile] outfile [*.fcontrast] Phylip contrast program output file Additional (Optional) qualifiers (* if not always prompted): -varywithin boolean [N] Within-population variation in data * -[no]reg boolean [Y] Print out correlations and regressions * -writecont boolean [N] Print out contrasts * -[no]nophylo boolean [Y] LRT test of no phylogenetic component, with and without VarA -printdata boolean [N] Print data at start of run -[no]progress boolean [Y] Print indications of progress of run Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-outfile" associated qualifiers -odirectory3 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit |
Qualifier | Type | Description | Allowed values | Default |
---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||
[-infile] (Parameter 1) |
frequencies | File containing one or more sets of data | Frequency value(s) | |
[-intreefile] (Parameter 2) |
tree | Phylip tree file (optional) | Phylogenetic tree | |
[-outfile] (Parameter 3) |
outfile | Phylip contrast program output file | Output file | <*>.fcontrast |
Additional (Optional) qualifiers | ||||
-varywithin | boolean | Within-population variation in data | Boolean value Yes/No | No |
-[no]reg | boolean | Print out correlations and regressions | Boolean value Yes/No | Yes |
-writecont | boolean | Print out contrasts | Boolean value Yes/No | No |
-[no]nophylo | boolean | LRT test of no phylogenetic component, with and without VarA | Boolean value Yes/No | Yes |
-printdata | boolean | Print data at start of run | Boolean value Yes/No | No |
-[no]progress | boolean | Print indications of progress of run | Boolean value Yes/No | Yes |
Advanced (Unprompted) qualifiers | ||||
(none) | ||||
Associated qualifiers | ||||
"-outfile" associated outfile qualifiers | ||||
-odirectory3 -odirectory_outfile |
string | Output directory | Any string | |
General qualifiers | ||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N |
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N |
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N |
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N |
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N |
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y |
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N |
-warning | boolean | Report warnings | Boolean value Yes/No | Y |
-error | boolean | Report errors | Boolean value Yes/No | Y |
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y |
-die | boolean | Report dying program messages | Boolean value Yes/No | Y |
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
When the gene frequencies data are used in CONTML or GENDIST, this involves the following assumptions:
How these assumptions affect the methods will be seen in my papers on inference of phylogenies from gene frequency and continuous character data (Felsenstein, 1973b, 1981c, 1985c).
The input formats are fairly similar to the discrete-character programs, but with one difference. When CONTML is used in the gene-frequency mode (its usual, default mode), or when GENDIST is used, the first line contains the number of species (or populations) and the number of loci and the options information. There then follows a line which gives the numbers of alleles at each locus, in order. This must be the full number of alleles, not the number of alleles which will be input: i. e. for a two-allele locus the number should be 2, not 1. There then follow the species (population) data, each species beginning on a new line. The first 10 characters are taken as the name, and thereafter the values of the individual characters are read free-format, preceded and separated by blanks. They can go to a new line if desired, though of course not in the middle of a number. Missing data is not allowed - an important limitation. In the default configuration, for each locus, the numbers should be the frequencies of all but one allele. The menu option A (All) signals that the frequencies of all alleles are provided in the input data -- the program will then automatically ignore the last of them. So without the A option, for a three-allele locus there should be two numbers, the frequencies of two of the alleles (and of course it must always be the same two!). Here is a typical data set without the A option:
5 3 2 3 2 Alpha 0.90 0.80 0.10 0.56 Beta 0.72 0.54 0.30 0.20 Gamma 0.38 0.10 0.05 0.98 Delta 0.42 0.40 0.43 0.97 Epsilon 0.10 0.30 0.70 0.62
whereas here is what it would have to look like if the A option were invoked:
5 3 2 3 2 Alpha 0.90 0.10 0.80 0.10 0.10 0.56 0.44 Beta 0.72 0.28 0.54 0.30 0.16 0.20 0.80 Gamma 0.38 0.62 0.10 0.05 0.85 0.98 0.02 Delta 0.42 0.58 0.40 0.43 0.17 0.97 0.03 Epsilon 0.10 0.90 0.30 0.70 0.00 0.62 0.38
The first line has the number of species (or populations) and the number of loci. The second line has the number of alleles for each of the 3 loci. The species lines have names (filled out to 10 characters with blanks) followed by the gene frequencies of the 2 alleles for the first locus, the 3 alleles for the second locus, and the 2 alleles for the third locus. You can start a new line after any of these allele frequencies, and continue to give the frequencies on that line (without repeating the species name).
If all alleles of a locus are given, it is important to have them add up to 1. Roundoff of the frequencies may cause the program to conclude that the numbers do not sum to 1, and stop with an error message.
While many compilers may be more tolerant, it is probably wise to make sure that each number, including the first, is preceded by a blank, and that there are digits both preceding and following any decimal points.
CONTML and CONTRAST also treat quantitative characters (the continuous-characters mode in CONTML, which is option C). It is assumed that each character is evolving according to a Brownian motion model, at the same rate, and independently. In reality it is almost always impossible to guarantee this. The issue is discussed at length in my review article in Annual Review of Ecology and Systematics (Felsenstein, 1988a), where I point out the difficulty of transforming the characters so that they are not only genetically independent but have independent selection acting on them. If you are going to use CONTML to model evolution of continuous characters, then you should at least make some attempt to remove genetic correlations between the characters (usually all one can do is remove phenotypic correlations by transforming the characters so that there is no within-population covariance and so that the within-population variances of the characters are equal -- this is equivalent to using Canonical Variates). However, this will only guarantee that one has removed phenotypic covariances between characters. Genetic covariances could only be removed by knowing the coheritabilities of the characters, which would require genetic experiments, and selective covariances (covariances due to covariation of selection pressures) would require knowledge of the sources and extent of selection pressure in all variables.
CONTRAST is a program designed to infer, for a given phylogeny that is provided to the program, the covariation between characters in a data set. Thus we have a program in this set that allow us to take information about the covariation and rates of evolution of characters and make an estimate of the phylogeny (CONTML), and a program that takes an estimate of the phylogeny and infers the variances and covariances of the character changes. But we have no program that infers both the phylogenies and the character covariation from the same data set.
In the quantitative characters mode, a typical small data set would be:
5 6 Alpha 0.345 0.467 1.213 2.2 -1.2 1.0 Beta 0.457 0.444 1.1 1.987 -0.2 2.678 Gamma 0.6 0.12 0.97 2.3 -0.11 1.54 Delta 0.68 0.203 0.888 2.0 1.67 Epsilon 0.297 0.22 0.90 1.9 1.74
Note that in the latter case, there is no line giving the numbers of alleles at each locus. In this latter case no square-root transformation of the coordinates is done: each is assumed to give directly the position on the Brownian motion scale.
For further discussion of options and modifiable constants in CONTML, GENDIST, and CONTRAST see the documentation files for those programs.
5 2 Homo 4.09434 4.74493 Pongo 3.61092 3.33220 Macaca 2.37024 3.36730 Ateles 2.02815 2.89037 Galago -1.46968 2.30259 |
((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00); |
Covariance matrix ---------- ------ 3.9423 1.7028 1.7028 1.7062 Regressions (columns on rows) ----------- -------- -- ----- 1.0000 0.4319 0.9980 1.0000 Correlations ------------ 1.0000 0.6566 0.6566 1.0000 |
Program name | Description |
---|---|
econtml | Continuous character Maximum Likelihood method |
econtrast | Continuous character contrasts |
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.
Converted (August 2004) to an EMBASSY program by the EMBOSS team.