-
Notifications
You must be signed in to change notification settings - Fork 83
Documentation for LD matrix methods #3416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
10912c0
a2c59d5
51ab754
fd6e6f7
bae4065
ac15ff5
95d3b9e
c1e358a
c86f85e
7e4152c
af8a725
0c0fb29
ece62a9
b5efa32
fa7b3ab
94bcc34
87de30f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -684,14 +684,310 @@ and boolean expressions (e.g., {math}`(x > 0)`) are interpreted as 0/1. | |||||||||||||
| and {math}`v_j` is the covariance of the trait with the j-th covariate. | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| (sec_stats_multi_site)= | ||||||||||||||
|
|
||||||||||||||
| ## Multi site statistics | ||||||||||||||
|
|
||||||||||||||
| :::{todo} | ||||||||||||||
| Document statistics that use information about correlation between sites, such as | ||||||||||||||
| LdCalculator (and perhaps reference {ref}`sec_identity`). Note that if we have a general | ||||||||||||||
| framework which has the same calling conventions as the single site stats, | ||||||||||||||
| we can rework the sections above. | ||||||||||||||
| ::: | ||||||||||||||
| (sec_stats_two_locus)= | ||||||||||||||
|
|
||||||||||||||
| ### Two-locus statistics | ||||||||||||||
|
|
||||||||||||||
| The {meth}`~TreeSequence.ld_matrix` method provides an interface to | ||||||||||||||
| a collection of two-locus statistics with predefined summary functions (see | ||||||||||||||
| {ref}`sec_stats_two_locus_summary_functions`) and `site` and `branch` | ||||||||||||||
| {ref}`modes <sec_stats_mode>`. The LD matrix method differs from other | ||||||||||||||
| statistics methods in that it provides a unified API with an argument to | ||||||||||||||
| specify different two-locus summaries of the data. It otherwise behaves | ||||||||||||||
| similarly to most other functions with respect to `sample_sets` and `indexes`. | ||||||||||||||
|
|
||||||||||||||
| Two-locus statistics can be computed using two modes, either `site` or | ||||||||||||||
| `branch`, and these should be interpreted in the same way as these modes in the | ||||||||||||||
| single-site statistics. Within this framework, statistics may be either | ||||||||||||||
| polarised or unpolarised. For statistics that are polarised, we compute | ||||||||||||||
| statistic values for pairs of derived alleles. Unpolarised statistics compute | ||||||||||||||
| statistics over all possible alleles, derived and ancestral, and the result in | ||||||||||||||
| averaged over statistics computed for all pairs of alleles (see weighting | ||||||||||||||
| schemes below). The option for polarisation is not exposed to the user. | ||||||||||||||
| Instead, we implement statistics that are polarised where appropriate. | ||||||||||||||
|
|
||||||||||||||
| (sec_stats_two_locus_site)= | ||||||||||||||
|
|
||||||||||||||
| #### Site | ||||||||||||||
|
|
||||||||||||||
| The `site` mode computes two-locus statistics summarized over alleles between | ||||||||||||||
| all pairs of specified sites. The default behavior, leaving `sites` | ||||||||||||||
| unspecified, will compute a matrix for all pairs of sites, with | ||||||||||||||
| one row and column for each site in the tree sequence (i.e., an n x n | ||||||||||||||
| matrix where n is the number of sites in the tree sequence). We can also | ||||||||||||||
| restrict the output to a subset of sites, either by specifying a single vector | ||||||||||||||
| for both rows and columns or a pair of vectors for the row sites and column | ||||||||||||||
| sites separately. | ||||||||||||||
|
|
||||||||||||||
| The following computes a matrix of the {math}`r^2` measure of linkage | ||||||||||||||
| disequilibrium (LD) computed pairwise between the first 4 sites in the tree | ||||||||||||||
| sequence among all samples. In our computations, row sites are used as the row | ||||||||||||||
| ("left-hand") loci and column sites are used as the column ("right-hand") | ||||||||||||||
| locus, and with a single list of sites specified, we obtain a symmetric square | ||||||||||||||
| matrix. | ||||||||||||||
|
|
||||||||||||||
| ```{code-cell} ipython3 | ||||||||||||||
| ld = ts.ld_matrix(sites=[[0, 1, 2, 3]]) | ||||||||||||||
| print(ld) | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| The following demonstrates how we can specify the row and column sites | ||||||||||||||
| independently of each other. We're specifying 3 columns and 2 rows, which | ||||||||||||||
| computes a subset of the matrix shown above. | ||||||||||||||
|
|
||||||||||||||
| ```{code-cell} ipython3 | ||||||||||||||
| ld = ts.ld_matrix(sites=[[0, 1], [1, 2, 3]]) | ||||||||||||||
| print(ld) | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| Because we allow for two-locus statistics to be computed for multi-allelic | ||||||||||||||
| data, we need to be able to combine statistical results from each pair of | ||||||||||||||
| alleles into one summary for a pair of sites. We use two implementations for | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
| combining results from multiple alleles: `hap_weighted` and `total_weighted`. | ||||||||||||||
| These are statistic-specific and not chosen by the user. | ||||||||||||||
|
|
||||||||||||||
| Briefly, consider a pair of sites with {math}`n` alleles at the first locus and | ||||||||||||||
| {math}`m` alleles at the second. Write {math}`f_{i,j}` as the statistic | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
| computed for focal alleles {math}`A_i` and {math}`B_j`, with haplotype weights | ||||||||||||||
| {math}`(A_i B_j, A_i b_j, a_i B_j)`, where {math}`a_i` and {math}`b_j` are the | ||||||||||||||
| collection of alleles that are not the focal alleles {math}`A_i` or | ||||||||||||||
| {math}`B_j`, respectively. Then the weighting schemes are defined as: | ||||||||||||||
|
Comment on lines
+755
to
+758
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
This bit didn't make sense, and I think wasn't used anyhow. |
||||||||||||||
|
|
||||||||||||||
| - `hap_weighted`: {math}`\sum_{i=1}^{n}\sum_{j=1}^{m}p(A_{i}B_{j})f_{ij}`, | ||||||||||||||
| where {math}`p(A_{i}B_{j})` is the frequency of haplotype {math}`A_{i}B_{j}`. | ||||||||||||||
| This method was first introduced in [Karlin | ||||||||||||||
| (1981)](https://doi.org/10.1111/j.1469-1809.1981.tb00308.x) and reviewed in | ||||||||||||||
| [Zhao (2007)](https://doi.org/10.1017/S0016672307008634). | ||||||||||||||
|
|
||||||||||||||
| - `total_weighted`: {math}`\frac{1}{n m}\sum_{i=1}^{n}\sum_{j=1}^{m}f_{ij}`. | ||||||||||||||
| This method assigns equal weight to each of the possible pairs of focal | ||||||||||||||
| alleles at the two sites, taking the arithmetic mean of statistics over | ||||||||||||||
| focal haplotypes. | ||||||||||||||
|
|
||||||||||||||
| Out of all of the available summary functions, only {math}`r^2` uses | ||||||||||||||
| `hap_weighted` normalisation, with the remainder using uniform weighting | ||||||||||||||
| (`total_weighted`). | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we say "see Ragsdale (XXXX) for explanation of this choice" or something? |
||||||||||||||
|
|
||||||||||||||
| (sec_stats_two_locus_branch)= | ||||||||||||||
|
|
||||||||||||||
| #### Branch | ||||||||||||||
|
|
||||||||||||||
| The `branch` mode computes expected two-locus statistics between pairs of | ||||||||||||||
| trees, conditioned on the marginal topologies and branch lengths of those | ||||||||||||||
| trees. The trees for which we compute statistics are specified by positions, | ||||||||||||||
| and for a pair of positions we consider all possible haplotypes that could be | ||||||||||||||
| generated by a single mutation occurring at the two trees. | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
|
||||||||||||||
| For two trees, one with {math}`n` branches and the other with {math}`m` | ||||||||||||||
| branches, there are {math}`nm` possible pairs of branches that may carry the | ||||||||||||||
| pair of mutations. For each pair, we compute the two-locus statistic, and then | ||||||||||||||
| sum these values weighted by the product of the two branch lengths. Given the | ||||||||||||||
| two mutations occur, this accounts for the relative probability that the two | ||||||||||||||
| mutations fall on any pair of branches. | ||||||||||||||
|
|
||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you think about inserting one of the precise interpretations of what's computed? This is still very vague ("accounts for the relative probability").
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another important thing to mention: which branches? For instance, if there are branches above the root, or not ancestral to any samples, are those included? What about branches ancestral to samples but not to any of the samples in |
||||||||||||||
| The time complexity of this method is quadratic in the number of samples, | ||||||||||||||
| due to the pairwise comparisons of branches from each pair of trees. | ||||||||||||||
| By default, this method computes | ||||||||||||||
| a symmetric matrix for all pairs of trees, with rows and columns representing | ||||||||||||||
| each tree in the tree sequence. Similar to the site method, we can restrict the | ||||||||||||||
| output to a subset of trees, either by specifying a vector of positions or | ||||||||||||||
| a pair of vectors for row and column positions separately. To select a specific | ||||||||||||||
| tree, the specified positions must land in the tree span (`[start, end)`). | ||||||||||||||
|
|
||||||||||||||
| In the following, we compute a matrix of expected {math}`r^2` within and | ||||||||||||||
| between the first 4 trees in the tree sequence. The tree breakpoints are | ||||||||||||||
| a convenient way to specify those first four trees. | ||||||||||||||
|
|
||||||||||||||
| ```{code-cell} ipython3 | ||||||||||||||
| ld = ts.ld_matrix(mode="branch", positions=[ts.breakpoints(as_array=True)[0:4]]) | ||||||||||||||
| print(ld) | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| Again, we can specify the row and column trees separately. | ||||||||||||||
|
|
||||||||||||||
| ```{code-cell} ipython3 | ||||||||||||||
| breakpoints = ts.breakpoints(as_array=True) | ||||||||||||||
| ld = ts.ld_matrix(mode="branch", positions=[breakpoints[[0]], breakpoints[0:4]]) | ||||||||||||||
| print(ld) | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| (sec_stats_two_locus_sample_sets)= | ||||||||||||||
|
|
||||||||||||||
| #### Sample Sets | ||||||||||||||
|
|
||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need to first insert a paragraph explaining what happens without |
||||||||||||||
| The two-locus API provides a mechanism by which to subset the samples under | ||||||||||||||
| consideration, providing the ability to compute a separate LD matrix for each | ||||||||||||||
| sample set or an LD matrix between sample sets. Output dimensions are handled | ||||||||||||||
|
Comment on lines
+823
to
+824
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't quite right, since separate LD matrices are produced for each sample set by default even for non-two-way statistics.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
| in the same manner as the rest of the stats API (see | ||||||||||||||
| {ref}`sec_stats_output_dimensions`). The two-locus API differs from the rest of | ||||||||||||||
| the stats API in that we handle one-way and two-way statistics in the same | ||||||||||||||
| function call. | ||||||||||||||
|
|
||||||||||||||
| To compute a two-way two-locus statistic, the `index` argument must be | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
| provided. The statistics are selected in the same way (with the `stat` | ||||||||||||||
| argument), but we provide a restricted set of two-way statistics (see | ||||||||||||||
| {ref}`sec_stats_two_locus_summary_functions_two_way`). The dimension-dropping | ||||||||||||||
| rules for the result follow the rest of the tskit stats API in that a single list | ||||||||||||||
| or tuple will produce a single two-dimensional matrix, while list of these | ||||||||||||||
| will produce a three-dimensional array, with the outer dimension of length | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
| equal to the length of the list. | ||||||||||||||
|
|
||||||||||||||
| For example, to compute the {math}`r^2` LD matrix over a subset of samples in | ||||||||||||||
| the tree sequence (such as sample nodes 0 through 7), we would specify the | ||||||||||||||
| samples as follows: | ||||||||||||||
|
|
||||||||||||||
| ``` | ||||||||||||||
| ts = msprime.sim_ancestry( | ||||||||||||||
| 20, | ||||||||||||||
| population_size=10000, | ||||||||||||||
| sequence_length=1000, | ||||||||||||||
| recombination_rate=2e-8, | ||||||||||||||
| random_seed=12) | ||||||||||||||
| ts = msprime.sim_mutations(ts, rate=2e-8, random_seed=12) | ||||||||||||||
|
|
||||||||||||||
| ld = ts.ld_matrix(mode=“site”, sample_sets=range(8)) | ||||||||||||||
| print(ld) | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| For concreteness, we would expect the following dimensions with the specified | ||||||||||||||
| `sample_sets` and `indexes` arguments. | ||||||||||||||
|
|
||||||||||||||
| ``` | ||||||||||||||
| # one-way | ||||||||||||||
| ts.ld_matrix(sample_sets=None) -> 2 dimensions | ||||||||||||||
| ts.ld_matrix(sample_sets=[0, 1, 2, 3]) -> 2 dimensions | ||||||||||||||
| ts.ld_matrix(sample_sets=[[0, 1, 2, 3]]) -> 3 dimensions | ||||||||||||||
| # two-way | ||||||||||||||
| ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=(0, 1)) -> 2 dimensions | ||||||||||||||
| ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=[(0, 1)]) -> 3 dimensions | ||||||||||||||
| ``` | ||||||||||||||
|
|
||||||||||||||
| #### Why are there nan values in the LD matrix? | ||||||||||||||
|
|
||||||||||||||
| For some statistics, it is possible to observe nan entries in the LD matrix, | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
| which can be surprising or numerically impact downstream analyses. A nan entry | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
| may occur when computing a ratio statistic (such as {math}`r` or {math}`r^2`) | ||||||||||||||
| with a denominator of zero, indicating that one or both sites in the pair are | ||||||||||||||
| monomorphic. This can happen for a number of reasons: | ||||||||||||||
|
|
||||||||||||||
| - The mutation models allows for reversible mutations, so a back mutation at | ||||||||||||||
| a site has resulted in a single allele despite multiple mutations in the | ||||||||||||||
| history of the sample. | ||||||||||||||
| - LD is computed for a subsample of individuals, and some sites are not | ||||||||||||||
| variable among the sample nodes in the subsample. | ||||||||||||||
| - A mutation exists above the root of the local tree, so that all samples carry | ||||||||||||||
| the mutation, and one or more sites are not variable. | ||||||||||||||
|
|
||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps address also |
||||||||||||||
| (sec_stats_two_locus_sample_one_way_stats)= | ||||||||||||||
|
|
||||||||||||||
| ##### One-way Statistics | ||||||||||||||
|
|
||||||||||||||
| One-way statistics are summaries of two loci in a single sample set, using a | ||||||||||||||
| triple of haplotype counts {math}`\{w_{Ab}, w_{aB}, w_{AB}\}` and the size of | ||||||||||||||
| the sample set {math}`n`, where the capitalized and lowercase letters in our notation | ||||||||||||||
| represent alternate alleles. | ||||||||||||||
|
|
||||||||||||||
| (sec_stats_two_locus_sample_two_way_stats)= | ||||||||||||||
|
|
||||||||||||||
| ##### Two-way Statistics | ||||||||||||||
|
|
||||||||||||||
| Two-way statistics are summaries of haplotype counts between two sample sets, | ||||||||||||||
| which operate on the three haplotype counts (as in one-way stats, above) | ||||||||||||||
| computed from each sample set, indexed by `(i, j)`. These statistics take on | ||||||||||||||
| a different meaning from their one-way counterparts. For instance {math}`D_i | ||||||||||||||
| D_j` can be thought of as the covariance between two loci when computed for | ||||||||||||||
|
Comment on lines
+901
to
+902
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||
| a haplotype in a single sample set. | ||||||||||||||
|
|
||||||||||||||
| Only a subset of our summary functions are two-way statistics (see | ||||||||||||||
| {ref}`sec_two_locus_summary_functions_two_way`). Note that the unbiased two-way | ||||||||||||||
| statistics expect non-overlapping sample sets, and we do not make any | ||||||||||||||
| assertions about the sample sets and assume that `i` and `j` represent disjoint | ||||||||||||||
| sets of samples (see also the note in {meth}`~TreeSequence.divergence`). | ||||||||||||||
|
|
||||||||||||||
| (sec_stats_two_locus_summary_functions)= | ||||||||||||||
|
|
||||||||||||||
| #### Summary Functions | ||||||||||||||
|
|
||||||||||||||
| (sec_stats_two_locus_summary_functions_one_way)= | ||||||||||||||
|
|
||||||||||||||
| ##### One-way | ||||||||||||||
|
|
||||||||||||||
| The two-locus summary functions all take haplotype counts and sample set size as | ||||||||||||||
| input. Each of our summary functions has the signature | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All of these functions are in terms of |
||||||||||||||
| {math}`f(w_{Ab}, w_{aB}, w_{AB}, n)`, converting to haplotype frequencies | ||||||||||||||
| {math}`\{p_{Ab}, p_{aB}, p_{AB}\}` where appropriate by dividing by {math}`n`. | ||||||||||||||
|
|
||||||||||||||
| `D` | ||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = p_{ab} - p_{a}p_{b}` | ||||||||||||||
|
|
||||||||||||||
| This statistic is polarised, as the unpolarised result, which averages over | ||||||||||||||
| allele labelings, is zero. Uses the `total` weighting method. | ||||||||||||||
|
|
||||||||||||||
| `D_prime` | ||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D}{D_{\max}}`, | ||||||||||||||
|
|
||||||||||||||
| where {math} | ||||||||||||||
| \min\{p_A (1-p_B), p_B (1-p_B)\} & \textrm{if }D>=0 \\ | ||||||||||||||
| \min\{p_A p_B, (1-p_B) (1-p_B)\} & \textrm{otherwise} | ||||||||||||||
| \end{cases}``` | ||||||||||||||
|
|
||||||||||||||
| and {math}`D` is defined above. Polarised, `total` weighted. | ||||||||||||||
|
|
||||||||||||||
| `D2` | ||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = (p_{ab} - p_{a} p_{b})^2` | ||||||||||||||
|
|
||||||||||||||
| Unpolarised, `total` weighted. | ||||||||||||||
|
|
||||||||||||||
| `Dz` | ||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = D (1 - 2 p_{a})(1 - 2p_{b})`, | ||||||||||||||
|
|
||||||||||||||
| where {math}`D` is defined above. Unpolarised, `total` weighted. | ||||||||||||||
|
|
||||||||||||||
| `pi2` | ||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = p_{a}p_{b}(1-p_{a})(1-p_{b})` | ||||||||||||||
|
|
||||||||||||||
| Unpolarised, `total` weighted. | ||||||||||||||
|
|
||||||||||||||
| `r` | ||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D}{\sqrt{p_{a}p_{b}(1-p_{a})(1-p_{b})}}`, | ||||||||||||||
|
|
||||||||||||||
| where {math}`D` is defined above. Polarised, `total` weighted. | ||||||||||||||
|
|
||||||||||||||
| `r2` | ||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D^{2}}{p_{a}p_{b}(1-p_{a})(1-p_{b})}`, | ||||||||||||||
|
|
||||||||||||||
| where {math}`D` is defined above. Unpolarised, `haplotype` weighted. | ||||||||||||||
|
|
||||||||||||||
| Unbiased two-locus statistics from the Hill-Robertson (1968) system are | ||||||||||||||
| computed from haplotype counts. Definitions of these unbiased estimators can | ||||||||||||||
| be found in [Ragsdale and Gravel | ||||||||||||||
| (2020)](https://doi.org/10.1093/molbev/msz265). They require at least 4 samples | ||||||||||||||
| to be valid and are specified as `stat="D2_unbiased"`, `"Dz_unbiased"`, or | ||||||||||||||
| `"pi2_unbiased"`. | ||||||||||||||
|
|
||||||||||||||
| (sec_two_locus_summary_functions_two_way)= | ||||||||||||||
|
|
||||||||||||||
| (sec_stats_two_locus_summary_functions_two_way)= | ||||||||||||||
|
|
||||||||||||||
| ##### Two-way | ||||||||||||||
|
|
||||||||||||||
| Two-way statistics are indexed by sample sets {math}`i, j` and compute values | ||||||||||||||
| using haplotype counts within pairs of sample sets. | ||||||||||||||
|
|
||||||||||||||
| `D2` | ||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = D_i * D_j`, | ||||||||||||||
|
|
||||||||||||||
| where {math}`D` is defined above. | ||||||||||||||
|
|
||||||||||||||
| `r2` | ||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{(p_{AB_i} - (p_{A_i} p_{B_i})) (p_{AB_j} - (p_{A_j} p_{B_j}))}{\sqrt{p_{A_i} (1 - p_{A_i}) p_{B_i} (1 - p_{B_i})}\sqrt{p_{A_j} (1 - p_{A_j}) p_{B_j} (1 - p_{B_j})}}` | ||||||||||||||
|
|
||||||||||||||
| And `D2_unbiased`, which can be found in [Ragsdale and Gravel | ||||||||||||||
| (2020)](https://doi.org/10.1093/molbev/msz265). | ||||||||||||||
|
|
||||||||||||||
apragsdale marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
|
|
||||||||||||||
| (sec_stats_notes)= | ||||||||||||||
|
|
||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this note about polarisation should be down below with the "weighting" schemes. Since isn't polarisation just chosen so that "people get what they expect"? So perhaps bundle those two bits into a "Computational details" section?