diff --git a/docs/stats.md b/docs/stats.md
index 87a2ed5f83..78255089ba 100644
--- a/docs/stats.md
+++ b/docs/stats.md
@@ -684,14 +684,368 @@ and boolean expressions (e.g., {math}`(x > 0)`) are interpreted as 0/1.
   and {math}`v_j` is the covariance of the trait with the j-th covariate.
 
 
+(sec_stats_multi_site)=
+
 ## Multi site statistics
 
-:::{todo}
-Document statistics that use information about correlation between sites, such as
-LdCalculator (and perhaps reference {ref}`sec_identity`). Note that if we have a general
-framework which has the same calling conventions as the single site stats,
-we can rework the sections above.
-:::
+(sec_stats_two_locus)=
+
+### Two-locus statistics
+
+The {meth}`~TreeSequence.ld_matrix` method provides an interface to
+a collection of two-locus statistics with predefined summary functions (see
+{ref}`sec_stats_two_locus_summary_functions`).
+The LD matrix method differs from other
+statistics methods in that it provides a unified API with an argument to
+specify different two-locus summaries of the data. It otherwise behaves
+similarly to most other functions with respect to `sample_sets` and `indexes`.
+
+Two-locus statistics can be computed using two {ref}`modes <sec_stats_mode>`,
+either `site` or `branch`, and these should be interpreted in the same way as
+these modes in the single-site statistics. That is, the `site` mode computes LD
+over observed alleles at pairs of sites, while the `branch` model computes
+expected LD conditioned on pairs of trees.
+
+(sec_stats_two_locus_site)=
+
+#### Site mode
+
+The `"site"` mode computes two-locus statistics summarized over alleles between
+all pairs of specified sites. The default behavior, leaving `sites`
+unspecified, will compute a matrix for all pairs of sites, with one row and
+column for each site in the tree sequence (i.e., an {math}`n \times n` matrix
+where {math}`n` is the number of sites in the tree sequence). We can also
+restrict the output to a subset of sites, either by specifying a single vector
+of site indexes for both rows and columns or a pair of vectors for the row
+sites and column sites separately.
+
+The following computes a matrix of the {math}`r^2` measure of linkage
+disequilibrium (LD) computed pairwise between the first 4 sites in the tree
+sequence among all samples. The `sites` must be given as a list of lists, and
+with a single list of sites specified, we obtain a symmetric square matrix. 
+
+```{code-cell} ipython3
+ld = ts.ld_matrix(sites=[[0, 1, 2, 3]])
+print(ld)
+```
+
+The following demonstrates how we can specify the row and column sites
+independently of each other. We're specifying 2 rows and 3 columns, which
+computes a subset of the matrix shown above.
+
+```{code-cell} ipython3
+ld = ts.ld_matrix(sites=[[1, 2], [1, 2, 3]])
+print(ld)
+```
+
+#### Computational details
+
+Because we allow for two-locus statistics to be computed for multi-allelic
+data, we need to be able to combine statistical results from each pair of
+alleles into one summary for a pair of sites. This does not affect biallelic
+data (and so this section can be skipped on first reading).
+We use two implementations for
+combining results from multiple alleles: `hap_weighted` and `total_weighted`.
+These are statistic-specific and not chosen by the user, with choices motivated
+by [Zhao (2007)](https://doi.org/10.1017/S0016672307008634).
+
+Briefly, consider a pair of sites with {math}`n` alleles at the first locus and
+{math}`m` alleles at the second. (Whether this includes the ancestral allele
+depends on whether the statistic is polarised.) Write {math}`f_{i,j}` as the
+statistic computed for focal alleles {math}`A_i` and {math}`B_j`. Then the
+weighting schemes are defined as:
+
+- `hap_weighted`: {math}`\sum_{i=1}^{n}\sum_{j=1}^{m}p(A_{i}B_{j})f_{ij}`,
+  where {math}`p(A_{i}B_{j})` is the frequency of haplotype {math}`A_{i}B_{j}`.
+  This method was first introduced in [Karlin
+  (1981)](https://doi.org/10.1111/j.1469-1809.1981.tb00308.x) and reviewed in
+  [Zhao (2007)](https://doi.org/10.1017/S0016672307008634).
+
+- `total_weighted`: {math}`\frac{1}{n m}\sum_{i=1}^{n}\sum_{j=1}^{m}f_{ij}`.
+  This method assigns equal weight to each of the possible pairs of focal
+  alleles at the two sites, taking the arithmetic mean of statistics over
+  focal haplotypes.
+
+Out of all of the available summary functions, only {math}`r^2` uses
+`hap_weighted` normalisation, with the remainder using uniform weighting
+(`total_weighted`).
+
+Within this framework, statistics may be either polarised or unpolarised. For
+statistics that are polarised, we compute statistic values for pairs of derived
+alleles. (For this purpose, the "derived" alleles at a site are all alleles
+except that stored as the ``ancestral_state`` for the site.) Unpolarised
+statistics compute statistics over all pairs of alleles, derived and ancestral.
+In either case, the result is averaged over these values, using one of the
+weighting scheme (described below for each statistics). The option for
+polarisation is not exposed to the user, and we list which statistics are
+polarised below.
+
+(sec_stats_two_locus_branch)=
+
+#### Branch mode
+
+The `"branch"` mode computes expected two-locus statistics between pairs of
+trees, conditioned on the marginal topologies and branch lengths of those
+trees. The trees for which we compute statistics are specified by positions,
+and for a pair of positions we consider all possible haplotypes that could be
+generated by a single mutation occurring on each of the two trees.
+
+For two trees, one with {math}`n` branches and the other with {math}`m`
+branches, there are {math}`nm` possible pairs of branches that may carry the
+pair of mutations. For each pair, we compute the two-locus statistic, and then
+sum these values weighted by the product of the two branch lengths. Given that
+the two mutations occur, this accounts for the relative probability that the
+two mutations fall on any pair of branches.
+
+In other words, imagine we place two mutations uniformly, one on each tree, and
+then compute the statistic. The branch mode computes the expected value of the
+statistic over this process, multiplied by the product of the total branch
+lengths of each tree. This weighting accounts for mutational opportunity, so that
+the sum of the branch-mode statistic over all positions in a genomic region,
+multiplied by a mutation rate,  is equal to the expected sum of the two-locus site
+statistic over all mutations falling in that region under an infinite-sites model.
+
+The time complexity of this method is quadratic in the number of samples,
+due to the pairwise comparisons of branches from each pair of trees.
+By default, this method computes
+a symmetric matrix for all pairs of trees, with rows and columns representing
+each tree in the tree sequence. Similar to the site method, we can restrict the
+output to a subset of trees, either by specifying a vector of positions or
+a pair of vectors for row and column positions separately. To select a specific
+tree, the specified positions must land in the tree span (`[start, end)`).
+
+In the following, we compute a matrix of expected {math}`r^2` within and
+between the first 4 trees in the tree sequence. The tree breakpoints are
+a convenient way to specify those first four trees.
+
+```{code-cell} ipython3
+ld = ts.ld_matrix(
+    mode="branch",
+    positions=[ts.breakpoints(as_array=True)[0:4]]
+)
+print(ld)
+```
+
+We note that these values are quite large: as described above, the statistic is
+scaled by the product of the total branch lengths of each pair of trees. To
+compute the expected {math}`r^2` value for a pair of mutations that each land
+uniformly on the pair of trees, we can divide by the product of the total
+branch lengths:
+
+```{code-cell} ipython3
+total_branch_lengths = [tree.total_branch_length for tree in ts.trees()]
+prod_branch_lengths = np.outer(total_branch_lengths, total_branch_lengths)
+print(ld / prod_branch_lengths[0:4, 0:4])
+```
+
+As with the `"site"` mode above, we can specify the row and column trees
+separately.
+
+```{code-cell} ipython3
+breakpoints = ts.breakpoints(as_array=True)
+ld = ts.ld_matrix(
+    mode="branch",
+    positions=[breakpoints[[0]], breakpoints[0:4]]
+)
+print(ld)
+```
+
+(sec_stats_two_locus_sample_sets)=
+
+#### Sample Sets
+
+Without specifying `sample_sets` or `indexes`, the `ld_matrix()` method
+computes statistics over a single sample set that includes all samples in the
+tree sequence. The API allows for the specification of a subset or multiple
+subsets of samples, so that a separate LD matrix can be computed for each. If
+`sample_sets` is specified as a single list of samples, then a single LD matrix
+is returned. A list of lists of samples will return a 3D array containing an LD
+matrix for each list of samples.
+
+Some LD statistics can be computed between sample sets (two-way statistics are
+specified below), in which case `indexes` must be specified that reference the
+indexes of the `sample_sets`, which must be a list of lists of sample nodes.
+This results in an LD matrix computed for each list of indexes. The statistics
+are selected in the same way (with the `stat` argument), and these are limited
+to a handful of statistics (see
+{ref}`sec_stats_two_locus_summary_functions_two_way`). The dimension-dropping
+rules for the result follow the rest of the tskit stats API in that a single
+list or tuple will produce a single two-dimensional matrix, while a list of
+these will produce a three-dimensional array, with the first dimension of
+length equal to the length of the list.
+
+For example, to compute the {math}`r^2` LD matrix over a subset of samples in
+the tree sequence (such as sample nodes 0 through 7), we would specify the
+samples as follows:
+
+```{code-cell} ipython3
+ts = msprime.sim_ancestry(
+    20,
+    population_size=10000,
+    sequence_length=1000,
+    recombination_rate=2e-8,
+    random_seed=12)
+ts = msprime.sim_mutations(ts, rate=2e-8, random_seed=12)
+
+ld = ts.ld_matrix(mode="site", sample_sets=range(8))
+print(ld)
+```
+
+We would get the following dimensions with the specified
+`sample_sets` and `indexes` arguments.
+
+```
+# one-way
+ts.ld_matrix(sample_sets=None) # -> 2 dimensions
+ts.ld_matrix(sample_sets=[0, 1, 2, 3]) # -> 2 dimensions
+ts.ld_matrix(sample_sets=[[0, 1, 2, 3]]) # -> 3 dimensions
+# two-way
+ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=(0, 1)) # -> 2 dimensions
+ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=[(0, 1)]) # -> 3 dimensions
+```
+
+#### Why are there `nan` values in the LD matrix?
+
+For some statistics, it is possible to observe `nan` entries in the LD matrix,
+which can be surprising and may numerically impact downstream analyses. A `nan`
+entry occurs if the denominator of a ratio statistic (including {math}`r` and
+{math}`r^2`) is zero, indicating that one or both of the alleles in the pair is
+fixed or absent in the given sample set(s). This can happen for
+a number of reasons:
+
+- Some mutation models allow for reversible mutations, so a back mutation at
+  a site can result in a single allele despite multiple mutations in the
+  history of the sample.
+- LD is computed for a subsample of individuals, and some sites are not
+  variable among the sample nodes in the subsample.
+- A mutation exists above the root of the local tree, so that all samples carry
+  the mutation, and one or more sites are not variable.
+
+The `branch` mode will also return `nan` values for ratio statistics if there
+are branches in either tree on which a mutation would not result in
+a polymorphism within a sample set.
+
+(sec_stats_two_locus_sample_one_way_stats)=
+
+#### One-way Statistics
+
+One-way statistics are summaries of two loci in a single sample set, using
+a triple of haplotype counts {math}`\{n_{AB}, n_{Ab}, n_{aB}\}` and the size of
+the sample set {math}`n`, where the capitalized and lowercase letters in our
+notation represent alternate alleles.
+
+(sec_stats_two_locus_sample_two_way_stats)=
+
+#### Two-way Statistics
+
+Two-way statistics are summaries of haplotype counts between two sample sets,
+which operate on the three haplotype counts (as in one-way stats, above)
+computed from each sample set, indexed by `(i, j)`. These statistics take on
+a different meaning from their one-way counterparts. For instance `stat="D2"`
+over a pair of sample sets computes {math}`D_i D_j`, which is the product of
+the covariance measure of LD within each sample set and is related to the
+covariance of {math}`D` between sample sets.
+
+Only a subset of our summary functions are two-way statistics (see
+{ref}`sec_two_locus_summary_functions_two_way`). Note that the unbiased two-way
+statistics expect non-overlapping sample sets (see [Ragsdale and Gravel
+(2020)](https://doi.org/10.1093/molbev/msz265)), and we do not make any
+assertions about the sample sets and assume that `i` and `j` represent disjoint
+sets of samples (see also the note in {meth}`~TreeSequence.divergence`).
+
+(sec_stats_two_locus_summary_functions)=
+
+#### Summary Functions
+
+(sec_stats_two_locus_summary_functions_one_way)=
+
+##### One-way
+
+The two-locus summary functions all take haplotype counts and sample set size
+as input. Each of our summary functions has the signature
+{math}`f(n_{AB}, n_{Ab}, n_{aB}, n)`, converting to haplotype frequencies
+{math}`\{p_{AB}, p_{Ab}, p_{aB}\}` by dividing by {math}`n`. Below,
+{math}`n_{ab} = n - n_{AB} - n_{Ab} - n_{aB}`, {math}`n_A = n_{AB} + n_{Ab}`
+and {math}`n_B = n_{AB} + n_{aB}`, with frequencies {math}`p` found by dividing
+by {math}`n`.
+
+Our convention is to use {math}`A,B` to denote derived alleles, and {math}`a,b`
+ancestral alleles (or other alleles, if the site is multi-allelic). For
+polarised statistics, we average statistics over all non-ancestral alleles. For
+unpolarised statistics, the labeling is arbitrary as we average over all
+alleles (derived and ancestral).
+
+`D`
+: {math}`f(n_{AB}, n_{Ab}, n_{aB}, n) = p_{AB}p_{ab} - p_{Ab}p_{aB} \, (=p_{AB} - p_A p_B)`
+
+  This statistic is polarised, as the unpolarised result, which averages over
+  allele labelings, is zero. Uses the `total` weighting method.
+
+`D_prime`
+: {math}`f(n_{AB}, n_{Ab}, n_{aB}, n) = \frac{D}{D_{\max}}`,
+
+  where {math}`D_{\max} = \begin{cases}
+            \min\{p_A (1-p_B), p_B (1-p_B)\} & \textrm{if }D>=0 \\
+            \min\{p_A p_B, (1-p_B) (1-p_B)\} & \textrm{if }D<0
+        \end{cases}`
+
+  and {math}`D` is defined above. Polarised, `total` weighted.
+
+`D2`
+: {math}`f(n_{AB}, n_{Ab}, n_{aB}, n) = D^2`
+
+  and {math}`D` is defined above. Unpolarised, `total` weighted.
+
+`Dz`
+:  {math}`f(n_{AB}, n_{Ab}, n_{aB}, n) = D (1 - 2 p_A) (1 - 2 p_B)`,
+
+  where {math}`D` is defined above. Unpolarised, `total` weighted.
+
+`pi2`
+: {math}`f(n_{AB}, n_{Ab}, n_{aB}, n) = p_A (1-p_A) p_B (1-p_B)`
+
+  Unpolarised, `total` weighted.
+
+`r`
+: {math}`f(n_{AB}, n_{Ab}, n_{aB}, n) = \frac{D}{\sqrt{p_A (1-p_A) p_B (1-p_B)}}`,
+
+  where {math}`D` is defined above. Polarised, `total` weighted.
+
+`r2`
+: {math}`f(n_{AB}, n_{Ab}, n_{aB}, n) = \frac{D^{2}}{p_A (1-p_A) p_B (1-p_B))}`,
+
+  where {math}`D` is defined above. Unpolarised, `haplotype` weighted.
+
+Unbiased two-locus statistics from the Hill-Robertson (1968) system are
+computed from haplotype counts. Definitions of these unbiased estimators can
+be found in [Ragsdale and Gravel
+(2020)](https://doi.org/10.1093/molbev/msz265). They require at least 4 samples
+to be valid and are specified as `stat="D2_unbiased"`, `"Dz_unbiased"`, or
+`"pi2_unbiased"`.
+
+(sec_two_locus_summary_functions_two_way)=
+
+(sec_stats_two_locus_summary_functions_two_way)=
+
+##### Two-way
+
+Two-way statistics are indexed by sample sets {math}`i, j` and compute values
+using haplotype counts within pairs of sample sets.
+
+`D2`
+: {math}`f(n_{AB}, n_{Ab}, n_{aB}, n) = D_i D_j`,
+
+  where {math}`D_i` denotes {math}`D` computed within sample set {math}`i`,
+  and {math}`D` is defined above. Unpolarised, `total` weighted.
+
+`r2`
+: {math}`f(n_{AB}, n_{Ab}, n_{aB}, n) = r_i r_j`,
+
+  where {math}`r_i` denotes {math}`r` computed within sample set {math}`i`,
+  and {math}`r` is defined above. Unpolarised, `haplotype` weighted.
+
+And `D2_unbiased`, which can be found in [Ragsdale and Gravel
+(2020)](https://doi.org/10.1093/molbev/msz265).
 
 
 (sec_stats_notes)=
diff --git a/python/CHANGELOG.rst b/python/CHANGELOG.rst
index 916412cc81..bc9b6217da 100644
--- a/python/CHANGELOG.rst
+++ b/python/CHANGELOG.rst
@@ -7,6 +7,12 @@ In development
 - Add ``json+struct`` metadata codec that allows storing binary data using a struct
   schema alongside JSON metadata. (:user:`benjeffery`, :pr:`3306`)
 
+**Features**
+
+- Add ``TreeSequence.ld_matrix`` stats method and documentation, for computing
+  two-locus statistics in site and branch mode.
+  (:user:`lkirk`, :user:`apragsdale`, :pr:`3416`) 
+
 --------------------
 [1.0.2] - 2026-03-06
 --------------------
diff --git a/python/tskit/trees.py b/python/tskit/trees.py
index 45d2da59e0..a370daf17f 100644
--- a/python/tskit/trees.py
+++ b/python/tskit/trees.py
@@ -10930,12 +10930,106 @@ def impute_unknown_mutations_time(
     def ld_matrix(
         self,
         sample_sets=None,
-        sites=None,
-        positions=None,
         mode="site",
         stat="r2",
+        sites=None,
+        positions=None,
         indexes=None,
     ):
+        r"""
+
+        Returns a matrix of the specified two-locus statistic (default
+        :math:`r^2`) computed from sample allelic states or branch lengths.
+        The resulting linkage disequilibrium (LD) matrix represents either the
+        two-locus statistic as computed between all pairs of specified
+        ``sites`` (``"site"`` mode, producing a
+        ``len(sites)``-by-``len(sites)`` sized matrix), or as computed from the
+        branch structures at marginal trees between pairs of trees at all
+        specified ``positions`` (``"branch"`` mode, producing a
+        ``len(positions)``-by-``len(positions)`` sized matrix).
+
+        The sites considered for ``"site"`` mode defaults to all sites (which may
+        result in a very large matrix!), but can be restricted using
+        the ``sites`` argument. Sites must be passed as a list of lists,
+        specifying the ``[row_sites, col_sites]``, resulting in a
+        rectangular matrix, or by specifying a single list of ``[sites]``, in
+        which a square matrix will be produced (see
+        :ref:`sec_stats_two_locus_site` for examples). Here, ``sites``,
+        ``row_sites``, and ``col_sites`` are each lists of site indexes.
+
+        Similarly, in the ``"branch"`` mode, the ``positions`` argument specifies
+        genomic coordinates at which the expectation for the two-locus statistic
+        is computed, given the local tree structure.
+        (See :ref:`sec_stats_two_locus_branch` for explanation of in what sense
+        this is an expectation.) This defaults to computing
+        the LD for each pair of distinct trees (this is equivalent to passing in
+        the leftmost coordinates of each tree's span, since intervals are closed on
+        the left and open on the right). Similar to the site mode, a nested list
+        of row and column positions can be specified separately (resulting in a
+        rectangular matrix) or a single list of a specified positions results
+        in a square matrix (see :ref:`sec_stats_two_locus_branch` for
+        examples). Like ``sites``, the ``positions`` must be specified as a list
+        of lists.
+
+        Some LD statistics are defined for both within a single set of samples
+        and for two sample sets. If the ``indexes`` argument is specified, then
+        ``indexes`` specifies the indexes of the sample sets in the
+        ``sample_sets`` list between which to compute LD. For instance, this
+        results in a 3D array whose ``[k,:,:]``-th slice contains LD values
+        between ``sample_sets[i]`` and ``sample_sets[j]``, where ``(i, j)`` is
+        the ``k``-th element of ``indexes``.
+
+        For more on how the ``indexes`` and ``sample_sets`` interact with the
+        output dimensions, see the :ref:`sec_stats_two_locus_sample_sets`
+        section. Statistics are defined in the
+        :ref:`sec_stats_two_locus_summary_functions_two_way` section.
+
+        **Available Stats** (use ``Stat Name`` in the ``stat`` keyword
+        argument). Statistics marked as "multi sample set" allow
+        (but do not require) computation from two sample sets
+        via the ``indexes`` argument.
+
+        ======================= ========== ================ ==============
+        Stat                     Polarised Multi Sample Set Stat Name
+        ======================= ========== ================ ==============
+        :math:`r^2`              n          y               "r2"
+        :math:`r`                y          n               "r"
+        :math:`D^2`              n          y               "D2"
+        :math:`D`                y          n               "D"
+        :math:`D'`               y          n               "D_prime"
+        :math:`D_z`              n          n               "Dz"
+        :math:`\pi_2`            n          n               "pi2"
+        :math:`\widehat{D^2}`    n          y               "D2_unbiased"
+        :math:`\widehat{D_z}`    n          n               "Dz_unbiased"
+        :math:`\widehat{\pi_2}`  n          n               "pi2_unbiased"
+        ======================= ========== ================ ==============
+
+        :param list sample_sets: A list, or a list of lists of sample node IDs,
+            specifying the groups of nodes to compute the statistic with. Defaults
+            to all samples.
+        :param str mode: A string giving the "type" of the statistic to be
+            computed. Defaults to "site", can be "site" or "branch".
+        :param str stat: A string giving the selected two-locus statistic to
+            compute. Defaults to "r2".
+        :param list sites: A list of lists of sites over which to compute an
+            LD matrix. Can be specified as a list of lists to control the row
+            and column sites. Only available in "site" mode. Specify as
+            ``[row_sites, col_sites]`` or ``[all_sites]``.
+            Defaults to all sites.
+        :param list positions: A list of lists of genomic positions where
+            expected LD is computed based on tree topologies and branch
+            lengths. Only applicable in "branch" mode. Specify as a list of
+            two lists to control the row and column positions, as
+            ``[row_positions, col_positions]``, or ``[all_positions]``.
+            Defaults to the leftmost coordinates of all trees and computes
+            LD between all pairs of trees.
+        :param list indexes: A list of 2-tuples or a single 2-tuple, specifying
+            the indexes of two sample sets over which to compute a two-way LD
+            statistic. Only :math:`r^2`, :math:`D^2`, and :math:`\widehat{D^2}`
+            are implemented for two-way statistics.
+        :return: A 2D or 3D array of LD matrices.
+        :rtype: numpy.ndarray
+        """
         one_way_stats = {
             "D": self._ll_tree_sequence.D_matrix,
             "D2": self._ll_tree_sequence.D2_matrix,