Skip to content

Validity and correction of Cornish-Fisher expansion, adapting Maillard and colleagues' work. #7

@hsuknowledge

Description

@hsuknowledge

This is a series of issues that resurfaced when applying my code to the 10k PBMC dataset.

Least absolute root produces wrong quantile for low genes

BigSur has been using a third-order Cornish-Fisher expansion to approximate the null distribution. We can see they added this back in Aug 2023. I followed their footsteps and added a fourth order by the Handbook (p. 935).

However, the resulting quantile function happened to curve back downward and intersect the abscissa at multiple roots, especially with lowly expressed genes. In BigSurR, this was handled by taking the least absolute root. And so was my implementation.

root <- ifelse( !all(is.na(re.roots)), min(abs(re.roots), na.rm=T), NA)

But we will eventually find some genes with a paradoxical outcome. Here, FAM138A's mcFano is lower than 1, yet its quantile becomes a positive number. Indeed, it has multiple roots, and some are negative that fit the description.

r$> df
   names      gene_means     cv mcFano quantile  p_val  padj_BH
   <chr>           <dbl>  <dbl>  <dbl>    <dbl>  <dbl>    <dbl>
 1 FAM138A     0.0000834  0.563  0.877    0.429  0.334    0.557
julia> roots
[Root((0.429331084949 ± 2.88055213016e-9)_com, :unique),
Root((-0.867559672201 ± 4.98239438596e-9)_com, :unique),
Root((1.8693150774 ± 4.70093475258e-9)_com, :unique),
Root((-2.30779528824 ± 3.63892738164e-9)_com, :unique),
Root((4.84028938378 ± 1.32585293855e-8)_com, :unique)]

We also note this gene has a (mean, sd) of (1, 1.19) for the null distribution of its modified Fano factor. Generally, in a lower expressed gene, the uncertainty in this estimate is higher.

source data of mean and sd of null distribution
   Row │ names       gene_means                 mcFano                  null_mu                null_sd
       │ String      Interval                  Interval               Interval              Interval
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │ FAM138A     8.33611e-5 ± 1.35525e-20   0.876804 ± 6.04228e-12  1.00008 ± 1.15548e-11  1.18915 ± 2.18447e-11
     2 │ AL627309.1  0.00266756 ± 4.33681e-19   0.979526 ± 1.84007e-10  1.00008 ± 3.68553e-10  0.210901 ± 1.24896e-10
     3 │ AL627309.3  0.000583528 ± 1.0842e-19   1.1019 ± 4.08202e-11    1.00008 ± 8.07097e-11  0.44974 ± 5.78869e-11
     4 │ AL627309.5  0.0507669 ± 6.93889e-18    0.969974 ± 3.56315e-9   1.00008 ± 6.87926e-9   0.0512242 ± 6.57085e-10
     5 │ AL627309.4  0.00116706 ± 2.1684e-19    1.06593 ± 9.25633e-11   1.00008 ± 1.61355e-10  0.318249 ± 8.20711e-11
     6 │ AL669831.2  0.000666889 ± 1.0842e-19   0.728413 ± 4.63957e-11  1.00008 ± 9.22331e-11  0.420738 ± 6.19056e-11
     7 │ LINC01409   0.0775258 ± 1.38778e-17    1.04155 ± 5.37956e-9    1.00008 ± 1.03954e-8   0.0427154 ± 8.86665e-10
     8 │ FAM87B      0.000916972 ± 1.0842e-19   0.815408 ± 7.5449e-11   1.00008 ± 1.26798e-10  0.35892 ± 7.26688e-11
     9 │ LINC01128   0.0614371 ± 6.93889e-18    1.28693 ± 4.84178e-9    1.00008 ± 8.29015e-9   0.0471329 ± 7.49545e-10
    10 │ LINC00115   0.011087 ± 1.73472e-18     1.04205 ± 7.96238e-10   1.00008 ± 1.52648e-9   0.104544 ± 2.63956e-10

This suggests that we may not just take the least absolute root as the answer, but their sign and magnitude must also match the deviation of modified Fano factor to make sense.

But actually, this was only a symptom of the problem, and the real cause lies in the validity of the Cornish-Fisher expansion methodology.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions