-
Notifications
You must be signed in to change notification settings - Fork 2
Description
This is a series of issues that resurfaced when applying my code to the 10k PBMC dataset.
Least absolute root produces wrong quantile for low genes
BigSur has been using a third-order Cornish-Fisher expansion to approximate the null distribution. We can see they added this back in Aug 2023. I followed their footsteps and added a fourth order by the Handbook (p. 935).
However, the resulting quantile function happened to curve back downward and intersect the abscissa at multiple roots, especially with lowly expressed genes. In BigSurR, this was handled by taking the least absolute root. And so was my implementation.
BigSurR/R/mcFanoCornishFisher.R
Line 47 in bc499af
| root <- ifelse( !all(is.na(re.roots)), min(abs(re.roots), na.rm=T), NA) |
But we will eventually find some genes with a paradoxical outcome. Here, FAM138A's mcFano is lower than 1, yet its quantile becomes a positive number. Indeed, it has multiple roots, and some are negative that fit the description.
r$> df
names gene_means cv mcFano quantile p_val padj_BH
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FAM138A 0.0000834 0.563 0.877 0.429 0.334 0.557julia> roots
[Root((0.429331084949 ± 2.88055213016e-9)_com, :unique),
Root((-0.867559672201 ± 4.98239438596e-9)_com, :unique),
Root((1.8693150774 ± 4.70093475258e-9)_com, :unique),
Root((-2.30779528824 ± 3.63892738164e-9)_com, :unique),
Root((4.84028938378 ± 1.32585293855e-8)_com, :unique)]We also note this gene has a (mean, sd) of (1, 1.19) for the null distribution of its modified Fano factor. Generally, in a lower expressed gene, the uncertainty in this estimate is higher.
source data of mean and sd of null distribution
Row │ names gene_means mcFano null_mu null_sd
│ String Interval… Interval… Interval… Interval…
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ FAM138A 8.33611e-5 ± 1.35525e-20 0.876804 ± 6.04228e-12 1.00008 ± 1.15548e-11 1.18915 ± 2.18447e-11
2 │ AL627309.1 0.00266756 ± 4.33681e-19 0.979526 ± 1.84007e-10 1.00008 ± 3.68553e-10 0.210901 ± 1.24896e-10
3 │ AL627309.3 0.000583528 ± 1.0842e-19 1.1019 ± 4.08202e-11 1.00008 ± 8.07097e-11 0.44974 ± 5.78869e-11
4 │ AL627309.5 0.0507669 ± 6.93889e-18 0.969974 ± 3.56315e-9 1.00008 ± 6.87926e-9 0.0512242 ± 6.57085e-10
5 │ AL627309.4 0.00116706 ± 2.1684e-19 1.06593 ± 9.25633e-11 1.00008 ± 1.61355e-10 0.318249 ± 8.20711e-11
6 │ AL669831.2 0.000666889 ± 1.0842e-19 0.728413 ± 4.63957e-11 1.00008 ± 9.22331e-11 0.420738 ± 6.19056e-11
7 │ LINC01409 0.0775258 ± 1.38778e-17 1.04155 ± 5.37956e-9 1.00008 ± 1.03954e-8 0.0427154 ± 8.86665e-10
8 │ FAM87B 0.000916972 ± 1.0842e-19 0.815408 ± 7.5449e-11 1.00008 ± 1.26798e-10 0.35892 ± 7.26688e-11
9 │ LINC01128 0.0614371 ± 6.93889e-18 1.28693 ± 4.84178e-9 1.00008 ± 8.29015e-9 0.0471329 ± 7.49545e-10
10 │ LINC00115 0.011087 ± 1.73472e-18 1.04205 ± 7.96238e-10 1.00008 ± 1.52648e-9 0.104544 ± 2.63956e-10This suggests that we may not just take the least absolute root as the answer, but their sign and magnitude must also match the deviation of modified Fano factor to make sense.
But actually, this was only a symptom of the problem, and the real cause lies in the validity of the Cornish-Fisher expansion methodology.