inferGenotypeBayesian - Infer a subject-specific genotype using a Bayesian approach
inferGenotypeBayesian infers an subject’s genotype by applying a Bayesian framework
with a Dirichlet prior for the multinomial distribution. Up to four distinct alleles are
allowed in an individual’s genotype. Four likelihood distributions were generated by
empirically fitting three high coverage genotypes from three individuals
(Laserson and Vigneault et al, 2014). A posterior probability is calculated for the
four most common alleles. The certainty of the highest probability model was
calculated using a Bayes factor (the most likely model divided by second-most likely model).
The larger the Bayes factor (K), the greater the certainty in the model.
inferGenotypeBayesian(data, germline_db = NA, novel = NA, v_call = "V_CALL", find_unmutated = TRUE, priors = c(0.6, 0.4, 0.4, 0.35, 0.25, 0.25, 0.25, 0.25, 0.25))
data.framecontaining V allele calls from a single subject. If
TRUE, then the sample IMGT-gapped V(D)J sequence should be provided in a column
- named vector of sequences containing the
germline sequences named in
allele_calls. Only required if
- an optional
data.frameof the type novel returned by findNovelAlleles containing germline sequences that will be utilized if
TRUE. See Details.
- column in
datawith V allele calls. Default is
germline_dbto find which samples are unmutated. Not needed if
allele_callsonly represent unmutated samples.
- a numeric vector of priors for the multinomial distribution.
priorsvector must be nine values that defined the priors for the heterozygous (two allele), trizygous (three allele), and quadrozygous (four allele) distributions. The first two values of
priorsdefine the prior for the heterozygous case, the next three values are for the trizygous case, and the final four values are for the quadrozygous case. Each set of priors should sum to one. Note, each distribution prior is actually defined internally by set of four numbers, with the unspecified final values assigned to
0; e.g., the heterozygous case is
c(priors, priors, 0, 0). The prior for the homozygous distribution is fixed at
c(1, 0, 0, 0).
data.frame of alleles denoting the genotype of the subject with the log10
of the likelihood of each model and the log10 of the Bayes factor. The output
contains the following columns:
GENE: The gene name without allele.
ALLELES: Comma separated list of alleles for the given
COUNTS: Comma separated list of observed sequences for each corresponding allele in the
TOTAL: The total count of observed sequences for the given
NOTE: Any comments on the inferrence.
KH: log10 likelihood that the
KD: log10 likelihood that the
KT: log10 likelihood that the
KQ: log10 likelihood that the
K_DIFF: log10 ratio of the highest to second-highest zygosity likelihoods.
Allele calls representing cases where multiple alleles have been
assigned to a single sample sequence are rare among unmutated
sequences but may result if nucleotides for certain positions are
not available. Calls containing multiple alleles are treated as
belonging to all groups. If
novel is provided, all
sequences that are assigned to the same starting allele as any
novel germline allele will have the novel germline allele appended
to their assignent prior to searching for unmutated sequences.
This method works best with data derived from blood, where a large portion of sequences are expected to be unmutated. Ideally, there should be hundreds of allele calls per gene in the input.
- Laserson U and Vigneault F, et al. High-resolution antibody dynamics of vaccine-induced immune responses. PNAS. 2014 111(13):4928-33.
# Infer IGHV genotype, using only unmutated sequences, including novel alleles inferGenotypeBayesian(SampleDb, germline_db=GermlineIGHV, novel=SampleNovel, find_unmutated=TRUE)
GENE ALLELES COUNTS TOTAL NOTE KH 1 IGHV1-2 02,04 664,302 966 -1000 2 IGHV1-3 01 226 226 4.20089197988625 3 IGHV1-8 01,02_G234T 467,370 837 -1000 4 IGHV1-18 01 1005 1005 -3.76643736033536 5 IGHV1-24 01 105 105 4.75335701924247 6 IGHV1-46 01 624 624 0.457455409315221 7 IGHV1-58 01,02 23,18 41 -20.3932114156223 8 IGHV1-69 01,04,06,02 515,469,280,15 1279 -1000 9 IGHV1-69-2 01 31 31 4.16107190423977 KD KT KQ K_DIFF 1 -7.92846809405969 -139.556367176944 -313.583949130729 131.627899082884 2 -45.2911957825576 -84.2865868763307 -128.991761853586 49.4920877624439 3 -1.04759115960507 -102.524664723923 -247.193958844361 101.477073564318 4 -223.85293382607 -1000 -1000 220.086496465735 5 -18.2407545518045 -36.3580822723628 -57.1281856909991 22.9941115710469 6 -136.193264784335 -243.861955237939 -1000 136.65072019365 7 3.60009261357983 -1.38512929425796 -8.47869574581951 4.9852219078378 8 -277.291087469703 3.55051520054669 -143.380669247128 146.931184447674 9 -2.62766579768837 -7.97659112471034 -14.1087168959268 6.78873770192814