getPopularMutationCount - Find mutation counts for frequency sequences

Description

getPopularMutationCount determines which sequences occur frequently for each V gene and returns the mutation count of those sequences.

Usage

getPopularMutationCount(data, germline_db, v_call = "V_CALL",
seq = "SEQUENCE_IMGT", gene_min = 0.001, seq_min = 50,
seq_p_of_max = 1/8, full_return = FALSE)

Arguments

data
a data.frame in the Change-O format. See findNovelAlleles for a list of required columns.
germline_db
A named list of IMGT-gapped germline sequences.
v_call
name of the column in data with V allele calls. Default is V_CALL.
seq
name of the column in data with the aligned, IMGT-numbered, V(D)J nucleotide sequence. Default is SEQUENCE_IMG
gene_min
The portion of all unique sequences a gene must constitute to avoid exclusion.
seq_min
The number of copies of the V that must be present for to avoid exclusion.
seq_p_of_max
For each gene, fraction of the most common V sequence’s count that a sequence must meet to avoid exclusion.
full_return
If TRUE, will return all data columns and will include sequences with mutation count < 1.

Value

A data frame of genes that have a frequent sequence mutation count above 1.

Examples

getPopularMutationCount(SampleDb, SampleGermlineIGHV)
# A tibble: 1 x 2
  V_GENE  MUTATION_COUNT
  <chr>            <int>
1 IGHV1-8              1

See also

getMutatedPositions can be used to find which positions of a set of sequences are mutated.