| Title: | A Driver Discovery Tool for Cancer Whole Genomes |
| Version: | 1.2.1 |
| Description: | A method for finding enrichments of somatic single nucleotide variants (SNVs) and small insertions-deletions (Indels) in functional elements in the human genome. 'ActiveDriverWGS' detects coding and noncoding cancer driver elements using whole genome sequencing data. The method is part of the publication H. Zhu et al. (2020) <doi:10.1016/j.molcel.2019.12.027> "Candidate Cancer Driver Mutations in Distal Regulatory Elements and Long-Range Chromatin Interaction Networks" in Molecular Cell. |
| Depends: | R (≥ 3.5) |
| Imports: | BSgenome, Biostrings, GenomeInfoDb, GenomicRanges, IRanges, S4Vectors |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| LazyData: | true |
| Suggests: | BSgenome.Hsapiens.UCSC.hg19, BSgenome.Hsapiens.UCSC.hg38, BSgenome.Mmusculus.UCSC.mm9, BSgenome.Mmusculus.UCSC.mm10, knitr, testthat, rmarkdown |
| VignetteBuilder: | knitr |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | no |
| Packaged: | 2026-03-09 13:24:40 UTC; jreimand |
| Author: | Juri Reimand [aut, cre], Helen Zhu [ctb], Kevin Cheng [ctb] |
| Maintainer: | Juri Reimand <juri.reimand@utoronto.ca> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-13 13:00:08 UTC |
fix_all_results verifies that the results table has the correct format and p-values
Description
fix_all_results verifies that the results table has the correct format and p-values
Usage
.fix_all_results(all_results)
Arguments
all_results |
a data frame containing the following columns
|
Value
the same data frame
This function finds the tri-nucleotide context of mutations
Description
This function finds the tri-nucleotide context of mutations
Usage
.get_3n_context_of_mutations(mutations, this_genome)
Arguments
mutations |
A data frame with the following columns: chr, pos1, pos2, ref, alt, patient
|
this_genome |
The reference genome object of BSgenome, for example BSgenome.Hsapiens.UCSC.hg19::Hsapiens |
Value
A data frame consisting of the same columns as the original mutations data frame and sorted
by SNVs and Indels with an additional column tag which indicates the trinucleotide context of
the mutation
Calculates the number of expected mutations based
Description
Calculates the number of expected mutations based
Usage
.get_obs_exp(hyp, select_positions, dfr, colname)
Arguments
hyp |
hypothesis to be tested |
select_positions |
boolean column which indicates which positions are in the element of interest |
dfr |
a dataframe containing the data to be tested |
colname |
name of the column which indicates the count of mutations in the positions of interest |
Value
a list of observed mutations and expected mutations
Returns significant results
Description
Returns significant results
Usage
.get_signf_results(all_res)
Arguments
all_res |
a data frame containing the following columns
|
Value
the same data frame with three addition columns
- fdr_element
The FDR corrected p-value of the element
- fdr_site
The FDR corrected p-value of the site
- has_site_mutations
A V indicates the presence of site mutations
Makes mutational signatures
Description
Makes mutational signatures
Usage
.make_mut_signatures()
Value
a dataframe with mutational signatures
Splits a BED12 file into separate regions
Description
Splits a BED12 file into separate regions
Usage
.split_coord_fragments_in_BED(i, coords)
Arguments
i |
The i-th row of the coords data frame which needs to be split into separate elements |
coords |
The coords data frame which is the imported BED12 file |
Value
A data frame containing the following columns for a given BED12 identifier
- chr
autosomal chromosomes as chr1 to chr22 and sex chromosomes as chrX and chrY
- start
the start position of the element in base 0 coordinates (BED format)
- end
the end position of the element in base 0 coordinates (BED format)
- id
the element identifier - if the element contains multiple segments such as exons, each segment should be a separate row with the segment coordinates and the element identifier as id. Elements can be coding or noncoding such as exons of protein coding genes or active enhancers.
ADWGS_test executes the statistical test for ActiveDriverWGS
Description
ADWGS_test executes the statistical test for ActiveDriverWGS
Usage
ADWGS_test(
id,
gr_element_coords,
gr_site_coords,
gr_maf,
win_size,
this_genome,
detect_depleted_mutations = FALSE
)
Arguments
id |
A string used to identify the element of interest. |
gr_element_coords |
A GenomicRanges object that describes the elements of interest containing the chromosome, start and end coordinates, and an mcols column corresponding to id |
gr_site_coords |
A GenomicRanges object that describes the sites of interest which reside in the elements of interest containing the chromosome, start and end coordinates, and an mcols column corresponding to id. Examples of sites include transcription factor binding sites in promoter regions or phosphosites in exons of protein coding genes. An empty GenomicRanges object nullifies the requirement for sites to exist. |
gr_maf |
A GenomicRanges object that describes the mutations in the dataset containing the chromosome, start and end coordinates, patient id, and trinucleotide context |
win_size |
An integer indicating the size of the background window in base pairs that is used to establish the expected mutation rate and respective null model. The default is 50000bps |
this_genome |
The reference genome object of BSgenome, for example BSgenome.Hsapiens.UCSC.hg19::Hsapiens |
detect_depleted_mutations |
if TRUE, detect elements with significantly fewer than expected mutations. FALSE by default |
Value
A data frame containing the following columns
- id
A string identifying the element of interest
- pp_element
The p-value of the element
- element_muts_obs
The number of patients with a mutation in the element
- element_muts_exp
The expected number of patients with a mutation in the element with respect to background
- element_enriched
A boolean indicating whether the element is enriched in mutations
- pp_site
The p-value of the site
- site_muts_obs
The number of patients with a mutation in the site
- site_muts_exp
The expected number of patients with a mutation in the site with respect to element
- site_enriched
A boolean indicating whether the site is enriched in mutations
- result_number
A numeric indicator denoting the order in which the results were calculated
- fdr_element
The FDR corrected p-value of the element
- fdr_site
The FDR corrected p-value of the site
- has_site_mutations
A V indicates the presence of site mutations
Examples
library(GenomicRanges)
# Regions
data(cancer_genes)
gr_element_coords = GRanges(seqnames = cancer_genes$chr,
IRanges(start = cancer_genes$start, end = cancer_genes$end),
mcols = cancer_genes$id)
# Sites (NULL)
gr_site_coords = GRanges(c(seqnames=NULL,ranges=NULL,strand=NULL))
# Reference genome
this_genome = BSgenome.Hsapiens.UCSC.hg19::Hsapiens
# Mutations
data(cll_mutations)
cll_mutations = format_muts(cll_mutations, this_genome = this_genome)
gr_maf = GRanges(cll_mutations$chr,
IRanges(cll_mutations$pos1, cll_mutations$pos2),
mcols=cll_mutations[,c("patient", "tag")])
# ADWGS_test
id = "ATM"
result = ADWGS_test(id, gr_element_coords, gr_site_coords, gr_maf,
win_size = 50000, this_genome = this_genome)
ActiveDriverWGS is a driver discovery tool for simple somatic mutations in cancer whole genomes
Description
ActiveDriverWGS is a driver discovery tool for simple somatic mutations in cancer whole genomes
Usage
ActiveDriverWGS(
mutations,
elements,
sites = NULL,
window_size = 50000,
filter_hyper_MB = 30,
recovery.dir = NULL,
mc.cores = 1,
ref_genome = "hg19",
detect_depleted_mutations = FALSE
)
Arguments
mutations |
A data frame containing the following columns: chr, pos1, pos2, ref, alt, patient.
|
elements |
A data frame containing the following columns: chr, start, end, id
|
sites |
A data frame containing the following columns: chr, start, end, id
|
window_size |
An integer indicating the size of the background window in base pairs that is used to establish the expected mutation rate and respective null model. The default is 50000bps |
filter_hyper_MB |
Hyper-mutated samples carry many passenger mutations and dilute the signal of true drivers.
Samples with a rate greater than |
recovery.dir |
The directory for storing recovery files. If the directory does not exist, ActiveDriverWGS will create the directory. If the parameter is unspecified, recovery files will not be saved. As an ActiveDriverWGS query for large datasets may be computationally heavy, specifying a recovery directory will recover previously computed results if a query is interrupted. |
mc.cores |
The number of cores which can be used if multiple cores are available. The default is 1. |
ref_genome |
The reference genome used on the analysis. The default option is "hg19", other options are "hg38", "mm9" and "mm10". |
detect_depleted_mutations |
if TRUE, detect elements with significantly fewer than expected mutations. FALSE by default |
Value
A data frame containing the results of driver discovery containing the following columns: id, pp_element, element_muts_obs, element_muts_exp, element_enriched, pp_site, site_muts_obs, site_muts_exp, site_enriched, fdr_element, fdr_site
- id
A string identifying the element of interest
- pp_element
The p-value of the element
- element_muts_obs
The number of patients with a mutation in the element
- element_muts_exp
The expected number of patients with a mutation in the element with respect to background
- element_enriched
A boolean indicating whether the element is enriched in mutations
- pp_site
The p-value of the site
- site_muts_obs
The number of patients with a mutation in the site
- site_muts_exp
The expected number of patients with a mutation in the site with respect to element
- site_enriched
A boolean indicating whether the site is enriched in mutations
- fdr_element
The FDR corrected p-value of the element
- fdr_site
The FDR corrected p-value of the site
- has_site_mutations
A V indicates the presence of site mutations
Examples
data(cancer_genes)
data(cll_mutations)
some_genes = c("ATM", "MYD88", "NOTCH1", "SF3B1", "XPO1",
"SOCS1", "CNOT3", "DDX3X", "KMT2A", "HIF1A", "APC")
result = ActiveDriverWGS(mutations = cll_mutations,
elements = cancer_genes[cancer_genes$id %in% some_genes,])
post-translational modification sites found in cancer genes
Description
post-translational modification sites found in cancer genes
Usage
data(cancer_gene_sites)
Format
A data frame containing the following columns: chr, start, end, id
- chr
autosomal chromosomes as chr1 to chr22 and sex chromosomes as chrX and chrY
- start
the start position of the site in base 0 coordinates (BED format)
- end
the end position of the site in base 0 coordinates (BED format)
- id
the site identifier - each site should contain only 1 segment and a unique id. If ids are duplicated, each segment of the site will be treated as an individual site. Sites can be coding or noncoding such as phosphosites of protein coding genes in genomic coordinates or transcription factor binding sites of active enhancers.
Source
References
Wadi, Lina, et al. "ActiveDriverDB: human disease mutations and genome variation in post-translational modification sites of proteins." Nucleic Acids Res. (2018): Jan 4;46(D1):D901-D910. (PubMed)
Examples
data(cancer_gene_sites)
data(cll_mutations)
data(cancer_genes)
ActiveDriverWGS(mutations = cll_mutations, elements = cancer_genes, sites = cancer_gene_sites)
cancer_genes
Description
protein coding genes from gencode v.19, cancer genes adapted from the Cancer Gene Census (November, 2018). Genes affected solely by amplifications, deletions and translations were removed.
Usage
data(cancer_genes)
Format
A data frame containing the following columns: chr, start, end, id
- chr
autosomal chromosomes as chr1 to chr22 and sex chromosomes as chrX and chrY
- start
the start position of the element in base 0 coordinates (BED format)
- end
the end position of the element in base 0 coordinates (BED format)
- id
the element identifier - if the element contains multiple segments such as exons, each segment should be a separate row with the segment coordinates and the element identifier as id. Elements can be coding or noncoding such as exons of protein coding genes or active enhancers.
Source
References
Harrow, Jennifer, et al. "GENCODE: the reference human genome annotation for The ENCODE Project." Genome research 22.9 (2012): 1760-1774. (PubMed)
Examples
data(cancer_genes)
data(cll_mutations)
ActiveDriverWGS(mutations = cll_mutations, elements = cancer_genes)
CLL mutations
Description
CLL whole genome simple somatic mutations from Alexandrov et, 2013
Usage
data(cll_mutations)
Format
A data frame containing the following columns: chr, pos1, pos2, ref, alt, patient.
- chr
autosomal chromosomes as chr1 to chr22 and sex chromosomes as chrX and chrY
- pos1
the start position of the mutation in base 1 coordinates
- pos2
the end position of the mutation in base 1 coordinates
- ref
the reference allele as a string containing the bases A, T, C or G
- alt
the alternate allele as a string containing the bases A, T, C or G
- patient
the patient identifier as a string
Source
References
Alexandrov, Ludmil B., et al. "Signatures of mutational processes in human cancer." Nature 500.7463 (2013): 415. (PubMed)
Examples
data(cll_mutations)
data(cancer_genes)
ActiveDriverWGS(mutations = cll_mutations, elements = cancer_genes)
This function filters hypermutated samples and returns the formatted mutations with the appropriate trinucleotide context
Description
This function filters hypermutated samples and returns the formatted mutations with the appropriate trinucleotide context
Usage
format_muts(mutations, this_genome, filter_hyper_MB = NA)
Arguments
mutations |
A data frame with the following columns: chr, pos1, pos2, ref, alt, patient
|
this_genome |
The reference genome object of BSgenome |
filter_hyper_MB |
The number of mutations per megabase for which a sample is considered hypermutated. Hypermutated samples will be removed in further analyses. |
Value
a data frame called mutations which has been formatted with an extra column for trinucleotide context
Examples
data(cll_mutations)
this_genome = BSgenome.Hsapiens.UCSC.hg19::Hsapiens
formatted_mutations = format_muts(cll_mutations[1:10,],
filter_hyper_MB = 30, this_genome = this_genome)
Prepares element coords from a BED12 file
Description
Prepares element coords from a BED12 file
Usage
prepare_elements_from_BED12(fname)
Arguments
fname |
The file name of a BED12 file containing the desired elements. For further documentation on the BED12 format, refer to the UCSC website. |
Value
A data frame containing the following columns to be used as the input element coords to ActiveDriverWGS
- chr
autosomal chromosomes as chr1 to chr22 and sex chromosomes as chrX and chrY
- start
the start position of the element in base 0 coordinates (BED format)
- end
the end position of the element in base 0 coordinates (BED format)
- id
the element identifier - if the element contains multiple segments such as exons, each segment should be a separate row with the segment coordinates and the element identifier as id. Elements can be coding or noncoding such as exons of protein coding genes or active enhancers.
Examples
elements = prepare_elements_from_BED12(system.file("extdata",
"chr17.coding_regions.bed",
package = "ActiveDriverWGS",
mustWork = TRUE))
Prepares element coords from a BED4 file
Description
Prepares element coords from a BED4 file
Usage
prepare_elements_from_BED4(fname)
Arguments
fname |
The file name of a BED4 file containing the desired elements. For further documentation on the BED4 format, refer to the UCSC website. |
Value
A data frame containing the following columns to be used as the input element coords to ActiveDriverWGS
- chr
autosomal chromosomes as chr1 to chr22 and sex chromosomes as chrX and chrY
- start
the start position of the element in base 0 coordinates (BED format)
- end
the end position of the element in base 0 coordinates (BED format)
- id
the element identifier - if the element contains multiple segments such as exons, each segment should be a separate row with the segment coordinates and the element identifier as id. Elements can be coding or noncoding such as exons of protein coding genes or active enhancers.
Examples
elements = prepare_elements_from_BED4(system.file("extdata",
"mini.ptm.bed",
package = "ActiveDriverWGS",
mustWork = TRUE))