| Type: | Package |
| Title: | Explain Gene Expression with Boolean Rules of Chromatin States |
| Version: | 0.1.0 |
| Description: | Infers Boolean rules among cis-regulatory regions using paired chromatin accessibility and gene expression data at bulk and single-cell levels. Links regulatory regions to target genes, providing insights into gene regulation mechanisms. |
| License: | GPL-3 |
| URL: | https://github.com/CompBioIPM/ocrRBBR |
| Depends: | R (≥ 4.4) |
| Imports: | doParallel, dplyr, foreach, GenomeInfoDb, GenomicRanges, glmnet, Matrix, parallel, rtracklayer, S4Vectors, stats, utils |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| LazyData: | true |
| LazyDataCompression: | xz |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-02-05 18:55:34 UTC; seyed_000 |
| Author: | Seyed Amir Malekpour [aut, cre] |
| Maintainer: | Seyed Amir Malekpour <a.malekpour@ipm.ir> |
| Repository: | CRAN |
| Date/Publication: | 2026-02-09 13:50:07 UTC |
Estimate Effective Sample Size from Single-Cell RNA-seq Data
Description
This function estimates the effective sample size (ESS) of single-cell RNA-seq data by accounting for correlation among cells within the same cell type.
Usage
ESS(rnaseq_data, cell_type, verbose = FALSE)
Arguments
rnaseq_data |
A numeric matrix of RNA-seq expression values.
Rows correspond to genes and columns correspond to cells.
Expression values are assumed to be normalized (e.g., |
cell_type |
A data frame containing cell-type information.
Row names must match the column names of |
verbose |
Logical. If TRUE, progress messages are printed. |
Value
A numeric value representing the effective sample size (ESS), adjusted for within-cell-type correlation.
Human ATAC-seq data
Description
Human ATAC-seq data
Usage
human_atacseq_data
Format
An object of class dgCMatrix with 43 rows and 9834 columns.
Human metadata
Description
Human metadata
Usage
human_meta_data
Format
An object of class matrix (inherits from array) with 9834 rows and 3 columns.
Human peaks GRanges
Description
Human peaks GRanges
Usage
human_peaks_gr
Format
An object of class GRanges of length 108377.
Human RNA-seq data
Description
Human RNA-seq data
Usage
human_rnaseq_data
Format
An object of class dgCMatrix with 2 rows and 9834 columns.
Identifies ATAC-seq peaks surrounding each gene
Description
This function identifies ATAC-seq peaks that are located within a specified window around the transcription start sites (TSS) of genes, and links those peaks to the respective genes.
Usage
link_peaks_to_tss(gtf_file, peaks_gr, gene_list = NA, tss_window = NA)
Arguments
gtf_file |
A character string specifying the path to a GTF file containing gene annotations. |
peaks_gr |
A |
gene_list |
A character vector of gene names to filter for. Only peaks within the window around TSS of these genes will be considered. Default is |
tss_window |
An integer specifying the window size around the TSS. Default is |
Value
A data frame containing the following columns:
peak |
The genomic coordinates of the peaks. |
peak_id |
The unique ID of each peak. |
gene_id |
The gene ID associated with the peak. |
gene_name |
The gene name associated with the peak. |
transcript_id |
The transcript ID associated with the peak. |
gene_type |
The type of the gene (e.g., protein-coding). |
distance |
The distance from the peak center to the TSS. |
Examples
# Load bulk mouse dataset
data(multiome_human_mouse) # This will load atacseq_data, rnaseq_data, peaks_gr
# Path to the GTF file in the package
gtf_file <- system.file("extdata", "gencode.vM25.annotation.sample.gtf", package = "ocrRBBR")
# Example usage for linking peaks to TSS
linked_peaks <- link_peaks_to_tss(
gtf_file = gtf_file,
peaks_gr = mouse_peaks_gr,
gene_list = c("Rag2"),
tss_window = 100000
)
# Filter results for a specific gene
linked_peaks_gene <- linked_peaks[linked_peaks$gene_name == "Rag2", ]
Mouse ATAC-seq data
Description
Mouse ATAC-seq data
Usage
mouse_atacseq_data
Format
An object of class data.frame with 23 rows and 85 columns.
Mouse peaks GRanges
Description
Mouse peaks GRanges
Usage
mouse_peaks_gr
Format
An object of class GRanges of length 512595.
Mouse RNA-seq data
Description
Mouse RNA-seq data
Usage
mouse_rnaseq_data
Format
An object of class data.frame with 2 rows and 85 columns.
Predicts OCR-driven Boolean rules for a gene
Description
This function predicts Boolean rule sets for a given gene using bulk-level multi-omics datasets, including RNA-seq gene expression and ATAC-seq peak signals in the gene's flanking regions, across samples.
Usage
ocrRBBR_bulk(
rnaseq_data,
atacseq_data,
gene_name,
peak_ids,
max_feature = NA,
slope = NA,
num_cores = NA,
verbose = FALSE
)
Arguments
rnaseq_data |
A numeric matrix of RNA-seq expression values. Rows correspond to genes, columns correspond to cell types or samples.
**Note:** |
atacseq_data |
A numeric matrix of ATAC-seq signal intensities. Rows correspond to peaks, columns correspond to cell types or samples. Column names must match those of |
gene_name |
A character string specifying the gene for which to infer Boolean rules. |
peak_ids |
A vector of peak identifiers corresponding to rows in |
max_feature |
An integer specifying the maximum number of input features allowed in a Boolean rule. Default is 3. |
slope |
The slope parameter for the sigmoid activation function. Default is 10. |
num_cores |
Number of parallel workers to use for computation. Adjust according to your system. Default is NA (automatic selection). |
verbose |
Logical. If TRUE, progress messages and a progress bar are shown. Default is FALSE. |
Value
A list containing predicted Boolean rules and associated metrics for the input gene.
Examples
# Load bulk mouse dataset
data(multiome_human_mouse) # loads atacseq_data, rnaseq_data, peaks_gr
# Example usage:
peak_ids <- c(278352, 278362, 278381, 278384)
boolean_rules <- ocrRBBR_bulk(mouse_rnaseq_data, mouse_atacseq_data, "Rag2",
peak_ids = peak_ids, max_feature = 3, slope = 10, num_cores = 1)
Predicts OCR-driven Boolean rules for a gene
Description
This function predicts Boolean rule sets for a given gene using single-cell multi-omics datasets, including RNA-seq gene expression and ATAC-seq peak signals in the gene's flanking regions, across samples.
Usage
ocrRBBR_single_cell(
rnaseq_data,
atacseq_data,
gene_name,
peak_ids,
max_feature = NA,
slope = NA,
num_cores = NA,
ESS = NA,
meta_data,
verbose = FALSE
)
Arguments
rnaseq_data |
A numeric matrix of RNA-seq expression values. Rows correspond to genes, columns
correspond to cells or samples. RNA-seq values are assumed to be normalized using Seurat's
|
atacseq_data |
A numeric matrix of ATAC-seq signal intensities. Rows correspond to peaks,
columns correspond to cells or samples. ATAC-seq counts are assumed to be normalized
by the
|
gene_name |
A character string specifying the gene for which to infer Boolean rules. |
peak_ids |
A vector of peak identifiers corresponding to rows in |
max_feature |
An integer specifying the maximum number of input features allowed in a Boolean rule. Default is 3. |
slope |
The slope parameter for the sigmoid activation function. Default is 10. |
num_cores |
Number of parallel workers to use for computation. Adjust according to your system. Default is NA. |
ESS |
Effective sample size of the single-cell data after accounting for noise and cell-to-cell correlation. |
meta_data |
A numeric matrix or data.frame containing additional per-cell covariates (rows = cells, columns = covariates), such as: **nCount_RNA**: The total number of RNA molecules (unique molecular identifiers, UMIs) detected per cell, calculated as the sum of UMI counts across all genes. **nFeature_RNA**: The number of genes detected per cell (genes with at least one UMI). **Mitochondrial percentage**: The percentage of reads that map to mitochondrial genes, which can be used to assess the quality of the sample. This information is typically stored as columns in the |
verbose |
Logical. If TRUE, progress messages and a progress bar are shown. Default is FALSE. |
Value
A list containing predicted Boolean rules and associated metrics for the input gene.
Examples
# Load single-cell human dataset
data(multiome_human_mouse) # loads atacseq_data, rnaseq_data, peaks_gr, meta_data
# Example usage:
peak_ids <- c(83456, 83458, 83460)
boolean_rules <- ocrRBBR_single_cell(human_rnaseq_data, human_atacseq_data,
"CD74", peak_ids = peak_ids, max_feature = 3, slope = 6,
num_cores = 1, ESS = 261, meta_data = human_meta_data)