Type: Package
Title: Base R Code Formatter
Version: 0.1.0
Description: A minimal R code formatter following base R style conventions. Formats R code with consistent spacing, indentation, and structure.
License: GPL-3
URL: https://github.com/cornball-ai/rformat
BugReports: https://github.com/cornball-ai/rformat/issues
Encoding: UTF-8
Depends: R (≥ 4.1.0)
Imports: Rcpp
LinkingTo: Rcpp
Suggests: tinytest, simplermarkdown
VignetteBuilder: simplermarkdown
NeedsCompilation: yes
Packaged: 2026-03-05 02:46:29 UTC; troy
Author: Troy Hernandez ORCID iD [aut, cre], Dirk Eddelbuettel ORCID iD [ctb]
Maintainer: Troy Hernandez <troy@cornball.ai>
Repository: CRAN
Date/Publication: 2026-03-09 16:30:09 UTC

Add Control Braces (AST Version)

Description

Finds bare control flow bodies (if/for/while/repeat without braces) and transforms them according to the specified mode. Modes: - 'TRUE' / '"single"': Add braces, keep on one line if short enough. - '"multi"': Add braces, force multi-line. - '"next_line"': Move same-line body to next line (no braces). - '"same_line"': Move next-line body to same line; strip single-stmt braces.

Usage

add_control_braces(terms, mode = "single", indent_str = "    ", line_limit = 80L)

Arguments

terms

Enriched terminal DataFrame.

mode

Control brace mode.

indent_str

Indent string (for line width calculations).

line_limit

Maximum line width.

Value

Updated DataFrame.


Compute Display Width of an Output Line

Description

Sums token text widths plus inter-token spaces for a given output line.

Usage

ast_line_width(terms, line_num, indent_str)

Arguments

terms

Enriched terminal DataFrame (sorted by out_line, out_order).

line_num

The output line number.

indent_str

Indent string (e.g., '" "' for 4 spaces).

Value

Display width of the line.


Build Line Index

Description

Creates a named list mapping output line numbers to row indices in the terms DataFrame. Avoids repeated 'which(terms$out_line == ln)' scans.

Usage

build_line_index(terms)

Arguments

terms

Enriched terminal DataFrame.

Value

Named list where names are line numbers (as strings) and values are integer vectors of row indices.


Get Tab-Expanded Line Length

Description

Returns the display width of a line, with tabs expanded to 8-column stops.

Usage

code_width(line)

Arguments

line

A single line of text.

Value

Display width of the line.


Convert Tab-Expanded Column to Character Position

Description

R's getParseData() reports columns with tabs expanded to 8-column tab stops. This function converts such a column back to a character position for use with substring().

Usage

col_to_charpos(line, col)

Arguments

line

A single line of text.

col

Tab-expanded column position (1-based).

Value

Character position (1-based) in the string.


Collapse Multi-Line Calls (AST Version)

Description

Finds multi-line parenthesized groups (function calls, control flow conditions) that would fit on one line and collapses them by setting all tokens' 'out_line' to the opening line.

Usage

collapse_calls(terms, indent_str, line_limit = 80L)

Arguments

terms

Enriched terminal DataFrame.

indent_str

Indent string.

line_limit

Maximum line length.

Value

Updated DataFrame.


Compute Indent at a Column Position

Description

Walks tokens on a line up to a given column, tracking braces and parens exactly as compute_nesting does. Returns the indent level that a hypothetical continuation line would receive.

Usage

compute_indent_at_col(nesting, line_toks, line_num, break_col)

Arguments

nesting

Result from compute_nesting().

line_toks

Tokens on the line.

line_num

Line number.

break_col

Column position to stop at (inclusive).

Value

Integer indent level.


Compute Nesting Depth Per Line

Description

Shared function used by 'format_tokens' and wrap passes to compute identical depth-based indent levels from the parse tree.

Usage

compute_nesting(terminals, n_lines)

Arguments

terminals

Terminal token data frame from 'getParseData()', ordered by 'line1, col1'.

n_lines

Number of source lines.

Value

Named list with 'line_indent', 'line_end_brace', 'line_end_paren', 'line_end_pab' (all integer vectors of length 'n_lines').


Enrich Terminal Tokens for AST-Based Formatting

Description

Parses code and returns an enriched terminal-token DataFrame with per-token nesting state and output metadata. This is the foundation of the parse-once architecture: parse once, enrich once, transform the DataFrame through all passes, serialize to text once at the end.

Usage

enrich_terminals(pd, orig_lines)

Arguments

pd

Parse data from 'getParseData()'.

orig_lines

Original source lines (split by newline).

Value

Enriched terminal-token DataFrame with added columns: 'out_line', 'out_order', 'out_text', 'brace_depth', 'paren_depth', 'pab', 'nesting_level'.


Expand Bare If-Else in Function Call Arguments (AST Version)

Description

Finds bare 'if (cond) expr else expr' arguments inside function calls on overlong lines and expands them to braced multi-line form.

Usage

expand_call_if_args(terms, indent_str = "    ", line_limit = 80L)

Arguments

terms

Enriched terminal DataFrame.

indent_str

Indent string.

line_limit

Maximum line width.

Value

Updated DataFrame.


Extract Expression Text from Source Lines

Description

Extract original text for a multi-line expression and re-indent it.

Usage

extract_expr_text(lines, tokens, target_indent)

Arguments

lines

Source code lines.

tokens

Token data frame for the expression.

target_indent

Target indentation string for continuation lines.

Value

Expression text with first line unindented, continuation lines re-indented.


Check if a Body Token Range is a Complete Statement

Description

Returns FALSE if the body has unclosed parens/brackets or ends with an operator that expects a continuation (assignment, binary ops).

Usage

find_bare_body_end(terms, body_start)

Arguments

terms

Enriched terminal DataFrame.

body_start

Integer row index where the bare body begins.

Value

Integer row index of the last token in the bare body.


Find Token Position in Formatted Line Output

Description

Computes the 1-based character position where the token at index 'idx' starts in the output of 'format_line_tokens(tokens)'. This replays the spacing logic to determine the exact output column.

Usage

find_token_pos_in_formatted(tokens, idx)

Arguments

tokens

Data frame of tokens for one line (ordered by col1).

idx

Index into 'tokens' of the target token.

Value

1-based character position of that token in the formatted output.


Fix Else Placement

Description

Ensures 'else' appears on the same line as the closing brace.

Usage

fix_else_placement(code)

Arguments

code

Code string.

Value

Code with corrected else placement.


Format Blank Lines

Description

Normalize blank lines between code blocks.

Usage

format_blank_lines(code)

Arguments

code

Code string.

Value

Code with normalized blank lines.


Format Tokens on a Single Line

Description

Format Tokens on a Single Line

Usage

format_line_tokens(tokens, prev_token = NULL, prev_prev_token = NULL)

Arguments

tokens

Data frame of tokens for one line.

prev_token

Optional token to treat as the previous token when formatting a token subset (e.g., suffix after a collapsed call).

prev_prev_token

Optional token before prev_token for unary detection.

Value

Formatted line content (no leading whitespace).


AST-Based Format Pipeline

Description

Single-pass pipeline: parse once, enrich the terminal DataFrame, run all transforms as DataFrame operations, serialize to text once.

Usage

format_pipeline(code, indent, wrap, expand_if, brace_style, line_limit,
                function_space = FALSE, control_braces = FALSE, join_else = TRUE)

Arguments

code

Code string for one top-level expression.

indent

Indent string or integer.

wrap

Continuation style: '"paren"' or '"fixed"'.

expand_if

Whether to expand all inline if-else.

brace_style

'"kr"' or '"allman"'.

line_limit

Maximum line length.

function_space

Add space after 'function'.

control_braces

Control brace mode.

join_else

If TRUE, move else to same line as preceding '}'.

Value

Formatted code string.


Format R Code Using Token-Based Parsing

Description

Internal function to format R code using getParseData tokens. Calculates proper indentation based on nesting depth.

Usage

format_tokens(code, indent = 4L, wrap = "paren", expand_if = FALSE,
              brace_style = "kr", line_limit = 80L, function_space = FALSE,
              control_braces = FALSE, join_else = TRUE)

Arguments

code

Character string of R code.

indent

Integer for spaces (default 4), or character string for literal indent (e.g., '"\t\t"' for vintage R Core style).

wrap

Continuation style: '"paren"' (default) aligns to opening parenthesis, '"fixed"' uses 8-space indent.

expand_if

Expand inline if-else to multi-line (default FALSE).

brace_style

Brace placement: '"kr"' (same line) or '"allman"' (new line).

line_limit

Maximum line length before wrapping (default 80).

function_space

If TRUE, add space before '(' in function definitions.

control_braces

If TRUE, add braces to bare one-line control flow bodies.

join_else

If TRUE, move else to same line as preceding '}'.

Value

Formatted code as character string.


Insert Synthetic Tokens into the DataFrame

Description

Adds new token rows (e.g., for brace insertion). New tokens get unique IDs starting from 'max(existing_id) + 1'.

Usage

insert_tokens(terms, new_rows)

Arguments

terms

Enriched terminal DataFrame.

new_rows

Data frame of new tokens to insert. Must have at minimum: 'token', 'out_text', 'out_line', 'out_order'. Other columns will be filled with defaults.

Value

Updated DataFrame with new rows appended.


Join Else to Preceding Close Brace

Description

AST transform that moves ELSE tokens (and any following tokens on the same line, like 'if' in 'else if') to the same output line as the preceding '}'. Skips if a COMMENT exists between '}' and 'else', or if joining would exceed the line limit.

Usage

join_else_transform(terms, indent_str, line_limit)

Arguments

terms

Enriched terminal DataFrame.

indent_str

Indent string for line width calculation.

line_limit

Maximum line width.

Value

Updated DataFrame.


Look Up Row Indices for a Line

Description

Look Up Row Indices for a Line

Usage

line_index_get(lidx, line_num)

Arguments

lidx

Line index from 'build_line_index()'.

line_num

Output line number.

Value

Integer vector of row indices, or 'integer(0)' if none.


Compute Display Width Using Line Index

Description

Like 'ast_line_width()' but uses a pre-built line index for O(1) lookup.

Usage

line_index_width(terms, lidx, line_num, indent_str)

Arguments

terms

Enriched terminal DataFrame.

lidx

Line index from 'build_line_index()'.

line_num

The output line number.

indent_str

Indent string.

Value

Display width of the line.


Create a Synthetic Token Row

Description

Helper to build a single token row for insertion.

Usage

make_token(token, text, out_line, out_order, parent = 0L)

Arguments

token

Token type string (e.g., ‘"’{'"‘, '"’}'"‘, '"’,'"')

text

Token text (e.g., '"{"', '"}"', '","')

out_line

Target output line.

out_order

Sort order within the line.

parent

Parent node ID (default 0).

Value

Single-row data frame.


Determine If Space Needed Between Tokens

Description

Determine If Space Needed Between Tokens

Usage

needs_space(prev, tok, prev_prev = NULL)

Arguments

prev

Previous token (data frame row).

tok

Current token (data frame row).

prev_prev

Token before prev (data frame row or NULL), for unary detection.

Value

Logical.


Recompute Nesting State After Structural Changes

Description

Re-walks terminals and refreshes 'brace_depth', 'paren_depth', 'pab', and 'nesting_level' columns. Call after brace insertion, token removal, or any structural transform.

Usage

recompute_nesting(terms)

Arguments

terms

Enriched terminal DataFrame.

Value

Updated DataFrame with refreshed nesting columns.


Reformat Function Definitions (AST Version)

Description

Rewrites named function signatures to fit within the line limit. Short signatures go on one line; long ones wrap at commas with paren-aligned or fixed continuation indent. Operates on the DataFrame directly, avoiding the serialize/re-parse cycle that caused idempotency oscillation.

Usage

reformat_function_defs(terms, indent_str = "    ", wrap = "paren",
                       brace_style = "kr", line_limit = 80L,
                       function_space = FALSE)

Arguments

terms

Enriched terminal DataFrame.

indent_str

Indent string (e.g., '" "').

wrap

Continuation style: '"paren"' or '"fixed"'.

brace_style

'"kr"' or '"allman"'.

line_limit

Maximum line length.

function_space

Whether to add space after 'function'.

Value

Updated DataFrame.


Reformat Inline If-Else Assignments (AST Version)

Description

Finds 'var <- if (cond) true_expr else false_expr' patterns and expands them to braced multi-line form with duplicated assignment: if (cond) { var <- true_expr } else { var <- false_expr }

Usage

reformat_inline_if(terms, indent_str = "    ", line_limit = 0L)

Arguments

terms

Enriched terminal DataFrame.

indent_str

Indent string.

line_limit

Maximum line width. Use 0 to expand all.

Value

Updated DataFrame.


Renumber Output Lines Sequentially

Description

After transforms that insert or remove lines, renumber 'out_line' so values are sequential starting from 1, preserving relative order and gaps for blank lines.

Usage

renumber_lines(terms)

Arguments

terms

Enriched terminal DataFrame.

Value

Updated DataFrame with renumbered 'out_line'.


Restore Truncated String Constant Token Text

Description

'utils::getParseData()' truncates long 'STR_CONST' token text. Reconstruct the original literal from source lines so token-based rewrite passes can round-trip long strings without introducing parse-invalid placeholders.

Usage

restore_truncated_str_const_tokens(terminals, orig_lines)

Arguments

terminals

Terminal token data frame from 'getParseData()'.

orig_lines

Original source lines.

Value

'terminals' with long 'STR_CONST' text restored.


Format R Code

Description

Format R code string according to base R style conventions.

Usage

rformat(code, indent = 4L, line_limit = 80L, wrap = "paren",
        brace_style = "kr", control_braces = FALSE, expand_if = FALSE,
        else_same_line = TRUE, function_space = FALSE, join_else = TRUE)

Arguments

code

Character string of R code to format.

indent

Indentation per level: integer for spaces (default 4), or character string for literal indent (e.g., '"\t\t"' for vintage R Core style).

line_limit

Maximum line length before wrapping (default 80).

wrap

Continuation style for long function signatures: '"paren"' (default) aligns to opening parenthesis, '"fixed"' uses 8-space indent.

brace_style

Brace placement for function definitions: '"kr"' (default) puts opening brace on same line as ') {', '"allman"' puts it on a new line.

control_braces

If TRUE, add braces to bare one-line control flow bodies (e.g., 'if (x) y' becomes 'if (x) { y }'). Default FALSE matches R Core source code where 59% of control flow bodies are bare.

expand_if

Expand inline if-else to multi-line (default FALSE).

else_same_line

If TRUE (default), repair top-level '}\nelse' (which is a parse error in R) by joining to '} else' before formatting. When FALSE, unparseable input is returned unchanged with a warning.

function_space

If TRUE, add space before '(' in function definitions: 'function (x)' instead of 'function(x)'. Default FALSE matches 96% of R Core source code.

join_else

If TRUE (default), move 'else' to the same line as the preceding '}': '} else {'. Matches R Core source code where 70% use same-line else. When FALSE, '}\nelse' on separate lines is preserved.

Value

Formatted code as a character string.

Examples

# Basic formatting: spacing around operators
rformat("x<-1+2")

# Add braces to bare control-flow bodies
rformat("if(x>0) y<-1", control_braces = TRUE)

# Expand inline if-else to multi-line
rformat("x <- if (a) b else c", expand_if = TRUE)

# Wrap long function signatures (default: paren-aligned)
long_sig <- paste0(
    "f <- function(alpha, beta, gamma, delta, ",
    "epsilon, zeta, eta) {\n    1\n}")
cat(rformat(long_sig), sep = "\n")

# Wrap with fixed 8-space continuation indent
cat(rformat(long_sig, wrap = "fixed"), sep = "\n")

# Allman brace style
rformat("f <- function(x) { x }", brace_style = "allman")

Format R Files in Directory

Description

Format all R files in a directory.

Usage

rformat_dir(path = ".", recursive = TRUE, dry_run = FALSE, indent = 4L,
            line_limit = 80L, wrap = "paren", brace_style = "kr",
            control_braces = FALSE, expand_if = FALSE, else_same_line = TRUE,
            function_space = FALSE, join_else = TRUE)

Arguments

path

Path to directory.

recursive

If TRUE, process subdirectories.

dry_run

If TRUE, report changes without writing.

indent

Indentation per level: integer for spaces (default 4), or character string for literal indent (e.g., '"\t\t"' for vintage R Core style).

line_limit

Maximum line length before wrapping (default 80).

wrap

Continuation style for long function signatures: '"paren"' (default) aligns to opening parenthesis, '"fixed"' uses 8-space indent.

brace_style

Brace placement for function definitions: '"kr"' (default) puts opening brace on same line as ') {', '"allman"' puts it on a new line.

control_braces

If TRUE, add braces to bare one-line control flow bodies. Default FALSE matches R Core majority style.

expand_if

Expand inline if-else to multi-line (default FALSE).

else_same_line

If TRUE (default), repair top-level '}\nelse' (which is a parse error in R) by joining to '} else' before formatting.

function_space

If TRUE, add space before '(' in function definitions: 'function (x)' instead of 'function(x)'. Default FALSE matches 96% of R Core source code.

join_else

If TRUE (default), move 'else' to the same line as the preceding '}'.

Value

Invisibly returns vector of modified file paths.

Examples

# Format all R files in a directory (dry run)
d <- tempfile()
dir.create(d)
writeLines("x<-1", file.path(d, "test.R"))
rformat_dir(d, dry_run = TRUE)

# Format and overwrite
rformat_dir(d)
unlink(d, recursive = TRUE)

Format R File

Description

Format an R file in place or write to a new file.

Usage

rformat_file(path, output = NULL, dry_run = FALSE, indent = 4L,
             line_limit = 80L, wrap = "paren", brace_style = "kr",
             control_braces = FALSE, expand_if = FALSE, else_same_line = TRUE,
             function_space = FALSE, join_else = TRUE)

Arguments

path

Path to R file.

output

Optional output path. If NULL, overwrites input file.

dry_run

If TRUE, return formatted code without writing.

indent

Indentation per level: integer for spaces (default 4), or character string for literal indent (e.g., '"\t\t"' for vintage R Core style).

line_limit

Maximum line length before wrapping (default 80).

wrap

Continuation style for long function signatures: '"paren"' (default) aligns to opening parenthesis, '"fixed"' uses 8-space indent.

brace_style

Brace placement for function definitions: '"kr"' (default) puts opening brace on same line as ') {', '"allman"' puts it on a new line.

control_braces

If TRUE, add braces to bare one-line control flow bodies. Default FALSE matches R Core majority style.

expand_if

Expand inline if-else to multi-line (default FALSE).

else_same_line

If TRUE (default), repair top-level '}\nelse' (which is a parse error in R) by joining to '} else' before formatting.

function_space

If TRUE, add space before '(' in function definitions: 'function (x)' instead of 'function(x)'. Default FALSE matches 96% of R Core source code.

join_else

If TRUE (default), move 'else' to the same line as the preceding '}'.

Value

Invisibly returns formatted code.

Examples

# Format a file (dry run to see result without writing)
f <- tempfile(fileext = ".R")
writeLines("x<-1+2", f)
rformat_file(f, dry_run = TRUE)

# Format and overwrite
rformat_file(f)
readLines(f)
unlink(f)

Serialize Enriched Tokens to Formatted Code

Description

Converts the enriched terminal DataFrame to a formatted code string. This is the final step: tokens are emitted in '(out_line, out_order)' order with proper indentation and spacing.

Usage

serialize_tokens(terms, indent_str, wrap = "paren", line_limit = 80L)

Arguments

terms

Enriched terminal DataFrame.

indent_str

Indent string (e.g., '" "' for 4 spaces).

wrap

Continuation style: '"paren"' or '"fixed"'.

line_limit

Maximum line length.

Value

Formatted code string.


Split Code into Top-Level Expressions

Description

Parses code to find top-level expressions, returning a list of chunks. Each chunk is either an expression (code string) or an inter-expression gap (comments, blank lines). Chunks concatenate back to the original.

Usage

split_toplevel(code)

Arguments

code

Character string of R code.

Value

List of 'list(text = "...", is_expr = TRUE/FALSE)' pairs.


Compute Indent Level for a Token

Description

Returns the depth-based indent level that should apply to a token's line. For closing tokens (‘}', ')', ']'), the indent is one less than the token’s own nesting level (they outdent to match their opening counterpart).

Usage

token_indent_level(terms, idx)

Arguments

terms

Enriched terminal DataFrame.

idx

Index of the token (must be first on its line for indent).

Value

Integer indent level.


Wrap Long Function Calls at Commas (AST Version)

Description

Finds single-line function calls on overlong lines and wraps them at commas. Continuation lines get depth-based indentation (or paren-aligned if 'wrap = "paren"').

Usage

wrap_long_calls(terms, indent_str, wrap = "paren", line_limit = 80L)

Arguments

terms

Enriched terminal DataFrame.

indent_str

Indent string.

wrap

Continuation style: '"paren"' or '"fixed"'.

line_limit

Maximum line length.

Value

Updated DataFrame.


Wrap Long Lines at Operators (AST Version)

Description

Finds overlong lines and breaks them after logical operators ('||', '&&', '|', '&'). Continuation lines get depth-based indentation.

Usage

wrap_long_operators(terms, indent_str, line_limit = 80L)

Arguments

terms

Enriched terminal DataFrame.

indent_str

Indent string.

line_limit

Maximum line length.

Value

Updated DataFrame.