textpress: A Lightweight and Versatile NLP Toolkit
A toolkit for web scraping, modular NLP pipelines, and text
preparation for large language models. Organized around four core
actions: fetching, reading, processing, and searching. Covers the full
pipeline from raw web data acquisition to structural text processing and
BM25 indexing. Supports multiple retrieval strategies including regex,
dictionary matching, and ranked keyword search. Pipe-friendly with no
heavy dependencies; all outputs are plain data frames or data.tables.
| Version: |
1.1.1 |
| Depends: |
R (≥ 3.5) |
| Imports: |
data.table, httr, Matrix, rvest, stringi, stringr, xml2, pbapply, jsonlite, lubridate |
| Suggests: |
SnowballC (≥ 0.7.0), DT, dplyr |
| Published: |
2026-03-17 |
| DOI: |
10.32614/CRAN.package.textpress |
| Author: |
Jason Timm [aut, cre] |
| Maintainer: |
Jason Timm <JaTimm at salud.unm.edu> |
| BugReports: |
https://github.com/jaytimm/textpress/issues |
| License: |
MIT + file LICENSE |
| URL: |
https://github.com/jaytimm/textpress,
https://jaytimm.github.io/textpress/ |
| NeedsCompilation: |
no |
| Materials: |
README, NEWS |
| CRAN checks: |
textpress results |
Documentation:
Downloads:
Linking:
Please use the canonical form
https://CRAN.R-project.org/package=textpress
to link to this page.