The {targets} R package user manual

R
Author

Shitao5

Published

2024-02-21

Modified

2024-02-23

Progress

Learning Progress: 26.32%.

1 Introduction

  • Pipeline tools coordinate the pieces of computationally demanding analysis projects. The targets package is a Make-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise.

    流水线工具协调计算密集型分析项目的各个部分。“targets”包是一个类似于 Make 的用于 R 中统计和数据科学的流水线工具。该包跳过那些已经是最新的任务的昂贵运行时,通过隐式并行计算协调必要的计算,并将文件抽象为 R 对象。如果所有当前输出与当前上游代码和数据匹配,那么整个流水线是最新的,结果比其他情况更可信。

  • targets implicitly nudges users toward a clean, function-oriented programming style that fits the intent of the R language and helps practitioners maintain their data analysis projects.

    “targets”隐式地引导用户朝着一种清晰的、面向函数的编程风格前进,符合 R 语言的意图,有助于从业者维护其数据分析项目。

2 Debugging pipelines

  • In targets, several layers of encapsulation and automation separate you from the code you want to debug:

    • The pipeline runs in an external non-interactive callr::r() process where you cannot use the R console.

    • Data management

    • Environment management

    • High-performance computing

    • Built-in error handling

    Although these layers are essential for reproducibility and scale, you will need to cut through them in order to diagnose and solve issues in pipelines.

    在 targets 中,几个封装和自动化的层将您与要调试的代码分开:

    • 管道在外部非交互式 callr::r() 进程中运行,您无法使用 R 控制台。

    • 数据管理

    • 环境管理

    • 高性能计算

    • 内置错误处理

    虽然这些层对于可重现性和扩展性至关重要,但您需要穿过它们才能诊断和解决管道中的问题。本章将解释如何做到这一点。

  • Even if you hit an error, you can still finish the successful parts of the pipeline. The error argument of tar_option_set() and tar_target() tells each target what to do if it hits an error. For example, tar_option_set(error = "null") tells errored targets to return NULL. The output as a whole will not be correct or up to date, but the pipeline will finish so you can look at preliminary results. This is especially helpful with dynamic branching.

3 Functions

Back to top