---
title: "dmtools_intro"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{dmtools_intro}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Installation
```{r install, eval = FALSE}
library(dmtools)
```
## Overview
For checking the dataset from EDC in clinical trials.
Notice, your dataset should have a postfix( \_V1 ) or a prefix( V1\_ ) in the names of variables. Column names should be unique.
* laboratory - Does the investigator correctly estimate the laboratory analyzes?
* dates - Do all dates correspond to the protocol's timeline?
* rename the dataset
## Usage
### laboratory
For laboratory check, you need to create the excel table like in the example.
* AGELOW - number, >= number
* AGEHIGH - if none, type Inf, <= number
* SEX - for both sex, use `|`
* LBTEST - What was the lab test name? (can be any convenient name for you)
* LBORRES* - What was the result of the lab test?
* LBNRIND* - How [did/do] the reported values compare within the [reference/normal/expected] range?
* LBORNRLO - What was the lower limit of the reference range for this lab test, >=
* LBORNRHI - What was the high limit of the reference range for this lab test, <=
*column names without prefix or postfix
```{r refer, echo = FALSE, result = 'asis', warning = FALSE, message = FALSE}
library(knitr)
library(dmtools)
library(dplyr)
refs <- system.file("labs_refer.xlsx", package = "dmtools")
refers <- readxl::read_xlsx(refs)
kable(refers, caption = "lab reference ranges")
```
```{r dataset, echo = FALSE, result = 'asis'}
ID <- c("01", "02", "03")
AGE <- c("19", "20", "22")
SEX <- c("f", "m", "m")
V1_GLUC <- c("5.5", "4.1", "9.7")
V1_GLUC_IND <- c("norm", NA, "norm")
V2_AST <- c("30", "48", "31")
V2_AST_IND <- c("norm", "norm", "norm")
df <- data.frame(
ID, AGE, SEX,
V1_GLUC, V1_GLUC_IND,
V2_AST, V2_AST_IND,
stringsAsFactors = F
)
kable(df, caption = "dataset")
```
```{r lab}
# "norm" and "no" it is an example, necessary variable for the estimate, get from the dataset
# parameter is_post has value FALSE because a dataset has a prefix( V1_ ) in the names of variables
refs <- system.file("labs_refer.xlsx", package = "dmtools")
obj_lab <- lab(refs, ID, AGE, SEX, "norm", "no", is_post = FALSE)
obj_lab <- obj_lab %>% check(df)
# ok - analysis, which has a correct estimate of the result
obj_lab %>% choose_test("ok")
# mis - analysis, which has an incorrect estimate of the result
obj_lab %>% choose_test("mis")
# skip - analysis, which has an empty value of the estimate
obj_lab %>% choose_test("skip")
# all analyzes
obj_lab %>% get_result()
```
### dates
For dates check, you need to create the excel table like in the example.
* MINUS, PLUS, VISITDY - parameter of a timeline
* VISITNUM - clinical encounter number, parameter for function e.g. `contains(num_visit)`
* VISIT - protocol-defined description of a clinical encounter (can be any convenient name)
* STARTDAT - column name of start date, with postfix or prefix
* STARTVISIT - can be any convenient name of start date for you
* IS_EQUAL - Boolean data type(T/F) to check date equality within a visit
* EQUALDAT - column name for check date's equality, with postfix or prefix
```{r timelines, echo = FALSE, result = 'asis', warning = F, message = FALSE}
dates <- system.file("dates.xlsx", package = "dmtools")
timeline <- readxl::read_xlsx(dates)
kable(timeline, caption = "timeline")
```
```{r dataset_dates, echo = FALSE, result = 'asis'}
id <- c("01", "02", "03")
screen_date_E1 <- c("1991-03-13", "1991-03-07", "1991-03-08")
rand_date_E2 <- c("1991-03-15", "1991-03-11", "1991-03-10")
ph_date_E3 <- c("1991-03-21", "1991-03-16", "1991-03-16")
bio_date_E3 <- c("1991-03-23", "1991-03-16", "1991-03-16")
df <- data.frame(
id, screen_date_E1, rand_date_E2, ph_date_E3, bio_date_E3,
stringsAsFactors = F
)
kable(df, caption = "dataset")
```
```{r date}
# use parameter str_date for search columns with dates, default:"DAT"
dates <- system.file("dates.xlsx", package = "dmtools")
obj_date <- date(dates, id, dplyr::contains, dplyr::matches)
obj_date <- obj_date %>% check(df)
# out - dates, which are out of the protocol's timeline
obj_date %>% choose_test("out")
# uneq - dates, which are unequal
obj_date %>% choose_test("uneq")
# ok - correct dates
obj_date %>% choose_test("ok")
# all dates
obj_date %>% get_result()
```
`dplyr::contains` - A function, which select necessary visit or event e.g. dplyr::start_with, dplyr::contains. It works like `df %>% select(contains("E1"))`. You also can use `dplyr::start_with`, works like `df %>% select(start_with("V1"))`
`dplyr::matches` - A function, which select dates from necessary visit e.g. dplyr::matches, dplyr::contains. It works like `visit_one %>% select(contains("DAT"))`, default: `dplyr::contains()`
### rename
Function to rename the dataset, using crfs.
```{r rename, eval = FALSE}
rename_dataset("./crfs", "old_name", "new_name", 2)
```
* "./crfs" - path to crfs
* "old_name" - variable for names in the dataset, without postfix or prefix
* "new_name" - variable for necessary names, names should be unique
* 2 - a position of a sheet in the excel document, where dmtools can find "old_name" and "new_name"