--- title: "dmtools_intro" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{dmtools_intro} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Installation ```{r install, eval = FALSE} library(dmtools) ``` ## Overview For checking the dataset from EDC in clinical trials. Notice, your dataset should have a postfix( \_V1 ) or a prefix( V1\_ ) in the names of variables. Column names should be unique. * laboratory - Does the investigator correctly estimate the laboratory analyzes? * dates - Do all dates correspond to the protocol's timeline? * rename the dataset ## Usage ### laboratory For laboratory check, you need to create the excel table like in the example. * AGELOW - number, >= number * AGEHIGH - if none, type Inf, <= number * SEX - for both sex, use `|` * LBTEST - What was the lab test name? (can be any convenient name for you) * LBORRES* - What was the result of the lab test? * LBNRIND* - How [did/do] the reported values compare within the [reference/normal/expected] range? * LBORNRLO - What was the lower limit of the reference range for this lab test, >= * LBORNRHI - What was the high limit of the reference range for this lab test, <= *column names without prefix or postfix ```{r refer, echo = FALSE, result = 'asis', warning = FALSE, message = FALSE} library(knitr) library(dmtools) library(dplyr) refs <- system.file("labs_refer.xlsx", package = "dmtools") refers <- readxl::read_xlsx(refs) kable(refers, caption = "lab reference ranges") ``` ```{r dataset, echo = FALSE, result = 'asis'} ID <- c("01", "02", "03") AGE <- c("19", "20", "22") SEX <- c("f", "m", "m") V1_GLUC <- c("5.5", "4.1", "9.7") V1_GLUC_IND <- c("norm", NA, "norm") V2_AST <- c("30", "48", "31") V2_AST_IND <- c("norm", "norm", "norm") df <- data.frame( ID, AGE, SEX, V1_GLUC, V1_GLUC_IND, V2_AST, V2_AST_IND, stringsAsFactors = F ) kable(df, caption = "dataset") ``` ```{r lab} # "norm" and "no" it is an example, necessary variable for the estimate, get from the dataset # parameter is_post has value FALSE because a dataset has a prefix( V1_ ) in the names of variables refs <- system.file("labs_refer.xlsx", package = "dmtools") obj_lab <- lab(refs, ID, AGE, SEX, "norm", "no", is_post = FALSE) obj_lab <- obj_lab %>% check(df) # ok - analysis, which has a correct estimate of the result obj_lab %>% choose_test("ok") # mis - analysis, which has an incorrect estimate of the result obj_lab %>% choose_test("mis") # skip - analysis, which has an empty value of the estimate obj_lab %>% choose_test("skip") # all analyzes obj_lab %>% get_result() ``` ### dates For dates check, you need to create the excel table like in the example. * MINUS, PLUS, VISITDY - parameter of a timeline * VISITNUM - clinical encounter number, parameter for function e.g. `contains(num_visit)` * VISIT - protocol-defined description of a clinical encounter (can be any convenient name) * STARTDAT - column name of start date, with postfix or prefix * STARTVISIT - can be any convenient name of start date for you * IS_EQUAL - Boolean data type(T/F) to check date equality within a visit * EQUALDAT - column name for check date's equality, with postfix or prefix ```{r timelines, echo = FALSE, result = 'asis', warning = F, message = FALSE} dates <- system.file("dates.xlsx", package = "dmtools") timeline <- readxl::read_xlsx(dates) kable(timeline, caption = "timeline") ``` ```{r dataset_dates, echo = FALSE, result = 'asis'} id <- c("01", "02", "03") screen_date_E1 <- c("1991-03-13", "1991-03-07", "1991-03-08") rand_date_E2 <- c("1991-03-15", "1991-03-11", "1991-03-10") ph_date_E3 <- c("1991-03-21", "1991-03-16", "1991-03-16") bio_date_E3 <- c("1991-03-23", "1991-03-16", "1991-03-16") df <- data.frame( id, screen_date_E1, rand_date_E2, ph_date_E3, bio_date_E3, stringsAsFactors = F ) kable(df, caption = "dataset") ``` ```{r date} # use parameter str_date for search columns with dates, default:"DAT" dates <- system.file("dates.xlsx", package = "dmtools") obj_date <- date(dates, id, dplyr::contains, dplyr::matches) obj_date <- obj_date %>% check(df) # out - dates, which are out of the protocol's timeline obj_date %>% choose_test("out") # uneq - dates, which are unequal obj_date %>% choose_test("uneq") # ok - correct dates obj_date %>% choose_test("ok") # all dates obj_date %>% get_result() ``` `dplyr::contains` - A function, which select necessary visit or event e.g. dplyr::start_with, dplyr::contains. It works like `df %>% select(contains("E1"))`. You also can use `dplyr::start_with`, works like `df %>% select(start_with("V1"))` `dplyr::matches` - A function, which select dates from necessary visit e.g. dplyr::matches, dplyr::contains. It works like `visit_one %>% select(contains("DAT"))`, default: `dplyr::contains()` ### rename Function to rename the dataset, using crfs. ```{r rename, eval = FALSE} rename_dataset("./crfs", "old_name", "new_name", 2) ``` * "./crfs" - path to crfs * "old_name" - variable for names in the dataset, without postfix or prefix * "new_name" - variable for necessary names, names should be unique * 2 - a position of a sheet in the excel document, where dmtools can find "old_name" and "new_name"