ORFik: Open Reading Frames in Genomics

Package ‘ORFik’

August 22, 2024

Type Package

Title Open Reading Frames in Genomics

Version 1.24.0

Encoding UTF-8

Description

R package for analysis of transcript and translation features through manipulation of sequence data

and NGS data like Ribo-Seq, RNA-Seq, TCP-

Seq and CAGE. It is generalized in the sense that any transcript region

can be analysed, as the name hints to it was made with investigation of ribosomal patterns over

Open Reading Frames (ORFs) as it's primary use case.

ORFik is extremely fast through use of C++, data.table and GenomicRanges.

Package allows to reassign starts of the transcripts with the use of CAGE-Seq data,

automatic shifting of RiboSeq reads, ﬁnding of Open Reading Frames for

whole genomes and much more.

biocViews ImmunoOncology, Software, Sequencing, RiboSeq, RNASeq,

FunctionalGenomics, Coverage, Alignment, DataImport

License MIT + ﬁle LICENSE

LazyData TRUE

BugReports https://github.com/Roleren/ORFik/issues

URL https://github.com/Roleren/ORFik

Depends R (>= 4.4.0), IRanges (>= 2.17.1), GenomicRanges (>= 1.35.1),

GenomicAlignments (>= 1.19.0)

Imports AnnotationDbi (>= 1.45.0), Biostrings (>= 2.51.1), biomaRt,

biomartr (>= 1.0.7), BiocFileCache, BiocGenerics (>= 0.29.1),

BiocParallel (>= 1.19.0), BSgenome, cowplot (>= 1.0.0), curl,

RCurl, data.table (>= 1.11.8), DESeq2 (>= 1.24.0), downloader,

fst (>= 0.9.2), GenomeInfoDb (>= 1.15.5), GenomicFeatures (>=

1.31.10), ggplot2 (>= 2.2.1), gridExtra (>= 2.3), httr (>=

1.3.0), jsonlite, methods (>= 3.6.0), R.utils, Rcpp (>= 1.0.0),

Rsamtools (>= 1.35.0), rtracklayer (>= 1.43.0), stats,

SummarizedExperiment (>= 1.14.0), S4Vectors (>= 0.21.3), tools,

txdbmaker, utils, XML, xml2 (>= 1.2.0), withr

2 Contents

RoxygenNote 7.3.1

Suggests testthat, rmarkdown, knitr, BiocStyle,

BSgenome.Hsapiens.UCSC.hg19

LinkingTo Rcpp

VignetteBuilder knitr

git_url https://git.bioconductor.org/packages/ORFik

git_branch RELEASE_3_19

git_last_commit d2ff2c0

git_last_commit_date 2024-04-30

Repository Bioconductor 3.19

Date/Publication 2024-08-21

Author Haakon Tjeldnes [aut, cre, dtc],

Kornel Labun [aut, cph],

Michal Swirski [ctb],

Katarzyna Chyzynska [ctb, dtc],

Yamila Torres Cleuren [ctb, ths],

Eivind Valen [ths, fnd]

Maintainer Haakon Tjeldnes <[email protected]>

Contents

ORFik-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

addCdsOnLeaderEnds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

addNewTSSOnLeaders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

alignmentFeatureStatistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

allFeaturesHelper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

appendZeroes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

artiﬁcial.orfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

assignAnnotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

assignFirstExonsStartSite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

assignLastExonsStopSite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

assignTSSByCage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

asTX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

bamVarName . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

bamVarNamePicker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

batchNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

bedToGR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

browseSRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

cellLineNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

cellTypeNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

changePointAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

checkRFP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

checkRNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

codonSumsPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Contents 3

codon_usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

codon_usage_exp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

codon_usage_plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

collapse.by.scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

collapse.fastq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

collapseDuplicatedReads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

collapseDuplicatedReads,data.table-method . . . . . . . . . . . . . . . . . . . . . . . . 38

collapseDuplicatedReads,GAlignmentPairs-method . . . . . . . . . . . . . . . . . . . . 39

collapseDuplicatedReads,GAlignments-method . . . . . . . . . . . . . . . . . . . . . . 39

collapseDuplicatedReads,GRanges-method . . . . . . . . . . . . . . . . . . . . . . . . 40

combn.pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

computeFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

computeFeaturesCage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

conditionNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

conﬁg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

conﬁg.exper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

conﬁg.save . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

conﬁg_ﬁle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

convertLibs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

convertToOneBasedRanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

convert_bam_to_ofst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

convert_to_bigWig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

convert_to_covRle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

convert_to_covRleList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

convert_to_fstWig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

correlation.plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

cor_plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

cor_table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

countOverlapsW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

countTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

countTable_regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

coverageByTranscriptC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

coverageByTranscriptW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

coverageGroupings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

coverageHeatMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

coveragePerTiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

coverageScorings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

coverage_to_dt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

covRle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

covRle-class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

covRleFromGR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

covRleList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

covRleList-class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

create.experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

deﬁneIsoform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

deﬁneTrailer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

DEG.analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

DEG.plot.static . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4 Contents

DEG_model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

DEG_model_results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

DEG_model_simple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

design,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

detectRibosomeShifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

detect_ribo_orfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

disengagementScore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

distToCds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

distToTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

download.ebi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

download.SRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

download.SRA.metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

downstreamFromPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

downstreamN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

downstreamOfPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

DTEG.analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

DTEG.plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

envExp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

envExp,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

envExp<- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

envExp<-,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

exists.ftp.dir.fast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

exists.ftp.ﬁle.fast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

experiment-class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

experiment.colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

export.bed12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

export.bedo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

export.bedoc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

export.bigWig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

export.fstwig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

export.ofst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

export.ofst,GAlignmentPairs-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

export.ofst,GAlignments-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

export.ofst,GRanges-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

export.wiggle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

extendLeaders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

extendsTSSexons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

extendTrailers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

extract_run_id . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

f,covRle-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

ﬁlepath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

ﬁlterCage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

ﬁlterExtremePeakGenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

ﬁlterTranscripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

ﬁlterUORFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

ﬁmport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Contents 5

ﬁndFa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

ﬁndFromPath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

ﬁndLibrariesInFolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

ﬁndMapORFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

ﬁndMaxPeaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

ﬁndNewTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

ﬁndNGSPairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

ﬁndORFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

ﬁndORFsFasta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

ﬁndPeaksPerGene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

ﬁndUORFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

ﬁndUORFs_exp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

ﬁnd_url_ebi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

ﬁnd_url_ebi_safe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

ﬁrstEndPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

ﬁrstExonPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

ﬁrstStartPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

ﬁx_malformed_gff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

ﬂankPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

ﬂoss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

footprints.analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

fpkm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

fpkm_calc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

fractionLength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

fractionNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

fread.bed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

gcContent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

geneToSymbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

getGAlignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

getGAlignmentsPairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

getGenomeAndAnnotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

getGRanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

getGtfPathFromTxdb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

getNGenesCoverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

getWeights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

get_bioproject_candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

get_genome_fasta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

get_genome_gtf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

get_noncoding_rna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

get_phix_genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

get_silva_rRNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

groupGRangesBy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

groupings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

gSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

hasHits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

heatMapL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

heatMapRegion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

heatMap_single . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

6 Contents

import.bedo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

import.bedoc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

import.fstwig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

import.ofst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

importGtfFromTxdb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

inhibitorNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

initiationScore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

insideOutsideORF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

install.fastp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

install.sratoolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

is.grl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

is.gr_or_grl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

is.ORF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

is.range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

isInFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

isOverlapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

isPeriodic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

kozakHeatmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

kozakSequenceScore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

kozak_IR_ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

lastExonEndPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

lastExonPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

lastExonStartPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

length,covRle-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

length,covRleList-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

lengths,covRle-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

lengths,covRleList-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

libFolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

libFolder,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

libNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

libraryTypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

list.experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

list.genomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

loadRegion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

loadRegions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

loadTranscriptType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

loadTxdb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

longestORFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

mainNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

makeExonRanks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

makeORFNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

makeSummarizedExperimentFromBam . . . . . . . . . . . . . . . . . . . . . . . . . . 231

makeTxdbFromGenome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

mapToGRanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

matchColors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

matchNaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

matchSeqStyle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

mergeFastq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

Contents 7

mergeLibs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

metadata.autnaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

metaWindow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

model.matrix,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

name,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

nrow,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

numCodons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

numExonsPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

ofst_merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

optimizedTranscriptLengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

optimized_txdb_path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

optimizeReads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

orfFrameDistributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

orfID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

ORFik.template.experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

ORFik.template.experiment.zf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

ORFikQC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

orfScore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

organism,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

outputLibs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

pasteDir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

pcaExperiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

percentage_to_ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

plotHelper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

pmapFromTranscriptF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

pmapToTranscriptF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

prettyScoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

pseudo.transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

pSitePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

QCfolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

QCfolder,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

QCplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

QCreport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

QCstats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

QCstats.plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

QC_count_tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

r,covRle-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

rankOrder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

read.experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

readBam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

readBigWig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

readLengthTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

readWidths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

readWig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

reassignTSSbyCage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

reassignTxDbByCage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

8 Contents

reduceKeepAttr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

regionPerReadLength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

remakeTxdbExonIds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

remove.experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

remove.ﬁle_ext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

removeMetaCols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

removeORFsWithinCDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

removeORFsWithSameStartAsCDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

removeORFsWithSameStopAsCDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

removeORFsWithStartInsideCDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

removeTxdbExons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

removeTxdbTranscripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

rename.SRA.ﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

repNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

resFolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

resFolder,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

restrictTSSByUpstreamLeader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

revElementsF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

reverseMinusStrandPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294

riboORFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294

riboORFsFolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

RiboQC.plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

ribosomeReleaseScore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

ribosomeStallingScore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

ribo_fft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

ribo_fft_plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

rnaNormalize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

runIDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

runIDs,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

save.experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

savePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

scaledWindowPositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304

scoreSummarizedExperiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

seqinfo,covRle-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

seqinfo,covRleList-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

seqinfo,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

seqlevels,covRle-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

seqlevels,covRleList-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

seqlevels,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

seqnamesPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

shiftFootprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

shiftFootprintsByExperiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

shiftPlots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

shifts.load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

show,covRle-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

show,covRleList-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

show,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

simpleLibs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

Contents 9

sortPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

splitIn3Tx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

stageNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

STAR.align.folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322

STAR.align.single . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

STAR.allsteps.multiQC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

STAR.index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331

STAR.install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

STAR.multiQC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

STAR.remove.crashed.genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

startCodons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

startDeﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

startRegion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

startRegionCoverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

startRegionString . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

startSites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

stopCodons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

stopDeﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342

stopRegion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342

stopSites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

strandBool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

strandMode,covRle-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

strandMode,covRleList-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

strandPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

subsetCoverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

subsetToFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347

symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

symbols,experiment-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

te.plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349

te.table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

te_rna.plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

tile1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

tissueNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

TOP.Motif.ecdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354

topMotif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

transcriptWindow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

transcriptWindow1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358

transcriptWindowPer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360

translationalEff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

trimming.table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363

trim_detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363

txNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

txNamesToGeneNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

txSeqsFromFa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

uniqueGroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

uniqueOrder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

unlistGrl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

uORFSearchSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

10 ORFik-package

updateTxdbRanks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370

updateTxdbStartSites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

upstreamFromPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

upstreamOfPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372

validateExperiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373

validGRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373

validSeqlevels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374

widthPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375

windowCoveragePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375

windowPerGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

windowPerReadLength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

windowPerTranscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380

xAxisScaler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

yAxisScaler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382

Index 383

ORFik-package ORFik for analysis of open reading frames.

Description

Main goals:

1. Finding Open Reading Frames (very fast) in the genome of interest or on the set of tran-

scripts/sequences.

2. Utilities for metaplots of RiboSeq coverage over gene START and STOP codons allowing to

spot the shift.

3. Shifting functions for the RiboSeq data.

4. Finding new Transcription Start Sites with the use of CageSeq data.

5. Various measurements of gene identity e.g. FLOSS, coverage, ORFscore, entropy that are

recreated based on many scientiﬁc publications.

6. Utility functions to extend GenomicRanges for faster grouping, splitting, tiling etc.

Author(s)

Maintainer: Haakon Tjeldnes <[email protected]> [data contributor]

Authors:

• Kornel Labun <[email protected]> [copyright holder]

Other contributors:

• Michal Swirski <[email protected]> [contributor]

• Katarzyna Chyzynska <[email protected]> [contributor, data contributor]

• Yamila Torres Cleuren <[email protected]> [contributor, thesis advisor]

• Eivind Valen <[email protected]> [thesis advisor, funder]

addCdsOnLeaderEnds 11

See Also

Useful links:

• https://github.com/Roleren/ORFik

• Report bugs at https://github.com/Roleren/ORFik/issues

addCdsOnLeaderEnds Extends leaders downstream

Description

When ﬁnding uORFs, often you want to allow them to end inside the cds.

Usage

addCdsOnLeaderEnds(fiveUTRs, cds, onlyFirstExon = FALSE)

Arguments

fiveUTRs The 5’ leader sequences as GRangesList

cds If you want to extend 5’ leaders downstream, to catch uorfs going into cds,

include it.

onlyFirstExon logical (F), include whole cds or only ﬁrst exons.

Details

This is a simple way to do that

Value

a GRangesList of cds exons added to ends

See Also

Other uorfs: filterUORFs(), removeORFsWithSameStartAsCDS(), removeORFsWithSameStopAsCDS(),

removeORFsWithStartInsideCDS(), removeORFsWithinCDS(), uORFSearchSpace()

12 alignmentFeatureStatistics

addNewTSSOnLeaders Add cage max peaks as new transcript start sites for each 5’ leader (*)

strands are not supported, since direction must be known.

Description

Add cage max peaks as new transcript start sites for each 5’ leader (*) strands are not supported,

since direction must be known.

Usage

addNewTSSOnLeaders(fiveUTRs, maxPeakPosition, removeUnused, cageMcol)

Arguments

fiveUTRs (GRangesList) The 5’ leaders or full transcript sequences

maxPeakPosition

The max peak for each 5’ leader found by cage

removeUnused logical (FALSE), if False: (standard is to set them to original annotation), If

TRUE: remove leaders that did not have any cage support.

cageMcol a logical (FALSE), if TRUE, add a meta column to the returned object with the

raw CAGE counts in support for new TSS.

Value

a GRanges object of ﬁrst exons

alignmentFeatureStatistics

Create alignment feature statistcs

Description

Among others how much reads are in mRNA, introns, intergenic, and check of reads from rRNA

and other ncRNAs. The better the annotation / gtf used, the more results you get.

Usage

alignmentFeatureStatistics(df, type = "ofst", BPPARAM = bpparam())

allFeaturesHelper 13

Arguments

df an ORFik experiment

type a character(default: "default"), load ﬁles in experiment or some precomputed

variant, like "ofst" or "pshifted". These are made with ORFik:::convertLibs(),

shiftFootprintsByExperiment(), etc. Can also be custom user made folders in-

side the experiments bam folder. It acts in a recursive manner with priority: If

you state "pshifted", but it does not exist, it checks "ofst". If no .ofst ﬁles, it uses

"default", which always must exists.

Presets are (folder is relative to default lib folder, some types fall back to other

formats if folder does not exist):

- "default": load the original ﬁles for experiment, usually bam.

- "ofst": loads ofst ﬁles from the ofst folder, relative to lib folder (falls back to

default)

- "pshifted": loads ofst, wig or bigwig from pshifted folder (falls back to ofst,

then default)

- "cov": Load covRle objects from cov_RLE folder (fail if not found)

- "covl": Load covRleList objects, from cov_RLE_List folder (fail if not found)

- "bed": Load bed ﬁles, from bed folder (falls back to default)

- Other formats must be loaded directly with ﬁmport

BPPARAM how many cores/threads to use? default: bpparam(). To see number of threads

used, do bpparam()$workers. You can also add a time remaining bar, for a

more detailed pipeline.

Value

a data.table of the statistcs

allFeaturesHelper Calculate the features in computeFeatures function

Description

Not used directly, calculates all features internally for computeFeatures.

Usage

allFeaturesHelper(

grl,

RFP,

RNA,

tx,

fiveUTRs,

cds,

threeUTRs,

faFile,

riboStart,

14 allFeaturesHelper

riboStop,

sequenceFeatures,

uorfFeatures,

grl.is.sorted,

weight.RFP = 1L,

weight.RNA = 1L,

st = NULL

)

Arguments

grl a GRangesList object with usually ORFs, but can also be either leaders, cds’,

3’ utrs, etc. This is the regions you want to score.

RFP RiboSeq reads as GAlignments , GRanges or GRangesList object

RNA RnaSeq reads as GAlignments , GRanges or GRangesList object

tx a GRangesList of transcripts, normally called from: exonsBy(Gtf, by = "tx",

use.names = T) only add this if you are not including Gtf ﬁle If you are using

CAGE, you do not need to reassign these to the cage peaks, it will do it for you.

fiveUTRs ﬁveUTRs as GRangesList, if you used cage-data to extend 5’ utrs, remember to

input CAGE assigned version and not original!

cds a GRangesList of coding sequences

threeUTRs a GRangesList of transcript 3’ utrs, normally called from: threeUTRsByTran-

script(Gtf, use.names = T)

faFile a path to fasta indexed genome, an open FaFile, a BSgenome, or path to ORFik

experiment with valid genome.

riboStart usually 26, the start of the ﬂoss interval, see ?ﬂoss

riboStop usually 34, the end of the ﬂoss interval

sequenceFeatures

a logical, default TRUE, include all sequence features, that is: Kozak, fraction-

Lengths, distORFCDS, isInFrame, isOverlapping and rankInTx. uorfFeatures =

FALSE will remove the 4 last.

uorfFeatures a logical, default TRUE, include all uORF sequence features, that is: distOR-

FCDS, isInFrame, isOverlapping and rankInTx

grl.is.sorted logical (F), a speed up if you know argument grl is sorted, set this to TRUE.

weight.RFP a vector (default: 1L). Can also be character name of column in RFP. As in trans-

lationalEff(weight = "score") for: GRanges("chr1", 1, "+", score = 5), would

mean score column tells that this alignment region was found 5 times.

weight.RNA Same as weightRFP but for RNA weights. (default: 1L)

st (NULL), if deﬁned must be: st = startRegion(grl, tx, T, -3, 9)

Value

a data.table with features

appendZeroes 15

appendZeroes Append zero values to data.table

Description

For every position in width max.pos - min.pos + 1, append 0 values in data.table. Needed when

coveragePerTiling was run on coverage window with drop.zero.dt as TRUE and you need to plot 0

positions after a transformation by coverageScorings.

Usage

appendZeroes(dt, max.pos, min.pos = 1L, fractions = unique(dt$fraction))

Arguments

dt a data.table from coverageByTiling that is normalized by coverageScorings.

max.pos integer, max position of dt

min.pos integer, default 1L. Minimum position of dt

fractions default unique(dt$fraction), will repeat each fraction max.pos - min.pos + 1

times.

Value

a data.table with appended 0 values

artificial.orfs Create small artiﬁcial orfs from cds

Description

Usefull to see if short ORFs prediction is dependent on length.

Split cds ﬁrst in two, a start part and stop part. Then say how large the two parts can be and merge

them together. It will sample a value in range give.

Parts will be forced to not overlap and can not extend outside original cds

Usage

artificial.orfs(

cds,

start5 = 1,

end5 = 4,

start3 = -4,

end3 = 0,

bin.if.few = TRUE

)

16 assignAnnotations

Arguments

cds a GRangesList of orfs, must have width %% 3 == 0 and length >= 6

start5 integer, default: 1 (start of orf)

end5 integer, default: 4 (max 4 codons from start codon)

start3 integer, default -4 (max 4 codons from stop codon)

end3 integer, default: 0 (end of orf)

bin.if.few logical, default TRUE, instead of per codon, do per 2, 3, 4 codons if you have

few samples compared to lengths wanted, If you have 4 cds’ and you want 7

different lengths, which is the standard, it will give you possible nt length: 6-12-

18-24 instead of original 6-9-12-15-18-21-24.

If you have more than 30x cds than lengths wanted this is skipped. (for default

arguments this is: 7*30 = 210 cds)

Details

If artiﬁcial cds length is not divisible by 2, like 3 codons, the second codon will always be from the

start region etc.

Also If there are many very short original cds, the distribution will be skewed towards more smaller

artiﬁcial cds.

Value

GRangesList of new ORFs (sorted: + strand increasing start, - strand decreasing start)

Examples

txdb <- ORFik.template.experiment()

#cds <- loadRegion(txdb, "cds")

## To get enough CDSs, just replicate them

# cds <- rep(cds, 100)

#artificial.orfs(cds)

assignAnnotations Overlaps GRanges object with provided annotations.

Description

It will return same list of GRanges, but with metdata columns: trainscript_id - id of transcripts

that overlap with each ORF gene_id - id of gene that this transcript belongs to isoform - for coding

protein alignment in relation to cds on coresponding transcript, for non-coding transcripts alignment

in relation to the transcript.

Usage

assignAnnotations(ORFs, con)

assignFirstExonsStartSite 17

Arguments

ORFs - GRanges or GRangesList object of your ORFs.

con - Path to gtf ﬁle with annotations.

Value

A GRanges object of your ORFs with metadata columns ’gene’, ’transcript’, isoform’ and ’biotype’.

assignFirstExonsStartSite

Reassign the start positions of the ﬁrst exons per group in grl

Description

Per group in GRangesList, assign the most upstream site.

Usage

assignFirstExonsStartSite(

grl,

newStarts,

is.circular = all(isCircular(grl) %in% TRUE)

)

Arguments

grl a GRangesList object

newStarts an integer vector of same length as grl, with new start values (absolute coordi-

nates, not relative)

is.circular logical, default FALSE if not any is: all(isCircular(grl) Where grl is the ranges

checked. If TRUE, allow ranges to extend below position 1 on chromosome.

Since circular genomes can have negative coordinates.

Details

make sure your grl is sorted, since start of "-" strand objects should be the max end in group, use

ORFik:::sortPerGroup(grl) to get sorted grl.

Value

the same GRangesList with new start sites

See Also

Other GRanges: assignLastExonsStopSite(), downstreamFromPerGroup(), downstreamOfPerGroup(),

upstreamFromPerGroup(), upstreamOfPerGroup()

18 assignLastExonsStopSite

assignLastExonsStopSite

Reassign the stop positions of the last exons per group

Description

Per group in GRangesList, assign the most downstream site.

Usage

assignLastExonsStopSite(

grl,

newStops,

is.circular = all(isCircular(grl) %in% TRUE)

)

Arguments

grl a GRangesList object

newStops an integer vector of same length as grl, with new start values (absolute coordi-

nates, not relative)

is.circular logical, default FALSE if not any is: all(isCircular(grl) Where grl is the ranges

checked. If TRUE, allow ranges to extend below position 1 on chromosome.

Since circular genomes can have negative coordinates.

Details

make sure your grl is sorted, since stop of "-" strand objects should be the min start in group, use

ORFik:::sortPerGroup(grl) to get sorted grl.

Value

the same GRangesList with new stop sites

See Also

Other GRanges: assignFirstExonsStartSite(), downstreamFromPerGroup(), downstreamOfPerGroup(),

upstreamFromPerGroup(), upstreamOfPerGroup()

assignTSSByCage 19

assignTSSByCage Input a txdb and add a 5’ leader for each transcript, that does not have

one.

Description

For all cds in txdb, that does not have a 5’ leader: Start at 1 base upstream of cds and use CAGE,

to assign leader start. All these leaders will be 1 exon based, if you really want exon splicings, you

can use exon prediction tools, or run sequencing experiments.

Usage

assignTSSByCage(

txdb,

cage,

extension = 1000,

filterValue = 1,

restrictUpstreamToTx = FALSE,

removeUnused = FALSE,

preCleanup = TRUE,

pseudoLength = 1

)

Arguments

txdb a TxDb ﬁle, a path to one of: (.gtf ,.gff, .gff2, .gff2, .db or .sqlite) or an ORFik

experiment

cage Either a ﬁlePath for the CageSeq ﬁle as .bed .bam or .wig, with possible com-

pressions (".gzip", ".gz", ".bgz"), or already loaded CageSeq peak data as GRanges

or GAlignment. NOTE: If it is a .bam ﬁle, it will add a score column by run-

ning: convertToOneBasedRanges(cage, method = "5prime", addScoreColumn =

TRUE) The score column is then number of replicates of read, if score column

is something else, like read length, set the score column to NULL ﬁrst.

extension The maximum number of basses upstream of the TSS to search for CageSeq

peak.

filterValue The minimum number of reads on cage position, for it to be counted as possible

new tss. (represented in score column in CageSeq data) If you already ﬁltered,

set it to 0.

restrictUpstreamToTx

a logical (FALSE). If TRUE: restrict leaders to not extend closer than 5 bases

from closest upstream leader, set this to TRUE.

removeUnused logical (FALSE), if False: (standard is to set them to original annotation), If

TRUE: remove leaders that did not have any cage support.

preCleanup logical (TRUE), if TRUE, remove all reads in region (-5:-1, 1:5) of all original

tss in leaders. This is to keep original TSS if it is only +/- 5 bases from the

original.

20 asTX

pseudoLength a numeric, default 1. Either if no CAGE supports the leader, or if CAGE is set

to NULL, add a pseudo length for all the UTRs. Will not extend a leader if it

would make it go outside the deﬁned seqlengths of the genome. So this length

is not guaranteed for all!

Details

Given a TxDb object, reassign the start site per transcript using max peaks from CageSeq data. A

max peak is deﬁned as new TSS if it is within boundary of 5’ leader range, speciﬁed by ‘extension‘

in bp. A max peak must also be higher than minimum CageSeq peak cutoff speciﬁed in ‘ﬁlter-

Value‘. The new TSS will then be the positioned where the cage read (with highest read count in

the interval). If no CAGE supports a leader, the width will be set to 1 base.

Value

a TxDb obect of reassigned transcripts

See Also

Other CAGE: reassignTSSbyCage(), reassignTxDbByCage()

Examples

txdbFile <- system.file("extdata", "hg19_knownGene_sample.sqlite",

package = "GenomicFeatures")

cagePath <- system.file("extdata", "cage-seq-heart.bed.bgz",

package = "ORFik")

## Not run:

assignTSSByCage(txdbFile, cagePath)

#Minimum 20 cage tags for new TSS

assignTSSByCage(txdbFile, cagePath, filterValue = 20)

# Create pseudo leaders for the ones without hits

assignTSSByCage(txdbFile, cagePath, pseudoLength = 100)

# Create only pseudo leaders (in example 2 leaders are added)

assignTSSByCage(txdbFile, cage = NULL, pseudoLength = 100)

## End(Not run)

asTX Map genomic to transcript coordinates by reference

Description

Map range coordinates between features in the genome and transcriptome (reference) space.

asTX 21

Usage

asTX(

grl,

reference,

ignore.strand = FALSE,

x.is.sorted = TRUE,

tx.is.sorted = TRUE

)

Arguments

grl a GRangesList of ranges within the reference, grl must have column called

names that gives grouping for result

reference a GRangesList of ranges that include and are bigger or equal to grl ig. cds is grl

and gene can be reference

ignore.strand When ignore.strand is TRUE, strand is ignored in overlaps operations (i.e., all

strands are considered "+") and the strand in the output is ’*’.

When ignore.strand is FALSE (default) strand in the output is taken from the

transcripts argument. When transcripts is a GRangesList, all inner list elements

of a common list element must have the same strand or an error is thrown.

Mapped position is computed by counting from the transcription start site (TSS)

and is not affected by the value of ignore.strand.

x.is.sorted if x is a GRangesList object, are "-" strand groups pre-sorted in decreasing order

within group, default: TRUE

tx.is.sorted if transcripts is a GRangesList object, are "-" strand groups pre-sorted in de-

creasing order within group, default: TRUE

Details

Similar to GenomicFeatures’ pmapToTranscripts, but in this version the grl ranges are compared to

reference ranges with same name, not by index. And it has a security ﬁx.

Value

a GRangesList in transcript coordinates

See Also

Other ExtendGenomicRanges: coveragePerTiling(), extendLeaders(), extendTrailers(),

reduceKeepAttr(), tile1(), txSeqsFromFa(), windowPerGroup()

Examples

seqname <- c("tx1", "tx2", "tx3")

seqs <- c("ATGGGTATTTATA", "AAAAA", "ATGGGTAATA")

grIn1 <- GRanges(seqnames = "1",

ranges = IRanges(start = c(21, 10), end = c(23, 19)),

strand = "-")

22 bamVarName

grIn2 <- GRanges(seqnames = "1",

ranges = IRanges(start = c(1), end = c(5)),

strand = "-")

grIn3 <- GRanges(seqnames = "1",

ranges = IRanges(start = c(1010), end = c(1019)),

strand = "-")

grl <- GRangesList(grIn1, grIn2, grIn3)

names(grl) <- seqname

# Find ORFs

test_ranges <- findMapORFs(grl, seqs,

"ATG|TGG|GGG",

"TAA|AAT|ATA",

longestORF = FALSE,

minimumLength = 0)

# Genomic coordinates ORFs

test_ranges

# Transcript coordinate ORFs

asTX(test_ranges, reference = grl)

# seqnames will here be index of transcript it came from

bamVarName Get library variable names from ORFik experiment

Description

What will each sample be called given the columns of the experiment? A column is included if

more than 1 unique element value exist in that column.

Usage

bamVarName(

df,

skip.replicate = length(unique(df$rep)) == 1,

skip.condition = length(unique(df$condition)) == 1,

skip.stage = length(unique(df$stage)) == 1,

skip.fraction = length(unique(df$fraction)) == 1,

skip.experiment = !df@expInVarName,

skip.libtype = FALSE,

fraction_prepend_f = TRUE

)

Arguments

df an ORFik experiment

skip.replicate a logical (FALSE), don’t include replicate in variable name.

skip.condition a logical (FALSE), don’t include condition in variable name.

skip.stage a logical (FALSE), don’t include stage in variable name.

bamVarNamePicker 23

skip.fraction a logical (FALSE), don’t include fraction

skip.experiment

a logical (FALSE), don’t include experiment

skip.libtype a logical (FALSE), don’t include libtype

fraction_prepend_f

a logical (TRUE), include "f" in front of fraction, useful for knowing what frac-

tion is.

Value

variable names of libraries (character vector)

See Also

Other ORFik_experiment: ORFik.template.experiment(), ORFik.template.experiment.zf(),

create.experiment(), experiment-class, filepath(), libraryTypes(), organism,experiment-method,

outputLibs(), read.experiment(), save.experiment(), validateExperiments()

Examples

df <- ORFik.template.experiment()

bamVarName(df)

## without libtype

bamVarName(df, skip.libtype = TRUE)

## Without experiment name

bamVarName(df, skip.experiment = TRUE)

bamVarNamePicker Get variable name per ﬁlepath in experiment

Description

Get variable name per ﬁlepath in experiment

Usage

bamVarNamePicker(

df,

skip.replicate = FALSE,

skip.condition = FALSE,

skip.stage = FALSE,

skip.fraction = FALSE,

skip.experiment = FALSE,

skip.libtype = FALSE,

fraction_prepend_f = TRUE

)

24 batchNames

Arguments

df an ORFik experiment

skip.replicate a logical (FALSE), don’t include replicate in variable name.

skip.condition a logical (FALSE), don’t include condition in variable name.

skip.stage a logical (FALSE), don’t include stage in variable name.

skip.fraction a logical (FALSE), don’t include fraction

skip.experiment

a logical (FALSE), don’t include experiment

skip.libtype a logical (FALSE), don’t include libtype

fraction_prepend_f

a logical (TRUE), include "f" in front of fraction, useful for knowing what frac-

tion is.

Value

variable name of library (character vector)

batchNames Get batch name variants

Description

Used to standardize nomeclature for experiments.

Example: Biological samples (batches) batch will become b1

Usage

batchNames()

Value

a data.table with 2 columns, the main name, and all name variants of the main name in second

column as a list.

See Also

Other experiment_naming: cellLineNames(), cellTypeNames(), conditionNames(), fractionNames(),

inhibitorNames(), libNames(), mainNames(), repNames(), stageNames(), tissueNames()

bedToGR 25

bedToGR Converts bed style data.frame to GRanges

Description

For info on columns, see: https://www.ensembl.org/info/website/upload/bed.html

Usage

bedToGR(x, skip.name = TRUE)

Arguments

x A data.frame from imported bed-ﬁle, to convert to GRanges

skip.name default (TRUE), skip name column (column 4)

Value

a GRanges object from bed

See Also

Other utils: convertToOneBasedRanges(), export.bed12(), export.bigWig(), export.fstwig(),

export.wiggle(), fimport(), findFa(), fread.bed(), optimizeReads(), readBam(), readBigWig(),

readWig()

browseSRA Open SRA in browser for speciﬁc bioproject

Description

Open SRA in browser for speciﬁc bioproject

Usage

browseSRA(x, browser = getOption("browser"))

Arguments

x character, bioproject ID.

browser a non-empty character string giving the name of the program to be used as the

HTML browser. It should be in the PATH, or a full path speciﬁed. Alternatively,

an R function to be called to invoke the browser.

Under Windows NULL is also allowed (and is the default), and implies that the

ﬁle association mechanism will be used.

26 cellLineNames

Value

invisible(NULL), opens webpage only

See Also

Other sra: download.SRA(), download.SRA.metadata(), download.ebi(), get_bioproject_candidates(),

install.sratoolkit(), rename.SRA.files()

Examples

#browseSRA("PRJNA336542")

#' # For windows make sure a valid browser is defined:

browser <- getOption("browser")

#browseSRA("PRJNA336542", browser)

cellLineNames Get cell-line name variants

Description

Used to standardize nomeclature for experiments.

Example: THP1 is main naming, but a variant is THP-1 THP-1 will then be renamed to THP1

(variables in R, can not have - in them)

Usage

cellLineNames(convertToTissue = FALSE)

Arguments

convertToTissue

logical, FALSE. If TRUE, return tissue type. NONE is returned for general

non-differentiated cell lines like 3T3.

Value

a data.table with 2 columns, the main name, and all name variants of the main name in second

column as a list.

See Also

Other experiment_naming: batchNames(), cellTypeNames(), conditionNames(), fractionNames(),

inhibitorNames(), libNames(), mainNames(), repNames(), stageNames(), tissueNames()

cellTypeNames 27

cellTypeNames Get cell type name variants

Description

Used to standardize nomeclature for experiments.

Example: 1 is main naming, but a variant is rep1 rep1 will then be renamed to 1

Usage

cellTypeNames()

Value

a data.table with 2 columns, the main name, and all name variants of the main name in second

column as a list.

See Also

Other experiment_naming: batchNames(), cellLineNames(), conditionNames(), fractionNames(),

inhibitorNames(), libNames(), mainNames(), repNames(), stageNames(), tissueNames()

changePointAnalysis Get the offset for speciﬁc RiboSeq read width

Description

Creates sliding windows of transcript normalized counts per position and check which window has

most in upstream window vs downstream window. Pick the position with highest absolute value

maximum of the window difference. Checks windows with split sites between positions -17 to -7,

where 0 is TIS. Normally you expect the shift around -12 for Ribo-seq, in TCP-seq / RCP-seq it is

usually a bit higher, usually because of cross-linking variations.

Usage

changePointAnalysis(

feature = "start",

max.pos = 40L,

interval = seq.int(14L, 24L),

center.pos = 12,

info = NULL,

verbose = FALSE

)

28 checkRFP

Arguments

x a vector with count per position to analyse, assumes the zero position (TIS) is in

the middle + 1 (position 0). Default it is size 60, from -30 to 29 in p-shifting

feature (character) either "start" or "stop"

max.pos integer, default 40L, subset x to go from index 1 to max.pos, if tail is not relevant.

interval integer vector , default seq.int(14L, 24L). The possible shift locations, default

Seperation points for upstream and downstream windows. That is (+/- 5 from

-12) position.

center.pos integer, default 12. Centering position for likely p-site. A ﬁrst qualiﬁed guess to

save time. 12 means 12 bases before TIS.

info specify read length if wanted for verbose output.

verbose logical, default FALSE. Report details of change point analysis.

Details

For visual explanation, see the supl. data of ORFik paper: Transcript normalized means per CDS

TIS region, count reads per position, divide that number per position by the total of that transcript,

then sum up these numbers per position for all transcripts.

Value

a single numeric offset, -12 would mean p-site is 12 bases upstream

See Also

Other pshifting: detectRibosomeShifts(), shiftFootprints(), shiftFootprintsByExperiment(),

shiftPlots(), shifts.load()

checkRFP Helper Function to check valid RFP input

Description

Helper Function to check valid RFP input

Usage

checkRFP(class)

Arguments

class the given class of RFP object

Value

NULL, stop if invalid object

checkRNA 29

See Also

Other validity: checkRNA(), is.ORF(), is.gr_or_grl(), is.grl(), is.range(), validGRL(),

validSeqlevels()

checkRNA Helper Function to check valid RNA input

Description

Helper Function to check valid RNA input

Usage

checkRNA(class)

Arguments

class the given class of RNA object

Value

NULL, stop if unvalid object

See Also

Other validity: checkRFP(), is.ORF(), is.gr_or_grl(), is.grl(), is.range(), validGRL(),

validSeqlevels()

codonSumsPerGroup Get read hits per codon

Description

Helper for entropy function, normally not used directly Seperate each group into tuples (abstract

codons) Gives sum for each tuple within each group

Usage

codonSumsPerGroup(grl, reads, weight = "score", is.sorted = FALSE)

30 codon_usage

Arguments

grl a GRangesList of 5’ utrs, CDS, transcripts, etc.

reads a GAlignments, GRanges, or precomputed coverage as covRle (one for each

strand) of RiboSeq, RnaSeq etc.

Weigths for scoring is default the ’score’ column in ’reads’. Can also be random

access paths to bigWig or fstwig ﬁle. Do not use random access for more than a

few genes, then loading the entire ﬁles is usually better. File streaming is still in

beta, so use with care!

weight (default: ’score’), if deﬁned a character name of valid meta column in subject.

GRanges("chr1", 1, "+", score = 5), would mean score column tells that this

alignment region was found 5 times. ORFik ofst, bedoc and .bedo ﬁles contains

a score column like this. As do CAGEr CAGE ﬁles and many other package

formats. You can also assign a score column manually.

is.sorted logical (FALSE), is grl sorted. That is + strand groups in increasing ranges

(1,2,3), and - strand groups in decreasing ranges (3,2,1)

Details

Example: counts c(1,0,0,1), with reg_len = 2, gives c(1,0) and c(0,1), these are summed and re-

turned as data.table 10 bases, will give 3 codons, 1 base codons does not exist.

Value

a data.table with codon sums

codon_usage Codon usage

Description

Per AA / codon, analyse the coverage, get a multitude of features. For both A sites and P-sites

(Input reads must be P-sites for now) This function takes inspiration from the codonDT paper, and

among others returns the negative binomial estimates, but in addition many other features.

Usage

codon_usage(

reads,

cds,

mrna,

faFile,

filter_table,

filter_cds_mod3 = TRUE,

min_counts_cds_filter = max(min(quantile(filter_table, 0.5), 1000), 1000),

with_A_sites = TRUE,

aligned_position = "center",

code = GENETIC_CODE

)

codon_usage 31

Arguments

reads either a single library (GRanges, GAlignment, GAlignmentPairs), or a list of

libraries returned from outputLibs(df) with p-sites. If list, the list must have

names coresponding to the library names.

cds a GRangesList

mrna a GRangesList

faFile a FaFile from genome

filter_table a matrix / vector of length equal to cds

filter_cds_mod3

logical, default TRUE. Remove all ORFs that are not mod3, this speeds up the

computation a lot, and usually removes malformed ORFs you would not want

anyway.

min_counts_cds_filter

numeric, default: max(min(quantile(filter_table, 0.50), 100), 100). Min-

imum number of counts from the ’ﬁlter_table’ argument.

with_A_sites logical, default TRUE. Not used yet, will also return A site scores.

aligned_position

what positions should be taken to calculate per-codon coverage. By default:

"center", meaning that positions -1,0,1 will be taken. Alternative: "left", then

positions 0,1,2 are taken.

code a named character vector of size 64. Default: GENETIC_CODE. Change if

organism does not use the standard code.

Details

The primary column to use is "mean_txNorm", this is the fair normalized score.

Value

a data.table of rows per codon / AA. All values are given per library, per site (A or P), sorted by the

mean_txNorm_percentage column of the ﬁrst library in the set, the columns are:

• variable (character)Library name

• seq (character)Amino acid:codon

• sum (integer)total counts per seq

• sum_txNorm (integer)total counts per seq normalized per tx

• var (numeric)variance of total counts per seq

• N (integer)total number of codons of that type

• mean_txNorm (numeric)Default use output, the fair codon usage, normalized both for gene

and genome level for codon and read counts

• ...

• alpha (numeric)dirichlet alpha MOM estimator (imagine mean and variance of probability in

1 value, the lower the value, the higher the variance, mean is decided by the relative value

between samples)

32 codon_usage_exp

• sum_txNorm (integer)total counts per seq normalized per tx

• relative_to_max_score (integer)Percentage use of codon

• type (factor(character))Either "P" or "A"

References

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7196831/

See Also

Other codon: codon_usage_exp(), codon_usage_plot()

Examples

df <- ORFik.template.experiment()[9:10,] # Subset to 2 Ribo-seq libs

## For single library

reads <- fimport(filepath(df[1,], "pshifted"))

cds <- loadRegion(df, "cds", filterTranscripts(df))

mrna <- loadRegion(df, "mrna", names(cds))

filter_table <- assay(countTable(df, type = "summarized")[names(cds)])

faFile <- findFa(df)

res <- codon_usage(reads, cds, mrna, faFile = faFile,

filter_table = filter_table, min_counts_cds_filter = 10)

codon_usage_exp Codon analysis for ORFik experiment

Description

Per AA / codon, analyse the coverage, get a multitude of features. For both A sites and P-sites

(Input reads must be P-sites for now) This function takes inspiration from the codonDT paper, and

among others returns the negative binomial estimates, but in addition many other features.

Usage

codon_usage_exp(

df,

reads,

cds = loadRegion(df, "cds", filterTranscripts(df)),

mrna = loadRegion(df, "mrna", names(cds)),

filter_cds_mod3 = TRUE,

filter_table = assay(countTable(df, type = "summarized")[names(cds)]),

faFile = df@fafile,

min_counts_cds_filter = max(min(quantile(filter_table, 0.5), 1000), 1000),

with_A_sites = TRUE,

code = GENETIC_CODE,

aligned_position = "center"

)

codon_usage_exp 33

Arguments

df an ORFik experiment

reads either a single library (GRanges, GAlignment, GAlignmentPairs), or a list of

libraries returned from outputLibs(df) with p-sites. If list, the list must have

names coresponding to the library names.

cds a GRangesList, the coding sequences, default: loadRegion(df, "cds", filterTranscripts(df)),

longest isoform per gene.

mrna a GRangesList, the full mRNA sequences (matching by names the cds sequences),

default: loadRegion(df, "mrna", names(cds)).

filter_cds_mod3

logical, default TRUE. Remove all ORFs that are not mod3, this speeds up the

computation a lot, and usually removes malformed ORFs you would not want

anyway.

filter_table an numeric(integer) matrix, where rownames are the names of the full set of

mRNA transcripts. This will be subsetted to the cds subset you use. Then CDSs

are ﬁltered from this table by the ’min_counts_cds_ﬁlter’ argument.

faFile FaFile, BSgenome, fasta/index ﬁle path or an ORFik experiment. This ﬁle is

usually used to ﬁnd the transcript sequences from some GRangesList.

min_counts_cds_filter

numeric, default: max(min(quantile(filter_table, 0.50), 100), 100). Min-

imum number of counts from the ’ﬁlter_table’ argument.

with_A_sites logical, default TRUE. Not used yet, will also return A site scores.

code a named character vector of size 64. Default: GENETIC_CODE. Change if

organism does not use the standard code.

aligned_position

what positions should be taken to calculate per-codon coverage. By default:

"center", meaning that positions -1,0,1 will be taken. Alternative: "left", then

positions 0,1,2 are taken.

Details

The primary column to use is "mean_txNorm", this is the fair normalized score.

Value

a data.table of rows per codon / AA. All values are given per library, per site (A or P), sorted by the

mean_txNorm_percentage column of the ﬁrst library in the set, the columns are:

• variable (character)Library name

• seq (character)Amino acid:codon

• sum (integer)total counts per seq

• sum_txNorm (integer)total counts per seq normalized per tx

• var (numeric)variance of total counts per seq

• N (integer)total number of codons of that type

34 codon_usage_plot

• mean_txNorm (numeric)Default use output, the fair codon usage, normalized both for gene

and genome level for codon and read counts

• ...

• alpha (numeric)dirichlet alpha MOM estimator (imagine mean and variance of probability in

1 value, the lower the value, the higher the variance, mean is decided by the relative value

between samples)

• sum_txNorm (integer)total counts per seq normalized per tx

• relative_to_max_score (integer)Percentage use of codon

• type (factor(character))Either "P" or "A"

References

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7196831/

See Also

Other codon: codon_usage(), codon_usage_plot()

Examples

df <- ORFik.template.experiment()[9:10,] # Subset to 2 Ribo-seq libs

## For single library

res <- codon_usage_exp(df, fimport(filepath(df[1,], "pshifted")),

min_counts_cds_filter = 10)

# mean_txNorm is adviced scoring column

# codon_usage_plot(res, res$mean_txNorm)

# Default for plot function is the percentage scaled version of mean_txNorm

# codon_usage_plot(res) # This gives check error

## For multiple libs

res2 <- codon_usage_exp(df, outputLibs(df, type = "pshifted", output.mode = "list"),

min_counts_cds_filter = 10)

# codon_usage_plot(res2)

codon_usage_plot Plot codon_usage

Description

Plot codon_usage

Usage

codon_usage_plot(

res,

score_column = res$relative_to_max_score,

ylab = "Ribo-seq library",

legend.position = "none",

collapse.by.scores 35

limit = c(0, max(score_column)),

midpoint = limit/2,

monospace_font = TRUE

)

Arguments

res a data.table of output from a codon_usage function

score_column numeric, default: res$relative_to_max_score. Which parameter to use as score

column.

ylab character vector, names for libraries to show on Y axis

legend.position

character, default "none", do not display legend.

limit numeric, 2 values for plot color limits. Default: c(0, max(score_column))

midpoint numeric, default: limit/2. midpoint of color limit.

monospace_font logical, default TRUE. Use monospace font, this does not work on systems (re-

quire speciﬁc font packages), set to FALSE if it crashes for you.

Value

a ggplot object

See Also

Other codon: codon_usage(), codon_usage_exp()

Examples

df <- ORFik.template.experiment()[9:10,] # Subset to 2 Ribo-seq libs

## For multiple libs

res2 <- codon_usage_exp(df, outputLibs(df, type = "pshifted", output.mode = "list"),

min_counts_cds_filter = 10)

# codon_usage_plot(res2, monospace_font = TRUE) # This gives check error

codon_usage_plot(res2, monospace_font = FALSE) # monospace font looks better

collapse.by.scores Merge reads by sum of existing scores

Description

If you have multiple reads a same location but different read lengths, speciﬁed in meta column

"size", it will sum up the scores (number of replicates) for all reads at that position

Usage

collapse.by.scores(x)

36 collapse.fastq

Arguments

x a GRanges object

Value

merged GRanges object

Examples

gr_s1 <- rep(GRanges("chr1", 1:10,"+"), 2)

gr_s2 <- GRanges("chr1", 1:12,"+")

gr2 <- GRanges("chr1", 21:40,"+")

gr <- c(gr_s1, gr_s2, gr2)

res <- convertToOneBasedRanges(gr,

addScoreColumn = TRUE, addSizeColumn = TRUE)

ORFik:::collapse.by.scores(res)

collapse.fastq Very fast fastq/fasta collapser

Description

For each unique read in the ﬁle, collapse into 1 and state in the fasta header how many reads existed

of that type. This is done after trimming usually, works best for reads < 50 read length. Not so

effective for 150 bp length mRNA-seq etc.

Usage

collapse.fastq(

files,

outdir = file.path(dirname(files[1]), "collapsed"),

header.out.format = "ribotoolkit",

compress = FALSE,

prefix = "collapsed_"

)

Arguments

files paths to fasta / fastq ﬁles to collapse. I tries to detect format per ﬁle, if ﬁle does

not have .fastq, .fastq.gz, .fq or fq.gz extensions, it will be treated as a .fasta ﬁle

format.

outdir outdir to save ﬁles, default: file.path(dirname(files[1]), "collapsed").

Inside same folder as input ﬁles, then create subfolder "collapsed", and add a

preﬁx of "collapsed_" to the output names in that folder.

collapseDuplicatedReads 37

header.out.format

character, default "ribotoolkit", else must be "fastx". How the read header of the

output fasta should be formated: ribotoolkit: ">seq1_x55", sequence 1 has 55

duplicated reads collapsed. fastx: ">1-55", sequence 1 has 55 duplicated reads

collapsed

compress logical, default FALSE

prefix character, default "collapsed_" Preﬁx to name of output ﬁle.

Value

invisible(NULL), ﬁles saved to disc in fasta format.

Examples

fastq.folder <- tempdir() # <- Your fastq files

infiles <- dir(fastq.folder, "*.fastq", full.names = TRUE)

# collapse.fastq(infiles)

collapseDuplicatedReads

Collapse duplicated reads

Description

For every GRanges, GAlignments read, with the same: seqname, start, (cigar) / width and strand,

collapse and give a new meta column called "score", which contains the number of duplicates of

that read. If score column already exists, will return input object!

Usage

collapseDuplicatedReads(x, addScoreColumn = TRUE, ...)

Arguments

x a GRanges, GAlignments or GAlignmentPairs object

addScoreColumn logical, default: (TRUE), if FALSE, only collapse and not keep score column of

counts for collapsed reads. Returns directly without collapsing if reuse.score.column

is FALSE and score is already deﬁned.

... alternative arguments for class instances. For example, see: ?'collapseDuplicatedReads,GRanges-method'

Value

a GRanges, GAlignments, GAlignmentPairs or data.table object, same as input

Examples

gr <- rep(GRanges("chr1", 1:10,"+"), 2)

collapseDuplicatedReads(gr)

38 collapseDuplicatedReads,data.table-method

collapseDuplicatedReads,data.table-method

Collapse duplicated reads

Description

For every GRanges, GAlignments read, with the same: seqname, start, (cigar) / width and strand,

collapse and give a new meta column called "score", which contains the number of duplicates of

that read. If score column already exists, will return input object!

Usage

## S4 method for signature 'data.table'

collapseDuplicatedReads(

addScoreColumn = TRUE,

addSizeColumn = FALSE,

reuse.score.column = TRUE,

keepCigar = FALSE

)

Arguments

x a GRanges, GAlignments or GAlignmentPairs object

addScoreColumn logical, default: (TRUE), if FALSE, only collapse and not keep score column of

counts for collapsed reads. Returns directly without collapsing if reuse.score.column

is FALSE and score is already deﬁned.

addSizeColumn logical (FALSE), if TRUE, add a size column that for each read, that gives orig-

inal width of read. Useful if you need original read lengths. This takes care of

soft clips etc. If collapsing reads, each unique range will be grouped also by

size.

reuse.score.column

logical (TRUE), if addScoreColumn is TRUE, and a score column exists, will

sum up the scores to create a new score. If FALSE, will skip old score column

and create new according to number of replicated reads after conversion. If

addScoreColumn is FALSE, this argument is ignored.

keepCigar logical, default FALSE. Keep the cigar information

Value

a GRanges, GAlignments, GAlignmentPairs or data.table object, same as input

Examples

gr <- rep(GRanges("chr1", 1:10,"+"), 2)

collapseDuplicatedReads(gr)

collapseDuplicatedReads,GAlignmentPairs-method 39

collapseDuplicatedReads,GAlignmentPairs-method

Collapse duplicated reads

Description

For every GRanges, GAlignments read, with the same: seqname, start, (cigar) / width and strand,

collapse and give a new meta column called "score", which contains the number of duplicates of

that read. If score column already exists, will return input object!

Usage

## S4 method for signature 'GAlignmentPairs'

collapseDuplicatedReads(x, addScoreColumn = TRUE)

Arguments

x a GRanges, GAlignments or GAlignmentPairs object

addScoreColumn logical, default: (TRUE), if FALSE, only collapse and not keep score column of

counts for collapsed reads. Returns directly without collapsing if reuse.score.column

is FALSE and score is already deﬁned.

Value

a GRanges, GAlignments, GAlignmentPairs or data.table object, same as input

Examples

gr <- rep(GRanges("chr1", 1:10,"+"), 2)

collapseDuplicatedReads(gr)

collapseDuplicatedReads,GAlignments-method

Collapse duplicated reads

Description

For every GRanges, GAlignments read, with the same: seqname, start, (cigar) / width and strand,

collapse and give a new meta column called "score", which contains the number of duplicates of

that read. If score column already exists, will return input object!

Usage

## S4 method for signature 'GAlignments'

collapseDuplicatedReads(x, addScoreColumn = TRUE, reuse.score.column = TRUE)

40 collapseDuplicatedReads,GRanges-method

Arguments

x a GRanges, GAlignments or GAlignmentPairs object

addScoreColumn logical, default: (TRUE), if FALSE, only collapse and not keep score column of

counts for collapsed reads. Returns directly without collapsing if reuse.score.column

is FALSE and score is already deﬁned.

reuse.score.column

logical (TRUE), if addScoreColumn is TRUE, and a score column exists, will

sum up the scores to create a new score. If FALSE, will skip old score column

and create new according to number of replicated reads after conversion. If

addScoreColumn is FALSE, this argument is ignored.

Value

a GRanges, GAlignments, GAlignmentPairs or data.table object, same as input

Examples

gr <- rep(GRanges("chr1", 1:10,"+"), 2)

collapseDuplicatedReads(gr)

collapseDuplicatedReads,GRanges-method

Collapse duplicated reads

Description

For every GRanges, GAlignments read, with the same: seqname, start, (cigar) / width and strand,

collapse and give a new meta column called "score", which contains the number of duplicates of

that read. If score column already exists, will return input object!

Usage

## S4 method for signature 'GRanges'

collapseDuplicatedReads(

addScoreColumn = TRUE,

addSizeColumn = FALSE,

reuse.score.column = TRUE

)

Arguments

x a GRanges, GAlignments or GAlignmentPairs object

addScoreColumn logical, default: (TRUE), if FALSE, only collapse and not keep score column of

counts for collapsed reads. Returns directly without collapsing if reuse.score.column

is FALSE and score is already deﬁned.

combn.pairs 41

addSizeColumn logical (FALSE), if TRUE, add a size column that for each read, that gives orig-

inal width of read. Useful if you need original read lengths. This takes care of

soft clips etc. If collapsing reads, each unique range will be grouped also by

size.

reuse.score.column

logical (TRUE), if addScoreColumn is TRUE, and a score column exists, will

sum up the scores to create a new score. If FALSE, will skip old score column

and create new according to number of replicated reads after conversion. If

addScoreColumn is FALSE, this argument is ignored.

Value

a GRanges, GAlignments, GAlignmentPairs or data.table object, same as input

Examples

gr <- rep(GRanges("chr1", 1:10,"+"), 2)

collapseDuplicatedReads(gr)

combn.pairs Create all unique combinations pairs possible

Description

Given a character vector, get all unique combinations of 2.

Usage

combn.pairs(x)

Arguments

x a character vector, will unique elements for you.

Value

a list of character vector pairs

Examples

df <- ORFik.template.experiment()

ORFik:::combn.pairs(df[, "libtype"])

42 computeFeatures

computeFeatures Get all main features in ORFik

Description

If you want to get all the NGS and/or sequence features easily, you can use this function. Each

feature have a link to an article describing its creation and idea behind it. Look at the functions in

the feature family (in the "see also" section below) to see all of them. Example, if you want to know

what the "te" column is, check out: ?translationalEff.

A short description of each feature is also shown here:

** NGS features ** If not stated otherwise stated, the feature apply to Ribo-seq.

• countRFP : raw counts of Ribo-seq

• fpkmRFP : FPKM

• fpkmRNA : FPKM of RNA-seq

• te : Translation efﬁciency Ribo-seq / RNA-seq FPKM

• ﬂoss : Fragment length similarity score

• entropyRFP : Positional entropy

• disengagementScores : downstream coverage from ORF

• RRS: Ribosome release score

• RSS: Ribosome staling score

• ORFScores: Periodicity score, does frame 0 have more reads

• ioScore: inside outside score: coverage ORF / coverage rest of transcript

• startCodonCoverage: Coverage over start codon + 2nt before start codon

• startRegionCoverage: Coverage over codon 2 & 3

• startRegionRelative: Peakness of TIS, startCodonCoverage / startRegionCoverage, 0-n

** Sequence features **

• kozak : Similarity to kozak sequence for organism score, 0-1

• gc : GC percentage, 0-1

• StartCodons : Start codon as a string, "ATG"

• StopCodons : stop codon as a string, "TAA"

• fractionLengths : ORF length compared to transcript, 0-1

** uORF features **

• distORFCDS : Distance from ORF stop site to CDS, -n:n

• inFrameCDS : Is ORF in frame with downstream CDS, T/F

• isOverlappingCds : Is ORF overlapping with downstream CDS, T/F

• rankInTx : ORF with most upstream start codon is 1, 1-n

computeFeatures 43

Usage

computeFeatures(

grl,

RFP,

RNA = NULL,

Gtf,

faFile = NULL,

riboStart = 26,

riboStop = 34,

sequenceFeatures = TRUE,

uorfFeatures = TRUE,

grl.is.sorted = FALSE,

weight.RFP = 1L,

weight.RNA = 1L

)

Arguments

grl a GRangesList object with usually ORFs, but can also be either leaders, cds’,

3’ utrs, etc. This is the regions you want to score.

RFP RiboSeq reads as GAlignments , GRanges or GRangesList object

RNA RnaSeq reads as GAlignments , GRanges or GRangesList object

Gtf a TxDb object of a gtf ﬁle or path to gtf, gff .sqlite etc.

faFile a path to fasta indexed genome, an open FaFile, a BSgenome, or path to ORFik

experiment with valid genome.

riboStart usually 26, the start of the ﬂoss interval, see ?ﬂoss

riboStop usually 34, the end of the ﬂoss interval

sequenceFeatures

a logical, default TRUE, include all sequence features, that is: Kozak, fraction-

Lengths, distORFCDS, isInFrame, isOverlapping and rankInTx. uorfFeatures =

FALSE will remove the 4 last.

uorfFeatures a logical, default TRUE, include all uORF sequence features, that is: distOR-

FCDS, isInFrame, isOverlapping and rankInTx

grl.is.sorted logical (F), a speed up if you know argument grl is sorted, set this to TRUE.

weight.RFP a vector (default: 1L). Can also be character name of column in RFP. As in trans-

lationalEff(weight = "score") for: GRanges("chr1", 1, "+", score = 5), would

mean score column tells that this alignment region was found 5 times.

weight.RNA Same as weightRFP but for RNA weights. (default: 1L)

Details

If you used CageSeq to reannotate your leaders, your txDB object must contain the reassigned

leaders. Use [reassignTxDbByCage()] to get the txdb.

As a note the library is reduced to only reads overlapping ’tx’, so the library size in fpkm calculation

is done on this subset. This will help remove rRNA and other contaminants.

Also if you have only unique reads with a weight column, explaining the number of duplicated

reads, set weights to make calculations correct. See getWeights

44 computeFeaturesCage

Value

a data.table with scores, each column is one score type, name of columns are the names of the

scores, i.g [ﬂoss()] or [fpkm()]

See Also

Other features: computeFeaturesCage(), countOverlapsW(), disengagementScore(), distToCds(),

distToTSS(), entropy(), floss(), fpkm(), fpkm_calc(), fractionLength(), initiationScore(),

insideOutsideORF(), isInFrame(), isOverlapping(), kozakSequenceScore(), orfScore(),

rankOrder(), ribosomeReleaseScore(), ribosomeStallingScore(), startRegion(), startRegionCoverage(),

stopRegion(), subsetCoverage(), translationalEff()

Examples

# Here we make an example from scratch

# Usually the ORFs are found in orfik, which makes names for you etc.

gtf <- system.file("extdata/Danio_rerio_sample", "annotations.gtf",

package = "ORFik") ## location of the gtf file

suppressWarnings(txdb <- loadTxdb(gtf))

# use cds' as ORFs for this example

ORFs <- loadRegion(txdb, "cds")

ORFs <- makeORFNames(ORFs) # need ORF names

# make Ribo-seq data,

RFP <- unlistGrl(firstExonPerGroup(ORFs))

computeFeatures(ORFs, RFP, Gtf = txdb)

# For more details see vignettes.

computeFeaturesCage Get all main features in ORFik

Description

If you have a txdb with correctly reassigned transcripts, use: [computeFeatures()]

Usage

computeFeaturesCage(

grl,

RFP,

RNA = NULL,

Gtf = NULL,

tx = NULL,

fiveUTRs = NULL,

cds = NULL,

threeUTRs = NULL,

faFile = NULL,

computeFeaturesCage 45

riboStart = 26,

riboStop = 34,

sequenceFeatures = TRUE,

uorfFeatures = TRUE,

grl.is.sorted = FALSE,

weight.RFP = 1L,

weight.RNA = 1L

)

Arguments

grl a GRangesList object with usually ORFs, but can also be either leaders, cds’,

3’ utrs, etc. This is the regions you want to score.

RFP RiboSeq reads as GAlignments , GRanges or GRangesList object

RNA RnaSeq reads as GAlignments , GRanges or GRangesList object

Gtf a TxDb object of a gtf ﬁle or path to gtf, gff .sqlite etc.

tx a GRangesList of transcripts, normally called from: exonsBy(Gtf, by = "tx",

use.names = T) only add this if you are not including Gtf ﬁle If you are using

CAGE, you do not need to reassign these to the cage peaks, it will do it for you.

fiveUTRs ﬁveUTRs as GRangesList, if you used cage-data to extend 5’ utrs, remember to

input CAGE assigned version and not original!

cds a GRangesList of coding sequences

threeUTRs a GRangesList of transcript 3’ utrs, normally called from: threeUTRsByTran-

script(Gtf, use.names = T)

faFile a path to fasta indexed genome, an open FaFile, a BSgenome, or path to ORFik

experiment with valid genome.

riboStart usually 26, the start of the ﬂoss interval, see ?ﬂoss

riboStop usually 34, the end of the ﬂoss interval

sequenceFeatures

a logical, default TRUE, include all sequence features, that is: Kozak, fraction-

Lengths, distORFCDS, isInFrame, isOverlapping and rankInTx. uorfFeatures =

FALSE will remove the 4 last.

uorfFeatures a logical, default TRUE, include all uORF sequence features, that is: distOR-

FCDS, isInFrame, isOverlapping and rankInTx

grl.is.sorted logical (F), a speed up if you know argument grl is sorted, set this to TRUE.

weight.RFP a vector (default: 1L). Can also be character name of column in RFP. As in trans-

lationalEff(weight = "score") for: GRanges("chr1", 1, "+", score = 5), would

mean score column tells that this alignment region was found 5 times.

weight.RNA Same as weightRFP but for RNA weights. (default: 1L)

Details

A specialized version if you don’t have a correct txdb, for example with CAGE reassigned leaders

while txdb is not updated. It is 2x faster for tested data. The point of this function is to give you the

ability to input transcript etc directly into the function, and not load them from txdb. Each feature

have a link to an article describing feature, try ?ﬂoss

46 computeFeaturesCage

Value

a data.table with scores, each column is one score type, name of columns are the names of the

scores, i.g [ﬂoss()] or [fpkm()]

See Also

Other features: computeFeatures(), countOverlapsW(), disengagementScore(), distToCds(),

distToTSS(), entropy(), floss(), fpkm(), fpkm_calc(), fractionLength(), initiationScore(),

insideOutsideORF(), isInFrame(), isOverlapping(), kozakSequenceScore(), orfScore(),

rankOrder(), ribosomeReleaseScore(), ribosomeStallingScore(), startRegion(), startRegionCoverage(),

stopRegion(), subsetCoverage(), translationalEff()

Examples

# a small example without cage-seq data:

# we will find ORFs in the 5' utrs

# and then calculate features on them

if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19")) {

library(GenomicFeatures)

# Get the gtf txdb file

txdbFile <- system.file("extdata", "hg19_knownGene_sample.sqlite",

package = "GenomicFeatures")

txdb <- loadDb(txdbFile)

# Extract sequences of fiveUTRs.

fiveUTRs <- fiveUTRsByTranscript(txdb, use.names = TRUE)[1:10]

faFile <- BSgenome.Hsapiens.UCSC.hg19::Hsapiens

tx_seqs <- extractTranscriptSeqs(faFile, fiveUTRs)

# Find all ORFs on those transcripts and get their genomic coordinates

fiveUTR_ORFs <- findMapORFs(fiveUTRs, tx_seqs)

unlistedORFs <- unlistGrl(fiveUTR_ORFs)

# group GRanges by ORFs instead of Transcripts

fiveUTR_ORFs <- groupGRangesBy(unlistedORFs, unlistedORFs$names)

# make some toy ribo seq and rna seq data

starts <- unlistGrl(ORFik:::firstExonPerGroup(fiveUTR_ORFs))

RFP <- promoters(starts, upstream = 0, downstream = 1)

score(RFP) <- rep(29, length(RFP)) # the original read widths

# set RNA seq to duplicate transcripts

RNA <- unlistGrl(exonsBy(txdb, by = "tx", use.names = TRUE))

#ORFik:::computeFeaturesCage(grl = fiveUTR_ORFs, RFP = RFP,

# RNA = RNA, Gtf = txdb, faFile = faFile)

}

# See vignettes for more examples

conditionNames 47

conditionNames Get condition name variants

Description

Used to standardize nomeclature for experiments.

Example: WT is main naming, but a variant is control control will then be renamed to WT

Usage

conditionNames()

Value

a data.table with 2 columns, the main name, and all name variants of the main name in second

column as a list.

See Also

Other experiment_naming: batchNames(), cellLineNames(), cellTypeNames(), fractionNames(),

inhibitorNames(), libNames(), mainNames(), repNames(), stageNames(), tissueNames()

config Read directory conﬁg for ORFik experiments

Description

Deﬁnes a folder for: 1. fastq ﬁles (raw data)

2. bam ﬁles (processed data)

3. references (organism annotation and STAR index)

4. experiments (Location to store and load all experiment .csv ﬁles) Update or use another conﬁg

using config.save() function.

Usage

config(

file = config_file(old_config_location = old_config_location),

old_config_location = "~/Bio_data/ORFik_config.csv"

)

Arguments

file location of conﬁg csv, default: conﬁg_ﬁle(old_conﬁg_location = old_conﬁg_location)

old_config_location

path, old conﬁg location before BiocFileCache implementation. Will copy this

to cache directory and delete old version. This is done to follow bioc rules on

not writing to user home directory.

48 conﬁg.exper

Value

a named character vector of length 3

Examples

## Make with default config path

#config()

config.exper Set directories for experiment

Description

Deﬁnes a folder for: 1. fastq ﬁles (raw_data)

2. bam ﬁles (processed data)

3. references (organism annotation and STAR index)

4. Experiment (name of experiment)

Usage

config.exper(experiment, assembly, type, config = ORFik::config())

Arguments

experiment short name of experiment (must be valid as a folder name)

assembly name of organism and assembly (must be valid as a folder name)

type name of sequencing type, Ribo-seq, RNA-seq, CAGE.. Can be more than one.

config a named character vector of length 3, default: ORFik::config()

Value

named character vector of paths for experiment

Examples

## Save to default config location

#config.exper("Alexaki_Human", "Homo_sapiens_GRCh38_101", c("Ribo-seq", "RNA-seq"))

conﬁg.save 49

config.save Save/update directory conﬁg for ORFik experiments

Description

Deﬁnes a folder for fastq ﬁles (raw_data), bam ﬁles (processed data) and references (organism

annotation and STAR index)

Usage

config.save(

file = config_file(),

fastq.dir = file.path(base.dir, "raw_data"),

bam.dir = file.path(base.dir, "processed_data"),

reference.dir = file.path(base.dir, "references"),

exp.dir = file.path(base.dir, "ORFik_experiments/"),

base.dir = "~/Bio_data",

conf = data.frame(type = c("fastq", "bam", "ref", "exp"), directory = c(fastq.dir,

bam.dir, reference.dir, exp.dir))

)

Arguments

file location of conﬁg csv, default: conﬁg_ﬁle(old_conﬁg_location = old_conﬁg_location)

fastq.dir directory where ORFik puts fastq ﬁle directories, default: ﬁle.path(base.dir,

"raw_data"), which is retrieved with: config()["fastq"]

bam.dir directory where ORFik puts bam ﬁle directories, default: ﬁle.path(base.dir, "pro-

cessed_data"), which is retrieved with: config()["bam"]

reference.dir directory where ORFik puts reference ﬁle directories, default: ﬁle.path(base.dir,

"references"), which is retrieved with: config()["ref"]

exp.dir directory where ORFik puts experiment csv ﬁles, default: ﬁle.path(base.dir,

"ORFik_experiments/"), which is retrieved with: config()["exp"]

base.dir base directory for all output directories, default: "~/Bio_data"

conf data.frame of complete conf object, default: data.frame(type = c("fastq", "bam",

"ref", "exp"), directory = c(fastq.dir, bam.dir, reference.dir, exp.dir))

Value

invisible(NULL), ﬁle saved to disc

Examples

# Overwrite default config, with new base directory for files

#config.save(base.dir = "/media/Bio_data/") # Output files go here instead

# of ~/Bio_data

## Dont do this, but for understanding here is how to make a second config

50 conﬁg_ﬁle

#new_config_path <- config_file(query = "ORFik_config_2")

#config.save(new_config_path, "/media/Bio_data/raw_data/",

# "/media/Bio_data/processed_data", /media/Bio_data/references/)

config_file Get path for ORFik conﬁg in cache

Description

Get path for ORFik conﬁg in cache

Usage

config_file(

cache = BiocFileCache::getBFCOption("CACHE"),

query = "ORFik_config",

ask = interactive(),

old_config_location = "~/Bio_data/ORFik_config.csv"

)

Arguments

cache path to bioc cache directory with rname from query argument. Default is:

BiocFileCache::getBFCOption("CACHE") For info, see: [BiocFileCache::BiocFileCache()]

query default: "ORFik_conﬁg". Exact rname of the ﬁle in cache.

ask logical, default interactive().

old_config_location

path, old conﬁg location before BiocFileCache implementation. Will copy this

to cache directory and delete old version. This is done to follow bioc rules on

not writing to user home directory.

Value

a ﬁle path in cache

Examples

config_file()

# Another config path

config_file(query = "ORFik_config_2")

convertLibs 51

convertLibs Converted format of NGS libraries

Description

Export as either .ofst, .wig, .bigWig,.bedo (legacy format) or .bedoc (legacy format) ﬁles:

Export ﬁles as .ofst for fastest load speed into R.

Export ﬁles as .wig / bigWig for use in IGV or other genome browsers.

The input ﬁles are checked if they exist from: envExp(df).

Usage

convertLibs(

df,

out.dir = libFolder(df),

addScoreColumn = TRUE,

addSizeColumn = TRUE,

must.overlap = NULL,

method = "None",

type = "ofst",

input.type = "ofst",

reassign.when.saving = FALSE,

envir = envExp(df),

BPPARAM = bpparam()

)

Arguments

df an ORFik experiment

out.dir optional output directory, default: libFolder(df), if it is NULL, it will just reas-

sign R objects to simpliﬁed libraries. Will then create a ﬁnal folder specﬁed as:

paste0(out.dir, "/", type, "/"). Here the ﬁles will be saved in format given by the

type argument.

addScoreColumn logical, default TRUE, if FALSE will not add replicate numbers as score col-

umn, see ORFik::convertToOneBasedRanges.

addSizeColumn logical, default TRUE, if FALSE will not add size (width) as size column, see

ORFik::convertToOneBasedRanges. Does not apply for (GAlignment version

of.ofst) or .bedoc. Since they contain the original cigar.

must.overlap default (NULL), else a GRanges / GRangesList object, so only reads that over-

lap (must.overlap) are kept. This is useful when you only need the reads over

transcript annotation or subset etc.

method character, default "None", the method to reduce ranges, for more info see convertToOneBasedRanges

type character, output format, default "ofst". Alternatives: "ofst", "bigWig", "wig","bedo"

or "bedoc". Which format you want. Will make a folder within out.dir with this

name containing the ﬁles.

52 convertToOneBasedRanges

input.type character, input type "ofst". Remember this function uses the loaded libraries if

existing, so this argument is usually ignored. Only used if ﬁles do not already

exist.

reassign.when.saving

logical, default FALSE. If TRUE, will reassign library to converted form after

saving. Ignored when out.dir = NULL.

envir environment to save to, default envExp(df), which defaults to .GlobalEnv, but

can be set with envExp(df) <- new.env() etc.

BPPARAM how many cores/threads to use? default: bpparam(). To see number of threads

used, do bpparam()$workers. You can also add a time remaining bar, for a

more detailed pipeline.

Details

We advice you to not use this directly, as other function are more safe for library type conversions.

See family description below. This is mostly used internally in ORFik. It is only adviced to use if

large bam ﬁles are already loaded in R and conversions are wanted from those.

See export.ofst, export.wiggle, export.bedo and export.bedoc for information on ﬁle for-

mats.

If libraries of the experiment are already loaded into environment (default: .globalEnv) is will ex-

port using those ﬁles as templates. If they are not in environment the .ofst ﬁles from the bam ﬁles

are loaded (unless you are converting to .ofst then the .bam ﬁles are loaded).

Value

NULL (saves ﬁles to disc or R .GlobalEnv)

See Also

Other lib_converters: convert_bam_to_ofst(), convert_to_bigWig(), convert_to_covRle(),

convert_to_covRleList()

Examples

df <- ORFik.template.experiment()

#convertLibs(df)

# Keep only 5' ends of reads

#convertLibs(df, method = "5prime")

convertToOneBasedRanges

Convert a GRanges Object to 1 width reads

convertToOneBasedRanges 53

Description

There are 5 ways of doing this

1. Take 5’ ends, reduce away rest (5prime)

2. Take 3’ ends, reduce away rest (3prime)

3. Tile to 1-mers and include all (tileAll)

4. Take middle point per GRanges (middle)

5. Get original with metacolumns (None)

You can also do multiple at a time, then output is GRangesList, where each list group is the operation

(5prime is [1], 3prime is [2] etc)

Many other ways to do this have their own functions, like startSites and stopSites etc. To retain

information on original width, set addSizeColumn to TRUE. To compress data, 1 GRanges object

per unique read, set addScoreColumn to TRUE. This will give you a score column with how many

duplicated reads there were in the speciﬁed region.

Usage

convertToOneBasedRanges(

gr,

method = "5prime",

addScoreColumn = FALSE,

addSizeColumn = FALSE,

after.softclips = TRUE,

along.reference = FALSE,

reuse.score.column = TRUE

)

Arguments

gr GRanges, GAlignment or GAlignmentPairs object to reduce.

method character, default "5prime", the method to reduce ranges, see NOTE for more

info.

addScoreColumn logical (FALSE), if TRUE, add a score column that sums up the hits per unique

range. This will make each read unique, so that each read is 1 time, and score

column gives the number of collapsed hits. A useful compression. If add-

SizeColumn is FALSE, it will not differentiate between reads with same start

and stop, but different length. If addSizeColumn is FALSE, it will remove it.

Collapses after conversion.

addSizeColumn logical (FALSE), if TRUE, add a size column that for each read, that gives orig-

inal width of read. Useful if you need original read lengths. This takes care of

soft clips etc. If collapsing reads, each unique range will be grouped also by

size.

after.softclips

logical (TRUE), include softclips in width. Does not apply if along.reference is

TRUE.

along.reference

logical (FALSE), example: The cigar "26MI2" is by default width 28, but if

along.reference is TRUE, it will be 26. The length of the read along the refer-

54 convert_bam_to_ofst

ence. Also "1D20M" will be 21 if by along.reference is TRUE. Intronic regions

(cigar: N) will be removed. So: "1M200N19M" is 20, not 220.

reuse.score.column

logical (TRUE), if addScoreColumn is TRUE, and a score column exists, will

sum up the scores to create a new score. If FALSE, will skip old score column

and create new according to number of replicated reads after conversion. If

addScoreColumn is FALSE, this argument is ignored.

Details

NOTE: Note: For cigar based ranges (GAlignments), the 5’ end is the ﬁrst non clipped base (neither

soft clipped or hard clipped from 5’). This is following the default of bioconductor. For special case

of GAlignmentPairs, 5prime will only use left (ﬁrst) 5’ end and read and 3prime will use only right

(last) 3’ end of read in pair. tileAll and middle can possibly ﬁnd poinst that are not in the reads

since: lets say pair is 1-5 and 10-15, middle is 7, which is not in the read.

Value

Converted GRanges object

See Also

Other utils: bedToGR(), export.bed12(), export.bigWig(), export.fstwig(), export.wiggle(),

fimport(), findFa(), fread.bed(), optimizeReads(), readBam(), readBigWig(), readWig()

Examples

gr <- GRanges("chr1", 1:10,"+")

# 5 prime ends

convertToOneBasedRanges(gr)

# is equal to convertToOneBasedRanges(gr, method = "5prime")

# 3 prime ends

convertToOneBasedRanges(gr, method = "3prime")

# With lengths

convertToOneBasedRanges(gr, addSizeColumn = TRUE)

# With score (# of replicates)

gr <- rep(gr, 2)

convertToOneBasedRanges(gr, addSizeColumn = TRUE, addScoreColumn = TRUE)

convert_bam_to_ofst Convert libraries to ofst

Description

Saved by default in folder "ofst" relative to default libraries of experiment. Speeds up loading of

full ﬁles compared to bam by large margins.

convert_bam_to_ofst 55

Usage

convert_bam_to_ofst(

df,

in_files = filepath(df, "default"),

out_dir = file.path(libFolder(df), "ofst"),

verbose = TRUE,

strandMode = rep(0, length(in_files))

)

Arguments

df an ORFik experiment, or NULL is allowed if both in_ﬁles and out_dir is spec-

iﬁed manually.

in_files paths to input ﬁles, default: filepath(df, "default") with bam format ﬁles.

out_dir paths to output ﬁles, default file.path(libFolder(df), "cov_RLE").

verbose logical, default TRUE, message about library output status.

strandMode numeric, default 0. Only used for paired end bam ﬁles. One of (0: strand

= *, 1: ﬁrst read of pair is +, 2: ﬁrst read of pair is -). See ?strandMode.

Note: Sets default to 0 instead of 1, as readGAlignmentPairs uses 1. This is

to guarantee hits, but will also make mismatches of overlapping transcripts in

opposite directions.

Details

If you want to keep bam ﬁles loaded or faster conversion if you already have them loaded, use

ORFik::convertLibs instead

Value

invisible(NULL), ﬁles saved to disc

See Also

Other lib_converters: convertLibs(), convert_to_bigWig(), convert_to_covRle(), convert_to_covRleList()

Examples

df <- ORFik.template.experiment.zf()

## Usually do default folder, here we use tmpdir

folder_to_save <- file.path(tempdir(), "ofst")

convert_bam_to_ofst(df, out_dir = folder_to_save)

fimport(file.path(folder_to_save, "ribo-seq.ofst"))

56 convert_to_bigWig

convert_to_bigWig Convert to BigWig

Description

Convert to BigWig

Usage

convert_to_bigWig(

df,

in_files = filepath(df, "pshifted"),

out_dir = file.path(libFolder(df), "bigwig"),

split.by.strand = TRUE,

split.by.readlength = FALSE,

seq_info = seqinfo(df),

weight = "score",

is_pre_collapsed = FALSE,

verbose = TRUE

)

Arguments

df an ORFik experiment, or NULL is allowed if both in_ﬁles and out_dir is spec-

iﬁed manually.

in_files paths to input ﬁles, default pshifted ﬁles: filepath(df, "pshifted") in ofst

format

out_dir paths to output ﬁles, default file.path(libFolder(df), "bigwig").

split.by.strand

logical, default TRUE, split into forward and reverse strand RleList inside cov-

Rle object.

split.by.readlength

logical, default FALSE, split into ﬁles for each readlength, deﬁned by read-

Widths(x) for each ﬁle.

seq_info SeqInfo object, default seqinfo(findFa(df))

weight integer, numeric or single length character. Default "score". Use score column

in loaded in_ﬁles.

is_pre_collapsed

logical, default FALSE. Have you already collapsed reads with collapse.by.scores,

so each positions is only in 1 GRanges object with a score column per readlength?

Set to TRUE, only if you are sure, will give a speedup.

verbose logical, default TRUE, message about library output status.

Value

invisible(NULL), ﬁles saved to disc

convert_to_covRle 57

See Also

Other lib_converters: convertLibs(), convert_bam_to_ofst(), convert_to_covRle(), convert_to_covRleList()

Examples

df <- ORFik.template.experiment()[10,]

## Usually do default folder, here we use tmpdir

folder_to_save <- file.path(tempdir(), "bigwig")

convert_to_bigWig(df, out_dir = folder_to_save)

fimport(file.path(folder_to_save, c("RFP_Mutant_rep2_forward.bigWig",

"RFP_Mutant_rep2_reverse.bigWig")))

convert_to_covRle Convert libraries to covRle

Description

Saved by default in folder "cov_RLE" relative to default libraries of experiment

Usage

convert_to_covRle(

df,

in_files = filepath(df, "pshifted"),

out_dir = file.path(libFolder(df), "cov_RLE"),

split.by.strand = TRUE,

split.by.readlength = FALSE,

seq_info = seqinfo(df),

weight = "score",

verbose = TRUE

)

Arguments

df an ORFik experiment, or NULL is allowed if both in_ﬁles and out_dir is spec-

iﬁed manually.

in_files paths to input ﬁles, default pshifted ﬁles: filepath(df, "pshifted") in ofst

format

out_dir paths to output ﬁles, default file.path(libFolder(df), "cov_RLE").

split.by.strand

logical, default TRUE, split into forward and reverse strand RleList inside cov-

Rle object.

split.by.readlength

logical, default FALSE, split into ﬁles for each readlength, deﬁned by read-

Widths(x) for each ﬁle.

seq_info SeqInfo object, default seqinfo(findFa(df))

58 convert_to_covRleList

weight integer, numeric or single length character. Default "score". Use score column

in loaded in_ﬁles.

verbose logical, default TRUE, message about library output status.

Value

invisible(NULL), ﬁles saved to disc

See Also

Other lib_converters: convertLibs(), convert_bam_to_ofst(), convert_to_bigWig(), convert_to_covRleList()

Examples

df <- ORFik.template.experiment()[10,]

## Usually do default folder, here we use tmpdir

folder_to_save <- file.path(tempdir(), "cov_RLE")

convert_to_covRle(df, out_dir = folder_to_save)

fimport(file.path(folder_to_save, "RFP_Mutant_rep2.covrds"))

convert_to_covRleList Convert libraries to covRleList objects

Description

Useful to store reads separated by readlength, for much faster coverage calculation. Saved by

default in folder "cov_RLE_List" relative to default libraries of experiment

Usage

convert_to_covRleList(

df,

in_files = filepath(df, "pshifted"),

out_dir = file.path(libFolder(df), "cov_RLE_List"),

out_dir_merged = file.path(libFolder(df), "cov_RLE"),

split.by.strand = TRUE,

seq_info = seqinfo(df),

weight = "score",

verbose = TRUE

)

Arguments

df an ORFik experiment, or NULL is allowed if both in_ﬁles and out_dir is spec-

iﬁed manually.

in_files paths to input ﬁles, default pshifted ﬁles: filepath(df, "pshifted") in ofst

format

convert_to_fstWig 59

out_dir paths to output ﬁles, default file.path(libFolder(df), "cov_RLE_List").

out_dir_merged character vector of paths, default: file.path(libFolder(df), "cov_RLE").

Paths to merged output ﬁles, Set to NULL to skip making merged covRle.

split.by.strand

logical, default TRUE, split into forward and reverse strand RleList inside cov-

Rle object.

seq_info SeqInfo object, default seqinfo(findFa(df))

weight integer, numeric or single length character. Default "score". Use score column

in loaded in_ﬁles.

verbose logical, default TRUE, message about library output status.

Value

invisible(NULL), ﬁles saved to disc

See Also

Other lib_converters: convertLibs(), convert_bam_to_ofst(), convert_to_bigWig(), convert_to_covRle()

Examples

df <- ORFik.template.experiment()[10,]

## Usually do default folder, here we use tmpdir

folder_to_save <- file.path(tempdir(), "cov_RLE_List")

folder_to_save_merged <- file.path(tempdir(), "cov_RLE")

ORFik:::convert_to_covRleList(df, out_dir = folder_to_save,

out_dir_merged = folder_to_save_merged)

fimport(file.path(folder_to_save, "RFP_Mutant_rep2.covrds"))

convert_to_fstWig Convert to fstwig

Description

Will split ﬁles by chromosome for faster loading for now. This feature might change in the future!

Usage

convert_to_fstWig(

df,

in_files = filepath(df, "pshifted"),

out_dir = file.path(libFolder(df), "fstwig"),

split.by.strand = TRUE,

split.by.readlength = FALSE,

seq_info = seqinfo(df),

weight = "score",

is_pre_collapsed = FALSE,

verbose = TRUE

)

60 correlation.plots

Arguments

df an ORFik experiment, or NULL is allowed if both in_ﬁles and out_dir is spec-

iﬁed manually.

in_files paths to input ﬁles, default pshifted ﬁles: filepath(df, "pshifted") in ofst

format

out_dir paths to output ﬁles, default file.path(libFolder(df), "bigwig").

split.by.strand

logical, default TRUE, split into forward and reverse strand RleList inside cov-

Rle object.

split.by.readlength

logical, default FALSE, split into ﬁles for each readlength, deﬁned by read-

Widths(x) for each ﬁle.

seq_info SeqInfo object, default seqinfo(findFa(df))

weight integer, numeric or single length character. Default "score". Use score column

in loaded in_ﬁles.

is_pre_collapsed

logical, default FALSE. Have you already collapsed reads with collapse.by.scores,

so each positions is only in 1 GRanges object with a score column per readlength?

Set to TRUE, only if you are sure, will give a speedup.

verbose logical, default TRUE, message about library output status.

Value

invisible(NULL), ﬁles saved to disc

correlation.plots Correlation plots between all samples

Description

Get correlation plot of raw counts and/or log2(count + 1) over selected region in: c("mrna", "lead-

ers", "cds", "trailers")

Note on correlation: Pearson correlation, using pairwise observations to ﬁll in NA values for the

covariance matrix.

Usage

correlation.plots(

df,

output.dir,

region = "mrna",

type = "fpkm",

height = 400,

correlation.plots 61

width = 400,

size = 0.15,

plot.ext = ".pdf",

complex.correlation.plots = TRUE,

data_for_pairs = countTable(df, region, type = type),

as_gg_list = FALSE,

text_size = 4,

method = c("pearson", "spearman")[1]

)

Arguments

df an ORFik experiment

output.dir directory to save to, named : cor_plot, cor_plot_log2 and/or cor_plot_simple

with either .pdf or .png

region a character (default: mrna), make raw count matrices of whole mrnas or one of

(leaders, cds, trailers)

type which value to use, "fpkm", alternative "counts".

height numeric, default 400 (in mm)

width numeric, default 400 (in mm)

size numeric, size of dots, default 0.15. Deprecated.

plot.ext character, default: ".pdf". Alternatives: ".png" or ".jpg".

complex.correlation.plots

logical, default TRUE. Add in addition to simple correlation plot two compu-

tationally heavy dots + correlation plots. Useful for deeper analysis, but takes

longer time to run, especially on low-quality gpu computers. Set to FALSE to

skip these.

data_for_pairs a data.table from ORFik::countTable of counts wanted. Default is fpkm of all

mRNA counts over all libraries.

as_gg_list logical, default FALSE. Return as a list of ggplot objects instead of as a grob.

Gives you the ability to modify plots more directly.

text_size size of correlation numbers

method c("pearson", "spearman")[1]

Value

invisible(NULL) / if as_gg_list is TRUE, return a list of raw plots.

62 cor_plot

cor_plot Get correlation between columns

Description

Get correlation between columns

Usage

cor_plot(

dt_cor,

col = c(low = "blue", high = "red", mid = "white", na.value = "white"),

limit = c(ifelse(min(dt_cor$Cor, na.rm = TRUE) < 0, -1, 0), 1),

midpoint = mean(limit),

label_name = "Pearson\nCorrelation",

text_size = 4,

legend.position = c(0.4, 0.7),

legend.direction = "horizontal"

)

Arguments

dt_cor a data.table, with column Cor

col colors c(low = "blue", high = "red", mid = "white", na.value = "white")

limit default (-1, 1), deﬁned by: c(ifelse(min(dt_cor$Cor, na.rm = TRUE) < 0,

-1, 0), 1)

midpoint midpoint of correlation values in label coloring.

label_name name of correlation method, default "Pearson Correlation" with newline af-

ter Pearson.

text_size size of correlation numbers

legend.position

default c(0.4, 0.7), other: "top", "right",..

legend.direction

default "horizontal", or "vertical"

Value

a ggplot (heatmap)

cor_table 63

cor_table Get correlation between columns

Description

Get correlation between columns

Usage

cor_table(

dt,

method = c("pearson", "spearman")[1],

upper_triangle = TRUE,

decimals = 2,

melt = TRUE,

na.rm.melt = TRUE

)

Arguments

dt a data.table

method c("pearson", "spearman")[1]

upper_triangle logical, default TRUE. Make lower triangle values NA.

decimals numeric, default 2. How many decimals for correlation

melt logical, default TRUE.

na.rm.melt logical, default TRUE. Remove NA values from melted table.

Value

a data.table with 3 columns, Var1, Var2 and Cor

countOverlapsW CountOverlaps with weights

Description

Similar to countOverlaps, but takes an optional weight column. This is usually the score column

Usage

countOverlapsW(query, subject, weight = NULL, ...)

64 countTable

Arguments

query IRanges, IRangesList, GRanges, GRangesList object. Usually transcript a tran-

script region.

subject GRanges, GRangesList, GAlignment, usually reads.

weight (default: NULL), if deﬁned either numeric or character name of valid meta col-

umn in subject. If weight is single numeric, it is used for all. A normall weight

is the score column given as weight = "score". GRanges("chr1", 1, "+", score =

5), would mean score column tells that this alignment region was found 5 times.

... additional arguments passed to countOverlaps/ﬁndOverlaps

Value

a named vector of number of overlaps to subject weigthed by ’weight’ column.

See Also

Other features: computeFeatures(), computeFeaturesCage(), disengagementScore(), distToCds(),

distToTSS(), entropy(), floss(), fpkm(), fpkm_calc(), fractionLength(), initiationScore(),

insideOutsideORF(), isInFrame(), isOverlapping(), kozakSequenceScore(), orfScore(),

rankOrder(), ribosomeReleaseScore(), ribosomeStallingScore(), startRegion(), startRegionCoverage(),

stopRegion(), subsetCoverage(), translationalEff()

Examples

gr1 <- GRanges(seqnames="chr1",

ranges=IRanges(start = c(4, 9, 10, 30),

end = c(4, 15, 20, 31)),

strand="+")

gr2 <- GRanges(seqnames="chr1",

ranges=IRanges(start = c(1, 4, 15, 25),

end = c(2, 4, 20, 26)),

strand=c("+"),

score=c(10, 20, 15, 5))

countOverlaps(gr1, gr2)

countOverlapsW(gr1, gr2, weight = "score")

countTable Extract count table directly from experiment

Description

Used to quickly load pre-created read count tables to R.

If df is experiment: Extracts by getting /QC_STATS directory, and searching for region Requires

ORFikQC to have been run on experiment, to get default count tables!

countTable 65

Usage

countTable(

df,

region = "mrna",

type = "count",

collapse = FALSE,

count.folder = "default"

)

Arguments

df an ORFik experiment or path to folder with countTable, use path if not same

folder as experiment libraries. Will subset to the count tables speciﬁed if df is

experiment. If experiment has 4 rows and you subset it to only 2, then only those

2 count tables will be outputted.

region a character vector (default: "mrna"), make raw count matrices of whole mrnas

or one of (leaders, cds, trailers).

type character, default: "count" (raw counts matrix). Which object type and normal-

ization do you want ? "summarized" (SummarizedExperiment object), "deseq"

(Deseq2 experiment, design will be all valid non-unique columns except repli-

cates, change by using DESeq2::design, normalization alternatives are: "fpkm",

"log2fpkm" or "log10fpkm".

collapse a logical/character (default FALSE), if TRUE all samples within the group SAM-

PLE will be collapsed to one. If "all", all groups will be merged into 1 col-

umn called merged_all. Collapse is deﬁned as rowSum(elements_per_group) /

ncol(elements_per_group)

count.folder character, default "auto" (Use count tables from original bam ﬁles stored in

"QC_STATS", these are like HTseq count tables). To load your custome count

tables from pshifted reads, set to "pshifted" (remember to create the pshifted ta-

bles ﬁrst!). If you have custom ranges, like reads over uORFs stored in a folder

called "/uORFs" relative to the bam ﬁles, set to "uORFs". Always create these

custom count tables with makeSummarizedExperimentFromBam. Always make

the location of the folder directly inside the bam ﬁle directory!

Details

If df is path to folder: Loads the the ﬁle in that directory with the regex region.rds, where region

is what is deﬁned by argument, if multiple exist, see if any start with "countTable_", if so, subset.

If loaded as SummarizedExperiment or deseq, the colData will be made from ORFik.experiment

information.

Value

a data.table/SummarizedExperiment/DESeq object of columns as counts / normalized counts per

library, column name is name of library. Rownames must be unique for now. Might change.

66 countTable_regions

See Also

Other countTable: countTable_regions()

Examples

# Make experiment

df <- ORFik.template.experiment()

# Make QC report to get counts ++ (not needed for this template)

# ORFikQC(df)

# Get count Table of mrnas

# countTable(df, "mrna")

# Get count Table of cds

# countTable(df, "cds")

# Get count Table of mrnas as fpkm values

# countTable(df, "mrna", type = "count")

# Get count Table of mrnas with collapsed replicates

# countTable(df, "mrna", collapse = TRUE)

# Get count Table of mrnas as summarizedExperiment

# countTable(df, "mrna", type = "summarized")

# Get count Table of mrnas as DESeq2 object,

# for differential expression analysis

# countTable(df, "mrna", type = "deseq")

countTable_regions Make a list of count matrices from experiment

Description

By default will make count tables over mRNA, leaders, cds and trailers for all libraries in experi-

ment. region

Usage

countTable_regions(

df,

out.dir = libFolder(df),

longestPerGene = FALSE,

geneOrTxNames = "tx",

regions = c("mrna", "leaders", "cds", "trailers"),

type = "count",

lib.type = "ofst",

weight = "score",

rel.dir = "QC_STATS",

forceRemake = FALSE,

BPPARAM = bpparam()

)

countTable_regions 67

Arguments

df an ORFik experiment

out.dir optional output directory, default: resFolder(df). Will make a folder within

this called "QC_STATS" with all results in this directory. Warning: If you assign

not default path, you will have a hazzle to load ﬁles later. Much easier to load

count tables, statistics, ++ later with default. Update resFolder of df instead if

needed.

longestPerGene a logical (default FALSE), if FALSE all transcript isoforms per gene. Ignored if

"region" is not a character of either: "mRNA","tx", "cds", "leaders" or "trailers".

geneOrTxNames a character vector (default "tx"), should row names keep trancript names ("tx")

or change to gene names ("gene")

regions a character vector, default: c("mrna", "leaders", "cds", "trailers"), make raw

count matrices of whole regions speciﬁed. Can also be a custom GRangesList

of for example uORFs or a subset of cds etc.

type default: "count" (raw counts matrix), alternative is "fpkm", "log2fpkm" or "log10fpkm"

lib.type a character(default: "default"), load ﬁles in experiment or some precomputed

variant, either "ofst", "bedo", "bedoc" or "pshifted". These are made with OR-

Fik:::convertLibs() or shiftFootprintsByExperiment(). Can also be custom user

made folders inside the experiments bam folder.

weight numeric or character, a column to score overlaps by. Default "score", will check

for a metacolumn called "score" in libraries. If not found, will not use weights.

rel.dir relative output directory for out.dir, default: "QC_STATS". For pshifted, write

"pshifted".

forceRemake logical, default FALSE. If TRUE, will not look for existing ﬁle count table ﬁles.

BPPARAM how many cores/threads to use? default: bpparam()

Value

a list of data.table, 1 data.table per region. The regions will be the names the list elements.

See Also

Other countTable: countTable()

Examples

##Make experiment

df <- ORFik.template.experiment()

## Create count tables for all default regions

# countTable_regions(df)

## Pshifted reads (first create pshiftead libs)

# countTable_regions(df, lib.type = "pshifted", rel.dir = "pshifted")

68 coverageByTranscriptW

coverageByTranscriptC coverageByTranscript with coverage input

Description

Extends the function with direct genome coverage input, see coverageByTranscript for original

function.

Usage

coverageByTranscriptC(x, transcripts, ignore.strand = !strandMode(x))

Arguments

x a covRle (one RleList for each strand in object), must have deﬁned and correct

seqlengths in its SeqInfo object.

transcripts GRangesList

ignore.strand a logical (default: length(x) == 1)

Value

Integer Rle of coverage, 1 per transcript

coverageByTranscriptW coverageByTranscript with weights

Description

Extends the function with weights, see coverageByTranscript for original function.

Usage

coverageByTranscriptW(

transcripts,

ignore.strand = FALSE,

weight = 1L,

seqinfo.x.is.correct = FALSE

)

coverageGroupings 69

Arguments

x reads (GRanges, GAlignments)

transcripts GRangesList

ignore.strand a logical (default: FALSE)

weight a vector (default: 1L), if single number applies for all, else it must be the string

name of a deﬁned meta column in "x", that gives number of times a read was

found. GRanges("chr1", 1, "+", score = 5), would mean score column tells that

this alignment was found 5 times.

seqinfo.x.is.correct

logical, default FALSE. If you know x, has correct seqinfo, then you can save

some computation time by setting this to TRUE.

Value

Integer Rle of coverage, 1 per transcript

coverageGroupings Get grouping for a coverage table in ORFik

Description

Either of two groupings: GF: Gene, fraction FGF: Fraction, position, feature It ﬁnds which of these

exists, and auto groups

Usage

coverageGroupings(logicals, grouping = "GF")

Arguments

logicals size 2 logical vector, the is.null checks for each column,

grouping which grouping to perform, default "GF" Gene & Fraction grouping. Alternative

"FGF", Fraction & position & feature.

Details

Normally not used directly!

Value

a quote of the grouping to pass to data.table

70 coverageHeatMap

coverageHeatMap Create a heatmap of coverage

Description

Creates a ggplot representing a heatmap of coverage:

• Rows : Position in region

• Columns : Read length

• Index intensity : (color) coverage scoring per index.

Coverage rows in heat map is fraction, usually fractions is divided into unique read lengths (standard

Illumina is 76 unique widths, with some minimum cutoff like 15.) Coverage column in heat map is

score, default zscore of counts. These are the relative positions you are plotting to. Like +/- relative

to TIS or TSS.

Usage

coverageHeatMap(

coverage,

output = NULL,

scoring = "zscore",

legendPos = "right",

addFracPlot = FALSE,

xlab = "Position relative to start site",

ylab = "Protected fragment length",

colors = "default",

title = NULL,

increments.y = "auto",

gradient.max = max(coverage$score)

)

Arguments

coverage a data.table, e.g. output of scaledWindowCoverage

output character string (NULL), if set, saves the plot as pdf or png to path given. If no

format is given, is save as pdf.

scoring character vector, default "zscore", Which scoring did you use to create? either

of zscore, transcriptNormalized, sum, mean, median, .. see ?coverageScorings

for info and more alternatives.

legendPos a character, Default "right". Where should the ﬁll legend be ? ("top", "bottom",

"right", "left")

addFracPlot Add margin histogram plot on top of heatmap with fractions per positions

xlab the x-axis label, default "Position relative to start site"

coverageHeatMap 71

ylab the y-axis label, default "Protected fragment length"

colors character vector, default: "default", this gives you: c("white", "yellow2", "yel-

low3", "lightblue", "blue", "navy"), do "high" for more high contrasts, or specify

your own colors.

title a character, default NULL (no title), what is the top title of plot?

increments.y increments of y axis, default "auto". Or a numeric value < max position & >

min position.

gradient.max numeric, defualt: max(coverage$score). What data value should the top color be

? Good to use if you want to compare 2 samples, with the same color intensity,

in that case set this value to the max score of the 2 coverage tables.

Details

Colors: Remember if you want to change anything like colors, just return the ggplot object, and

reassign like: obj + scale_color_brewer() etc. Standard colors are:

• 0 reads in whole readlength :gray

• few reads in position :white

• medium reads in position :yellow

• many reads in position :dark blue

Value

a ggplot object of the coverage plot, NULL if output is set, then the plot will only be saved to

location.

See Also

Other heatmaps: heatMapL(), heatMapRegion(), heatMap_single()

Other coveragePlot: pSitePlot(), savePlot(), windowCoveragePlot()

Examples

# An ORF

grl <- GRangesList(tx1 = GRanges("1", IRanges(1, 6), "+"))

# Ribo-seq reads

range <- IRanges(c(rep(1, 3), 2, 3, rep(4, 2), 5, 6), width = 1 )

reads <- GRanges("1", range, "+")

reads$size <- c(rep(28, 5), rep(29, 4)) # read size

coverage <- windowPerReadLength(grl, reads = reads, upstream = 0,

downstream = 5)

coverageHeatMap(coverage)

# With top sum bar

coverageHeatMap(coverage, addFracPlot = TRUE)

# See vignette for more examples

72 coveragePerTiling

coveragePerTiling Get coverage per group

Description

It tiles each GRangesList group to width 1, and ﬁnds hits per position.

A range from 1:5 will split into c(1,2,3,4,5) and count hits on each. This is a safer speedup of

coverageByTranscript from GenomicFeatures. It also gives the possibility to return as data.table,

for faster computations.

Usage

coveragePerTiling(

grl,

reads,

is.sorted = FALSE,

keep.names = TRUE,

as.data.table = FALSE,

withFrames = FALSE,

weight = "score",

drop.zero.dt = FALSE,

fraction = NULL

)

Arguments

grl a GRangesList of 5’ utrs, CDS, transcripts, etc.

reads a GAlignments, GRanges, or precomputed coverage as covRle (one for each

strand) of RiboSeq, RnaSeq etc.

Weigths for scoring is default the ’score’ column in ’reads’. Can also be random

access paths to bigWig or fstwig ﬁle. Do not use random access for more than a

few genes, then loading the entire ﬁles is usually better. File streaming is still in

beta, so use with care!

is.sorted logical (FALSE), is grl sorted. That is + strand groups in increasing ranges

(1,2,3), and - strand groups in decreasing ranges (3,2,1)

keep.names logical (TRUE), keep names or not. If as.data.table is TRUE, names (genes

column) will be a factor column, if FALSE it will be an integer column (index

of gene), so ﬁrst input grl element is 1. Dropping names gives ~ 20 % speedup.

If drop.zero.dt is FALSE, data.table will not return names, will use index (to

avoid memory explosion).

as.data.table a logical (FALSE), return as data.table with 2 columns, position and count.

withFrames a logical (FALSE), only available if as.data.table is TRUE, return the ORF

frame, 1,2,3, where position 1 is 1, 2 is 2 and 4 is 1 etc.

coveragePerTiling 73

weight (default: ’score’), if deﬁned a character name of valid meta column in subject.

GRanges("chr1", 1, "+", score = 5), would mean score column tells that this

alignment region was found 5 times. ORFik ofst, bedoc and .bedo ﬁles contains

a score column like this. As do CAGEr CAGE ﬁles and many other package

formats. You can also assign a score column manually.

drop.zero.dt logical FALSE, if TRUE and as.data.table is TRUE, remove all 0 count posi-

tions. This greatly speeds up and most importantly, greatly reduces memory

usage. Will not change any plots, unless 0 positions are used in some sense.

(mean, median, zscore coverage will only scale differently)

fraction integer or character, a description column. Useful for grouping multiple outputs

together. If returned as Rle, this is added as: metadata(coverage) <- list(fraction

= fraction). If as.data.table it will be added as an additional column.

Details

NOTE: If reads contains a $score column, it will presume that this is the number of replicates per

reads, weights for the coverage() function. So delete the score column or set weight to something

else if this is not wanted.

Value

a numeric RleList, one numeric-Rle per group with # of hits per position. Or data.table if as.data.table

is TRUE, with column names c("count" [numeric or integer], "genes" [integer], "position" [integer])

See Also

Other ExtendGenomicRanges: asTX(), extendLeaders(), extendTrailers(), reduceKeepAttr(),

tile1(), txSeqsFromFa(), windowPerGroup()

Examples

ORF <- GRanges(seqnames = "1",

ranges = IRanges(start = c(1, 10, 20),

end = c(5, 15, 25)),

strand = "+")

grl <- GRangesList(tx1_1 = ORF)

RFP <- GRanges("1", IRanges(25, 25), "+")

coveragePerTiling(grl, RFP, is.sorted = TRUE)

# now as data.table with frames

coveragePerTiling(grl, RFP, is.sorted = TRUE, as.data.table = TRUE,

withFrames = TRUE)

# With score column (usually replicated reads on that position)

RFP <- GRanges("1", IRanges(25, 25), "+", score = 5)

dt <- coveragePerTiling(grl, RFP, is.sorted = TRUE,

as.data.table = TRUE, withFrames = TRUE)

class(dt$count) # numeric

# With integer score column (faster and less space usage)

RFP <- GRanges("1", IRanges(25, 25), "+", score = 5L)

dt <- coveragePerTiling(grl, RFP, is.sorted = TRUE,

as.data.table = TRUE, withFrames = TRUE)

74 coverageScorings

class(dt$count) # integer

coverageScorings Add a coverage scoring scheme

Description

Different scorings and groupings of a coverage representation.

Usage

coverageScorings(coverage, scoring = "zscore", copy.dt = TRUE)

Arguments

coverage a data.table containing at least columns (count, position), it is possible to have

additionals: (genes, fraction, feature)

scoring a character, one of (zscore, transcriptNormalized, mean, median, sum, log2sum,

log10sum, sumLength, meanPos and frameSum, periodic, NULL). More info in

details

copy.dt logical TRUE, copy object, to avoid overwriting original object. Set to false to

run function using reference to object, a speed up if original object is not needed.

Details

Usually output of metaWindow or scaledWindowPositions is input in this function.

Content of coverage data.table: It must contain the count and position columns.

genes column: If you have multiple windows, the genes column must deﬁne which gene/transcript

grouping the different counts belong to. If there is only a meta window or only 1 gene/transcript,

then this column is not needed.

fraction column: If you have coverage of i.e RNA-seq and Ribo-seq, or TCP -seq of large and small

subunite, divide into fractions. Like factor(RNA, RFP)

feature column: If gene group is subdivided into parts, like gene is transcripts, and feature column

can be c(leader, cds, trailer) etc.

Given a data.table coverage of counts, add a scoring scheme. per: the grouping given, if genes is

deﬁned, group by per gene in default scoring.

Scorings:

• zscore (count-windowMean)/windowSD per)

• transcriptNormalized (sum(count / sum of counts per))

• mean (mean(count per))

• median (median(count per))

coverage_to_dt 75

• sum (count per)

• log2sum (count per)

• log10sum (count per)

• sumLength (count per) / number of windows

• meanPos (mean per position per gene) used in scaledWindowPositions

• sumPos (sum per position per gene) used in scaledWindowPositions

• frameSum (sum per frame per gene) used in ORFScore

• frameSumPerL (sum per frame per read length)

• frameSumPerLG (sum per frame per read length per gene)

• fracPos (fraction of counts per position per gene)

• periodic (Fourier transform periodicity of meta coverage per fraction)

• NULL (no grouping, return input directly)

Value

a data.table with new scores (size dependent on score used)

See Also

Other coverage: metaWindow(), regionPerReadLength(), scaledWindowPositions(), windowPerReadLength()

Examples

dt <- data.table::data.table(count = c(4, 1, 1, 4, 2, 3),

position = c(1, 2, 3, 4, 5, 6))

coverageScorings(dt, scoring = "zscore")

# with grouping gene

dt$genes <- c(rep("tx1", 3), rep("tx2", 3))

coverageScorings(dt, scoring = "zscore")

coverage_to_dt Convert coverage RleList to data.table

Description

Convert coverage RleList to data.table

76 covRle

Usage

coverage_to_dt(

coverage,

keep.names = TRUE,

withFrames = FALSE,

weight = "score",

drop.zero.dt = FALSE,

fraction = NULL

)

Arguments

coverage RleList with names

keep.names logical (TRUE), keep names or not. If as.data.table is TRUE, names (genes

column) will be a factor column, if FALSE it will be an integer column (index

of gene), so ﬁrst input grl element is 1. Dropping names gives ~ 20 % speedup.

If drop.zero.dt is FALSE, data.table will not return names, will use index (to

avoid memory explosion).

withFrames a logical (FALSE), only available if as.data.table is TRUE, return the ORF

frame, 1,2,3, where position 1 is 1, 2 is 2 and 4 is 1 etc.

weight (default: ’score’), if deﬁned a character name of valid meta column in subject.

GRanges("chr1", 1, "+", score = 5), would mean score column tells that this

alignment region was found 5 times. ORFik ofst, bedoc and .bedo ﬁles contains

a score column like this. As do CAGEr CAGE ﬁles and many other package

formats. You can also assign a score column manually.

drop.zero.dt logical FALSE, if TRUE and as.data.table is TRUE, remove all 0 count posi-

tions. This greatly speeds up and most importantly, greatly reduces memory

usage. Will not change any plots, unless 0 positions are used in some sense.

(mean, median, zscore coverage will only scale differently)

fraction integer or character, a description column. Useful for grouping multiple outputs

together. If returned as Rle, this is added as: metadata(coverage) <- list(fraction

= fraction). If as.data.table it will be added as an additional column.

Value

a data.table with column names c("count" [numeric or integer], "genes" [integer], "position" [inte-

ger])

covRle Coverage Rlelist for both strands

Description

Coverage Rlelist for both strands

covRle-class 77

Usage

covRle(forward = RleList(), reverse = RleList())

Arguments

forward a RleList with deﬁned seqinfo for forward strand counts

reverse a RleList with deﬁned seqinfo for reverse strand counts

Value

a covRle object

See Also

Other covRLE: covRle-class, covRleFromGR(), covRleList, covRleList-class

Examples

covRle()

covRle(RleList(), RleList())

chr_rle <- RleList(chr1 = Rle(c(1,2,3), c(1,2,3)))

covRle(chr_rle, chr_rle)

covRle-class Coverage Rle for both strands or single

Description

Given a run of coverage(x) where x are reads, this class combines the 2 strands into 1 object

Value

a covRLE object

See Also

Other covRLE: covRle, covRleFromGR(), covRleList, covRleList-class

78 covRleFromGR

covRleFromGR Convert GRanges to covRle

Description

Convert GRanges to covRle

Usage

covRleFromGR(x, weight = "AUTO", ignore.strand = FALSE)

Arguments

x a GRanges, GAlignment or GAlignmentPairs object. Note that coverage cal-

culation for GAlignment is slower, so usually best to call convertToOneBase-

dRanges on GAlignment object to speed it up.

weight default "AUTO", pick ’score’ column if exist, else all are 1L. Can also be a

manually assigned meta column like ’score2’ etc.

ignore.strand logical, default FALSE.

Value

covRle object

See Also

Other covRLE: covRle, covRle-class, covRleList, covRleList-class

Examples

seqlengths <- as.integer(c(200, 300))

names(seqlengths) <- c("chr1", "chr2")

gr <- GRanges(seqnames = c("chr1", "chr1", "chr2", "chr2"),

ranges = IRanges(start = c(10, 50, 100, 150), end = c(40, 80, 129, 179)),

strand = c("+", "+", "-", "-"), seqlengths = seqlengths)

cov_both_strands <- covRleFromGR(gr)

cov_both_strands

cov_ignore_strand <- covRleFromGR(gr, ignore.strand = TRUE)

cov_ignore_strand

strandMode(cov_both_strands)

strandMode(cov_ignore_strand)

covRleList 79

covRleList Coverage Rlelist for both strands

Description

Coverage Rlelist for both strands

Usage

covRleList(list, fraction = names(list))

Arguments

list a list or List of covRle objects of equal length and lengths

fraction character, default names(list). Names to elements of list, can be integers, as

readlengths etc.

Value

a covRleList object

See Also

Other covRLE: covRle, covRle-class, covRleFromGR(), covRleList-class

Examples

covRleList(List(covRle()))

covRleList-class List of covRle

Description

Given a run of coverage(x) where x are reads, this covRle combines the 2 strands into 1 object This

list can again combine these into 1 object, with accession functions and generalizations.

Value

a covRleList object

See Also

Other covRLE: covRle, covRle-class, covRleFromGR(), covRleList

80 create.experiment

create.experiment Create an ORFik experiment

Description

Create a single R object that stores and controls all results relevant to a speciﬁc Next generation

sequencing experiment. Click the experiment link above in the title if you are not sure what an

ORFik experiment is.

By using ﬁles in a folder / folders. It will make an experiment table with information per sam-

ple, this object allows you to use the extensive API in ORFik that works on experiments.

Information Auto-detection:

There will be several columns you can ﬁll in, when creating the object, if the ﬁles have logical names

like (RNA-seq_WT_rep1.bam) it will try to auto-detect the most likely values for the columns. Like

if it is RNA-seq or Ribo-seq, Wild type or mutant, is this replicate 1 or 2 etc.

You will have to ﬁll in the details that were not auto detected. Easiest way to ﬁll in the blanks are in

a csv editor like libre Ofﬁce or excel. You can also remake the experiment and specify the speciﬁc

column manually. Remember that each row (sample) must have a unique combination of values.

An extra column called "reverse" is made if there are paired data, like +/- strand wig ﬁles.

Usage

create.experiment(

dir,

exper,

saveDir = ORFik::config()["exp"],

txdb = "",

fa = "",

organism = "",

assembly = "",

pairedEndBam = FALSE,

viewTemplate = FALSE,

types = c("bam", "bed", "wig", "ofst"),

libtype = "auto",

stage = "auto",

rep = "auto",

condition = "auto",

fraction = "auto",

author = "",

files = findLibrariesInFolder(dir, types, pairedEndBam),

result_folder = NULL,

runIDs = extract_run_id(files)

)

create.experiment 81

Arguments

dir Which directory / directories to create experiment from, must be a directory

with NGS data from your experiment. Will include all ﬁles of ﬁle type speciﬁed

by "types" argument. So do not mix ﬁles from other experiments in the same

folder!

exper Short name of experiment. Will be name used to load experiment, and name

shown when running list.experiments

saveDir Directory to save experiment csv ﬁle, default: ORFik::config()["exp"], which

has default: "~/Bio_data/ORFik_experiments/". Set to NULL if you don’t want

to save it to disc.

txdb A path to TxDb (prefered) or gff/gtf (not adviced, slower) ﬁle with transcriptome

annotation for the organism.

fa A path to fasta genome/sequences used for libraries, remember the ﬁle must

have a fasta index too.

organism character, default: "" (no organism set), scientiﬁc name of organism. Homo

sapiens, Danio rerio, Rattus norvegicus etc. If you have a SRA metadata csv

ﬁle, you can set this argument to study$ScientiﬁcName[1], where study is the

SRA metadata for all ﬁles that was aligned.

assembly character, default: "" (no assembly set). The genome assembly name, like

GRCh38 etc. Useful to add if you want detailed metadata of experiment analy-

sis.

pairedEndBam logical FALSE, else TRUE, or a logical list of TRUE/FALSE per library you see

will be included (run ﬁrst without and check what order the ﬁles will come in)

1 paired end ﬁle, then two single will be c(T, F, F). If you have a SRA metadata

csv ﬁle, you can set this argument to study$LibraryLayout == "PAIRED", where

study is the SRA metadata for all ﬁles that was aligned.

viewTemplate run View() on template when ﬁnished, default (FALSE). Usually gives you a

better view of result than using print().

types Default c("bam", "bed", "wig", "ofst"), which types of libraries to allow as

NGS data.

libtype character, default "auto". Library types, must be length 1 or equal length of

number of libraries. "auto" means ORFik will try to guess from ﬁle names.

Example: RFP (Ribo-seq), RNA (RNA-seq), CAGE, SSU (TCP-seq 40S), LSU

(TCP-seq 80S).

stage character, default "auto". Developmental stage, tissue or cell line, must be length

1 or equal length of number of libraries. "auto" means ORFik will try to guess

from ﬁle names. Example: HEK293 (Cell line), Sphere (zebraﬁsh stage), ovary

(Tissue).

rep character, default "auto". Replicate numbering, must be length 1 or equal length

of number of libraries. "auto" means ORFik will try to guess from ﬁle names.

Example: 1 (rep 1), 2 rep(2). Insert only numbers here!

condition character, default "auto". Library conditions, must be length 1 or equal length

of number of libraries. "auto" means ORFik will try to guess from ﬁle names.

Example: WT (wild type), mutant, etc.

82 create.experiment

fraction character, default "auto". Fractionation of library, must be length 1 or equal

length of number of libraries. "auto" means ORFik will try to guess from ﬁle

names. This columns is used to make experiment unique, if the other columns

are not sufﬁcient. Example: cyto (cytosolic fraction), dmso (dmso treated frac-

tion), etc.

author character, default "". Main author of experiment, usually last name is enough.

When printing will state "author et al" in info.

files character vector or data.table of library paths in dir. Default: findLibrariesInFolder(dir,

types, pairedEndBam). Do not touch unless you want to do some subsetting,

it will automatically remove ﬁles that are not of ﬁle format deﬁned by ’type’

argument. Note that sorting on number that: 10 is before 2, so 1, 2, 10, is

sorted as: 1, 10, 2. If you want to ﬁx this, you could update this argument with:

ORFik:::ﬁndLibrariesInFolder()[1,3,2] to get order back to 1,2,10 etc.

result_folder character, default NULL. The folder to output analysis results like QC, count

tables etc. By default the libFolder(df) folder is used, the folder of ﬁrst library

in experiment. If you are making a new experiment which is a collection of other

experiments, set this to a new folder, to not contaminate your other experiment

directories.

runIDs character ids, usually SRR, ERR, or DRR identiﬁers, default is to search for any

of these 3 in the ﬁlename by: extract_run_id(files). They are optional.

Value

a data.frame, NOTE: this is not a ORFik experiment, only a template for it!

See Also

Other ORFik_experiment: ORFik.template.experiment(), ORFik.template.experiment.zf(),

bamVarName(), experiment-class, filepath(), libraryTypes(), organism,experiment-method,

outputLibs(), read.experiment(), save.experiment(), validateExperiments()

Examples

# 1. Pick directory

dir <- system.file("extdata/Homo_sapiens_sample", "", package = "ORFik")

# 2. Pick an experiment name

exper <- "ORFik"

# 3. Pick .gff/.gtf location

txdb <- system.file("extdata/Homo_sapiens_sample", "Homo_sapiens_dummy.gtf.db", package = "ORFik")

# 4. Pick fasta genome of organism

fa <- system.file("extdata/Homo_sapiens_sample", "Homo_sapiens_dummy.fasta", package = "ORFik")

# 5. Set organism (optional)

org <- "Homo sapiens"

# Create temple not saved on disc yet:

template <- create.experiment(dir = dir, exper, txdb = txdb,

saveDir = NULL,

fa = fa, organism = org,

viewTemplate = FALSE)

deﬁneIsoform 83

## Now fix non-unique rows: either is libre office, microsoft excel, or in R

template$X5[6] <- "heart"

# read experiment (if you set correctly)

df <- read.experiment(template)

# Save with: save.experiment(df, file = "path/to/save/experiment.csv")

## Create and save experiment directly:

## Default location: "~/Bio_data/ORFik_experiments/"

#template <- create.experiment(dir = dir, exper, txdb = txdb,

# fa = fa, organism = org,

# viewTemplate = FALSE)

## Custom location (If you work in a team, use a shared folder)

#template <- create.experiment(dir = dir, exper, txdb = txdb,

# saveDir = "~/MY/CUSTOME/LOCATION",

# fa = fa, organism = org,

# viewTemplate = FALSE)

defineIsoform Overlaps GRanges object with provided annotations.

Description

Overlaps GRanges object with provided annotations.

Usage

defineIsoform(

rel_orf,

tran,

isoform_names = c("perfect_match", "elong_START_match", "trunc_START_match",

"elong_STOP_match", "trunc_STOP_match", "overlap_inside", "overlap_both",

"overlap_upstream", "overlap_downstream", "upstream", "downstram", "none")

)

Arguments

rel_orf - GRanges object of your ORF.

tran - GRanges object of annotation (transcript or cds) that overlapped in some way

rel_orf.

isoform_names - A vector of strings that will be used instead of these defaults: ’perfect_match’

- start and stop matches the tran object strand wise ’elong_START_match’ -

rel_orf is extension from the STOP side of the tran ’trunc_START_match’ -

rel_orf is truncation from the STOP side of the tran ’elong_STOP_match’ -

rel_orf is extension from the START side of the tran ’trunc_STOP_match’ -

rel_orf is truncation from the START side of the tran ’overlap_inside’ - rel_orf

is inside tran object ’overlap_both’ - rel_orf contains tran object inside ’over-

lap_upstream’ - rel_orf is overlaping upstream part of the tran ’overlap_downstream’

84 deﬁneTrailer

- rel_orf is overlaping downstream part of the tran ’upstream’ - rel_orf is up-

stream towards the tran ’downstream’ - rel_orf is downstream towards the tran

’none’ - when none of the above options is true

Value

A string object of deﬁned isoform towards transcript.

defineTrailer Deﬁnes trailers for ORF.

Description

Creates GRanges object as a trailer for ORFranges representing ORF, maintaining restrictions of

transcriptRanges. Assumes that ORFranges is on the transcriptRanges, strands and seqlevels are in

agreement. When lengthOFtrailer is smaller than space left on the transcript than all available space

is returned as trailer.

Usage

defineTrailer(ORFranges, transcriptRanges, lengthOftrailer = 200)

Arguments

ORFranges GRanges object of your Open Reading Frame.

transcriptRanges

GRanges object of transtript.

lengthOftrailer

Numeric. Default is 10.

Details

It assumes that ORFranges and transcriptRanges are not sorted when on minus strand. Should be

like: (200, 600) (50, 100)

Value

A GRanges object of trailer.

See Also

Other ORFHelpers: longestORFs(), mapToGRanges(), orfID(), startCodons(), startSites(),

stopCodons(), stopSites(), txNames(), uniqueGroups(), uniqueOrder()

DEG.analysis 85

Examples

ORFranges <- GRanges(seqnames = Rle(rep("1", 3)),

ranges = IRanges(start = c(1, 10, 20),

end = c(5, 15, 25)),

strand = "+")

transcriptRanges <- GRanges(seqnames = Rle(rep("1", 5)),

ranges = IRanges(start = c(1, 10, 20, 30, 40),

end = c(5, 15, 25, 35, 45)),

strand = "+")

defineTrailer(ORFranges, transcriptRanges)

DEG.analysis Run differential TE analysis

Description

Expression analysis of 1 dimension, usually between conditions of RNA-seq.

Using the standardized DESeq2 pipeline ﬂow.

Creates a DESeq model (given x is the target.contrast argument) (usually ’condition’ column)

1. RNA-seq model: design = ~ x (differences between the x groups in RNA-seq)

Usage

DEG.analysis(

df,

target.contrast = design[1],

design = ORFik::design(df),

p.value = 0.05,

counts = countTable(df, "mrna", type = "summarized"),

batch.effect = TRUE,

pairs = combn.pairs(unlist(df[, target.contrast]))

)

Arguments

df an experiment of usually RNA-seq.

target.contrast

a character vector, default design[1]. The column in the ORFik experiment that

represent the comparison contrasts. By default: the ﬁrst design factor of the full

experimental design. This is the factor you will do the comparison on. DESeq

will normalize the counts based on the full design, but the log fold change values

will be based on this contrast only. It is usually the ’condition’ column.

design a character vector, default design(df.rfp). The full experiment design. Which

factors have more than 1 level. Example: stage column are all HEK293, so it

can not be a design factor. The condition column has 2 possible values, WT

86 DEG.analysis

and mutant, so it is a factor of the experiment. Replicates column is not part

of design, that is inserted later with setting batch.effect = TRUE. Library type

’libtype’ column, can also no be part of initial design, it is always added inside

the function, after initial setup.

p.value a numeric, default 0.05 in interval (0,1) or "" to not show. What p-value used for

the analysis? Will be shown as a caption.

counts a SummarizedExperiment, default: countTable(df, "mrna", type = "summa-

rized"), all transcripts. Assign a subset if you don’t want to analyze all genes. It

is recommended to not subset, to give DESeq2 data for variance analysis.

batch.effect logical, default TRUE. Makes replicate column of the experiment part of the

design.

If you believe you might have batch effects, keep as TRUE. Batch effect usually

means that you have a strong variance between biological replicates. Check

out pcaExperiment and see if replicates cluster together more than the design

factor, to verify if you need to set it to TRUE.

pairs list of character pairs, the experiment contrasts. Default: combn.pairs(unlist(df.rfp[,

target.contrast])

Details

#’ Analysis is done between each possible combination of levels in the target contrast If target con-

trast is the condition column, with factor levels: WT, mut1 and mut2 with 3 replicates each. You

get comparison of WT vs mut1, WT vs mut2 and mut1 vs mut2.

The respective result categories are deﬁned as: (given a user deﬁned p value, shown here as 0.05):

Signiﬁcant - p-value adjusted < 0.05 (p-value cutoff decided by ’p.value argument)

The LFC values are shrunken by lfcShrink(type = "normal").

Remember that DESeq by default can not do global change analysis, it can only ﬁnd subsets with

changes in LFC!

Value

a data.table with columns: (contrast variable, gene id, regulation status, log fold changes, p.adjust

values, mean counts)

References

doi: 10.1002/cpmb.108

See Also

Other DifferentialExpression: DEG.plot.static(), DEG_model(), DTEG.plot(), te.table(),

te_rna.plot()

DEG.plot.static 87

Examples

## Simple example (use ORFik template, then use only RNA-seq)

df <- ORFik.template.experiment()

df.rna <- df[df$libtype == "RNA",]

design(df.rna) # The full experimental design

design(df.rna)[1] # Default target contrast

#dt <- DEG.analysis(df.rna)

DEG.plot.static Plot DEG result

Description

Plot setup:

X-axis: mean counts Y-axis: Log2 fold changes For explanation of plot, see DEG.analysis

Usage

DEG.plot.static(

dt,

output.dir = NULL,

p.value = 0.05,

plot.title = "",

plot.ext = ".pdf",

width = 6,

height = 6,

dot.size = 0.4,

xlim = "auto",

ylim = "bidir.max",

relative.name = paste0("DEG_plot", plot.ext)

)

Arguments

dt a data.table with the results from DEG.analysis

output.dir a character path, default NULL(no save), or a directory to save to a ﬁle. Relative

name of ﬁle, speciﬁed by ’relative.name’ argument.

p.value a numeric, default 0.05 in interval (0,1) or "" to not show. What p-value used for

the analysis? Will be shown as a caption.

plot.title title for plots, usually name of experiment etc

plot.ext character, default: ".pdf". Alternatives: ".png" or ".jpg".

width numeric, default 6 (in inches)

height numeric, default 6 (in inches)

dot.size numeric, default 0.4, size of point dots in plot.

88 DEG_model

xlim numeric vector or character preset, default: "bidir.max" (Equal in both + / -

direction, using max value + 0.5 of meanCounts column in dt). If you want

ggplot to decide limit, set to "auto". For numeric vector, specify min and max x

limit: like c(-5, 5)

ylim numeric vector or character preset, default: "bidir.max" (Equal in both + / -

direction, using max value + 0.5 of LFC column in dt). If you want ggplot to

decide limit, set to "auto". For numeric vector, specify min and max y limit: like

c(-10, 10)

relative.name character, Default: paste0("DEG_plot", plot.ext) Relative name of ﬁle to

be saved in folder speciﬁed in output.dir. Change to .pdf if you want pdf ﬁle

instead of png.

Value

a ggplot object

See Also

Other DifferentialExpression: DEG_model(), DTEG.analysis(), DTEG.plot(), te.table(), te_rna.plot()

Examples

df <- ORFik.template.experiment()

df.rna <- df[df$libtype == "RNA",]

#dt <- DEG.analysis(df.rna)

#Default scaling

#DEG.plot.static(dt)

#Manual scaling

#DEG.plot.static(dt, xlim = c(-2, 2), ylim = c(-2, 2))

DEG_model Get DESeq2 model without running results

Description

This is the preparation step of DESeq2 analysis using ORFik::DEG.analysis. It is exported so that

you can do this step in standalone, usually you want to use DEG.analysis directly.

Usage

DEG_model(

df,

target.contrast = design[1],

design = ORFik::design(df),

p.value = 0.05,

counts = countTable(df, "mrna", type = "summarized"),

batch.effect = TRUE

)

DEG_model 89