2.E: Reactome & rbioapi


Introduction

Directly quoting from Reactome:

REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic and clinical research, genome analysis, modeling, systems biology and education. Founded in 2003, the Reactome project is led by Lincoln Stein of OICR, Peter D’Eustachio of NYULMC, Henning Hermjakob of EMBL-EBI, and Guanming Wu of OHSU.

(source: https://reactome.org/what-is-reactome)

Reactome provides two RESTful API services: Reactome content services and Reactome analysis services. In rbioapi, the naming schema is that any function which belongs to analysis services starts with rba_reactome_analysis* . Other rba_reactome_* functions without the ‘analysis’ infix correspond to content services API.

Before continuing reading this article, it is a good idea to read Reactome Data Model page.


Reactome analysis services

This section mostly revolves around rba_reactome_analysis() function. So, naturally, we will start with that. As explained in the function’s manual, you have considerable freedom in providing the main input for this function; You can supply an R object (as a data frame, matrix, or simple vector), a URL, or a local file path. Note that the type of analysis will be decided based on whether your input is 1-dimensional or 2-dimensional. This has been explained in detail in the manual of rba_reactome_analysis(), see that for more information.
rba_reactome_analysis() is the API equivalent of Reactome’s analyse gene list tool. You can see that the function’s arguments correspond to what would you choose in the webpage’s wizard.

## 1 We create a simple vector with our genes
genes <- c("p53", "BRCA1", "cdk2", "Q99835", "CDC42", "CDK1", "KIF23", "PLK1", "RAC2", "RACGAP1", "RHOA", "RHOB", "MSL1", "PHF21A", "INSR", "JADE2", "P2RX7", "CCDC101", "PPM1B", "ANAPC16", "CDH8", "HSPA1L", "CUL2", "ZNF302", "CUX1", "CYTH2", "SEC22C", "EIF4E3", "ROBO2", "CXXC1", "LINC01314", "ATP5F1")

## 2 We call reactome analysis with the default parameters
analyzed <- rba_reactome_analysis(input = genes,
                                  projection = TRUE,
                                  p_value = 0.01)

## 3 As always, we use str() to inspect the resutls
str(analyzed, 1)
#> List of 8
#>  $ summary            :List of 7
#>  $ expression         :List of 1
#>  $ identifiersNotFound: int 1
#>  $ pathwaysFound      : int 82
#>  $ pathways           :'data.frame': 82 obs. of  19 variables:
#>  $ resourceSummary    :'data.frame': 3 obs. of  3 variables:
#>  $ speciesSummary     :'data.frame': 1 obs. of  5 variables:
#>  $ warnings           : list()

## 4 Note that in the summary element: (analyzed$summary)
### 4.a because we supplied a simple vector, the analysis type was: over-representation
### 4.b You need the token for other rba_reactome_analysis_* functions

## 5 Analsis results are in the pathways data frame:

As mentioned, some of rba_reactome_analysis()’s arguments correspond to the wizard of analyse gene list tool; Other arguments corresponds to the contents of “Filter your results” tab in the results page.

Having the analysis’s token, you can retrieve the analysis results in many formats using rba_reactome_analysis_pdf() and rba_reactome_analysis_download():

# download a full pdf report
rba_reactome_analysis_pdf(token = analyzed$summary$token,
                          species = 9606)
# download the result in compressed json.gz format
rba_reactome_analysis_download(token = analyzed$summary$token,
                               request = "results",
                               save_to = "reactome_results.json")

Your token is only guaranteed to be stored for 7 days. After that, you can upload the JSON file you have downloaded using rba_reactome_analysis_download and get a token for that:

re_uploaded <- rba_reactome_analysis_import(input = "reactome_results.json")

Please Note: Other services supported by rbioapi also provide Over-representation analysis tools. Please see the vignette article Do with rbioapi: Over-Representation (Enrichment) Analysis in R (link to the documentation site) for an in-depth review.

See also in Functions’ manuals

Some rbioapi Reactome analysis functions were not covered in this vignette, be sure to check their manuals:

  • rba_reactome_analysis_mapping()

  • rba_reactome_analysis_species()

  • rba_reactome_analysis_token()


Reactome contents services

rbioapi functions that correspond to Reactome content services are those starting with rba_reactome_* but without “_analysis” infix. These functions cover what you can do with objects in Reactome knowledge-base. In simpler terms, most -but not all of them- correspond to what you can find in Reactome Pathway Browser and search results. (e.g. a pathway, a reaction, a physical Entity, etc.)

Retrieve any object from Reactome knowledge-base

Using rba_reactome_query(), you can retrieve any object from Reactome knowledge-base. In simpler terms, what I mean by the object is roughly anything that Reactome associated an ID to it. This can range from a person’s entry to proteins, reactions, pathways, species, and many more! You can explore Reactome’s data schema to learn about Reactome knowledge-base objects and their organization. Here are some examples, note that you are not limited to only one ID per query. You can use a vector of inputs, the only limitation is that when you supply more than one ID, you cannot have enhanced = TRUE.

## 1 query a pathway Entry
pathway <- rba_reactome_query(ids = "R-HSA-109581", enhanced = TRUE)
## 2 As always we use str() to inspect the output's structure
str(pathway, 2)
#> List of 28
#>  $ dbId               : int 109581
#>  $ displayName        : chr "Apoptosis"
#>  $ stId               : chr "R-HSA-109581"
#>  $ stIdVersion        : chr "R-HSA-109581.6"
#>  $ created            :List of 5
#>   ..$ dbId       : int 109608
#>   ..$ displayName: chr "Alnemri, E, Hengartner, Michael, Tschopp, Jürg, Tsujimoto, Yoshihide, Hardwick, JM, 2004-01-16"
#>   ..$ dateTime   : chr "2004-01-16 21:01:51"
#>   ..$ className  : chr "InstanceEdit"
#>   ..$ schemaClass: chr "InstanceEdit"
#>  $ modified           :List of 6
#>   ..$ dbId       : int 11003847
#>   ..$ displayName: chr "Weiser, Joel, 2024-08-30"
#>   ..$ dateTime   : chr "2024-08-30 03:50:04"
#>   ..$ note       : chr "Inserted by org.reactome.orthoinference"
#>   ..$ className  : chr "InstanceEdit"
#>   ..$ schemaClass: chr "InstanceEdit"
#>  $ isInDisease        : logi FALSE
#>  $ isInferred         : logi FALSE
#>  $ name               :List of 1
#>   ..$ : chr "Apoptosis"
#>  $ releaseDate        : chr "2004-09-20"
#>  $ speciesName        : chr "Homo sapiens"
#>  $ authored           :List of 1
#>   ..$ : int 109608
#>  $ edited             :List of 1
#>   ..$ :List of 5
#>  $ figure             :List of 1
#>   ..$ :List of 5
#>  $ goBiologicalProcess:List of 9
#>   ..$ dbId        : int 2273
#>   ..$ displayName : chr "apoptotic process"
#>   ..$ accession   : chr "0006915"
#>   ..$ databaseName: chr "GO"
#>   ..$ definition  : chr "A programmed cell death process which begins when a cell receives an internal (e.g. DNA damage) or external sig"| __truncated__
#>   ..$ name        : chr "apoptotic process"
#>   ..$ url         : chr "https://www.ebi.ac.uk/QuickGO/term/GO:0006915"
#>   ..$ className   : chr "GO_BiologicalProcess"
#>   ..$ schemaClass : chr "GO_BiologicalProcess"
#>  $ literatureReference:List of 7
#>   ..$ :List of 11
#>   ..$ :List of 11
#>   ..$ :List of 11
#>   ..$ :List of 11
#>   ..$ :List of 11
#>   ..$ :List of 11
#>   ..$ :List of 11
#>  $ orthologousEvent   :List of 14
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>  $ reviewed           :List of 1
#>   ..$ :List of 5
#>  $ species            :List of 1
#>   ..$ :List of 8
#>  $ summation          :List of 1
#>   ..$ :List of 5
#>  $ reviewStatus       :List of 6
#>   ..$ dbId       : int 9821382
#>   ..$ displayName: chr "five stars"
#>   ..$ definition : chr "externally reviewed"
#>   ..$ name       :List of 1
#>   ..$ className  : chr "ReviewStatus"
#>   ..$ schemaClass: chr "ReviewStatus"
#>  $ updateTrackers     :List of 22
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>   ..$ :List of 6
#>  $ hasDiagram         : logi TRUE
#>  $ hasEHLD            : logi TRUE
#>  $ lastUpdatedDate    : chr "2022-06-09"
#>  $ hasEvent           :List of 4
#>   ..$ :List of 16
#>   ..$ :List of 17
#>   ..$ :List of 17
#>   ..$ :List of 16
#>  $ schemaClass        : chr "Pathway"
#>  $ className          : chr "Pathway"



## 3 You can compare it with the webpage of R-HSA-202939 entry:
# https://reactome.org/content/detail/R-HSA-202939
## 1 query a protein Entry
protein <- rba_reactome_query(ids = 66247, enhanced = TRUE)
## 2 As always we use str() to inspect the output's structure
str(protein, 1)
#> List of 27
#>  $ dbId               : int 66247
#>  $ displayName        : chr "UniProt:P25942-1 CD40"
#>  $ modified           :List of 6
#>  $ databaseName       : chr "UniProt"
#>  $ identifier         : chr "P25942"
#>  $ name               :List of 1
#>  $ otherIdentifier    :List of 108
#>  $ url                : chr "http://purl.uniprot.org/uniprot/P25942-1"
#>  $ crossReference     :List of 29
#>  $ referenceDatabase  :List of 8
#>  $ physicalEntity     :List of 1
#>  $ checksum           : chr "BC8776EC2C4A5680"
#>  $ comment            :List of 1
#>  $ description        :List of 1
#>  $ geneName           :List of 2
#>  $ isSequenceChanged  : logi FALSE
#>  $ keyword            :List of 16
#>  $ secondaryIdentifier:List of 8
#>  $ sequenceLength     : int 277
#>  $ species            : int 48887
#>  $ chain              :List of 2
#>  $ referenceGene      :List of 12
#>  $ referenceTranscript:List of 4
#>  $ variantIdentifier  : chr "P25942-1"
#>  $ isoformParent      :List of 1
#>  $ className          : chr "ReferenceIsoform"
#>  $ schemaClass        : chr "ReferenceIsoform"



## 3 You can compare it with the webpage of R-HSA-202939 entry:
# https://reactome.org/content/detail/R-HSA-202939

Find Cross-Reference IDs in Reactome

As you can see in the second example usage of we used Reactome’s dbID 66247 to query CD40 protein. How did we obtain that in the first place? You can use rba_reactome_xref to map any cross-reference (external) IDs to Reactome IDs.

## 1 We Supply HGNC ID to find what is the corresponding database ID in Reactome
xref_protein <- rba_reactome_xref("CD40")
## 2 As always use str() to inspect the output's structure
str(xref_protein, 1)
#> List of 19
#>  $ dbId               : int 66247
#>  $ displayName        : chr "UniProt:P25942-1 CD40"
#>  $ databaseName       : chr "UniProt"
#>  $ identifier         : chr "P25942"
#>  $ name               :List of 1
#>  $ otherIdentifier    :List of 1
#>  $ url                : chr "http://purl.uniprot.org/uniprot/P25942-1"
#>  $ checksum           : chr "BC8776EC2C4A5680"
#>  $ comment            :List of 1
#>  $ description        :List of 1
#>  $ geneName           :List of 1
#>  $ isSequenceChanged  : logi FALSE
#>  $ keyword            :List of 1
#>  $ secondaryIdentifier:List of 1
#>  $ sequenceLength     : int 277
#>  $ chain              :List of 1
#>  $ variantIdentifier  : chr "P25942-1"
#>  $ className          : chr "ReferenceIsoform"
#>  $ schemaClass        : chr "ReferenceIsoform"

Map Cross-Reference IDs to Reactome

While we are at the cross-reference topic, here is another useful resource. Using rba_reactome_mapping you can find the Reactome pathways or reactions which include your external ID:

## 1 Again, consider CD40 protein:
xref_mapping <- rba_reactome_mapping(id = "CD40",
                                    resource = "hgnc",
                                    map_to = "pathways")

See also in Functions’ manuals

There are still more rbioapi f Reactome content functions that were not covered in this vignette. Here is a brief overview, see the functions’ manual for detailed guides and examples.

Retrieve Reactome Database information

  • rba_reactome_version(): Return current Reactome version

  • rba_reactome_diseases(): Retrieve a list of disease annotated in Reactome.

  • rba_reactome_species(): Retrieve a list of species annotated in Reactome.

General Mapping/Querying

  • rba_reactome_query()

  • rba_reactome_mapping()

  • rba_reactome_xref()

Things you can do with a Entities

  • reactome_complex_list(): Get a list of complexes that have your molecule in them.

  • rba_reactome_complex_subunits(): Get the list of subunits in your complex

  • rba_reactome_participant_of(): Get a list of Reactome sets and complexes that your entity (event, molecule, reaction, pathway etc.) is a participant in them.

  • rba_reactome_entity_other_forms()

Things you can do with Events

  • rba_reactome_event_ancestors()

  • rba_reactome_participants()

  • rba_reactome_pathways_events()

  • rba_reactome_event_ancestors()

  • rba_reactome_orthology()

  • rba_reactome_event_hierarchy(): Retrieve full event hierarchy of an species.

Pathways

  • rba_reactome_pathways_low()

  • rba_reactome_pathways_events()

  • rba_reactome_pathways_top()

Interactors

  • rba_reactome_interactors_psicquic()

  • rba_reactome_interactors_static()

People

  • rba_reactome_people_name()

  • rba_reactome_people_id()

Export diagrams and events

  • rba_reactome_exporter_diagram()

  • rba_reactome_exporter_overview()

  • rba_reactome_exporter_reaction()

  • rba_reactome_exporter_event()


How to Cite?

To cite Reactome (Please see https://reactome.org/cite):

  • Marc Gillespie, Bijay Jassal, Ralf Stephan, Marija Milacic, Karen Rothfels, Andrea Senff-Ribeiro, Johannes Griss, Cristoffer Sevilla, Lisa Matthews, Chuqiao Gong, Chuan Deng, Thawfeek Varusai, Eliot Ragueneau, Yusra Haider, Bruce May, Veronica Shamovsky, Joel Weiser, Timothy Brunson, Nasim Sanati, Liam Beckman, Xiang Shao, Antonio Fabregat, Konstantinos Sidiropoulos, Julieth Murillo, Guilherme Viteri, Justin Cook, Solomon Shorser, Gary Bader, Emek Demir, Chris Sander, Robin Haw, Guanming Wu, Lincoln Stein, Henning Hermjakob, Peter D’Eustachio, The reactome pathway knowledgebase 2022, Nucleic Acids Research, 2021;, kab1028, https://doi.org/10.1093/nar/gkab1028
  • Griss J, Viteri G, Sidiropoulos K, Nguyen V, Fabregat A, Hermjakob H. ReactomeGSA - Efficient Multi-Omics Comparative Pathway Analysis. Mol Cell Proteomics. 2020 Sep 9. doi: 10.1074/mcp. PubMed PMID: 32907876.
  • Fabregat A, Korninger F, Viteri G, Sidiropoulos K, Marin-Garcia P, Ping P, Wu G, Stein L, D’Eustachio P, Hermjakob H. Reactome graph database: Efficient access to complex pathway data. PLoS Comput Biol. 2018 Jan 29;14(1):e1005968. doi: 10.1371/journal.pcbi.1005968. eCollection 2018 Jan. PubMed PMID: 29377902.
  • Fabregat A, Sidiropoulos K, Viteri G, Marin-Garcia P, Ping P, Stein L, D’Eustachio P, Hermjakob H. Reactome diagram viewer: data structures and strategies to boost performance. Bioinformatics. 2018 Apr 1;34(7):1208-1214. doi: 10.1093/bioinformatics/btx752. PubMed PMID: 29186351.
  • Fabregat A, Sidiropoulos K, Viteri G, Forner O, Marin-Garcia P, Arnau V, D’Eustachio P, Stein L, Hermjakob H. Reactome pathway analysis: a high-performance in-memory approach. BMC Bioinformatics. 2017 Mar 2;18(1):142. doi: 10.1186/s12859-017-1559-2. PubMed PMID: 28249561.
  • Wu G, Haw R. Functional Interaction Network Construction and Analysis for Disease Discovery. Methods Mol Biol. 2017;1558:235-253. doi: 10.1007/978-1-4939-6783-4_11. PubMed PMID: 28150241

To cite rbioapi:

  • Moosa Rezwani, Ali Akbar Pourfathollah, Farshid Noorbakhsh, rbioapi: user-friendly R interface to biologic web services’ API, Bioinformatics, Volume 38, Issue 10, 15 May 2022, Pages 2952–2953, https://doi.org/10.1093/bioinformatics/btac172

Session info

#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] rbioapi_0.8.0  rmarkdown_2.29
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.49        
#>  [5] magrittr_2.0.3    maketools_1.3.1   cachem_1.1.0      knitr_1.49       
#>  [9] htmltools_0.5.8.1 buildtools_1.0.0  lifecycle_1.0.4   DT_0.33          
#> [13] cli_3.6.3         sass_0.4.9        jquerylib_0.1.4   compiler_4.4.2   
#> [17] httr_1.4.7        sys_3.4.3         tools_4.4.2       curl_6.0.0       
#> [21] evaluate_1.0.1    bslib_0.8.0       yaml_2.3.10       htmlwidgets_1.6.4
#> [25] jsonlite_1.8.9    rlang_1.1.4       crosstalk_1.2.1