parse BioGRID tab2 format — read_biogrid

Biogrid is a database curating protein-protein interactions from hight- and low-throughput physical and genetic interactions

read_biogrid_tab2(fname, taxon)

Arguments

fname	path to .tab2 file
taxon	filter for interactions where both partners are from ncbi <taxon>

Value

tibble::tibble with columns biogrid_interaction_id gene_id_1 gene_id_2 biogrid_id_1 biogrid_id_2 feature_name_1 feature_name_2 gene_symbol_1 gene_symbol_2 synonyms_1 synonyms_2 experimental_system experimental_system_type author pubmed_id taxon_1 taxon_2 throughput score modification phenotypes qualifications tags source_database

usage: From the command line, download the by-organism biogrid dataset for the desired release and filter for the desired organism

BIOGRID_RELEASE=3.4.161 ORGANISM_NAME=Candida_albicans_SC5314 pushd raw_data wget https://downloads.thebiogrid.org/Download/BioGRID/Release-Archive/BIOGRID-$BIOGRID_RELEASE/BIOGRID-ORGANISM-$BIOGRID_RELEASE.tab2.zip unzip BIOGRID-$BIOGRID_RELEASE/BIOGRID-ORGANISM-$BIOGRID_RELEASE.tab2.zip ls | grep -v -e "$ORANISM_NAME | xargs rm popd

from R, parse biogrid data

biogrid_release <- "3.4.161" organism_name <- "Candida_albicans_SC5314" taxon <- "237561" biogrid_data <- CalCEN::read_biogrid_tab2( fname = paste0("raw_data/biogrid/BIOGRID-ORGANISM-", organism_name, "-", biogrid_release, ".tab2.txt"), taxon = taxon)