Contents
InterPro
InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them. The contents of InterPro consist of diagnostic signatures and the proteins that they significantly match. The signatures consist of models (simple types, such as regular expressions or more complex ones, such as Hidden Markov models) which describe protein families, domains or sites. Models are built from the amino acid sequences of known families or domains and they are subsequently used to search unknown sequences (such as those arising from novel genome sequencing) in order to classify them. Each of the member databases of InterPro contributes towards a different niche, from very high-level, structure-based classifications (SUPERFAMILY and CATH-Gene3D) through to quite specific sub-family classifications (PRINTS and PANTHER). InterPro's intention is to provide a one-stop-shop for protein classification, where all the signatures produced by the different member databases are placed into entries within the InterPro database. Signatures which represent equivalent domains, sites or families are put into the same entry and entries can also be related to one another. Additional information such as a description, consistent names and Gene Ontology (GO) terms are associated with each entry, where possible.
Data contained in InterPro
InterPro contains three main entities: proteins, signatures (also referred to as "methods" or "models") and entries. The proteins in UniProtKB are also the central protein entities in InterPro. Information regarding which signatures significantly match these proteins are calculated as the sequences are released by UniProtKB and these results are made available to the public (see below). The matches of signatures to proteins are what determine how signatures are integrated together into InterPro entries: comparative overlap of matched protein sets and the location of the signatures' matches on the sequences are used as indicators of relatedness. Only signatures deemed to be of sufficient quality are integrated into InterPro. As of version 81.0 (released 21 August 2020) InterPro entries annotated 73.9% of residues found in UniProtKB with another 9.2% annotated by signatures that are pending integration. InterPro also includes data for splice variants and the proteins contained in the UniParc and UniMES databases.
InterPro consortium member databases
The signatures from InterPro come from 13 "member databases", which are listed below.
Data types
InterPro consists of seven types of data provided by different members of the consortium:
InterPro entry types
InterPro entries can be further broken down into five types:
Access
The database is available for text- and sequence-based searches via a webserver, and for download via anonymous FTP. Like other EBI databases, it is in the public domain, since its content can be used "by any individual and for any purpose". InterPro aims to release data to the public every 8 weeks, typically within a day of the UniProtKB release of the same proteins.
InterPro application programming interface (API)
InterPro provides an API for programmatic access to all InterPro entries and their related entries in Json format. There are six main endpoints for the API corresponding to the different InterPro data types: entry, protein, structure, taxonomy, proteome and set.
InterProScan
InterProScan is a software package that allows users to scan sequences against member database signatures. Users can use this signature scanning software to functionally characterize novel nucleotide or protein sequences. InterProScan is frequently used in genome projects in order to obtain a "first-pass" characterisation of the genome of interest. , the public version of InterProScan (v5.x) uses a Java-based architecture. The software package is currently only supported on a 64-bit Linux operating system. InterProScan, along with many other EMBL-EBI bioinformatics tools, can also be accessed programmatically using RESTful and SOAP Web Services APIs.
This article is derived from Wikipedia and licensed under CC BY-SA 4.0. View the original article.
Wikipedia® is a registered trademark of the
Wikimedia Foundation, Inc.
Bliptext is not
affiliated with or endorsed by Wikipedia or the
Wikimedia Foundation.