This is a portable dump of the D2P2 database. All data is made available as compressed TSV files with matching MySQL CREATE TABLE files so that you may add just the tables you need to your own MySQL database. Additionally a SQLite3 database file is made available ready to import data into so that you may use the database without having to install a database server.

To start using the D2P2 data with MySQL:


To start using the D2P2 data with SQLite3:

$ bzip2 -d tables/*.bz2                                       #First decompress all of the table data, or just those that you need!
$ sqlite3 d2p2.sqlite3                                        #Next use the SQLite3 application to load the empty database provided
sqlite> .separator "\t"                                       --Tell SQLite that we are using tab delimited files
sqlite> .import tables/protein.tsv protein                    --Load the tables you are interested in using
sqlite> .import tables/genome.tsv genome
sqlite> .import tables/genome_sequence.tsv genome_sequence
sqlite> .import tables/dis_assignment.tsv dis_assignment
sqlite> .import tables/predictor.tsv predictor

The above can also be accomplished more automatically with the following from the command line:

$ for table in `ls -1 tables/*.tsv`; do echo -e ".mode tabs\n.import $table "`basename $table .tsv` | sqlite3 d2p2.sqlite3; done
>
>done
sqlite> SELECT protein.seqid, predictor.name                  --Then do a query to get the data you want from the database
   ...>   dis_assignment.start, dis_assignment.end,
   ...>   substr(genome_sequence.sequence,start,1 + end - start)            --cut out regions of sequence
   ...>     AS seq_region
   ...> FROM protein
   ...> JOIN genome ON protein.genome = genome.genome
   ...>      AND genome.domain IN ('A','B')                                 --only from Archaea and Bacteria
   ...> JOIN dis_assignment ON dis_assignment.protein = protein.protein
   ...>      AND 1 + end - start > 35                                       --only for "long" type disorder
   ...> JOIN predictor ON predictor.predictor = dis_assignment.predictor
   ...> JOIN genome_sequence ON genome_sequence.protein = protein.protein;