This is a portable dump of the D2P2 database. All data is made available as compressed TSV files with matching MySQL CREATE TABLE files so that you may add just the tables you need to your own MySQL database. Additionally a SQLite3 database file is made available ready to import data into so that you may use the database without having to install a database server. To start using the D2P2 data with MySQL: To start using the D2P2 data with SQLite3: $ bzip2 -d tables/*.bz2 #First decompress all of the table data, or just those that you need! $ sqlite3 d2p2.sqlite3 #Next use the SQLite3 application to load the empty database provided sqlite> .separator "\t" --Tell SQLite that we are using tab delimited files sqlite> .import tables/protein.tsv protein --Load the tables you are interested in using sqlite> .import tables/genome.tsv genome sqlite> .import tables/genome_sequence.tsv genome_sequence sqlite> .import tables/dis_assignment.tsv dis_assignment sqlite> .import tables/predictor.tsv predictor The above can also be accomplished more automatically with the following from the command line: $ for table in `ls -1 tables/*.tsv`; do echo -e ".mode tabs\n.import $table "`basename $table .tsv` | sqlite3 d2p2.sqlite3; done > >done sqlite> SELECT protein.seqid, predictor.name --Then do a query to get the data you want from the database ...> dis_assignment.start, dis_assignment.end, ...> substr(genome_sequence.sequence,start,1 + end - start) --cut out regions of sequence ...> AS seq_region ...> FROM protein ...> JOIN genome ON protein.genome = genome.genome ...> AND genome.domain IN ('A','B') --only from Archaea and Bacteria ...> JOIN dis_assignment ON dis_assignment.protein = protein.protein ...> AND 1 + end - start > 35 --only for "long" type disorder ...> JOIN predictor ON predictor.predictor = dis_assignment.predictor ...> JOIN genome_sequence ON genome_sequence.protein = protein.protein;