previous section previous page next page next section
CMB

Online Lectures on Bioinformatics

navigation


Database searching


Exercises

Note:You are supposed to have the possibility to access a Unix/Linux-like operating system with an installed C-compiler to be able to follow the below instructions.


Search for homologies with hidden Markov models


Obtain the Swissprot-entry of the myb proto-oncogene protein (AC P10242, entry MYB_HUMAN).

Take the amino acid sequence of the myb protein and search against proteins with BLAST.


  1. Concatenate domains

    We want to obtain a HMM for myb-domains to search against a database. Keep that in mind while screening the hits of the BLAST-search. Select some myb-domains and copy the corresponding parts of the sequences to a file in fasta-format (fasta format is described in exercises section of Pairwise Alignments).


  2. Multiply align domains

    Download ClustalW (or ClustalX ?) from EBI-ftp-Server:
    -> ftp.ebi.ac.uk
    -> directory /pub/software/unix
    -> clustalw.tar.Z

    decompress and detar with

    $ gtar xvzf clustalw.tar.Z
    $ cd clustalw1.7

    compile the source code distribution:

    $ make

    Multiply align the myb-domains by clustalw.


  3. HMMER

    Install HMMER 2.1.1

    Download from

    -> ftp.genetics.wustl.edu
    -> directory /pub/eddy/hmmer
    -> hmmer-2.1.1.tar.Z

    decompress, detar (see above) and install source code distribution (see file INSTALL):

    $ cd hmmer-2.1.1
    $ ./configure
    $ make

    The binaries now are in the directory binaries.


  4. Build HMM of myb-domains

    Build and calibrate a HMM of the myb-domains by means of hmmbuild and hmmcalibrate.


  5. Search a database for homologues

    We want to search a fasta-database consisting of concatenated sequences in fasta format (fasta format is described in exercises section of Pairwise Alignments).
    Therefore, first create a directory for the database and change into this directory:

    $ mkdir database
    $ cd database

    Download e.g. the peptides of the Drosophila genomes in fasta format from NCBI (~7,5 MB)

    -> ftp ncbi.nlm.nih.gov (ftp-Server at NCBI)
    -> directory /genbank/genomes/D_melanogaster/Scaffolds/LARGE
    -> download all files ending with '.faa'

    concatenate the peptides of all scaffolds into one file:

    $ cat *.faa > drosophila.fasta
    ($ rm *.faa)

    Use hmmsearch to search against a database with the HMM of the myb-domains:
    Use '>' to direct standard output to a file.
    The commandline may look like this:

    $ hmmsearch myb_domains.hmm ~/database/Drosophila.fasta > myb_hmm_drome.log


  6. Iterate search

    Screen the hits, build a new HMM including selected hits and hmmsearch again.


    Comments are very welcome.
    luz@molgen.mpg.de