OrthoFinder Tutorials Phylogenetic orthology inference for comparative genomics

1. Downloading and running OrthoFinder

Plan for this tutorial

In this tutorial we’re going to download OrthoFinder and check we can run it on the Example Dataset. After doing that we’ll be ready for the next tutorial where we run it on a more interesting set of species. All these steps will be done on the command line so that you can just copy and paste the commands yourself. If you are not familiar with the command line there are many online tutorials and reference pages, here is a nice short one that covers the basics: https://www.techspot.com/guides/835-linux-command-line-basics/.

Downloading and running OrthoFinder

There are a number of ways of obtaining OrthoFinder.

For Linux: follow the instructions below.

For Macs: it’s easiest to use Bioconda to install and then follow the instructions below.

For Windows it is best to install the Windows Subsystem for Linux and then continue as for Linux below.

For Mac and Windows using Bioconda/Windows Subsystem for Linux, follow the instructions here: Alternative ways of getting OrthoFinder and then return to Step 1 of this tutorial.

  1. Create a directory to work in. Open a terminal and run the commands:
    mkdir ~/orthofinder_tutorial
    cd ~/orthofinder_tutorial
    
  2. Download the latest version of OrthoFinder (if you’re using Bioconda you still need to do this step to get the Example Dataset)
    wget https://github.com/davidemms/OrthoFinder/releases/latest/download/OrthoFinder.tar.gz
    

    If you don’t have wget installed, you can try curl:

    curl -L -O https://github.com/davidemms/OrthoFinder/releases/latest/download/OrthoFinder.tar.gz

    Or go to the GitHub releases page and download OrthoFinder: https://github.com/davidemms/OrthoFinder/releases

  3. Extract the package, and cd into the OrthoFinder directory
    tar xzvf OrthoFinder.tar.gz
    cd OrthoFinder/
    
  4. Request OrthoFinder to print its help file.

    On Linux:

     ./orthofinder -h
    

    Or, if you’ve installed OrthoFinder using Bioconda run the version it installed in the system path rather than the local copy in this directory:

     orthofinder -h
    

    This will print all the OrthoFinder command line options.

  5. Run OrthoFinder on the Example Dataset (this is a very small dataset so should run in a few of minutes, normal datasets will take longer)

    Linux:

     ./orthofinder -f ExampleData/
    

    Or, using Bioconda:

     orthofinder -f ExampleData/
    

A quick look at the results

When you run OrthoFinder you should get something like this:

~/orthofinder_tutorial$ ./orthofinder -f ExampleData/

OrthoFinder version 2.3.7 Copyright (C) 2014 David Emms

2019-10-23 11:12:56 : Starting OrthoFinder
48 thread(s) for highly parallel tasks (BLAST searches etc.)
1 thread(s) for OrthoFinder algorithm

Checking required programs are installed
----------------------------------------
Test can run "mcl -h" - ok
Test can run "fastme -i /home/emms/orthofinder_tutorial/ExampleDataset/OrthoFinder/Results_Oct23/WorkingDirectory/SimpleTest.phy -o /home/emms/orthofinder_tutorial/ExampleDataset/OrthoFinder/Results_Oct23/WorkingDirectory/SimpleTest.tre" - ok

Dividing up work for BLAST for parallel processing
--------------------------------------------------
2019-10-23 11:12:56 : Creating diamond database 1 of 4
2019-10-23 11:12:56 : Creating diamond database 2 of 4
2019-10-23 11:12:56 : Creating diamond database 3 of 4
2019-10-23 11:12:56 : Creating diamond database 4 of 4

Running diamond all-versus-all
------------------------------
Using 48 thread(s)
2019-10-23 11:12:56 : This may take some time....
2019-10-23 11:13:05 : Done all-versus-all sequence search

Running OrthoFinder algorithm
-----------------------------
2019-10-23 11:13:05 : Initial processing of each species
2019-10-23 11:13:05 : Initial processing of species 0 complete
2019-10-23 11:13:05 : Initial processing of species 1 complete
2019-10-23 11:13:06 : Initial processing of species 2 complete
2019-10-23 11:13:06 : Initial processing of species 3 complete
2019-10-23 11:13:08 : Connected putative homologues
2019-10-23 11:13:08 : Written final scores for species 0 to graph file
2019-10-23 11:13:08 : Written final scores for species 1 to graph file
2019-10-23 11:13:08 : Written final scores for species 2 to graph file
2019-10-23 11:13:09 : Written final scores for species 3 to graph file
2019-10-23 11:13:09 : Ran MCL

Writing orthogroups to file
---------------------------
OrthoFinder assigned 2202 genes (80.6% of total) to 604 orthogroups. Fifty percent of all genes were in orthogroups with 4 or more genes (G50 was 4) and were contained in the largest 281 orthogroups (O50 was 281). There were 269 orthogroups with all species present and 246 of these consisted entirely of single-copy genes.

2019-10-23 11:13:15 : Done orthogroups

Analysing Orthogroups
=====================

Calculating gene distances
--------------------------
2019-10-23 11:13:19 : Done

Inferring gene and species trees
--------------------------------
2019-10-23 11:13:19 : Done 0 of 325
2019-10-23 11:13:19 : Done 100 of 325
2019-10-23 11:13:19 : Done 200 of 325

269 trees had all species present and will be used by STAG to infer the species tree

Best outgroup(s) for species tree
---------------------------------
2019-10-23 11:13:27 : Starting STRIDE
2019-10-23 11:13:28 : Done STRIDE
Observed 2 well-supported, non-terminal duplications. 2 support the best roots and 0 contradict them.
Best outgroups for species tree:
  Mycoplasma_hyopneumoniae
  Mycoplasma_agalactiae, Mycoplasma_hyopneumoniae
  Mycoplasma_agalactiae

WARNING: Multiple potential species tree roots were identified, only one will be analyed.

Reconciling gene trees and species tree
---------------------------------------
Outgroup: Mycoplasma_hyopneumoniae
2019-10-23 11:13:28 : Starting Recon and orthologues
2019-10-23 11:13:28 : Starting OF Orthologues
2019-10-23 11:13:28 : Done 0 of 325
2019-10-23 11:13:29 : Done 100 of 325
2019-10-23 11:13:30 : Done 200 of 325
2019-10-23 11:13:32 : Done 300 of 325
2019-10-23 11:13:32 : Done OF Orthologues
2019-10-23 11:13:32 : Done Recon

Writing results files
=====================
2019-10-23 11:13:32 : Done orthologues

Results:
    /home/emms/orthofinder_tutorial/ExampleDataset/OrthoFinder/Results_Oct23/

CITATION:
 When publishing work that uses OrthoFinder please cite:
 Emms D.M. & Kelly S. (2015), Genome Biology 16:157

 If you use the species tree in your work then please also cite:
 Emms D.M. & Kelly S. (2017), MBE 34(12): 3267-3278
 Emms D.M. & Kelly S. (2018), bioRxiv https://doi.org/10.1101/267914

OrthoFinder creates a directory within the directory with your input files and puts all the results there, e.g.: ExampleData/OrthoFinder/Results_Oct11. This is what the results directory looks like:

If everything worked then you should have got a similar looking results directory. That’s it, we’re done!

In the next tutorial (Running an example OrthoFinder analysis) we will look at how to prepare and run our own analysis and after that there’s a tutorial showing how to explore all the results.