Email updates

Keep up to date with the latest news and content from Immunome Research and BioMed Central.

This article is part of the supplement: Ninth International Conference on Bioinformatics (InCoB2010): Immunome Research

Open Access Proceedings

Clustering-based identification of clonally-related immunoglobulin gene sequence sets

Zhiliang Chen1, Andrew M Collins2, Yan Wang2 and Bruno A Gaëta1*

Author Affiliations

1 School of Computer Science and Engineering, University of New South Wales, NSW 2052, Australia

2 School of Biotechnology and Biomolecular Sciences, University of New South Wales, NSW 2052, Australia

For all author emails, please log on.

Immunome Research 2010, 6(Suppl 1):S4  doi:10.1186/1745-7580-6-S1-S4

Published: 27 September 2010

Abstract

Background

Clonal expansion of B lymphocytes coupled with somatic mutation and antigen selection allow the mammalian humoral immune system to generate highly specific immunoglobulins (IG) or antibodies against invading bacteria, viruses and toxins. The availability of high-throughput DNA sequencing methods is providing new avenues for studying this clonal expansion and identifying the factors guiding the generation of antibodies. The identification of groups of rearranged immunoglobulin gene sequences descended from the same rearrangement (clonally-related sets) in very large sets of sequences is facilitated by the availability of immunoglobulin gene sequence alignment and partitioning software that can accurately predict component germline gene, but has required painstaking visual inspection and analysis of sequences.

Results

We have developed and implemented an algorithm for identifying sets of clonally-related sequences in large human immunoglobulin heavy chain gene variable region sequence sets. The program processes sequences that have been partitioned using iHMMune-align, and uses pairwise comparisons of CDR3 sequences and similarity in IGHV and IGHJ germline gene assignments to construct a distance matrix. Agglomerative hierarchical clustering is then used to identify likely groups of clonally-related sequences. The program is available for download from http://www.cse.unsw.edu.au/~ihmmune/ClonalRelate/ClonalRelate.zip webcite.

Conclusions

The method was evaluated on several benchmark datasets and provided a more accurate and considerably faster identification of clonally-related immunoglobulin gene sequences than visual inspection by domain experts.