Mining Big Biological Data Sets: Applications of Novel Clustering Methods for Single Cell Genomics, Cancer, and Developmental Studies
The human body consists of approximately 1015 different cells. Although all these cells share almost identical DNA content they present a wide range of phenotypic characteristics: Neurons, blood cells, and muscle cells have completely different functions and behave very different from each other. This diversity starts at the very first stages of development, where the fertilized egg – which is a single cell - divides and differentiates to give rise to many different cells that together create a complete organism. Even within a specific tissue there is a high level of heterogeneity. For example, the brain, skin, and kidney in the human body consist of thousands of cells of various types that are continuously regenerating, dividing, differentiating, and communicating with each other using well controlled gene circuits that are not fully understood. In recent years it has been realized that tumors are also heterogeneous, having dangerous minority populations capable of re-creating the tumor.
All these different cell phenotypes are characterized - for the most part - by different sets of genes that each individual cell is expressing and that are related to its structure and function. Hence, the repertoire of expressed genes and their levels of expression can be used as a tool for discerning between different cells types and states.
Recent technological advances, taking advantage of the next generation sequencing (NGS) revolution, enable us to measure, by RNA-sequencing (RNA-seq), the entire gene expression profile (the “transcriptome”) of tissues and tumors down to single cell resolution. These technologies are widely used for studying complex biological processes and diseases, since a pathological state will be reflected in the cellular transcriptome.
The production of overwhelming amounts of data requires appropriate analysis tools to extract the most biologically meaningful results. One such tool is cluster analysis, whose task is to infer groups of similar objects. For example, two fundamental challenges in biology are to identify and characterize cells sub-populations in heterogeneous tissues and tumors, as well as to identify sets of related genes that act similarly under different conditions and may share similar functionality.
* The work was carried out towards the PhD degree in the Faculty of Engineering, Bar-Ilan University, under the supervision of Dr. Tomer Kalisky.