Bioinfomatics


Bioinformatics is a technology for extracting meaningful information from a large amount of biometric data such as genome, transcriptome, and epigenome data.

When the microarray technique was introduced in the mid-1990s, a new field called bioinformatics emerged with the aim of efficient handling of biometric data, and it has advanced rapidly for about 20 years.

Currently, bioinformatics technology is essential in such processes as the inspection of the quality of raw data (FASTQ file) produced by next-generation sequencing (NGS), refining, process, and conversion to significant data.

To handle such bio big data, various interdisciplinary studies such as biology, statistics, mathematics, and computing are needed, and various tools have been developed and used as a result of many researchers’ efforts.
However, it is also an important role for bioinformatics to understand the characteristics of the bio big data produced and select tools that are suitable for each characteristic.
When NGS service is requested, the following basic analysis services are provided.

1. Whole Genome Sequencing

  • Quality inspection of the read produced
  • Corresponding species’ reference sequence and comparison and mapping
  • Corresponding species’ reference sequence and different variation extraction
  • Analysis of annotation for variation sequence
  • Provision of report for statistics and major results

2. Whole Exome Sequencing(WES)

  • Quality inspection of the read produced
  • Corresponding species’ reference sequence and comparison and mapping
  • Corresponding species’ reference sequence and different variation extraction
  • Analysis of annotation for variation sequence
  • Provision of report for statistics and major results

3. Whole Genome de novo Sequencing

  • Genome size prediction (k-mer analysis)
  • De Novo assembly
  • Genome annotation

4. RNA-Sequencing

  • Quality inspection
  • Mapping
  • Estimation of gene expression value
  • Significant genetic analysis
  • Analysis of major functions

5. de novo RNA-Sequencing

  • Quality inspection
  • Sequence Assembly & Annotation
  • Estimation of unigene expression value
  • Significant unigenes analysis
  • Analysis of major functions

6. Small RNA-Sequencing

  • Quality inspection
  • Mapping
  • Estimation of microRNA expression value
  • Significant miRNA analysis
  • Prediction of significant miRNA target

7. Single Cell RNA-Sequencing

  • Fastq Quality Control
  • Mapping & Cell Quality Control
  • Clustering & Differentially Epression
  • Report

8. Whole Genome Bisulfite Sequencing

  • Fastq Quality Control
  • Mapping
  • Cytosine extraction
  • Major methylation area analysis and annotation
  • Report

9. Chip-Sequencing

  • Fastq Quality Control
  • Mapping
  • Chip quality inspection
  • Transcription factor or histone protein binding site deduction
  • Provision of annotations for each binding site
  • Result report that includes various visualizations

10. Metagenome Sequencing

  • Removal of host genome read and quality inspection
  • Sequence Assembly
  • Prediction & annotation genes
  • Taxonomy analysis

11. 16S rDNA Metagenome

  • Quality inspection and removal of chimeric read
  • Production of Operational Taxonomic Unit (OTU)
  • Taxonomy analysis

1. Whole Genome Sequencing

  • CNV: Copy Number Variation
    In the case of the human genome, although it is composed of 2n allelomorphs that originated from each of the parents, there is a copy number variation (CNV) where one allelomorph is lost (1n) or duplicated (> 3n). As a result of analyzing the genome of East Asian people, about 5,000 CNVs were found in 3.32% of a reference human genome, which is relatively frequent. And the relationship with various carcinomas has also frequently been reported. Theragen Etex provides information on CNV based on basic genome analysis results.
  • SV: Structural Variation
    There are variations that induce changes in the structure of genes such as inversion and translocation as well as gene deletion or gene insertion caused by CNV. Theragen provides information on various SVs based on basic whole-genome analysis results.
  • Loss of Heterozygosity
    In the case of the human genome, although there is usually diversity caused by various combinations of 2n allelomorphs that originated from parents, there is a case where there is only 1n because a certain area of allelomorphs was lost. This phenomenon is referred to as “loss of heterozygosity.” In particular, such loss is often found in cancer tissues, and as it is closely related to the cancer development process, there is a need to analyze this for a disease genome.
  • Trio Analysis
    Currently, it is known that there are about 6,000 to 7,000 rare diseases. The cause of disease has been identified for just a tiny fraction of these. Intensive research is being conducted on variations that are found in patients who have genetic diseases, but not normal people in their family tree, using WES or WGS to identify the genes responsible for the disease. Analysis is conducted considering the function of variation, degree of sequence conservation, and frequency in the general population to identify such variations among countless variations. To this end, WES/WGS can provide the basis for diagnosis and treatment of patients who suffer from genetic diseases.

2. Whole Genome de novo Sequencing

  • Analysis of orthologous gene cluster
    It conducts a homology analysis of protein sequences of related species according to phylogenesis and confirms the group of genes that seem to be functionally similar.
  • Construction of mitochondria genome
    The corresponding service can be provided to genome sequencing data for fungus species and vertebrate species and it is conducted in a way of constructing mitochondria database.

3. RNA-Sequencing

  • Fusion Gene
    Fusion genes are deduced based on reads that include information on the conjugation area between gene A and gene B among the reads produced. Although there are a variety of tools for the efficient deduction of fusion genes that work as a major factor in the development of cancer, it is difficult to obtain high accuracy with a single tool. Therefore, the results of application of two or more tools are analyzed from various angles to finally deduce fusion genes.
  • Variant
    As variations in DNA are found after RNS transcription, information on variations in RNA can be analyzed like a genomic analysis. Using Samtools, information on variations in RNA-Seq data can be analyzed.
  • Gene Set Enrichment Analysis(GSEA)
    Functions that are found in specific conditions can be predicted using GSEA tool developed by the Broad Institute. Unlike the analysis of significant genes alone, it conducts analysis of expression value of whole genes. Therefore, it is a tool that is suitable in a condition where significant genes are less deduced. Broader research on functions is possible based on various kinds of signatures provided by the Broad Institute.
  • Pathway Analysis
    It analyzes whether the genes that show significant expression changes in specific conditions are involved in certain pathways. In the case of the KEGG pathway, which is typically used a lot, it can provide important pathway analysis for researchers who have obtained a license.
  • Heatmap Analysis
    Heatmap analysis is possible as one of ways to effectively visualize the expression pattern of significant genes. It provides one-way hierarchical clustering that lumps together similar gene groups based on gene expression patterns or two-way hierarchical clustering that provides sample groups according to sample similarity, and these clustering patterns are efficiently visualized by a heatmap for provision.
  • Time-Series Analysis
    In the case of the time series data, gene expression change in each time slot is provided as three groups: increase, non-change, and reduction. It helps the understanding of researchers by conducting gene ontology analysis of the genes in the groups to understand the representative function of each group.

4. mRNA-miRNA Integrated Analysis

  • If there is an expression result of a target gene of significant miRNA, an integrated analysis result is provided for only the miRNA and genes that show correlation in an opposite direction between miRNA expression and gene expression.

5. Metagenome Advanced Analytics

  • Gene Cluster
    Genes that have high sequence similarity by taxon form a cluster through homology search. The clusters that have abundant control cases are sorted out from the selected gene clusters.