Bioinfomatics

  1. Home
  2. >
  3. Services
  4. >
  5. Research
  6. >
  7. Bioinfomatics

Bioinformatics is a technology for extracting meaningful information from a large amount of biometric data such as genome, transcriptome, and epigenome data.

When the microarray technique was introduced in the mid-1990s, a new field called bioinformatics emerged with the aim of efficient handling of biometric data, and it has advanced rapidly for about 20 years.

Currently, bioinformatics technology is essential in such processes as the inspection of the quality of raw data (FASTQ file) produced by next-generation sequencing (NGS), refining, process, and conversion to significant data.

To handle such bio big data, various interdisciplinary studies such as biology, statistics, mathematics, and computing are needed, and various tools have been developed and used as a result of many researchers’ efforts.
However, it is also an important role for bioinformatics to understand the characteristics of the bio big data produced and select tools that are suitable for each characteristic.
When NGS service is requested, the following basic analysis services are provided.

1. Whole Genome Sequencing

  • Quality inspection of the read produced
  • Corresponding species’ reference sequence and comparison and mapping
  • Corresponding species’ reference sequence and different variation extraction
  • Analysis of annotation for variation sequence
  • Provision of report for statistics and major results

2. Whole Exome Sequencing(WES)

  • Quality inspection of the read produced
  • Corresponding species’ reference sequence and comparison and mapping
  • Corresponding species’ reference sequence and different variation extraction
  • Analysis of annotation for variation sequence
  • Provision of report for statistics and major results

3. Whole Genome de novo Sequencing

  • Genome size prediction (k-mer analysis)
  • De Novo assembly
  • Genome annotation

4. RNA-Sequencing

  • Quality inspection
  • Mapping
  • Estimation of gene expression value
  • Significant genetic analysis
  • Analysis of major functions

5. de novo RNA-Sequencing

  • Quality inspection
  • Sequence Assembly & Annotation
  • Estimation of unigene expression value
  • Significant unigenes analysis
  • Analysis of major functions

6. Small RNA-Sequencing

  • Quality inspection
  • Mapping
  • Estimation of microRNA expression value
  • Significant miRNA analysis
  • Prediction of significant miRNA target

7. Single Cell RNA-Sequencing

  • Fastq Quality Control
  • Mapping & Cell Quality Control
  • Clustering & Differentially Epression
  • Report

8. Whole Genome Bisulfite Sequencing

  • Fastq Quality Control
  • Mapping
  • Cytosine extraction
  • Major methylation area analysis and annotation
  • Report

9. Chip-Sequencing

  • Fastq Quality Control
  • Mapping
  • Chip quality inspection
  • Transcription factor or histone protein binding site deduction
  • Provision of annotations for each binding site
  • Result report that includes various visualizations

10. Metagenome Sequencing

  • Removal of host genome read and quality inspection
  • Sequence Assembly
  • Prediction & annotation genes
  • Taxonomy analysis

11. 16S rDNA Metagenome

  • Quality inspection and removal of chimeric read
  • Production of Operational Taxonomic Unit (OTU)
  • Taxonomy analysis

1. Whole Genome Sequencing

  • CNV (Copy Number Variation)
  • SV (Structural Variation)
  • Loss of Heterozygosity
  • Trio Analysis
  • Insertion DNA site search and CRISPR-CAS9 Target site variant detection
    When conducting Genome Modification (GM) to change traits in animal or plant genes, there used to be no system that can verify the possibility of side effects. But, for verification, Theragen Bio developed an analysis method that finds an exotic gene insertion location. The method can check whether the insertion position damages the function of the existing genes and whether exotic genes can be inserted safely.
    In addition, a technology called CRISPR, which can perform genome editing on the user’s desired sites (on-target genome editing), has been introduced recently. By using sequence listing of guide RNA (gRNA), a technology has been developed which allows users to conduct editing on their desired site (on-target genome editing). However, the side effects of conducting genome editing at a similar area (off-target site) have been reported. Therefore, in order to verify problems related to CRISPR target areas, Theragen Bio developed a verification system to check whether editing can be performed on the desired area (on-target site) and whether editing cannot be performed on unwanted similar area (off-target site).
  • GWAS (Genome-Wide Association Analysis)
    Genome-wide association study (GWAS) refers to a research method that investigates genetic elements related to diseases and drug reactivity. Recently, it is being used to study trait-related elements to breed plant and animals. By using a logic which asserts that trait diversity affects genetic polymorphism, GWAS is used to analyze qualitative traits, quantitative traits, and genetic polymorphism (single-nucleotide polymorphism, insertion/deletion polymorphism, etc.).
  • Phylogenetic and Structure Analysis
    Phylogenetic analysis is done to trace evolution paths and to classify species by using genetic differences. By using non-synonymous mutation, which refers to changes in protein sequencing that have been preserved well genetically, evolution paths are traced and species are classified. Theragen Bio conducts multiple-alignment after extracting non-synonymous mutation, and the results of phylogenetic analysis are provided through mathematical calculation depending on the results.
    Structure analysis is an algorithm that inferences the structure of a population from large-scale SNP genotype data and it is suitable for classifying each population group in terms of genetic differences. Theragen Bio provides results by classifying the genetic traits of each population group through the use of large-scale SNP genotype data.
  • SNP or InDel Primer Design (excluding repeat and multi-locus)
    Primers are designed to verify markers with PCR or Sanger-Seq about SNP or InDel or other genetic markers that are discovered through genome analysis. However, if the primer is designed in an area with repeated sequence or an area with multi-locus, unwanted areas become PCR and Sanger-Seq. Thus, verification cannot be performed. Theragen Bio constructed a pipeline that can design primers for a large amount of locus and provide quality primer design results, except primers that cannot be used for verification through repeated sequence area and multiple locus tests.

2. Whole-Genome De Novo Sequencing

  • Evolution Analysis (orthologous gene cluster, phylogenetic analysis, contraction & expansion)
    When analyzing the standard genomes of animals and plants, and when genome assembly and genome annotation are completed, an evolution analysis is usually used to check how the standard genomes are different from other species and what kind of characteristics and related genes are there when comparing the standard genome with that of other species. Theragen Bio conducts evolution analysis in three steps. In the first-step, through an analysis of orthologous gene clusters between the standard genomes of different species, an analysis is conducted to check gene clusters that are specific to a species and those that are shared with other species. In the second step, phylogenetic analysis is conducted by using well-preserved genes, and the genetic distance between species is calculated. In the last step, an analysis of cluster contraction & expansion is conducted statistically by using genetic distance and the number of cluster genes with an in silico method.
  • Construction of Mitochondrial Genome
    Mitochondrial genome with eukaryotes has mitochondria that exists as an independent cell organelle within the eukaryotes, and it consists of DNA like nucleotides. Accordingly, through DNA decryption, the structure of the mitochondrial genome can be checked, and the mitochondrial genome can be assembled for the purpose of a system analysis of species. Theragen Bio conducts the whole-genome sequencing of species for the analysis and a homology search on a mitochondria database. With an assembly method which assembles mitochondria sequencing that exhibit a high rate of similarity, the mitochondria genome is produced.
  • Pan-genome Analysis
    To estimate genetic and evolutionary differences of prokaryote genomes, orthologous gene cluster of genome between species and strains is estimated and the characteristics of genomes are analyzed on genetic differences and the gene copy number by function.
  • Polymorphic SSR Search
    Polymorphic SSR Search is an analysis to find markers by using SSR (microsatellite) on the phenotype of animal and plant genomes. The purpose is to find the sitting position of SSR which shows the differences in the numbers of the repeated sequence motifs of different phenotypes. Because NGS can produce DNA sequences in large amounts, it can be used as a tool to develop SSR (microsatellite) markers. Thus, Theragen Bio finds candidate SSR markers with an in silico method after producing DNA sequences, and then designs primers accordingly. Primers which can generate PCR products in a different area than that of repeated sequencing is removed. Therefore, Theragen Bio is conducting an analysis to find candidate SSR markers with a high success rate.

3. Metagenome Sequencing

  • Machine Learning-Based Metagenome Analysis: Association Rule Mining
    The existing method of analyzing the proportion of clusters by each environmental specimen after a metagenome analysis can be used only when there is a difference in phenotypes by each cluster, but with a random forest technique (a machine learning technique) and association rule mining, a combination close to phenotypes can be derived by making a rule of pattern by strain combination. This method uses the random forest method to select the taxon level that will best describe the phenotype. Then it selects the phenotype of each strain and the phenotypes that differ through Fisher’s exact test. Finally, it uses CPAR (Classification based on Predictive Association Rules) to machine learn the strain combinations to find the optimal strain combination.
  • Report on Probiotics Strains
    The detection of beneficial probiotics (FDA notice, 19 types) can be checked. This analysis method can check the amount of each type of probiotics strain existing in each specimen after a metagenome analysis. Theragen Bio provides information on the profiling of probiotics strains in the form of a report.

4. RNA-Sequencing

  • Fusion Gene
  • Variant
  • Gene Set Enrichment Analysis (GSEA)
  • Pathway Analysis
  • Tissue-Specific Gene Search
    Out of over three samples, significant genes that are expressed in a specific sample can be predicted. It is an appropriate analysis method in differentiating genes that express at a specific development stage or in a specific tissue out of other studies on various differentially-expressed genes.
  • Heatmap Analysis
  • Time-Series Analysis

5. mRNA-miRNA Integrated Analysis

6. Predictive Analysis of Neo-Antigens

  • As Dr. James Patrick Allison and Dr. Tasuku Honjo received the Nobel Prize in Physiology or Medicine in 2018 by developing Immune Checkpoint Inhibitors, the 3rd generation Immuno-oncologic agent is receiving a lot of attention. It is one out of various immunotherapy techniques that have been presented. Cancer vaccine treatment, which uses neo-antigens, or mutations existing inside the patient’s cancer tissue, to stimulate the patient’s immune system to treat cancer, is emerging as a hot topic.
  • To predict neo-antigens, WES (Whole Exome Sequencing) and RNASeq data are produced based on NGS from each patient’s normal tissues and cancer tissues. After checking mutation that is discovered in cancer tissue, candidate neo-antigens are selected by checking the actual expression of mutation into RNA. By selecting the mutation that is expected to have a high coherence with HLA in each patient among the selected candidate neo-antigens, the final candidate neo-antigens are provided.