K-mers
   HOME
*



picture info

K-mers
In bioinformatics, ''k''-mers are substrings of length k contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis, in which ''k''-mers are composed of nucleotides (''i.e''. A, T, G, and C), ''k''-mers are capitalized upon to assemble DNA sequences, improve heterologous gene expression, identify species in metagenomic samples, and create attenuated vaccines. Usually, the term ''k''-mer refers to all of a sequence's subsequences of length k, such that the sequence AGAT would have four monomers (A, G, A, and T), three 2-mers (AG, GA, AT), two 3-mers (AGA and GAT) and one 4-mer (AGAT). More generally, a sequence of length L will have L - k + 1 ''k''-mers and n^ total possible ''k''-mers, where n is number of possible monomers (e.g. four in the case of DNA). Introduction ''k''-mers are simply length k subsequences. For example, all the possible ''k''-mers of a DNA sequence are shown below: A method of visualizin ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

K-mer Diagram
In bioinformatics, ''k''-mers are substrings of length k contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis, in which ''k''-mers are composed of nucleotides (''i.e''. A, T, G, and C), ''k''-mers are capitalized upon to assemble DNA sequences, improve heterologous gene expression, identify species in metagenomic samples, and create attenuated vaccines. Usually, the term ''k''-mer refers to all of a sequence's subsequences of length k, such that the sequence AGAT would have four monomers (A, G, A, and T), three 2-mers (AG, GA, AT), two 3-mers (AGA and GAT) and one 4-mer (AGAT). More generally, a sequence of length L will have L - k + 1 ''k''-mers and n^ total possible ''k''-mers, where n is number of possible monomers (e.g. four in the case of DNA). Introduction ''k''-mers are simply length k subsequences. For example, all the possible ''k''-mers of a DNA sequence are shown below: A method of visualizin ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Computational Genomics
Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data (i.e., experimental data obtained with technologies that require the genome sequence, such as genomic DNA microarrays). These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes (rather than individual genes) to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biologi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Sequence Assembly
In bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology might not be able to 'read' whole genomes in one go, but rather reads small pieces of between 20 and 30,000 bases, depending on the technology used. Typically, the short fragments (reads) result from shotgun sequencing genomic DNA, or gene transcript ( ESTs). The problem of sequence assembly can be compared to taking many copies of a book, passing each of them through a shredder with a different cutter, and piecing the text of the book back together just by looking at the shredded pieces. Besides the obvious difficulty of this task, there are some extra practical issues: the original may have many repeated paragraphs, and some shreds may be modified during shredding to have typos. Excerpts from another book may also be added in, and some shreds may be completely unrecognizable. Genom ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Binning (metagenomics)
In metagenomics, binning is the process of grouping reads or contigs and assigning them to individual genome. Binning methods can be based on either compositional features or alignment (similarity), or both. Introduction Metagenomic samples can contain reads from a huge number of organisms. For example, in a single gram of soil, there can be up to 18000 different types of organisms, each with its own genome. Metagenomic studies sample DNA from the whole community, and make it available as nucleotide sequences of certain length. In most cases, the incomplete nature of the obtained sequences makes it hard to assemble individual genes, much less recovering the full genomes of each organism. Thus, binning techniques represent a "best effort" to identify reads or contigs within certain genome known as Metagenome Assembled Genome (MAG). Taxonomy of MAGs can be inferred through placement into reference phylogenetic tree using algorithms like GTDB-Tk. The first studies that sampled DN ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Methylation
In the chemical sciences, methylation denotes the addition of a methyl group on a substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replacing a hydrogen atom. These terms are commonly used in chemistry, biochemistry, soil science, and the biological sciences. In biological systems, methylation is catalyzed by enzymes; such methylation can be involved in modification of heavy metals, regulation of gene expression, regulation of protein function, and RNA processing. In vitro methylation of tissue samples is also one method for reducing certain histological staining artifacts. The reverse of methylation is demethylation. In biology In biological systems, methylation is accomplished by enzymes. Methylation can modify heavy metals, regulate gene expression, RNA processing and protein function. It has been recognized as a key process underlying epigenetics. Methanogenesis Methanogenesis, the process th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Deamination
Deamination is the removal of an amino group from a molecule. Enzymes that catalyse this reaction are called deaminases. In the human body, deamination takes place primarily in the liver, however it can also occur in the kidney. In situations of excess protein intake, deamination is used to break down amino acids for energy. The amino group is removed from the amino acid and converted to ammonia. The rest of the amino acid is made up of mostly carbon and hydrogen, and is recycled or oxidized for energy. Ammonia is toxic to the human system, and enzymes convert it to urea or uric acid by addition of carbon dioxide molecules (which is not considered a deamination process) in the urea cycle, which also takes place in the liver. Urea and uric acid can safely diffuse into the blood and then be excreted in urine. Deamination reactions in DNA Cytosine Spontaneous deamination is the hydrolysis reaction of cytosine into uracil, releasing ammonia in the process. This can occur in vitro thr ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Amino Acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha amino acids appear in the genetic code. Amino acids can be classified according to the locations of the core structural functional groups, as Alpha and beta carbon, alpha- , beta- , gamma- or delta- amino acids; other categories relate to Chemical polarity, polarity, ionization, and side chain group type (aliphatic, Open-chain compound, acyclic, aromatic, containing hydroxyl or sulfur, etc.). In the form of proteins, amino acid '' residues'' form the second-largest component (water being the largest) of human muscles and other tissues. Beyond their role as residues in proteins, amino acids participate in a number of processes such as neurotransmitter transport and biosynthesis. It is thought that they played a key role in enabling life ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

CpG Site
The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG islands (or CG islands). Cytosines in CpG dinucleotides can be methylated to form 5-methylcytosines. Enzymes that add a methyl group are called DNA methyltransferases. In mammals, 70% to 80% of CpG cytosines are methylated. Methylating the cytosine within a gene can change its expression, a mechanism that is part of a larger field of science studying gene regulation that is called epigenetics. Methylated cytosines often mutate to thymines. In humans, about 70% of promoters located near the transcription start site of a gene (proximal promoters) contain a CpG island. CpG characteristics Definition ''CpG'' is shorthand for ''5'—C—phosphate—G—3' '', that is, cytosine and guanine separated by only one phosphate group; phosphate ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Single Nucleotide Polymorphism
In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population (e.g. 1% or more), many publications do not apply such a frequency threshold. For example, at a specific base position in the human genome, the G nucleotide may appear in most individuals, but in a minority of individuals, the position is occupied by an A. This means that there is a SNP at this specific position, and the two possible nucleotide variations – G or A – are said to be the alleles for this specific position. SNPs pinpoint differences in our susceptibility to a wide range of diseases, for example age-related macular degeneration (a common SNP in the CFH gene is associated with increased risk of the disease) or nonalcoholic fatty liver disease (a SNP in the PNPLA3 gene is associated with incr ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


CG Suppression
CG suppression is a term for the phenomenon that CG dinucleotides are very uncommon in most portions of vertebrate genomes. In adult somatic tissues, cytosine residues may be methylated, and this occurs almost exclusively within a symmetric CpG context. Methylated C residues spontaneously deaminate to form T residues; hence CpG dinucleotides steadily mutate to TpG dinucleotides, which gives rise to the under-representation of CpG dinucleotides in the human genome (they occur at only 21% of the expected frequency). (On the other hand, spontaneous deamination of unmethylated C residues gives rise to U residues, a mutation that is quickly recognized and repaired by the cell). In human and mouse, CGs are the least frequent dinucleotide, making up less than 1% of all dinucleotides. GCs are the second most infrequent, making up more than 4% of all dinucleotides, so CGs are more than fourfold less frequent than all other dinucleotides. See also *CpG island The CpG sites or CG ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Non-coding DNA
Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regulatory RNAs). Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Fraction of non-coding genomic DNA In bacteria, the coding regions typically take up 88 % of the genome. The remaining 12 % consists largely of non-coding genes and regulatory sequences, which means that almost all of the bacterial genome has a function. The amount of coding DNA in eukaryrotes is usually a much smaller fraction of the genome because eukaryotic genomes contai ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Coding Region
The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy. Definition Although this term is also sometimes used interchangeably with exon, it is not the exact same thing: the exon is composed of the coding region as well as the 3' and 5' untranslated regions of the RNA, and so therefore, an exon would be partially made up of coding regions. The 3' and 5' untranslated regions of the RNA, which do not code for protein, are termed non-coding regions and are not discussed on this page. There is often confusion between coding region ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]