Uncharacterized protein C1orf131 is a
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
that in humans is encoded by the
gene
In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
''C1orf131''. The first ortholog of this protein was discovered in humans. Subsequently, through the use of algorithms and bioinformatics, homologs of C1orf131 have been discovered in numerous species, and as a result, the name of the majority of the proteins in this protein family is Uncharacterized protein C1orf131 homolog.
Gene
In humans ''C1orf131'' is located on the minus strand of
chromosome 1
Chromosome 1 is the designation for the largest human chromosome. Humans have two copies of chromosome 1, as they do with all of the autosomes, which are the non- sex chromosomes. Chromosome 1 spans about 249 million nucleotide base pairs, which ...
and on the cytogenetic band 1q42.2 along with 193 other genes. Notably, the gene upstream of ''C1orf131'' is ''
GNPAT'', and the gene downstream of ''C1orf131'' is
''TRIM67''. When this gene is transcribed in humans, ''C1orf131'' most often forms an mRNA of 1458 base pairs long which is composed of seven exons. There are at least nine others alternative splice forms in humans that produce proteins. They range in size from 129 base pairs (2 exons) to 1458 base pairs (7 exons).
Protein
In the C1orf131 protein family, the proteins are between 93 and 450 amino acids long; however, the majority tend to be between 160-295 amino acids long. They have a molecular weight between 10.6 and 49.0 kDa with the majority between 18.6 and 32.7 kDa. They have an isoelectric point between 9.6 and 11.2. Over 30 orthologs from mammals, birds and lizards have been identified as having a poly(A) RNA binding site.
All orthologs in this protein family have a
domain of unknown function DUF4602.
The human protein has been shown to be both phosphorylated and acetylated.
These proteins are
lysine-rich, charged amino acids (
D E H K R), and basic charged amino acids (
H K R). The secondary structure of these proteins primarily consist of alpha helices and coils with a small percentage of beta strands. C1orf131 has been shown to interact with
ubiquitin
Ubiquitin is a small (8.6 kDa) regulatory protein found in most tissues of eukaryotic organisms, i.e., it is found ''ubiquitously''. It was discovered in 1975 by Gideon Goldstein and further characterized throughout the late 1970s and 1980s. Fo ...
through affinity capture followed by
mass spectrometry and
APP (amyloid beta (A4) precursor protein) through reconstituted complex.
DUF4602
DUF4602 (PF15375) is generally 120+ amino acids long.
There is typically only one gene that contains this DUF domain;however, the DUF domain has been identified in two different proteins in several species. In ''Trichuris suis'' DUF4602 is found in both hypothetical protein M5114_09117 and tRNA pseudouridine synthase D, and in ''Echinocuccus granulosus'' DUF4602 has been found in hypothetical protein EGR 05135 and expressed conserved protein. DUF4602 has been found primarily in eukaryotes; however, DUF4602 has been identified in the virus
DRHN1,
''Bacillus sp. UNC41MFS5'', ''
Enterococcus faecalis
''Enterococcus faecalis'' – formerly classified as part of the group D ''Streptococcus'' system – is a Gram-positive, commensal bacterium inhabiting the gastrointestinal tracts of humans. Like other species in the genus ''Enterococcus'', ''E ...
'', and ''Enterococcus faecalis 13-SD-W-01''. In the C1orf131 orthologs the DUF domains are typically located in the middle of the gene toward the C-terminus side in larger proteins (250+ residues) and in smaller orthologs (160-250 residues) the DUF domain is located near the N-terminus. Also in larger orthologs there are regions of low complexity which could indicate that these proteins are
intrinsically disordered protein
In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs ran ...
s.
Evolutionary history
This gene family exists only in eukaryotes. There are no paralogs of this gene; however, there are a few
pseudogenes
Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by DNA duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are ...
of ''C1orf131''. Thus far they have only been found in orangutans, mouse lemurs, and sloths.
When this gene family is compared to cytochrome C, a slow evolving gene, and
fibrinogen gamma chain
Fibrinogen gamma chain, also known as fibrinogen gamma gene (FGG), is a human gene found on chromosome 4.
The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein composed of three pairs of nonidentical po ...
, a fast evolving gene
it is shown to evolve at a faster rate than fibrinogen.
References
{{Reflist
Genes on human chromosome 1
Uncharacterized proteins