C1orf131 Overview1
   HOME

TheInfoList



OR:

Uncharacterized protein C1orf131 is a
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
that in humans is encoded by the
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
''C1orf131''. The first ortholog of this protein was discovered in humans. Subsequently, through the use of algorithms and bioinformatics, homologs of C1orf131 have been discovered in numerous species, and as a result, the name of the majority of the proteins in this protein family is Uncharacterized protein C1orf131 homolog.


Gene

In humans ''C1orf131'' is located on the minus strand of
chromosome 1 Chromosome 1 is the designation for the largest human chromosome. Humans have two copies of chromosome 1, as they do with all of the autosomes, which are the non-sex chromosomes. Chromosome 1 spans about 249 million nucleotide base pairs, which ar ...
and on the cytogenetic band 1q42.2 along with 193 other genes. Notably, the gene upstream of ''C1orf131'' is ''
GNPAT Glyceronephosphate O-acyltransferase is an enzyme associated with Rhizomelic chondrodysplasia punctata type 2. GNPAT is located on chromosome 1 on the plus strand. The gene C1orf131 is located directly upstream of it, and the closest downstream ge ...
'', and the gene downstream of ''C1orf131'' is ''TRIM67''. When this gene is transcribed in humans, ''C1orf131'' most often forms an mRNA of 1458 base pairs long which is composed of seven exons. There are at least nine others alternative splice forms in humans that produce proteins. They range in size from 129 base pairs (2 exons) to 1458 base pairs (7 exons).


Protein

In the C1orf131 protein family, the proteins are between 93 and 450 amino acids long; however, the majority tend to be between 160-295 amino acids long. They have a molecular weight between 10.6 and 49.0 kDa with the majority between 18.6 and 32.7 kDa. They have an isoelectric point between 9.6 and 11.2. Over 30 orthologs from mammals, birds and lizards have been identified as having a poly(A) RNA binding site. All orthologs in this protein family have a
domain of unknown function A domain of unknown function (DUF) is a protein domain that has no characterised function. These families have been collected together in the Pfam database using the prefix DUF followed by a number, with examples being DUF2992 and DUF1220. As of 201 ...
DUF4602. The human protein has been shown to be both phosphorylated and acetylated. These proteins are
lysine Lysine (symbol Lys or K) is an α-amino acid that is a precursor to many proteins. It contains an α-amino group (which is in the protonated form under biological conditions), an α-carboxylic acid group (which is in the deprotonated −C ...
-rich, charged amino acids ( D E H K R), and basic charged amino acids ( H K R). The secondary structure of these proteins primarily consist of alpha helices and coils with a small percentage of beta strands. C1orf131 has been shown to interact with
ubiquitin Ubiquitin is a small (8.6 kDa) regulatory protein found in most tissues of eukaryotic organisms, i.e., it is found ''ubiquitously''. It was discovered in 1975 by Gideon Goldstein and further characterized throughout the late 1970s and 1980s. Fo ...
through affinity capture followed by
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is use ...
and APP (amyloid beta (A4) precursor protein) through reconstituted complex.


DUF4602

DUF4602 (PF15375) is generally 120+ amino acids long. There is typically only one gene that contains this DUF domain;however, the DUF domain has been identified in two different proteins in several species. In ''Trichuris suis'' DUF4602 is found in both hypothetical protein M5114_09117 and tRNA pseudouridine synthase D, and in ''Echinocuccus granulosus'' DUF4602 has been found in hypothetical protein EGR 05135 and expressed conserved protein. DUF4602 has been found primarily in eukaryotes; however, DUF4602 has been identified in the virus DRHN1, ''Bacillus sp. UNC41MFS5'', ''
Enterococcus faecalis ''Enterococcus faecalis'' – formerly classified as part of the group D ''Streptococcus'' system – is a Gram-positive, commensal bacterium inhabiting the gastrointestinal tracts of humans. Like other species in the genus ''Enterococcus'', ''E ...
'', and ''Enterococcus faecalis 13-SD-W-01''. In the C1orf131 orthologs the DUF domains are typically located in the middle of the gene toward the C-terminus side in larger proteins (250+ residues) and in smaller orthologs (160-250 residues) the DUF domain is located near the N-terminus. Also in larger orthologs there are regions of low complexity which could indicate that these proteins are
intrinsically disordered protein In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs rang ...
s.


Evolutionary history

This gene family exists only in eukaryotes. There are no paralogs of this gene; however, there are a few
pseudogenes Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by DNA duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are ...
of ''C1orf131''. Thus far they have only been found in orangutans, mouse lemurs, and sloths. When this gene family is compared to cytochrome C, a slow evolving gene, and
fibrinogen gamma chain Fibrinogen gamma chain, also known as fibrinogen gamma gene (FGG), is a human gene found on chromosome 4. The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein composed of three pairs of nonidentical po ...
, a fast evolving gene it is shown to evolve at a faster rate than fibrinogen.


References

{{Reflist Genes on human chromosome 1 Uncharacterized proteins