Coiled-coil domain 47 (CCDC47) is a gene located on human chromosome 17, specifically locus 17q23.3 which encodes for the protein CCDC47. The gene has several aliases including GK001 and MSTP041. The protein itself contains coiled-coil domains, the SEEEED superfamily, a

domain of unknown function A domain of unknown function (DUF) is a protein domain that has no characterised function. These families have been collected together in the Pfam database using the prefix DUF followed by a number, with examples being DUF2992 and DUF1220. As of 201 ...

(DUF1682) and a transmembrane domain. The function of the protein is unknown, but it has been proposed that CCDC47 is involved in calcium ion homeostasis and the endoplasmic reticulum overload response.

Gene

The CCDC47 gene itself is located on the minus strand of human chromosome 17 and contains 13

exon An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequen ...

splice sites and 14 distinct introns. After removal of exons, the gene is 3445 base pairs in length. No evidence for micro RNA or pseudogenes has been found. The gene does not have various

isoforms A protein isoform, or "protein variant", is a member of a set of highly similar proteins that originate from a single gene or gene family and are the result of genetic differences. While many perform the same or similar biological roles, some isof ...

, only transcript variant 1X exists. Chromosome 17 Diagram

Protein

Structure

The protein encoded by CCDC47 is 483 amino acids in length and contains both a

signal peptide A signal peptide (sometimes referred to as signal sequence, targeting signal, localization signal, localization sequence, transit peptide, leader sequence or leader peptide) is a short peptide (usually 16-30 amino acids long) present at the N-ter ...

and transmembrane domain. It is rich in negatively charged amino acids such as

aspartic acid Aspartic acid (symbol Asp or D; the ionic form is known as aspartate), is an α-amino acid that is used in the biosynthesis of proteins. Like all other amino acids, it contains an amino group and a carboxylic acid. Its α-amino group is in the pro ...

and

glutamic acid Glutamic acid (symbol Glu or E; the ionic form is known as glutamate) is an α-amino acid that is used by almost all living beings in the biosynthesis of proteins. It is a non-essential nutrient for humans, meaning that the human body can synt ...

giving it an acidic

isoelectric point The isoelectric point (pI, pH(I), IEP), is the pH at which a molecule carries no net electrical charge or is electrically neutral in the statistical mean. The standard nomenclature to represent the isoelectric point is pH(I). However, pI is also u ...

of 4.56. The protein is also rich in

methionine Methionine (symbol Met or M) () is an essential amino acid in humans. As the precursor of other amino acids such as cysteine and taurine, versatile compounds such as SAM-e, and the important antioxidant glutathione, methionine plays a critical ro ...

. In total, it weighs 55.9 kDal which is conserved through various orthologs. CCDC47 also contains the SEEEED superfamily and domain of unknown function 1682 (DUF1682). The SEEEED superfamily is a short, low complexity region which is composed mainly of serine. The family routinely lies on the clathrin adaptor complex 3 beta-1 subunit proteins. The exact function of DUF 1682 is unclear but one member of the family has been described as an adipocyte-specific protein. C Terminus CCDC47

There are two predicted disulfide bonds in the structure of CCDC47 at cysteines 209 to 214 and cysteines 215 to 283, respectively. The C-terminal portion of the protein is highly charged and its

secondary structure Protein secondary structure is the three dimensional conformational isomerism, form of ''local segments'' of proteins. The two most common Protein structure#Secondary structure, secondary structural elements are alpha helix, alpha helices and beta ...

is predicted to be that of an

alpha helix The alpha helix (α-helix) is a common motif in the secondary structure of proteins and is a right hand-helix conformation in which every backbone N−H group hydrogen bonds to the backbone C=O group of the amino acid located four residues e ...

region. This region also contains coiled coil domains which are structural motifs in which 2-7 alpha helices are coiled together and are subsequently involved in biological expression. These domains typically follow the pattern HxxHCxC where H is a hydrophobic amino acid, C is a charged amino acid and x is any amino acid. Many amino acid sequences following this pattern are seen in the C-terminal region of CCDC47 where the highest conservation through orthologs is represented. CCDC47 Protein

Regulation and translation

CCDC47 is regulated by the promoter GXP43413. The promoter is 819 base pairs in length and is highly conserved in mammals. Conserved binding sites in mammals which are located on this promoter include Nuclear Respiratory Factor 1 (NFR1), cAMP-responsive element binding protein (CREB), PAR b ZIP family and Sp4 Transcription Factor. NRF1 encodes a protein which homodimerizes and activates expression of key metabolic genes. CREB binds to cAMP response elements thereby increasing or decreasing the transcription of downstream genes while PAR b ZIP family is involved in the regulation of circadian rhythms. In regards to the

mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...

translation Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...

begins at base pair 337 and ends at 1728. There is a strong

stem loop Stem-loop intramolecular base pairing is a pattern that can occur in single-stranded RNA. The structure is also known as a hairpin or hairpin loop. It occurs when two regions of the same strand, usually complementary in nucleotide sequence when ...

located in the 5' UTR region from bases 289-318 which likely is involved in regulation of the mRNA due to its close proximity to the

start codon The start codon is the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and Archaea and a N-formylmethionine (fMet) in bacteria, mitochondria and plastids. The ...

Cellular distribution

The final protein is thought to be translated from the endoplasmic reticulum into the cytoplasm of the cell. The protein is anchored in the membrane of the ER at the transmembrane domain located from amino acid 137 to 165. The portion of the protein which extends into the cytosol is predicted to be highly phosphorylated as the protein's phosphorylation sites are conserved into the bony fish orthologs. Research has shown that CCDC47 is expressed in the response to an ER overload making this close proximity to the ER important.

Post translational modification

In addition to the high levels of

phosphorylation In chemistry, phosphorylation is the attachment of a phosphate group to a molecule or an ion. This process and its inverse, dephosphorylation, are common in biology and could be driven by natural selection. Text was copied from this source, wh ...

seen in CCDC47, three sulfation sites are predicted and conserved in mammals, reptiles and birds but not in fish, amphibians or invertebrates. Five potential

sumoylation In molecular biology, SUMO (Small Ubiquitin-like Modifier) proteins are a family of small proteins that are covalently attached to and detached from other proteins in cells to modify their function. This process is called SUMOylation (sometimes w ...

sites are also seen and conserved back to the bony fish. There is no

glycosylation Glycosylation is the reaction in which a carbohydrate (or ' glycan'), i.e. a glycosyl donor, is attached to a hydroxyl or other functional group of another molecule (a glycosyl acceptor) in order to form a glycoconjugate. In biology (but not al ...

of the protein as it is not predicted to extend into the extracellular portion of the cell.

Expression

Microarray A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silicon t ...

tissue expression patterns from GEO were analyzed and showed that CCDC47 appears to be an ubiquitously expressed at moderate levels in many different human tissues. Although the protein is ubiquitously expressed, the highest levels of expression are seen in neuronal tissues such as the

superior cervical ganglion The superior cervical ganglion (SCG) is part of the autonomic nervous system (ANS); more specifically, it is part of the sympathetic nervous system, a division of the ANS most commonly associated with the fight or flight response. The ANS is comp ...

, brain amygdala and ciliary ganglion. Elevated expression is also seen in the

thyroid The thyroid, or thyroid gland, is an endocrine gland in vertebrates. In humans it is in the neck and consists of two connected lobes. The lower two thirds of the lobes are connected by a thin band of tissue called the thyroid isthmus. The thy ...

and CD34+ cells.

Homology

CCDC47 has no known paralogs through text based queries, BLAST and BLAT. The gene has many orthologs extending back to invertebrates such as ''C. elegans'' and is highly conserved in mammals with a percent identity greater than 95%. CCDC47 has been sequenced in a wide taxonomy of organisms including mammals, birds, reptiles, amphibians, bony fish and invertebrates. Percent identity of human CCDC47 to a specific ortholog declines with increasing years of divergence, as expected. Homologous genes of CCDC47 are also present in mosquitos, mushrooms, arabidopsis and Asian rice. These homologs contain the same DUF1682 which is found in CCDC47.

References

External links

* {{UCSC gene info, CCDC47 Genes on human chromosome 17 Human proteins