SNED1 (Sushi, Nidogen, and EGF-like Domains) is an

extracellular matrix In biology, the extracellular matrix (ECM), also called intercellular matrix, is a three-dimensional network consisting of extracellular macromolecules and minerals, such as collagen, enzymes, glycoproteins and hydroxyapatite that provide stru ...

(ECM) protein expressed at low levels in a wide range of tissues. The gene encodin
SNED1
is located in the human chromosome 2 at locus q37.3. The corresponding mRNA isolated from the spleen and is 6834bp in length, and the corresponding protein i
1413 amino-acid long
The mouse ortholog of SNED1 was cloned in 2004 from the embryonic kidney by Leimester et al. SNED1 present domains characteristic of ECM proteins, including an amino-terminal NIDO domain, several calcium binding

EGF-like domain The EGF-like domain is an evolutionary conserved protein domain, which derives its name from the epidermal growth factor where it was first described. It comprises about 30 to 40 amino-acid residues and has been found in a large number of mostl ...

s (EGF_CA), a Sushi domain also known as complement control protein (CCP) domain, and three type III fibronectin ( FN3) domains in the carboxy-terminal region.

Gene

Locus

SNED1 is located on the plus strand of chromosome 2 at locus 2q37.3. The Refseq identification number i
NM_001080437.3
The genomic DNA sequence of SNED1 contains 98,159bp and the longest spliced mRNA as predicted by AceView is 7048bp and contains 31 exons. There are 9 predicted splice variants of SNED1 that exhibited protein structure matches using the Phyre 2 database which is discussed under "Tertiary and Quaternary Structure".

Common aliases

SNED1 is an acronym for Sushi, Nidogen, and EGF-like Domains 1. Obsolete aliases for SNED1 include Snep, SST3, and IRE-BP1.

Homology/evolution

Homologs and phylogeny

SNED1 is highly conserved throughout evolutionary history and is shown to exhibit this conservation across vertebrates including fish, reptiles, amphibians, birds, and mammals. It is unclear that SNED1 is conserved in invertebrates, but protein domains found in SNED1 are also found in invertebrates. It may be worth noting that the abundance of cysteine residues, mostly located within EGF-like domains where they form disulfide bonds, appears to be very highly conserved, suggesting that the cysteine richness is a very important feature of this protein.

Paralogs

SNED1 has several paralogs within the human genome, which cover small portions of the entire peptide sequence. Genes encoding proteins sharing domains (EGF-like, Sushi) with SNED 1 include the neurogenic locus notch homolog (NOTCH) proteins, the jagged proteins, eyes shut homolog proteins, the crumbs homolog proteins, delta and notch-like epidermal growth factor receptors, the sushi von Wilebrand factor A protein (SVEP1), and slit homolog three protein.

Protein

Primary sequence

The Protein Knowledge Database, UniProt, reports that the full length SNED1 protein is 1413 amino-acid long
UniProt Q8TER0
. The full sequence obtained by an NCBI BLAST search can be accessed with the reference I
NP_001073906.1
One presumably important feature of this protein that is worth noting is that it is extraordinarily cysteine rich, with 107 cysteines total, giving an overall cysteine composition of 13.2%.

Domains and motifs

SNED1 is a secreted protein of the extracellular matrix. It contains a signal peptide (amino acid 1-24) directing the protein to the secretory pathway. Precise prediction of domain boundaries can be obtained using th
InterPro domain database
o
SMART
There are various interesting domains in this protein. The first in the annotated sequence above shown in pink, is th
NIDO domain
also found in the Nidogen-1 protein, also known as

Entactin Nidogen-1 (NID-1), formerly known as entactin, is a protein that in humans is encoded by the ''NID1'' gene. Both nidogen-1 and nidogen-2 are essential components of the basement membrane alongside other components such as type IV collagen, proteogl ...

. Other than SNED1, this domain is shared with only four human proteins: the basement membrane proteins nidogen-1, nidogen-2, and alpha-tectorin; and mucin-4, which has been demonstrated to play a role in promoting pancreatic cancer metastasis. The second regions of interest shown by an underline are calcium-binding EGF domain (EGF-CA). There are many of these domains in the sequence and they are often present in a large number of membrane bound and extracellular proteins. These EGF-CA domains may suggest a "sticky" nature to this protein as oftentimes

(ECM) proteins require calcium cations to form homo- and hetero-dimeric complexes between other ECM proteins. The Sushi domain or

complement control protein Complement control protein are proteins that interact with components of the complement system. The complement system is tightly regulated by a network of proteins known as "regulators of complement activation (RCA)" that help distinguish target ...

(CCP) motif is annotated in green in the figure and this domain has been identified in many proteins involved in the complement system. Other aliases for this domain include short consensus repeats (SCRs) and the

Sushi domain Sushi domain is an evolutionarily conserved protein domain. It is also known as Complement control protein (CCP) modules or short consensus repeats (SCR). The name derives from the visual similarity of the domain to nigiri sushi when the prim ...

, from which the protein gets its name. The

Fibronectin type III domain The Fibronectin type III domain is an evolutionarily conserved protein domain that is widely found in animal proteins. The fibronectin protein in which this domain was first identified contains 16 copies of this domain. The domain is about 100 am ...

(FN3) is annotated in blue and the presence of this domain may suggest one of the properties of this protein as being involved in cell adhesion. SNED1 contains an RGD and a LDV sequence, important in the binding of other ECM proteins to

integrin Integrins are transmembrane receptors that facilitate cell-cell and cell-extracellular matrix (ECM) adhesion. Upon ligand binding, integrins activate signal transduction pathways that mediate cellular signals such as regulation of the cell cycle, ...

s that are proteins found in cell membranes, an mediate cell-ECM interactions.

Post-translational modifications

13 N-glycosylation sites are predicted in the sequence of SNED1, and the presence of N-linked sites has been determined experimentally. SNED1 also has several predicted attachment sites for O-linked glycans and glycosaminoglycans, but these have not yet been validated experimentally at this time. There was only a few post-translational kinase dependant phosphorylation sites worth noting that resulted in a score of >0.8 by the NetPhosK program in the ExPASy Bioinformatics suite proteomics tools. These sites are annotated with yellow highlight in the conceptual translation above. All of these sites are predicted to be phosphorylated by either

Protein kinase A In cell biology, protein kinase A (PKA) is a family of enzymes whose activity is dependent on cellular levels of cyclic AMP (cAMP). PKA is also known as cAMP-dependent protein kinase (). PKA has several functions in the cell, including regulatio ...

(PKA) or

Protein kinase C In cell biology, Protein kinase C, commonly abbreviated to PKC (EC 2.7.11.13), is a family of protein kinase enzymes that are involved in controlling the function of other proteins through the phosphorylation of hydroxyl groups of serine and t ...

(PKC). Experimental evidence exists for phosphorylation at 12 residues: 5 serine, 5 threonine, and 2 tyrosine residues.

Secondary structure

The amino acid sequence of the longest variant is incredibly cysteine rich, presumably resulting in a large amount of disulfide bond formation. The beta sheets are annotated as purple text in the conceptual translation and the alpha-helices are annotated as red text. The percentage of intrinsic disorder of processed human SNED1 (residues 25–1413) predicted by IUPred2A is 15.3%. A large proportion of random coil (73%) was predicted in SNED1 together with 26% of β-strands, and 1% of helix corresponding to a sequence found in the amino-terminal region of SNED1

Tertiary and quaternary structure

'' his section needs referencing to figures and experimental demonstration' The program Phyre2 was used to construct predictions of both the conserved domain regions NIDO, CCP, and FN3, as well as each of the splice variants. There were some interesting results consistent with the proposed function of an extracellular "sticky" protein possibly involved in cell-cell adhesion or in clotting. Protein matches found in Phyre2 comprise an array of proteins with functions of; clotting, hydrolysis, plasminogen activation, hormone/growth factor, protein binding, cell-adhesion, and ECM proteins. Splice variants a, b, and e, ihave >99% structural similarity to the protein neurexin 1-alpha (

NRXN1 Neurexin-1-alpha is a protein that in humans is encoded by the ''NRXN1'' gene. Neurexins are a family of proteins that function in the vertebrate nervous system as cell adhesion molecules and receptors. They are encoded by several unlinked genes o ...

). Neurexins are cell adhesion molecules and often contain EGF binding domains, enhancing intracellular junction forming between cells. NRXN1 is also proposed to play a role in angiogenesis. Alpha-neurexins interact with neurexophilins and possibly function in the synaptic junctions of the vertebrate nervous system. Alpha neurexins often utilize alternate promoters and splice sites, resulting in many different transcripts from one gene, may be an explanation of this gene's abundance of alternative transcripts. Splice variant d has a 100% structural match to

Low density lipoprotein receptor-related protein 4 Low-density lipoprotein receptor-related protein 4 (LRP-4), also known as multiple epidermal growth factor-like domains 7 (MEGF7), is a protein that in humans is encoded by the ''LRP4'' gene. LRP-4 is a member of the Lipoprotein receptor-related p ...

(LRP4). This protein is involved in SOST-mediated bone formation inhibition and inhibition of Wnt signaling. LRP4 plays an important role in the formation of neuromuscular junctions. Splice variants f and g have >99% similarity to fibrillin-1, an ECM protein that is a structural component of calcium binding microfibrils. Splice variant i and conserved domain CCP are >99% structurally similar to t-plasminogen activator (PLAT). PLAT is secreted by vascular endothelial cells and acts as a serine protease that converts plasminogen to plasmin.

Plasmin Plasmin is an important enzyme () present in blood that degrades many blood plasma proteins, including fibrin clots. The degradation of fibrin is termed fibrinolysis. In humans, the plasmin protein (in the zymogen form of plasminogen) is encoded ...

is a

fibrolytic Fibrolytic bacteria constitute a group of microorganisms that are able to process complex plant polysaccharides thanks to their capacity to synthesize cellulolytic and hemicellulolytic enzymes. Polysaccharides are present in plant cellular cell wa ...

enzyme that aids in the breakdown of blood clots and is used clinically for that exact purpose. The conserved domain NIDO, was >99% similar to coagulation factor IX, also known as

Factor IX Factor IX (or Christmas factor) () is one of the serine proteases of the coagulation system; it belongs to peptidase family S1. Deficiency of this protein causes haemophilia B. It was discovered in 1952 after a young boy named Stephen Christmas w ...

(F9). F9 is a secreted coagulation factor involved in the clotting cascade that required activation by multiple other coagulation factors within the cascade. The 3 consecutive conserved FN3 domains together are 100% similar with 100% coverage to anosmin 1. Anosmin-1 is an ECM glycoprotein responsible for normal neural development of the brain, spinal cord and kidney.

Interacting proteins

Computational prediction by several databases, focusing on secreted proteins and membrane proteins, resulted in the prediction of 114 unique interactions by at least one algorithm, including SNED1 auto-interaction. More than half of the protein partners of SNED1 were annotated as membrane proteins in UniProtKB. 47 extracellular proteins were identified as SNED1 binding partners, including 30 core matrisome proteins, 10 matrisome-associated proteins, and seven secreted proteins. Among the 30 matrisome proteins are 6

collagens Collagen () is the main structural protein in the extracellular matrix found in the body's various connective tissues. As the main component of connective tissue, it is the most abundant protein in mammals, making up from 25% to 35% of the whole ...

: COL6A3, found in basement membranes and other ECMs, COL7A1, and the Fibril-Associated Collagens with Interrupted triple-helices (FACITS), all containing a thrombospondin domain, COL12A1, COL14A1, COL16A1, COL20A1); and a number of ECM glycoproteins: 4 tenascins (TNC, TNN, TNR, and TNXB), fibronectin (FN1), the latent-TGFβ binding protein 2 (LTBP2), and the

basement membrane The basement membrane is a thin, pliable sheet-like type of extracellular matrix that provides cell and tissue support and acts as a platform for complex signalling. The basement membrane sits between Epithelium, epithelial tissues including mesot ...

glycoproteins nidogens 1 and 2. Independently, the STRING-Known and Predicted Protein Interaction database was used to determine proteins that may be interacting and the following proteins were candidates for interaction:

somatostatin Somatostatin, also known as growth hormone-inhibiting hormone (GHIH) or by several other names, is a peptide hormone that regulates the endocrine system and affects neurotransmission and cell proliferation via interaction with G protein-couple ...

(SST),

somatostatin receptor 2 Somatostatin receptor type 2 is a protein that in humans is encoded by the ''SSTR2'' gene. The SSTR2 gene is located on chromosome 17 on the long arm in position 25.1 in humans. It is also found in most other vertebrates. The somatostatin recep ...

(SSTR2)as well as a variety of other somatostatin receptors,

spermine synthase Spermine synthase (, ''spermidine aminopropyltransferase'', ''spermine synthetase'') is an enzyme that converts spermidine into spermine. This enzyme catalyses the following chemical reaction : S-adenosylmethioninamine + spermidine \rightleftharp ...

(SMS), and TMEM132C. All of the somatostatin related proteins are involved in the inhibition of hormones. There is very little known about TMEM132C and all publications related to the protein are mass genome screens. The protein expression profiles of TMEM132C and SNED1 are very similar to SNED1, with protein abundance found in blood plasma, platelets, and liver. All of the interacting proteins described are expressed in these three common areas.

Expression

SNED1 is ubiquitously expressed at low to intermediate levels in adult tissues, making it unclear from RNA expression profiles, which cells are secreting SNED1 in tissues. Experimental data obtained in mice have shown that the ''Sned1'' promoter is broadly active during embryogenesis, particularly in the limb buds, tail, sclerotome, vertebrate and ribs, lung, kidney, adrenal gland, cerebellum, choroid plexus, and head mesenchyme. The protein expression profiles of SNED1 predicted with MOPED-Multi-Omics Profiling Expression Database and PaxB-Protein Abundance Across Organisms database indicate that the protein is found in blood serum, blood plasma, blood T-lymphocytes, platelets, kidney Hek-293 cells, liver, and low levels in the brain.

Transcript variants

The program Aceview was used to predict transcript variants, shown in Figure 6. There are 9 spliced forms and 3 unspliced forms. Three of the transcript variants, b, c, and e, contain green regions that represent uORFs which indicate that they contain regulatory elements within the coding region of the transcript. All of the spliced transcript variants a-i were analyzed with the Phyre2 server to predict protein structure. See, "Tertiary and Quaternary Structure". The existence of the splice variants are has not been yet validated experimentally.

Promoter

The promoter was predicted and analyzed for

transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The fu ...

binding sites using the ElDorado software on the Genomatix software suite. There were alternative promoters downstream of the selected 845bp promoter.

Transcription factors

The following transcription factors were found with a matrix similarity of 1.00 and the entire binding domain was matched in the ElDorado predicted promoter.

Protein functions and Clinical significance

A select cases on NCBI's GeoProfiles highlighted some clinically relevant expression data regarding SNED1 expression levels in response to certain conditions. In aldosterone producing adenoma versus control lung tissue, SNED1 expression decreased about 25 fold in the adenoma tissue. In a development study on the transition from oligodendrocyte precursors to mature oligodendrocytes, expression decreased almost 100 fold upon differentiation into mature oligodendrocytes. It may be interesting to explore the expression in clotting disorders or other blood related diseases. A seminal study published in 2014 has demonstrated that SNED1 was a promoter of breast cancer metastasis. The recent generation of a ''Sned1'' knockout mouse model is also shedding light on the multiple roles of SNED1 in development and physiology. The global Sned1 knockout leads to early post-natal lethality and severe craniofacial and skeletal anomalies, indicating that ''Sned1'' is an essential gene.

References

{{reflist, 2 Extracellular matrix proteins Glycoproteins