Content Migration
   HOME

TheInfoList



OR:

Content migration is the process of moving information stored on a given computer
information system An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, store, and distribute information. From a sociotechnical perspective, information systems are composed by four components: task, people ...
(IS) to a new system. The IS may be a
Web content management system A web content management system (WCM or WCMS) is a software content management system (CMS) specifically for web content. It provides website authoring, collaboration, and administration tools that help users with little knowledge of web programm ...
(CMS), a
digital asset management Digital asset management (DAM) and the implementation of its use as a computer application is required in the collection of digital assets to ensure that the owner, and possibly their delegates, can perform operations on the data files. Termi ...
(DAM), or a
document management system A document management system (DMS) is usually a computerized system used to store, share, track and manage files or documents. Some systems include history tracking where a log of the various versions created and modified by different users is r ...
(DMS). The IS may also be based on flat
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaSc ...
content, including HTML files,
Active Server Pages Active Server Pages (ASP) is Microsoft's first server-side scripting language and engine for dynamic web pages. It was first released in December 1996, before being superseded in January 2002 by ASP.NET. History Initially released as an a ...
(ASP),
JavaServer Pages Jakarta Server Pages (JSP; formerly JavaServer Pages) is a collection of technologies that helps software developers create dynamically generated web pages based on HTML, XML, SOAP, or other document types. Released in 1999 by Sun Microsystems, J ...
(JSP),
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
, or content stored in some type of
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaSc ...
/
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
based system and can be either static or dynamic content.


Business drivers


Reasons to consider migrating content

Content Migrations can solve a number of issues ranging from: * Consolidation from one or more CMS systems into the fewer systems. This allows for more centralized control, governance of content, and better knowledge management and sharing. * Reorganizing content due to mergers and acquisitions to assimilate as much content from the source systems for a unified look and feel. * Converting content that has grown organically either in a CMS or Flat HTML and standardizing the formatting so standards can be applied for a unified branding of the content. * Complex upgrade paths from un-supported versions can be simplified by migrating content to a newer version of the platform. * Compliance requirements might require more functionality from the underlying store, examples would be a need to audit content access, improved security or records management.


Arguments against migrating content

Content migrations entail risks. Even though some of the reasons like cost might be obvious, there are some less obvious reasons to avoid a migration exercise. These include corruption in transit and loss of context, particularly the unstructured content, which is typically one of the larger artifacts of business. There is also the risk of external references not being considered (broken links to content). The size of the data to be migrated makes the very resource-intensive (Source- Destination- Temporary- storage, network bandwidth, etc.), which means that auditing the migration process could also be complex and require consistency and traceability. Another common issue in content migration is the loss of SEO and page rank in search engines. Migrating to another location and adopting a new software means that all website URLs are going to be changed as well, hence, search engines would have to make some adjustments even if it is informed about the process. In a white paper, Oracle also outlined several issues involving the so-called people perspective. It cited the probability that people involved in the content migration might not have a thorough grasp of the history, structure, and meaning of the source data as well as the new system, which could lead not only to the loss of information but also incur additional resources. One of the methods that address the risks is the use of metadata. It is employed to describe, access, and manage records, serving as the ultimate means by which the integrity, trustworthiness, and authenticity of a record can be proven. The process, for instance, could adopt a two-track framework where one track deals with the overall content, structure, layout, and vision, while the other is focused on metadata.{{Cite book, title=Metadata and Semantic Research, last1=Sanchez-Alonso, first1=Salvador, last2=Athanasiadis, first2=Ioannis, publisher=Springer, year=2010, isbn=9783642165511, location=Berlin, pages=28


Approaches

There are many ways to access the content stored in a CMS. Depending on the CMS vendor they offer either an Application programming interface (API), Web services, rebuilding a record by writing SQL queries,
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
exports, or through the web interface. # The API requires a developer to read and understand how to interact with the source CMS’s API layer then develop an application that extracts the content and stores it in a database, XML file, or Excel. Once the content is extracted the developer must read and understand the target CMS API and develop code to push the content into the new System. The same can be said for Web Services. # Most CMSs use a database to store and associate content so if no API exists the programmer must reverse engineer the table structure. Once the structure is reverse engineered, very complex SQL queries are written to pull all the content from multiple tables into an intermediate table or into some type of
Comma-separated values A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separat ...
(CSV) or XML file. Once the developer has the files or database the developer must read and understand the target CMS API and develop code to push the content into the new System. The same can be said for Web Services. # XML export creates XML files of the content stored in a CMS but after the files are exported they need to be altered to fit the new scheme of the target CMS system. This is typically done by a developer by writing some code to do the transformation. # HTML files, JSP, ASP, PHP, or other application server file formats are the most difficult. The structure for Flat HTML files is based on a culmination of folder structure, HTML file structure, and image locations. In the early days of content migration, the developer had to use programming languages to parse the HTML files and save them as structured databases, XML, or CSV. Typically PERL, JAVA, C++, or C# were used because of the regular expression handling capability. JSP, ASP, PHP, ColdFusion, and other Application Server technologies usually rely on server-side includes helping simplify development but makes it very difficult to migrate content because the content is not assembled until the user looks at it on their web browser. This makes it very difficult to look at the files and extract the content from the file structure. # Web Scraping allows users to access most of the content directly from the Web User Interface. Since a web interface is visual (this is the point of a CMS) some Web Scrapers leverage the UI to extract the content and place it into a structure like a Database, XML, or CSV format. All CMSs, DAMs, and DMSs use web interfaces so extracting the content for one or many source sites is basically the same process. In some cases, it is possible to push the content into the new CMS using the web interface but some CMSs use JAVA applets or Active X Control which are not supported by most web scrapers. In that case, the developer must read and understand the target CMS API and develop code to push the content into the new System. The same can be said for Web Services.


The basic content migration flow

# Obtain an inventory of the content. # Obtain an inventory of Binary content like Images, PDFs, CSS files, Office Docs, Flash, and any binary objects. # Find any broken links in the content or content resources. # Determine the Menu Structure of the Content. # Find the parent/sibling connection to the content so the links to other content and resources are not broken when moving them. # Extract the Resources from the pages and store them into a Database or File structure. Store the reference in a database or a File. # Extract the HTML content from the site and store it locally. # Upload the resources to the new CMS either by using the API or the web interface and store the new location in a Database or XML. # Transform the HTML to meet the new CMSs standards and reconnect any resources. # Upload the transformed content into the new system. Old to new #Remember the content strategy on your new site can evolve as brand objectives change and as you start to understand how content performs in this new environment. It may be necessary to bring back old content that hadn’t initially been migrated — make sure you archive everything that doesn’t make the initial cut for this reason.


References

What the Content Migration APIs Are Not
/ref>


External links


No Small Task: Migrating Content to a New CMS
Data management