Organizations need a way to archive their content to stay compliant. Here is how Squiz’s composed solutions can help you to archive more with less effort.
What is web archiving?
Organizations need a way to archive their content for a number of reasons. It could be for compliance, litigation, and overall information governance reasons.
Web archiving enables organizations to capture snapshots of their websites at a point in time, a little like the Wayback machine. This provides an audit trail of what was published and what content was customer-facing at that time. But beyond snapshots, the solution also stores the actual files in .warc format, an internationally recognized format for archiving web pages and content.
Why archive in the first place?
Some organizations, such as government institutions and higher education, must archive content as part of the legislation.
For example, as part of Higher Education consumer law in the UK, students can raise a case or dispute up to seven years after their course has been completed. Hence, all UK universities must capture and preserve this online information to prove compliance with set regulations and have adequate litigation evidence.
Similar legislation applies in Australia. According to the National Archives of Australia (NAA), activities undertaken on Australian Government websites and intranets are Commonwealth records that must be captured and managed. Content that may require archiving includes attachments, agency websites, and decommissioned websites. Therefore, government agencies and councils must have a web archive solution to remain compliant.
Access your complete digital history
Get a record of major milestones and content published on the previous iterations of your website. With web archiving, organizations can gain backups of, not only the content at a fixed point in time, but also the website’s structure, context, search history, and results (if the website has search functionality). This provides a reliable, time-stamped digital record.
Future-proof yourself from false claims
As the world becomes increasingly digital, web content is a crucial part of modern business communications. Take a retail or service delivery site, for example. The vendor may offer a promotion with specific terms and conditions to the customer. If for any reason, a claim or complaint is made against the terms of an old promotion that is no longer advertised, it's crucial that businesses keep digital records of that promotion should they need to defend themselves from litigation.
Available web archive solutions
Although web archives provide a valuable service, the current solutions in the market can be challenging.
- They are often expensive.
- They don’t always allow users to have the flexibility to choose where archived content is stored.
- Most web archive solutions don’t support archiving on-demand. You can only set a scheduled time to archive the site.
Why Squiz’s archive solution is the better choice
The Squiz web archive is a composed solution, meaning you can use it on any website. It sits outside your Content Management System (CMS), meaning it doesn’t clog up your CMS or impact performance, and saves information to your external cloud storage. Our web archive solution works by archiving content from any website (not only Matrix CMS), including third-party embedded content. Once configured, Web Archive is fully managed by Squiz and is available in one location.
Critical capabilities in Squiz’s web archive solution include:
- Archive web page on schedule or on-demand: define a specific schedule to take page snapshots, or enable it on-demand (i.e. upon a page being updated or published).
- Your choice of services: control costs by choosing where archived content is stored - AWS S3 buckets, GCP cloud buckets, or Azure buckets, among other options.
- Extendable solution: easily extend your archive library, Set up Squiz DXP Search to crawl old archived pages in your history, or create automated notifications when content is archived.
How does the web archive solution work?
If used with Squiz DXP Content Management, a scheduled event or trigger, such as a page being published, gets pushed through to Squiz DXP integrations to take the snapshots for you. Those snapshots get shipped off to your preferred cloud storage (i.e. Amazon S3 buckets), and a log is created for you to access.
If you want to go back and see what was published at a point in time, you can click on the link in the log and see the archived version of that webpage.
Web archiving solving real-world challenges
Squiz DXP helps build, manage and connect experiences across sites, portals, and apps - your way. Delivering a web archive solution to market is just another example of how Squiz DXP and its products are supporting lean teams in service-led organizations to overcome their real-world challenges and achieve more with less.
Queensland Health, a Squiz customer and early adopter of our web archiving (.warc) solution, worked with us to develop the proof of concept as they needed to better govern the content on their site. Below is an example of what their archived webpage looks like once it is retrieved using an open-source web archive viewer tool, such as replayweb.page.
Queensland Health accessed its archived website pages by:
- Accessing their archive log on their Google Sheets.
- Clicking the deep link of the archived file from the Google Sheet
- Previewing the achieved website content, as it appeared to the user at that point in time.
Who can use the solution?
Any Squiz DXP customer can use the Squiz web archive solution.
The solution can be purchased on an annual SaaS subscription plan based on usage. There is a one-time implementation fee. For more information, contact your account manager.