National Records of Scotland is today celebrating World Digital Preservation Day.
We caught up with our web continuity assistant Barbara Fuentes at West Register House, who told us about her work on archiving websites, helping to ensure that digital records are captured and preserved for the future…
NRS has been archiving paper records for centuries but in recent years, public bodies have begun to produce a huge amount of information for an online audience.
This helps the public access services they need but it presents a big challenge for NRS as websites are regularly updated and replaced, with the risk that data from old sites may be lost forever.
You can see this if you try to find certain websites from the 1990s and the early 2000s, where information is no longer accessible and what is available is strewn with broken links to pages that no longer exist. It’s archivist’s nightmare!
If future generations want to understand how the Scottish Government communicated with the public in the 2010s or what health advice the NHS provided, they’ll need online data to get a complete picture.
This is where NRS steps in to help save and store this data, and to make it available online.
Archiving & Preservation
NRS offers two distinct services to public bodies in Scotland. The first is a Web Archive composed of “snapshots” of sites as they appeared at the time of the captured. For public bodies that want to use it, we also provide a Web Continuity Service that allows users finding broken links on a live site to be redirected automatically to the archived version of that site on the NRS Web Archive.
NRS aims to capture the most detailed versions we can of these sites. We work in partnership with the Internet Archive, who handle the technical side of the operation. We ask the Internet Archive to capture particular sites and they use a “crawler” – a software programme that works its way through a website capturing content.
Most of my job is quality assurance, ensuring information gathered by the crawler on a particular date replicates the live site at that same time as closely as possible.
Some sites are huge and it can take up to 10 days for the crawling operation to complete. The crawler isn’t infallible, so I check for missing pages, files or sections of websites and any other common web archiving issues that affects the quality of the capture.
The thing I like most about my job is planning and working with colleagues in these public bodies.
Communication and cooperation are essential and we liaise constantly throughout the process. Working together on these large projects can be a real learning experience for them and for us.
First, we need their permission to begin archiving. It can be difficult to convince public bodies to allow us to use software to capture their websites but some are very enthusiastic, as we are offering a professional service that can help them deal with complex record management issues.
We then examine the sites to be captured, trying to anticipate which features might cause problems. I’m regularly in touch with these organisations during the archiving process to resolve issues that arise. We’ve also had a few last-minute requests to archive sites that are about to be decommissioned!
We couldn’t carry out our work without dedicated support from colleagues within NRS.
We work alongside our colleagues from the Depositor Liaison Branch who make first contact with public bodies to archive their paper records. We also rely on the assistance of the Procurement Team, who assist us with acquiring the resources we need.
Finally, we’re very grateful for the regular help we receive from ICT and other NRS colleagues that allows us to provide this tailored service.
Barbara spoke with Ross Truslove, NRS Communications, in September. Follow NRS on Twitter today, where we’ll be marking World Digital Preservation Day.