Today is World Digital Preservation Day, celebrating work to preserve the world’s digital heritage.  

Most of the records National Records of Scotland (NRS) currently holds are paper documents such as files, books and letters. Modern records however are mostly “born digital” – they have been produced in a digital format, such as email, PDFs, spreadsheets and websites.  

Preserving digital data creates technical challenges for archivists, who are innovating new ways to preserve born digital records for future generations. Data needs to be kept securely but it also needs to be flexible, efficient and trustworthy, so that it can be easily accessed and understood many years from now.  

Today, Lynn Bruce of the NRS Digital Records Unit tells us about a recent digital archiving project – preserving the NRS staff intranet for future reference.  

—————————————————————————————-

Intranet  

Intranets are a core part of modern organisations’ internal communications. In the past, businesses may have used physical mediums such as staff newsletters and poster boards to share news with their employees.  

In today’s digital age, intranets – which resemble private social network websites for staff – are a common and effective means for sharing news on corporate affairs, appointments, policies and personal news. 

Any technology product can become old, obsolete and in need of replacement and by 2023, NRS decided to close our old intranet and migrate to a new one. A project team was assembled to deliver this goal 

Our Digital Records Unit, who archive digital records, were asked to explore how key information could be preserved for future reference. 

The Challenges of Intranet Preservation 

Ideally, we would have liked to have created an interactive snapshot of our intranet which preserved the structure of the site as well as the content. This is our goal when we archive websites on the internet.  

NRS has run a web archive since 2017. We use specialist web crawling software to create a representative snapshot of selected websites. Users can click through these snapshots to experience the look and feel of the live site. However, the challenges of intranet preservation prevented us from achieving this. . 

Our old intranet looked like any normal website, but it was in fact a document management and storage system, accessed through a web browser. This meant that it did not have the same clear structure and navigational features as public facing websites. These are very important for successful web crawling and without them we were unable to capture the old intranet through the web archive. 

We considered using helpful open source tools such as Conifer, a web crawler which is good at capturing interactive sites and HTTrack which can download the files which make up a website.  

Unfortunately, NRS, like many other organisations, only enabled access to the intranet when users were logged onto the organisation’s network. There are strict security protocols around the network, including restrictions on downloading and running software. These prevented us from testing and deploying open source tools in the short timeframe we had before it closed. 

What We Did 

We therefore reverted to exporting content out of the old intranet and saving it in a format which can be easily preserved: PDFs. These copies would not preserve the structure of the site but they would provide a visual representation of the layout and presentation of content.  

A news item from the NRS intranet in March 2015, focusing on the Declaration of Arbroath.
Extract from news PDF export

We then faced a further challenge.  Because of the intranet’s underlying software, the project team could only export news items, not the whole site. Luckily, news items were one of the key parts of the site that we wanted to preserve!  

Information Governance colleagues then identified other important pages and took screenshots. Like PDF exports, these may not capture the full look and feel of the site but they do give an idea of how key information was presented.  

In all, 1.5GB of news items made up of 3 PDF files were exported by the project team. The team created OCR (Optical Character Recognition) versions of these files as well which, because they are machine readable, means that the text within them is searchable.  

As part of the migration, the project team also created a lot of metadata about the site, including a list of all pages and page views. This adds valuable context to exported content and hopefully enables future users to understand why certain decisions were taken. 

Conclusions 

Archiving the staff intranet was a collaborative effort between NRS IT, Information Governance and the Digital Records Unit. It required all teams involved to work to tight deadlines, to be adaptable, and to work together to find solutions. 

Archiving had not originally been part of the project and it would not have succeeded without the project team’s flexibility and willingness to engage. This type of partnership working is integral when trying to preserve digital records.  

Although we did not manage to create an interactive snapshot of the intranet, we did capture a record of how information was presented, and preserved the news articles which would have been lost in the transition to the new intranet.  

This project underlines the challenges that intranets can present to digital archiving, and the importance of making sure that the best does not become the enemy of ‘good enough’.  

Lynn Bruce

Digital Records Unit

National Records of Scotland

One thought on “Digital Preservation Day – Preserving Our Own Records  

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.