Misra D, Seamans J, Thoma GR.
Testing the Scalability of a DSpace-based Archive.
Proc. IS&T Archiving 2008. Bern, Switzerland. June 2008:36-40.
The implementation of production-level large scale archives is often based on research prototypes that possess essential functions and characteristics, e.g., storage capacity, ingest, metadata recording, ability to migrate to newer formats, etc. However, a key characteristic that is often overlooked is scalability, i.e., the ability of the system to accommodate large numbers of items without compromising performance - while ingesting, indexing or access. Here we describe an investigation of archive scalability in a Java-based system (System for the Preservation of Electronic Resources or SPER) which was built by an R&D team at the U.S. National Library of Medicine to investigate various aspects of digital preservation. SPER uses DSpace as the underlying infrastructure for building and managing the digital archive. To confirm the capability of SPER/DSpace to serve as a large archive, we conducted scalability tests by generating and ingesting data for more than a million items, and studied ingest behavior as a function of the archive size. This paper describes the test procedure and environment, the software developed to measure performance during ingest, and the characteristics of the ingested data. We present the ensuing results, which confirm the scalability of SPER/DSpace with acceptable ingest performance as the archive is expanded to a million items.
More about this article:
Full Text (PDF) | View Citation








