Note
This technote is not yet published.
We describe the architectural choices and tradeoffs for the large scale object storage to be used at the USDF.
1 Introduction¶
Through the decade’s long planned operation of the Rubin telescope, we can expect in the order of 50+PB of raw images and over 600PB of other data. The challenges and decisions of its architecture, deployment, maintainence, and lifecycle are document here.
2 Data Requirements and Challanges¶
2.1 Data Volume, Variety and Access Patterns¶
- overview of the types of data products and size
2.2 Challenges and Remediations¶
- highlight specific things we need to worry about
- high level posisble solutions
3 Architectural Motivation¶
- repeatable: well documented
- scalable: both management and performance
- robust: tiering
4 Technical Design¶
4.1 Hardware¶
- define standard building blocks of storage
- define performance envelopes
- define resilience of solutions
4.2 Software¶
- overview of software solutions
- pros/cons of ceph/minio
- supportabilty of solutions
5 Operational Processes¶
5.1 Deployment¶
- large amounts of storage added per year
- easy to deploy, consistent and repeatable
5.2 Monitoring¶
- hardware and software steady state and failure reporting requirements
- environmentals and zoning?
5.3 Common Tasks¶
- what does hardware failure look like? disks, servers, racks?
- what are the high level responsibilities and roles required?
5.4 Life-cycle¶
- what procedures required to replace hardware?
6 Proof of Concept¶
6.1 Scope and what are we trying to achieve¶
- why kubernetes?
- describe why minio and direct-csi
6.2 Deployment Experience¶
- deployment steps and references
6.3 Operational Experience¶
- performance benchmarking
- simulating failures
7 Initial Hardware and Software Choices¶
- dell xe7100 vs wd data102 vs seagate 4u100
- what and why