Benchmark: NEBULA by NASA, Introduction

Project Name: Nebula
Official Home: http://nebula.nasa.gov

About the NEBULA Cloud

NEBULA is a Cloud Computing environment developed at NASA Ames Research Center, integrating a set of open-source components into a seamless, self-service platform. It provides high-capacity computing, storage and network connectivity, and uses a virtualized, scalable approach to achieve cost and energy efficiencies.

The fully-integrated nature of the NEBULA components provides for extremely rapid development of policy-compliant and secure web applications, fosters and encourages code reuse, and improves the coherence and cohesiveness of NASA's collaborative web applications. It is used for Education and Public Outreach, for collaboration and public input, and also for mission support.

Built from the ground up around principles of transparency and public collaboration, NEBULA is also an open-source project. NEBULA is built on the NEBULA platform.

For more details on the components, technologies, and services of NASA's Cloud, read onwards.

NEBULA Services

The NEBULA platform offers a turnkey Software-as-a-Service experience that can rapidly address the requirements of a large number of projects. However, each component of the NEBULA platform is also available individually; thus, NEBULA can also serve in Platform-as-a-Service or Infrastructure-as-a-Service capacities.

nebula-system-components.png
  • Virtualization
  • Storage
  • App Framework
  • Integrated Development Environment
  • Enterprise Search

The SaaS functionality of the Nebula Cloud includes typical moderation workflows, terms of service, and several levels of basic policy compliance, security, and software assurance. Users desiring to utilize the underlying NEBULA components directly will be required to pass the necessary security reviews, content reviews and legal certifications themselves.

Services: Virtualization

On-demand virtualization

Eucalyptus is an API-compatible open-source clone of the Amazon AWS (Amazon Web Services) cloud platform. This provides NASA researchers, should they require it, with the simplest-possible approach to on-demand computing capacity. All AWS-compatible tools will work "out-of-the-box" or with minor customization. As an added benefit, the virtual server images used by NASA teams within NEBULA, can easily be run on EC2 (or other Xen-based virtualization environments) by outside partners, collaborators, or independent researchers.

Eucalyptus was originally developed by researchers at the University of California at Santa Barbara. Their paper presenting Eucalyptus is insightful, and provides a lot of background on the concepts and opportunities of Iaas.

Additional information regarding Eucalyptus is available on the Eucalyptus website.

Services: Storage

Super-computer class storage

NEBULA uses the LUSTRE clustering file system (an open-source project maintained by SUN) to provide highly-scalable storage in the hundreds and thousands of terabytes. This is deployed on a cluster of 64-bit storage nodes, allowing nearly-unlimited individual file size, and connected to a dedicated 10GigE network. Never before has such research-grade computing been available in a web application platform.

(Additional information about the Lustre file system is available on the Lustre website.)

Data storage has a set of critical characteristics that define the requirements of any storage solution.

  • Reliability
  • Permanence
  • Performance

Evaluating these characteristics in the context of cloud applications, Nebula identifies five classes of storage.

  • Operating System
  • Temporary Storage
  • Computed Data
  • Gold Standard Data
  • Backups

We fulfill the requirements of these classes of storage in three distinct hardware configurations:

Primary Lustre Nodes

These are rack-dense, 2RU servers with 12 SATA drives in a RAID 6 configuration. They are equipped with 10GigE network ports. Our current configuration uses 1TB drives, for non-blocking access to 10TB of usable storage per server. This provides us with 4 CPU cores, and 5TB of usable storage, per rack unit, at non-blocking network speeds.

(We are currently evaluating the use of even higher-density equipment, such as the SUN xFire servers, which can provide disk density up to 2X that of this configuration. However, network connectivity becomes a bottleneck in these configurations, and they have a much lower ratio of CPU to disk. Additionally, they are substantially more expensive per gigabyte.)

These primary nodes provide large quantities of temporary and computed data storage.

SAN-backed Lustre Nodes

For gold-standard and backup data, we use fiber-channel-connected LUNs on a multiply-redundant and geographically-diverse SAN. This provides us with highly-reliable and permanent storage, and decouples the storage from the connected server as a single point-of-failure. However, this comes with a modest performance penalty (our FC connectivity is at 6 gigabit), and a cost per gigabyte that is more than an order-of-magnitude more expensive.

Critical Lustre Nodes

For those components of the NEBULA infrastructure where we cannot accept the performance penalty of a SAN connection, and where RAID 6 is insufficiently reliable, we use pairs of Primary Lustre Nodes, configured with RAID 1+0, and mirrored against each other using DRBD in a Active-Passive pattern. This configuration is used for redundant Master database servers, and Lustre Metadata Servers.

Every Nebula account comes with 100GB of storage in the Primary Lustre filesystem. Additional storage is available, please contact your NEBULA account manager.

NEBULA Services: Web Application Framework

Web Development Framework

After an extensive trade study, the NEBULA team selected Django, a python-based web application framework, as the first and primary application environment for the Cloud. NEBULA users have access to an extensive collection of open-source django "apps", providing features ranging from simple blogs, wikis, and discussion forums, to more advanced collaboration suites, image processing, and more.

We expect to make the full text of this trade study available shortly - please check back soon.

Additional information on the Django web application framework can be found on the Django Project website.

Services: IDE

Development IDE

Software doesn't write itself. For smaller, more simple web applications, many NEBULA users will be completely satisfied with customizing the look-and-feel of the available NEBULA "plugables". Most NEBULA users, however, will want to take advantage of our integrated development environment.

The most popular publicly available Cloud Computing Platforms (Amazon's AWS, the Google AppEngine) typically rely on free services (such as GitHub, Google Code, or SourceForge) to support source code revision control, documentation and bug ticket tracking. The users are left to their own devices for Continuous Integration, and automated testing.

Every NEBULA project comes with a dedicated subversion repository, Trac+Agilo project management interface (complete with documentation wiki, bug tracking system and Agile project management solution), and a staging environment using Continuous Integration and SeleniumGrid-based automated testing. Test metrics include cyclometric complexity, and site compliance with a variety of government policies, including Section 508, COPPA, and others.

Services: Search

Enterprise Search

Using the SOLR search and indexing engine (built on Lucene and running on the Tomcat JAVA JRE), and taking advantage of the "Solango" django application, ALL NEBULA content is simply and transparently searchable across the entire platform. NEBULA users have implemented tag, author, comment, filename, and even geolocation- or date-range searches.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License