2. Embassy Cloud Version 4

2.1. About Version 4

Embassy v4 is a new implementation of Openstack based on Openstack Ussuri. Here are some key differences and benefits of using this new Embassy Cloud -

  • EMBL and Elixir federated login
  • Octavia Load Balancers (LBaaS)
  • New hardware with GPU support
  • Seamless upgrades to new Openstack versions
  • Built in Kubernetes cluster deployment (see Embassy Hosted Kubernetes v3.0 - Magnum)
  • Ceph Storage backend
  • Large pool of public IPs
  • Secret Management

2.2. Service Status

2.3. Requesting Embassy Resources (including Embassy Hosted Kubernetes)

The EBI Resource Usage Portal is the service used internally to request Embassy resources.

As Embassy Hosted Kubernetes v3.0 is now deployed with Openstack Magnum this is considered an “embassy” resource type (users can now deploy their own Kubernetes clusters with Magnum in an Openstack project)

The previous “embassy-hosted-kubernetes” resource type is no longer available as this refers to Embassy Hosted Kubernetes v2.0 which is now considered legacy

Note

Please use How to Size Your Cluster to calculate the quota for your Embassy project if you intend to create Magnum (EHK) Kubernetes clusters

2.4. Retiring Embassy Projects

All Embassy retirement requests must be submitted via the Resource Usage Portal (RUP).

Please remove all resources within your Embassy project before submitting a retirement request via RUP. These resources include all Kubernetes clusters, load balancers, instances, volumes, snapshots, routers, networks (not externals), security groups, and key pairs.

Please consider that none of resources or data could be restored after manual deletion and/or retirement.

2.6. How to Login

Navigate to https://uk1.embassy.ebi.ac.uk

Select your identity provider (currently EMBL or ELIXIR)

Once you have authenticated for the first time you will automatically be placed in an EBI-Sandbox project. This process creates a user object in Openstack in your chosen federated domain (EMBL or ELIXIR).

At this point if you have an EBI user account use the service portal and if you do not have an EBI user account use the external form to notify Help Desk that you have authenticated to Openstack and they will associate your federated user with the project already created for you (that you or your sponsor would have requested via the EBI Resource Usage Portal)

Note

If you do not have a federated account in EMBL or Elixir you will have to apply for an EMBL-EBI collaborator account. Please contact your sponsor/GTL to arrange this as part of your Embassy application process

2.7. Quick Start

Here is a quick start video demonstrating SSH to a newly created Openstack instance



Warning

In production, you will probably want to restrict SSH access to a range of IPs that you will use to access the instances.

Note

Restrict access using the right Security Group rules will give your instances more protection against attack. It’s important to remember that you are responsible for the security of your instances, and the Internet is a dangerous place.

2.8. Selecting An Appropriate Deployment Zone

We have various availability zones for running different workloads. By default “Any Availability Zone” is selected. Please do not leave the default. Users must manually select the appropriate zone related to the hypervisor specification described below -

Table 2.1 Availability Zones
Name Use Case vCPU Arch GPU Memory (GB) Disk (GB) Disk Type
nova Default Generic 64 Intel No 384 245 SSD
amd High Memory 128 AMD No 3800 1500 SSD
gpu (coming soon) GPU 64 Intel NVIDIA Tesla M10 384 245 SSD

Warning

Zones “amd” and “gpu” are only designed for use cases agreed with the Cloud Team. Please ask permission before deploying in these zones. Failure to do this could mean that your instances are shutdown and migrated to make resources available for planned use cases.

2.9. Using the OpenStack CLI

You can also use the OpenStack CLI tool to interact directly with your project.

2.9.1. Retrieving Credentials

To download your credentials file, login to the OpenStack dashboard and click on Identity->Application Credentials -> Create Application Credential. Just fill in the Name of the credential.

Note

For some applications, like for example creating Heat stacks, you will need to use unrestricted credentials. This is acheived by adding a tick in the unrestricted box when creating credentials as above. As per documetation - Unrestricted: By default, for security reasons, application credentials are forbidden from being used for creating additional application credentials or keystone trusts. If your application credential needs to be able to perform these actions, check “unrestricted”

2.9.2. Installing the Client

An example how to install and use the OpenStack CLI follows:

$ which python3
/usr/bin/python3

$ apt-cache policy virtualenv
virtualenv:
  Installed: 15.0.1+ds-3ubuntu1

$ virtualenv -p /usr/bin/python3 venvpy3

$ source venvpy3/bin/activate

(venvpy3) $ pip install python-openstackclient
(venvpy3) $ pip install python-magnumclient
(venvpy3) $ pip install python-octaviaclient

# Test: source credentials file and get a list of networks in this project
(venvpy3) $ source ~/embassy/developmentrc.sh
(venvpy3) $ openstack network list
+--------------------------------------+---------------+--------------------------------------+
| ID                                   | Name          | Subnets                              |
+--------------------------------------+---------------+--------------------------------------+
| e25c3173-bb5c-4bbc-83a7-f0551099c8cd | ext-net-36    | 3c926da4-b320-4320-8d62-f70e2078a2fd |
| 2d771d9c-f279-498f-8b8a-f5c6d83da6e8 | ext-net       | b5c8ea12-6729-495c-9cfd-8a56557a8bff |
| 7421d53d-6467-4f29-9d4f-e96e8c85ecd8 | ext-net-31    | 69868395-d808-4e48-a10a-79854258aa1e |

[..]

2.9.3. Terraform

If you want to use Terraform, you can do so by using the Application Credential you created a few steps above. You will need to define your openstack provider by specifying the application credential id and secret. Make sure no other Openstack environment OS_ variables are set to avoid conflict.

Listing 2.1 init.tf
 terraform {
   required_version = ">= 0.14.0"
   required_providers {
     openstack = {
       source = "terraform-provider-openstack/openstack"
     }
   }
 }
Listing 2.2 providers.tf
 provider "openstack" {
   user_name = "username@ebi.ac.uk"
   application_credential_id = "1521e176a6874bd6a71f407291d7be08"
   application_credential_secret = "my-super-long-app-secret"
   tenant_name = "my-tenancy"
   auth_url = "https://uk1.embassy.ebi.ac.uk:5000"
   region = "RegionOne"
 }

2.10. Images

An OpenStack Compute cloud is not very useful unless you have Virtual Machine images (or Virtual Appliances).

What is a virtual machine image?

A virtual machine image is a single file which contains a virtual disk that has a bootable operating system installed on it.

The Cloud team do not provide images. Images may be available in Embassy that users have made public. Please do not use these images as you cannot trust the image provenance, and the image and related metadata may be changed under your feet at any time causing issues for your application.

Tenants must deploy and manage their own images to get total control of their application and pipeline dependencies. The simplest way to obtain a virtual machine image that works with OpenStack is to download one that someone else has already created. Check this URL to find public available images.

Images must be maintained and updated by tenants to ensure they are secure and pass CVSS standards.

2.11. Storage

2.11.1. Root Disks

Instances need to have storage for root disk space. By default this is Cinder storage in Horizon (‘Create New Volume’).

Note

There is very limited ephemeral (local hypervisor) storage. Please select Cinder storage for instances. If you try and use ephemeral storage you may see the error ‘No valid host was found’ due to lack of available resource.

2.11.2. S3 Object Store

Embassy users can request access to our Object Stores, we have an S3 compatible Object Storage backend. Some of the typical Object store use cases are:

  • No requirement for a POSIX filesystem
  • Large datasets
  • Unstructured data
  • Backups
  • Archiving

If you would like to use our Object Stores or simply explore the technology, please send us an email to embassycloud@ebi.ac.uk and we will create an environment for you.

Note

  • Our s3 compatible Object Store is not backed up. Please make sure you don’t use it as your only backup target.

2.11.2.1. S3

In order to use our s3 compatible object store, you can download the AWS Command Line Interface (awscli) from https://aws.amazon.com/cli/. Alternatively you can also use https://github.com/s3tools/s3cmd, which is another interface written in python.

Examples of use:

$ export AWS_ACCESS_KEY_ID=yourAccessKeyId
$ export AWS_SECRET_ACCESS_KEY=yourSecretAccessKey
$ export AWS_DEFAULT_REGION=us-east-1

(or)

$ aws configure
#(and follow the steps)

# Create a bucket
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 mb s3://testbucket
make_bucket: testbucket

# List buckets
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 ls
2021-01-29 15:35:21 testbucket

# Upload file to bucket
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 cp helloworld.txt s3://testbucket/
upload: helloworld.txt

# List files within bucket
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 ls s3://testbucket/
2021-01-29 15:36:51 13 helloworld.txt

# Upload directory including only jpgs and txts
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 cp /tmp/foo/ s3://testbucket/ --recursive --exclude "*" --include "*.jpg" --include "*.txt"

# Generate a temporary url for users to download a given object - default expiration time 3600s
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 presign s3://testbucket/myobject
https://uk1s3.embassy.ebi.ac.uk/testbucket/myobject?AWSAccessKeyId=ozB4pHyzrPUjXo1fw57&Signature=pG3xRpKyTuxQq8xatRUusJ6oE%3D&Expires=1526574462

# Obtain the used space and number of objects (result displayed in bytes)
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api list-objects --bucket testbucket --output json --query "[sum(Contents[].Size), length(Contents[])]"

# Delete a bucket (use --force if it's not empty)
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 rb --force s3://testbucket

2.11.2.2. AWS SDK for Python (boto3)

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-examples.html

Example code -

import boto3

# Creating the low level functional client
client = boto3.client(
   's3',
    aws_access_key_id = '***********',
    aws_secret_access_key = '*******************',
    endpoint_url = 'https://uk1s3.embassy.ebi.ac.uk',
    region_name = 'us-east-1'
)
# Fetch the list of existing buckets
clientResponse = client.list_buckets()

# Print the bucket names one by one
print('Printing bucket names...')
for bucket in clientResponse['Buckets']:
    print(f'Bucket Name: {bucket["Name"]}')

2.11.2.3. S3 Java SDK

If your Java application needs to interact with our s3 compatible Object Store you have two options:

  • Amazon Java SDK: if your application requires portability to Amazon.
  • IBM Cloud Object Storage Java SDK: If you would like to use all the features from our Object Store, use the SDK directly provided by the vendor. You can find examples and code repositories in the link name.

Example code for Amazon Java SDK:

import com.amazonaws.regions.Region;

import com.amazonaws.auth.EnvironmentVariableCredentialsProvider;
import com.amazonaws.client.builder.AwsClientBuilder;
import com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration;

import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;

import com.amazonaws.services.s3.model.ListObjectsRequest;
import com.amazonaws.services.s3.model.ObjectListing;
import com.amazonaws.services.s3.model.S3Object;
import com.amazonaws.services.s3.model.S3ObjectSummary;

import java.util.List;

public class EBIaws {

  public static void main( String[] args ) {

  final AmazonS3 s3 = AmazonS3ClientBuilder.standard()
            .withEndpointConfiguration(new EndpointConfiguration("https://uk1s3.embassy.ebi.ac.uk/mybucket", "us-east-1"))
    .withCredentials(new EnvironmentVariableCredentialsProvider())
    .withPathStyleAccessEnabled(true)
    .build();

  System.out.println("Listing objects");
          ObjectListing objectListing = s3.listObjects(new ListObjectsRequest()
                .withBucketName(""));
          for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
            System.out.println(" - " + objectSummary.getKey() + "  " +
                               "(size = " + objectSummary.getSize() + ")");
          }
          System.out.println();
      }
}

2.11.3. Volume Issues

Occassionally Cinder volumes get stuck in an unwanted state that prevents deletion.

Users have now been granted permission to change the state of volume to error to enable deletion

openstack volume list --project [project-name] -f json | jq -r '.[]|select(.Status=="error_deleting")|.ID'| xargs -I {} openstack volume set --state error {}
openstack volume list --project [project-name] -f json | jq -r '.[]|select(.Status=="error")|.ID'| xargs -I {} openstack volume delete {}

2.12. EBI Data Access

2.12.1. EBI internal databases

Access to internal DBs in Read/Write or Read Only mode is available through the policy below to Embassy tenants and to Embassy Hosted Kubernetes (EHK) users.

The Embassy Tenant will…

  • Contact the internal EBI database owner (GTL) to gain their explicit permission. This may be easiest by asking their EBI GTL sponsor to contact the db owner if the Embassy tenant is not an EBI staff member
  • Request access from the EBI DB team by emailing itsupport@ebi.ac.uk

The Database Team will…

If read-only access is requested

  • Contact the Database Owner (GTL) to establish approval for the new user
  • Give you the connection information (a dedicated database user will be required for the read-only Embassy connection, e.g. embassy_ro)
  • lLiaise with the Networking/Security team for the opening of the connection port allowing the database user to connect
  • Link all relevant information back to the RT ticket

If read-write access is requested

  • Apply a hardened security profile to the database, ensuring the following (downtime may be required to implement the increased security):
    • No non-standard plugins or extensions are installed that might allow OS commands to be run via SQL (lib_mysqludf_sys in MySQL, “untrusted” languages in PostgreSQL, etc)
    • The latest OS and DB versions and patches are installed as per TSC guidelines, and schedules applied in order to minimize exposure to known bugs
    • There is an encrypted SSL channel for the database connection credentials
    • There is a dedicated database user for the Embassy write connection (e.g. embassy_rw)

The Database Owner (GTL) will…

  • Consider the implications of this new mode of access to any other stakeholders of the same database (Service Teams/Technical Leads) and be content there is consensus to proceed
  • Ensure the internal EBI database does not contain human or other data that requires controlled access
  • Be aware of the security implications of granting write access to the internal database from a location external to the EBI network and continue to maintain full responsibility for the internal database (Embassy is in a DMZ outside the internal EBI Network and is therefore accessible from the Internet, secured by the tenant admin)
  • Approve access for the new user

The Embassy Tenant Admin will…

  • Deploy and maintain strict security procedures to mitigate risk, including:
    • Ensuring the tenancy has Security Groups applied
    • Arranging access via SSH keys controlled through a bastion host with an active firewall
    • Ensuring SSH key access/database credentials should be given to named users and not redistribute
    • Closing any reported security vulnerabilities as soon as possible

2.12.2. EGA dataset

Please follow these steps for accessing the EGA dataset in EBI:

  1. Install the following dependencies:
1
~# yum install maven git fuse fuse-libs
  1. Download and build ega-fuse-client:
1
2
3
$ git clone https://github.com/EGA-archive/ega-fuse-client.git
$ cd ega-fuse-client
$ mvn package
  1. Allow non-root users specifiy fuse mount options. Your /etc/fuse.conf file should look like this
1
2
# mount_max = 1000
user_allow_other
  1. Obtain a bearer token from EGA AAI.
  2. Mount the ega dataset:
1
2
3
4
$ cd ega-fuse-client
$ mkdir ./mountpoint
$ java -Xmx8G -jar target/EgaFUSE-1.0-SNAPSHOT.jar -t y0urBe4rerTokEnFroMEGAaaii -m ./mountpoint > /preferred/path/ega-fuse-client-`date +%Y%m%d%H%M%S`.log 2>&1 &
$ ls ./mountpoint/Dataset

Note

Expected download speed is 5-10MB/s depending on the network load.

2.12.3. FiRe Archive

For private access, Embassy Cloud users can access the FiRe archive in the same way they do within EBI. Use your FiRe credentials and access the same endpoint urls for managing your FiRe objects.

The public access endpoint offers a method for accessing files which are publicly available on transfer services, but without the overhead of crossing WAN borders since this public access service is hosted internally in the EBI network. The public access endpoint is implemented over HTTP and the file content will we streamed in the HTTP response if a valid HTTP request was made.

The following endpoints can be employed depending on your location. Embassy Cloud is in the Hemel Hempstead data centre.

Table 2.2 Public Endpoints
Endpoint Protocol Data centre
http://hh.fire.sdo.ebi.ac.uk/fire/public http Hemel Hempstead
https://hh.fire.sdo.ebi.ac.uk/fire/public https Hemel Hempstead
http://hx.fire.sdo.ebi.ac.uk/fire/public http Hinxton
https://hx.fire.sdo.ebi.ac.uk/fire/public https Hinxton

You can check FIRE docs here [2].

[2]Until public DNS is updated, please add this entry to your /etc/hosts: 193.62.197.14 docs.fire.ebi.ac.uk

2.12.3.1. Use Case: Accessing ENA data files

This project maintains a collection of endpoints publicly available in FiRe, so you can access ENA data files through FiRe public endpoints. One example using the curl command to download two datasets:

1
2
3
4
5
6
7
8
9
 # Bandwidth test using FIRE endpoint
 curl -o /dev/null https://hh.fire.sdo.ebi.ac.uk/fire/public/era/fastq/ERR226/002/ERR2262402/ERR2262402_2.fastq.gz ;
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 100 4645M  100 4645M    0     0  84.2M      0  0:00:55  0:00:55 --:--:-- 85.8M

 # Download using "remote name (i.e. ERR2262402_1.fastq.gz file)"
 curl https://hh.fire.sdo.ebi.ac.uk/fire/public/era/fastq/ERR226/002/ERR2262402/ERR2262402_1.fastq.gz -O --remote-name
 curl https://hh.fire.sdo.ebi.ac.uk/fire/public/era/fastq/ERR226/002/ERR2262402/ERR2262402_2.fastq.gz -O --remote-name

You also can “stream” the data like the example below, and speed up the process using tools like GNU parallel.

1
2
3
4
5
6
7
8
 # Fastq Convert To Fasta
 $ curl -s https://hh.fire.sdo.ebi.ac.uk/fire/public/era/fastq/ERR226/002/ERR2262402/ERR2262402_2.fastq.gz | gunzip -c  | paste - - - - | sed 's/^@/>/g'| cut -f1-2 | tr '\t' '\n' > my.fasta

 $ head my.fasta
 >ERR2262402.1 1/2
 GGAAAACCTTTGCTTCTCTACAACGCGGATCCTGTCCACGACGCCAACGGAGGATGTTCCGCCTACAAGGACGGAACTCACGACTATTCCGATGAAGTGAAGAACTTCTTCACACTCAGGAATATGTGGTGGGGCTACTAC
 >ERR2262402.2 2/2
 GCTCCCGTCGCCGTCCAATTGATCCTTGACGGGTCACATGCAAATATCTGTGTCTGATATGATATAAAAAACCATCCATGGAGGAACATGAAAATATTAAGTTGCCTCAGATTAAGAGAATACCTTCGAGGATAGTTCTTTTTTCGAAGA

2.13. Security Best Practices

2.13.1. The Shared Model

EMBL-EBI’s Embassy cloud provides collaborators with a secure virtual infrastructure located close to the EMBl-EBI’s public data resources. In the Embassy model systems administration & security of the cloud is a shared responsibility:

Shared Model

2.13.2. Infrastructure Security

This section explains both Service Provider security measures (EBI’s implementation of the Red Hat OpenStack Platform) and project (tenant) security.

2.13.2.1. Service Provider

We are using Red Hat OpenStack Platform which is an enterprise implementation of Openstack with built in security.

We have based our IaaS security model on this document

In summary -

  1. SELinux is configured to help protect against the bridging security domains with API access
  2. Customized controller firewall rules are in place to restrict external access to API ports only.
  3. DMZ creation for the OSP installation gives logical separation from EBI’s internal infrastructure.
  4. Rate limiting configured in haproxy to help prevent any denial of Service attacks
  5. API endpoints are secured with SSL/TLS
  6. Keystone tokens are time limited and expire

2.13.2.2. Tenant

Tenant (Project) security is managed with a combination of strict EBI policy and tools made available to tenants like Security Groups and a requirement to enable firewall protection on internet accessible instances.

Tenants are advised to deploy a firewall in addition to Security Groups. This can be on a per instance basis (iptables), firewall appliance (e.g. PFSense), or bastion node.

2.13.3. Security Groups

Security groups control network access to instances; the default security group allows all outbound access (so instances can access the internet), and all access between instances within the same security group, but no incoming access.

This keeps the instances secure but means you can’t contact them from outside the Embassy cloud. You will need to create a new security group that allows SSH access - this will allow you to connect to your instance via secure shell.

Steps for creating your new Security Group in the Openstack Dashboard:

  • Select Project > Network > Security Groups, then on Security Groups.
  • Click on Create Security Group. Give it a name e.g: SSH-and-ping and a description.
  • To the right of your newly created Security Group, click on Manage Rules.
  • Click on the Add Rule button.
    • Click on the Rule box, where it says Custom TCP Rule, select your protocol (e.g. SSH)
    • Choose the IP range you would like to allow, and write it in the CIDR field.
    • Click on Add.

After these steps, inbound access will be allowed and your are now able to see the rules that currently exist for this security group, including the new one you created.

Warning

Choosing CIDR: 0.0.0.0/0 allows everyone in the Internet to reach your ssh port. We recommend to restrict SSH access to a range of IPs that you will use to access the instance. This will give you another layer of protection against attack. It’s important to remember that you are responsible for the security of your instances, and the internet is a fairly dangerous place.

2.13.4. Protect your VMs

Here you are some tips for making your deployments more resilient in Openstack:

  1. For critical instances, use Cinder for storing the OS and data volumes. This is persistent and has the added advantage of 7 daily snapshots.
  2. If you require high availability you will need more than one instance. Make sure they are running in different host by using affinity rules.
  3. Use our Embassy Cloud S3 Object Store (native Amazon S3 compatible) for your backup jobs.
  4. Important to stress not to rely on snapshots as a backup, these are just a convenience tool and are not application aware. You should be able to programmatically redeploy all instances using the API (Heat or similar), as this is the Cloud model. Do not do manual unscripted instance deployment/configuration.