Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
INSIGHTS
6 min read
Share
At Banzai Cloud we provision different frameworks and tools like Spark, Zeppelin and, most recently, Tensorflow, all of which run on our Pipeline PaaS (built on Kubernetes). One of Pipeline's early adopters runs a Tensorflow Training Controller
using GPUs on AWS EC2, wired into our CI/CD pipeline, which needs significant parallelization for reading training data. We've introduced support for Amazon Elastic File System and made it publicly available in the forthcoming release of Pipeline. Beside Tensorflow, they also use EFS for Spark Streaming checkpointing instead of S3 (note that we don't use HDFS at all).
This post would like to walk you through some problems with EFS on Kubernetes and provide a clearer picture of its benefits, before digging into the Tensorflow and Spark Streaming examples in the next post in this series. So, by the end of this blog:
smells
like a cloud lockin' to me - not really, Pipeline/Kubernetes can use minio to unlock you, which we'll discuss in another post.tl;dr1: Who cares? Pipeline automates all this; maybe you'll get a GitHub star once you're done reading. tl;dr2: I know all this already, I'll just use the EFS provisioner deployment you guys open sourced. Done. Maybe I'll give you a GitHub star as well.
OK, this first step is not hard
. You can provision a Kubernetes cluster with Pipeline with one single REST API call - see the Postman collection we created for that reason or follow this post or install & launch Pipeline, either by yourself or by launching a Pipeline control plane on AWS with the following Cloudformation template. Easy
isn't it? What's that, there's plenty of options? Just wait until we get to the hosted service. Once the cluster is up and running you can use the Cluster Info request from Pipeline's Postman collection to get the necessary info about a cluster. You'll need the following:
nodes
except masterVPC id
, Subnet id
and Security Group id
for the node cluster's network (VpcId, SubnetId, SecurityGroupId)The following steps require AWS CLI. The easiest way to get what we need is to ssh into one of the nodes, since an AWS CLI will already be installed and ready to be use. You'll need the same SSH key you provided for Pipeline.
ssh -i yourPrivateKey ubuntu@[Node-Public-Ip]
If you're using the Pipeline control plane, you need to ssh to the control plane instance first. There, you'll find the ssh key for rest of the nodes at: /opt/pipeline/.ssh/id_rsa
. Configure the AWS client with aws configure
specifying AWS region and credentials.
You'll need a unique ID for the file system; install uuid if necessary:
sudo apt install uuid
Create the FileSystem:
aws efs create-file-system --creation-token $(uuid)
{
"SizeInBytes": {
"Value": 0
},
"CreationToken": "dfa3efaa-e2f7-11e7-b6r3-1b3492c170e5",
"Encrypted": false,
"CreationTime": 1515793944.0,
"PerformanceMode": "generalPurpose",
"FileSystemId": "fs-c1f34a18",
"NumberOfMountTargets": 0,
"LifeCycleState": "creating",
"OwnerId": "1234567890"
}
Make sure to note the FileSystemId and OwnerId, as you will need them later.
aws efs create-mount-target \
--file-system-id {FileSystemId} \
--subnet-id {SubnetId} \
--security-groups {SecurityGroupId}
{
"MountTargetId": "fsmt-5dfa3054",
"NetworkInterfaceId": "eni-5cfa2372",
"FileSystemId": "fs-c1f65a08",
"LifeCycleState": "creating",
"SubnetId": "subnet-1d11267a",
"OwnerId": "1234567890",
"IpAddress": "10.0.100.195"
}
Poll the status of mount targets until status LifeCycleState = "available":
aws efs describe-mount-targets --file-system-id fs-c1f24a08
export KUBECONFIG=~/.kube/config
In order to mount EFS storage as PersistentVolumes in Kubernetes, deploy the EFS provisioner. The EFS provisioner consists of a container that has access to an AWS EFS resource. To deploy to the Kubernetes cluster directly from your machine, you need to download the Kubernetes cluster config (aka kubeconfig), using the Cluster Config request from our Postman collection. The easiest way to do this is to save it to a local file in your home directory and set the KUBECONFIG
env variable:
export KUBECONFIG=~/.kube/config
Make sure your Amazon images contain the nfs-common package. If not, ssh to all the nodes and install nfs-common
with
sudo apt-get install nfs-common
wget https://raw.githubusercontent.com/banzaicloud/banzai-charts/master/efs-provisioner/efs-provisioner.yaml
Edit efs-provisioner.yaml and replace the values in brackets: {FILE_SYSTEM_ID}, {AWS_REGION}, {AWS_ACCESS_KEY_ID}, {AWS_SECRET_ACCESS_KEY} with yours. Alternatively, instead of specifying ASW credentials, you can set up instance profile roles that allow EFS access. Apply with kubectl.
kubectl apply -f efs-provisioner.yaml
The resulting output should be something like this:
configmap "efs-provisioner" created
clusterrole "efs-provisioner-runner" created
clusterrolebinding "run-efs-provisioner" created
serviceaccount "efs-provisioner" created
deployment "efs-provisioner" created
storageclass "aws-efs" created
persistentvolumeclaim "efs" created
At this point your EFS PVC should be ready to use:
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
efs Bound pvc-e7f86c81-f7ea-11e7-9914-0223c9890f2a 1Gi RWX aws-efs 29s
Finally, let's see how you can use the PVC you just claimed and mount it to a container.
apiVersion: v1
kind: Pod
metadata:
name: example-app
spec:
containers:
- name: example-app
image: example_image:v0.1
volumeMounts:
- name: efs-pvc
mountPath: "/efs-volume"
volumes:
- name: efs-pvc
persistentVolumeClaim:
claimName: efs
At this stage you should be a happy, functional user of EFS - a bit of work has to be done to get to this stage but worry not, the next post in this series will be about how you can do it with Pipeline. That process is so simple it's short enough to tweet. Also, we will walk through the benefits of using EFS with Tensorflow and the performance improvements EFS provides to streaming Spark applications when checkpointing (and the reasons we have for switching to EFS instead of S3 or HDFS).
Get emerging insights on innovative technology straight to your inbox.
Discover why security teams rely on Panoptica's graph-based technology to navigate and prioritize risks across multi-cloud landscapes, enhancing accuracy and resilience in safeguarding diverse ecosystems.
The Shift is Outshift’s exclusive newsletter.
The latest news and updates on cloud native modern applications, application security, generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.