Cloud Infrastructure Management
Overview
This comprehensive guide covers essential cloud infrastructure management practices for AWS, Azure, and Google Cloud Platform. Learn how to optimize costs, ensure security, implement monitoring, and manage resources effectively across major cloud providers.
Quick Reference
- Cloud Providers: AWS, Azure, Google Cloud Platform
- Management Tools: Terraform, Ansible, CloudFormation
- Monitoring: CloudWatch, Azure Monitor, Stackdriver
- Security: IAM, VPC, Security Groups
1. Cloud Infrastructure Fundamentals
1.1 Cloud Service Models
Understanding different cloud service models and their use cases.
Service Models:
- IaaS (Infrastructure as a Service): Virtual machines, storage, networking
- PaaS (Platform as a Service): Application hosting, databases, middleware
- SaaS (Software as a Service): Complete applications delivered over the internet
- FaaS (Function as a Service): Serverless computing, event-driven functions
1.2 Cloud Deployment Models
Choosing the right deployment model for your infrastructure needs.
Deployment Options:
- Public Cloud: Shared infrastructure, cost-effective, scalable
- Private Cloud: Dedicated infrastructure, enhanced security, control
- Hybrid Cloud: Combination of public and private clouds
- Multi-Cloud: Using multiple cloud providers
2. AWS Infrastructure Management
2.1 AWS Resource Management
Managing AWS resources efficiently and cost-effectively.
AWS CLI Commands:
# Configure AWS CLI
aws configure
# List all EC2 instances
aws ec2 describe-instances --query 'Reservations[*].Instances[*].[InstanceId,State.Name,InstanceType]' --output table
# Create EC2 instance
aws ec2 run-instances \
--image-id ami-0c02fb55956c7d316 \
--count 1 \
--instance-type t3.micro \
--key-name my-key-pair \
--security-group-ids sg-12345678 \
--subnet-id subnet-12345678
# List S3 buckets
aws s3 ls
# Create S3 bucket
aws s3 mb s3://my-bucket-name
# Upload file to S3
aws s3 cp local-file.txt s3://my-bucket-name/
# List RDS instances
aws rds describe-db-instances --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceStatus,Engine]' --output table
2.2 AWS Cost Optimization
Implementing cost optimization strategies for AWS infrastructure.
Cost Optimization Techniques:
- Right-sizing Instances: Match instance types to workload requirements
- Reserved Instances: Commit to 1-3 year terms for significant savings
- Spot Instances: Use spare capacity for non-critical workloads
- Auto Scaling: Automatically adjust capacity based on demand
- Storage Optimization: Use appropriate storage classes for data lifecycle
Cost Monitoring Script:
#!/bin/bash
# aws_cost_monitor.sh
# Get current month costs
CURRENT_COST=$(aws ce get-cost-and-usage \
--time-period Start=2023-12-01,End=2023-12-31 \
--granularity MONTHLY \
--metrics BlendedCost \
--query 'ResultsByTime[0].Total.BlendedCost.Amount' \
--output text)
# Set cost threshold
THRESHOLD=1000
if (( $(echo "$CURRENT_COST > $THRESHOLD" | bc -l) )); then
echo "AWS costs exceeded threshold: $CURRENT_COST" | mail -s "AWS Cost Alert" admin@company.com
fi
echo "Current AWS costs: $CURRENT_COST"
3. Azure Infrastructure Management
3.1 Azure Resource Management
Managing Azure resources using Azure CLI and PowerShell.
Azure CLI Commands:
# Login to Azure
az login
# List resource groups
az group list --output table
# Create resource group
az group create --name myResourceGroup --location eastus
# List virtual machines
az vm list --output table
# Create virtual machine
az vm create \
--resource-group myResourceGroup \
--name myVM \
--image UbuntuLTS \
--admin-username azureuser \
--generate-ssh-keys
# List storage accounts
az storage account list --output table
# Create storage account
az storage account create \
--name mystorageaccount \
--resource-group myResourceGroup \
--location eastus \
--sku Standard_LRS
3.2 Azure Monitoring and Management
Implementing monitoring and management solutions for Azure resources.
Azure Monitor Configuration:
# Create log analytics workspace
az monitor log-analytics workspace create \
--resource-group myResourceGroup \
--workspace-name myWorkspace
# Enable VM insights
az monitor log-analytics workspace pack enable \
--resource-group myResourceGroup \
--workspace-name myWorkspace \
--name VMInsights
# Create alert rule
az monitor metrics alert create \
--name "High CPU Usage" \
--resource-group myResourceGroup \
--scopes /subscriptions/{subscription-id}/resourceGroups/myResourceGroup/providers/Microsoft.Compute/virtualMachines/myVM \
--condition "avg Percentage CPU > 80" \
--description "Alert when CPU usage is high"
4. Google Cloud Platform Management
4.1 GCP Resource Management
Managing Google Cloud Platform resources effectively.
gcloud Commands:
# Set default project
gcloud config set project my-project-id
# List compute instances
gcloud compute instances list
# Create compute instance
gcloud compute instances create my-instance \
--zone=us-central1-a \
--machine-type=e2-medium \
--image-family=ubuntu-2004-lts \
--image-project=ubuntu-os-cloud
# List storage buckets
gsutil ls
# Create storage bucket
gsutil mb gs://my-bucket-name
# List Cloud SQL instances
gcloud sql instances list
# Create Cloud SQL instance
gcloud sql instances create my-instance \
--database-version=POSTGRES_13 \
--tier=db-f1-micro \
--region=us-central1
4.2 GCP Cost Management
Implementing cost management strategies for Google Cloud Platform.
Cost Management Techniques:
- Committed Use Discounts: 1-3 year commitments for significant savings
- Sustained Use Discounts: Automatic discounts for long-running instances
- Preemptible Instances: Use spare capacity for non-critical workloads
- Resource Quotas: Set limits to prevent unexpected costs
- Budget Alerts: Monitor spending and set up alerts
5. Infrastructure as Code (IaC)
5.1 Terraform Configuration
Using Terraform for infrastructure provisioning and management.
Terraform AWS Configuration:
# main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-west-2"
}
# VPC
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "main-vpc"
}
}
# Subnet
resource "aws_subnet" "main" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-west-2a"
map_public_ip_on_launch = true
tags = {
Name = "main-subnet"
}
}
# Security Group
resource "aws_security_group" "main" {
name_prefix = "main-sg"
vpc_id = aws_vpc.main.id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# EC2 Instance
resource "aws_instance" "main" {
ami = "ami-0c02fb55956c7d316"
instance_type = "t3.micro"
subnet_id = aws_subnet.main.id
vpc_security_group_ids = [aws_security_group.main.id]
tags = {
Name = "main-instance"
}
}
5.2 Ansible Configuration Management
Using Ansible for configuration management and automation.
Ansible Playbook Example:
# playbook.yml
---
- name: Configure web server
hosts: webservers
become: yes
vars:
nginx_version: "1.18.0"
app_port: 8080
tasks:
- name: Update package cache
apt:
update_cache: yes
cache_valid_time: 3600
- name: Install nginx
apt:
name: nginx
state: present
- name: Start and enable nginx
systemd:
name: nginx
state: started
enabled: yes
- name: Configure nginx
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
backup: yes
notify: restart nginx
- name: Install Python dependencies
pip:
name:
- flask
- gunicorn
state: present
- name: Create application directory
file:
path: /var/www/app
state: directory
owner: www-data
group: www-data
mode: '0755'
- name: Deploy application
copy:
src: app.py
dest: /var/www/app/app.py
owner: www-data
group: www-data
mode: '0644'
handlers:
- name: restart nginx
systemd:
name: nginx
state: restarted
6. Cloud Security Management
6.1 Identity and Access Management (IAM)
Implementing proper IAM policies and practices.
AWS IAM Policy Example:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::my-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::my-bucket"
},
{
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": "us-west-2"
}
}
}
]
}
6.2 Network Security
Implementing network security best practices in the cloud.
Security Group Configuration:
# Security group for web servers
resource "aws_security_group" "web" {
name_prefix = "web-sg"
vpc_id = aws_vpc.main.id
# Allow HTTP from anywhere
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Allow HTTPS from anywhere
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Allow SSH from specific IP
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"]
}
# Allow all outbound traffic
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
7. Cloud Monitoring and Logging
7.1 CloudWatch Monitoring
Setting up comprehensive monitoring with AWS CloudWatch.
CloudWatch Configuration:
# CloudWatch alarm for high CPU usage
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
alarm_name = "high-cpu-usage"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "300"
statistic = "Average"
threshold = "80"
alarm_description = "This metric monitors ec2 cpu utilization"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
InstanceId = aws_instance.main.id
}
}
# CloudWatch log group
resource "aws_cloudwatch_log_group" "app_logs" {
name = "/aws/ec2/application"
retention_in_days = 30
}
# CloudWatch dashboard
resource "aws_cloudwatch_dashboard" "main" {
dashboard_name = "main-dashboard"
dashboard_body = jsonencode({
widgets = [
{
type = "metric"
x = 0
y = 0
width = 12
height = 6
properties = {
metrics = [
["AWS/EC2", "CPUUtilization", "InstanceId", aws_instance.main.id],
[".", "NetworkIn", ".", "."],
[".", "NetworkOut", ".", "."]
]
period = 300
stat = "Average"
region = "us-west-2"
title = "EC2 Instance Metrics"
}
}
]
})
}
7.2 Centralized Logging
Implementing centralized logging solutions across cloud providers.
ELK Stack Configuration:
# Elasticsearch configuration
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.15.0
environment:
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ports:
- "9200:9200"
volumes:
- elasticsearch_data:/usr/share/elasticsearch/data
logstash:
image: docker.elastic.co/logstash/logstash:7.15.0
ports:
- "5044:5044"
volumes:
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
depends_on:
- elasticsearch
kibana:
image: docker.elastic.co/kibana/kibana:7.15.0
ports:
- "5601:5601"
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
depends_on:
- elasticsearch
volumes:
elasticsearch_data:
Download the Complete Guide
Get the full PDF version with additional cloud management techniques, automation scripts, and best practices.
Download PDF