Google Cloud
Virtual Machines

DE53 – Cloud Infrastructure

Antoine · Arnaud · Benjamin

Agenda

01 – Compute Engine
02 – Common Compute Engine Actions
03 – VM Access & Lifecycle
04 – Compute Options
05 – Compute Pricing
06 – Special Compute Configurations
07 – Images
08 – Disk Options

01

Compute Engine

What is Compute Engine?

Infrastructure as a Service (IaaS)
Run virtual machines on Google's global infrastructure
No upfront investment – pay per second for what you use
Scale to thousands of vCPUs on demand
Per-second billing (1-minute minimum)

Key Features

High-performance and customizable virtual machines
Choice of CPU platform: Intel or AMD
Automatic sustained use discounts
Global load balancing and autoscaling support
Native integration with all Google Cloud services

Available Resources: compute

1 vCPU = 1 hardware hyper-thread (not a full physical core)
Network throughput: ~2 Gbps per vCPU
Maximum: 200 Gbps at ~176 vCPUs
Predefined machine types or fully custom configurations
Four machine families: General-purpose, Compute-Optimized, Memory-Optimized, Accelerator-optimized

Available Resources: Storage

Standard Persistent Disk (HDD) – large sequential workloads, low cost
SSD Persistent Disk – low-latency random I/O, databases
Local SSD – physically attached, ephemeral, very high Input / Output Operations Per Second (IOPS)
Cloud Storage – object storage accessed over the network

For storage, four main options exist. Standard HDD persistent disks are backed by magnetic drives, lowest cost, best for large sequential workloads. SSD persistent disks use solid-state drives for low-latency random I/O, ideal for databases. Local SSDs are physically attached to the host machine, giving the highest possible IOPS, but they are ephemeral, data is lost if the VM is stopped or terminated. Cloud Storage is object storage for unstructured data like images, backups, and datasets, accessed over the network. We'll cover disk options in detail later in the session.

Available Resources: Networking

Virtual Private Cloud (VPC) – your isolated network
Firewall rules – control inbound and outbound traffic
Regional or global load balancing
Throughput: ~2 Gbps/vCPU, max 200 Gbps
Network throughput shared with Disk I/O bandwidth

Tensor Processing Units (TPU)

Google's custom Application-Specific Integrated Circuits (ASIC) designed for machine learning
First deployed internally in 2015, announced publicly in 2016
Accelerates TensorFlow matrix computation workloads
Available via the Cloud TPU service
Separate billing model from standard VMs

02

Common Compute Engine Actions

Instance Metadata & Startup Scripts

Every VM has a metadata server (IP: 169.254.169.254)
Stores key-value pairs: project info, zone, custom values
Startup scripts: run automatically at every boot
Shutdown scripts: run on graceful shutdown
Large scripts can be referenced from Cloud Storage

Every Compute Engine VM can query a metadata server at the fixed link-local address 169.254.169.254. This server exposes both standard metadata, project ID, zone, machine type, service account token, and any custom key-value pairs you define. Startup scripts are executed automatically at boot time. They're commonly used to install software, pull the latest application code, or configure services without baking everything into a custom image. Shutdown scripts run on graceful shutdown, allowing you to flush buffers or deregister from a load balancer before the VM stops. For large scripts, you store them in Cloud Storage and reference them via a metadata URL. This is a powerful pattern for keeping VM images lean while still automating configuration.

Moving a VM Instance

Within the same region: gcloud compute instances move
Automated: moves VM and its persistent disks together
Cross-region: manual, snapshot → create new disk → new VM
Update DNS, IP references, and load balancer configs after move

Snapshots: Backup & Migration

Take a snapshot of any persistent disk at any time
Stored in Cloud Storage – durable and globally accessible
Incremental: only changed blocks stored after the first snapshot
Use case 1: backup – protect against accidental data loss
Use case 2: migration – restore in a different zone or region

Persistent disk snapshots are one of Compute Engine's most powerful features. You can snapshot a disk at any time, even while the VM is running and the disk is in use. Snapshots are backed by Cloud Storage, making them durable and replicated. They are incremental: the first snapshot captures the entire disk, and subsequent ones only store the blocks that changed since the last snapshot, making them fast to create and efficient in storage usage. Snapshots are excellent for backups, if you accidentally delete data, you can restore it. They're also the standard mechanism for migrating data across zones or regions: snapshot here, restore there.

Snapshots: HDD → SSD Migration

And if Performance needs have grown since the VM was created?

Snapshot the existing HDD persistent disk
Create a new SSD persistent disk from that snapshot
Attach the new SSD disk to the VM
Result: significant IOPS boost with zero data loss

Persistent Disk Snapshot Features

Incremental storage – very efficient after the first snapshot
Schedulable – automate daily/weekly backups
Restore across zones within the same region
Deleting one snapshot doesn't destroy the chain
Use to create new bootable disks or entire VM instances

Key properties of snapshots to remember. They are incremental, after the first full snapshot each subsequent one only stores changed blocks, keeping storage costs low. You can schedule snapshots automatically on a daily or weekly basis to maintain a rolling backup window. Snapshots can restore to disks in any zone within the same region. If you delete a snapshot in the middle of a chain, Google automatically re-chains the surrounding snapshots, no data loss. Finally, you can create new bootable persistent disks from snapshots, which means you can also create entirely new VMs from them, very useful for deploying multiple identical environments.

Resize a Persistent Disk

Disks can be grown while the VM is running
No need to stop the VM or detach the disk
After resizing: also resize the filesystem inside the OS
Disks can only be enlarged – shrinking is not supported
Works for both HDD and SSD persistent disk types

Another powerful feature of persistent disks is live resizing. You can grow a disk while the VM is running and the disk is attached, no downtime required. However, there are two distinct steps: first, you resize the disk at the Google Cloud infrastructure level (via Console, gcloud, or API); second, you resize the filesystem inside the operating system to claim the new space. On Linux this typically means running resize2fs or xfs_growfs. You can only increase disk size, never decrease it, plan ahead with a reasonable initial size. This flexibility means you can start small and grow as your data grows, paying only for what you actually need.

03

VM Access & Lifecycle

Accessing Linux VMs: SSH

Protocol: SSH over TCP port 22
Requires a firewall rule allowing tcp:22 to the VM
Option 1: "SSH" button in Cloud Console (browser terminal)
Option 2: gcloud compute ssh INSTANCE_NAME (handles keys automatically)
Option 3: Any SSH client with your public key uploaded to metadata

To connect to a Linux VM you use SSH, Secure Shell, over TCP port 22. The firewall must allow this traffic. Google provides three convenient access methods. The easiest is the "SSH" button in the Cloud Console, it opens a browser-based terminal and handles all key generation and injection automatically, no configuration on your part. Second, the gcloud CLI's compute ssh command handles SSH key management for you and works from any terminal. Third, if you prefer your own SSH client like PuTTY or the OpenSSH client, you can manually upload your public key to the instance metadata or the project-wide SSH keys, then connect using your client with the VM's external IP address.

Accessing Windows VMs: RDP

Protocol: RDP over TCP port 3389
Requires a firewall rule allowing tcp:3389
Step 1: Set Windows password in Cloud Console ("Set Windows Password")
Step 2: Connect with any RDP client using the VM's external IP
Cloud Console also offers an in-browser RDP option

For Windows VMs, you use RDP, Remote Desktop Protocol, over TCP port 3389. You need a firewall rule allowing this port. The workflow is: in the Cloud Console, click the arrow next to the "RDP" button and choose "Set Windows Password." Google generates or you enter a username, and the console returns the password. Open any RDP client, Microsoft Remote Desktop on macOS, the built-in mstsc on Windows, or Remmina on Linux, and connect using the VM's external IP and those credentials. Google also provides an in-browser RDP option using the Chrome Remote Desktop browser extension, similar to the SSH button, for browser-based access without any local client.

VM Lifecycle: Main States

Provisioning: resources being allocated by Google

Staging: resources acquired, operating system booting

Running: VM fully operational and accessible

Stopping: VM shutting down gracefully

Terminated: VM stopped, CPU/RAM freed, disk persists

VM Lifecycle: Additional States

Suspended: VM hibernated, memory saved to disk
Repairing: Google recovering the VM from a hardware failure
Suspended VM: no CPU/memory charges; disk and IP still billed
From Terminated: can restart, delete, change machine type, or change image

Changing VM state from running

	Methods	Shutdown Script time	State
reset	`console`, gcloud, API, OS	no	remains running
start	`console`, gcloud, API	no	terminated ➜ running
reboot	OS: `sudo reboot`	~90 sec	running ➜ running
stop	`console`, gcloud, API	~90 sec	running ➜ terminated
shutdown	OS: `sudo shutdown`	~90 sec	running ➜ terminated
delete	`console`, gcloud, API	~90 sec	running ➜ N/A
preemption	automatic	~30 sec	N/A

"ACPI Power Off"

Availability Policy

On host maintenance:

Live migrate (default): VM moves transparently, no interruption
Terminate: VM stops during host maintenance event

Automatic restart: auto-restart after hardware failure?
VMs with GPUs and preemptible VMs cannot live-migrate

The availability policy controls VM behavior during Google's scheduled infrastructure maintenance events. "On host maintenance" has two options. Live migrate, the default, moves your running VM transparently to different hardware while it keeps running. You won't notice. This is the right choice for most workloads. Terminate stops the VM during the maintenance event, use this only if your workload cannot tolerate even a brief live migration. "Automatic restart" controls whether Google restarts your VM automatically after a hardware failure or maintenance event that required termination. Leave this enabled for stateless workloads. Note: VMs with GPU attachments and preemptible VMs cannot live-migrate due to hardware constraints.

OS Patch Management

Managed service to keep VM operating systems up to date
Patch compliance reporting: which VMs are missing patches?
Patch deployment: apply patches across your entire VM fleet
Schedule patch jobs during maintenance windows
Supports both Linux and Windows VMs

OS Patch Management is a Compute Engine service that helps you keep your VMs patched and secure. The compliance reporting feature gives you a dashboard of your entire VM fleet showing which instances are missing patches and the severity of those missing patches. The patch deployment feature lets you run patch jobs across your fleet, you can target all VMs, or filter by labels to patch only specific groups. Patch jobs can be scheduled to run during low-traffic windows to minimize disruption. This service works for both Linux, it uses the native package manager, and Windows VMs via Windows Update. For organizations with compliance requirements, this service also provides the audit logs needed to prove that patches were applied.

Billing When a VM is Stopped

Terminated VM: no charge for vCPU and memory
Persistent disks: still charged while VM is stopped
Reserved static external IP: still charged if not in use
Custom images stored: charged for storage
Tip: delete or release unused resources to avoid surprise charges

An important billing nuance. When you stop or terminate a VM, you stop paying for the vCPU and memory, the compute costs stop. However, any persistent disks attached to the VM continue to be charged because the data is still stored on Google's infrastructure. Similarly, if you have a static external IP address reserved but not attached to a running VM, you are charged for it, Google bills for idle reserved IPs to encourage efficient address space usage. Custom images you've created are also billed as storage. The bottom line: stopping a VM is a great way to save money on idle resources, but always remember to release or delete any attached resources you no longer need to avoid ongoing charges.

04

Compute Options

Three Ways to Create a VM

Google Cloud Console – visual, guided, great for exploration
gcloud CLI – scriptable, repeatable, fast
REST API – programmatic, for custom tooling and Infrastructure as Code

All three expose the same configuration options.

Machine Type Naming Convention

[FAMILY]-[TYPE]-[vCPUs]

n1-standard-4 → N1 family, standard memory ratio, 4 vCPUs, 15 GB RAM
e2-medium → E2 family, medium (2 vCPUs, 4 GB)
c2-standard-8 → C2 family, standard ratio, 8 vCPUs
standard = balanced | highmem = more RAM | highcpu = less RAM

Machine Families: General-Purpose

E2: cost-optimized, shared-core options, best price-performance
N1: flexible, supports GPUs & TPUs, widest OS/feature support
N2: high-performance Intel Cascade Lake / Ice Lake
N2D: AMD EPYC – largest VMs in general-purpose category
Tau T2D: scale-out Intel x86 | T2A: ARM Ampere Altra

The general-purpose family is the most versatile and covers the vast majority of workloads. E2 machines are the most affordable, they're great for small applications, development environments, and microservices. They also have shared-core variants (e2-micro, e2-small) that run on a fraction of a physical core. N1 is the legacy general-purpose series, still widely used because it supports GPUs, TPUs, and has the broadest compatibility. N2 uses newer Intel processors and offers meaningfully higher performance than N1 for demanding applications. N2D runs on AMD EPYC and offers the largest VMs in the general-purpose family. Tau T2D is optimized for scale-out x86 workloads, and T2A is Google's first ARM machine type using Ampere Altra processors, great for containerized and cloud-native workloads with better performance-per-watt.

Machine Families: Compute-Optimized

C2: Intel Cascade Lake – highest single-thread performance
C2D: AMD EPYC Milan – highest performance per core in GCP
H3: Intel Sapphire Rapids – latest HPC-grade hardware
Best for: High Performance Computing, gaming servers, EDA, CPU-bound simulations

The compute-optimized family is designed for workloads where raw CPU performance is the primary concern. C2 machines use Intel Cascade Lake and provide the highest single-thread clock speed available in Compute Engine, critical for latency-sensitive workloads and applications that don't parallelize well. C2D uses AMD EPYC Milan processors and offers extremely high performance-per-core, making it excellent for scientific simulations and HPC jobs. H3 is the newest generation using Intel Sapphire Rapids, targeting high-performance computing at scale. Common use cases: electronic design automation, gaming servers, CPU-intensive batch workloads, and financial modeling. These machines cost more per vCPU than general-purpose, so use them only when you genuinely need the extra CPU muscle.

Machine Families: Memory-Optimized

M1: up to 4 TB of RAM – in-memory databases
M2: up to 12 TB of RAM – largest VM on Google Cloud
M3: latest generation, higher memory bandwidth
Best for: SAP HANA, in-memory analytics, large caches
Very high memory-to-vCPU ratio, premium pricing

The memory-optimized family is for workloads that need enormous amounts of RAM. M1 can go up to 4 terabytes. M2 goes even further, up to 12 terabytes, making it the largest single VM available on Google Cloud. M3 is the latest generation with improved memory bandwidth and latency characteristics. Primary use cases are in-memory databases like SAP HANA, real-time analytics platforms that hold entire datasets in RAM, large-scale caching systems, and any workload where disk I/O would be a bottleneck. These machines are priced at a premium, they're only the right choice when your workload genuinely requires holding terabytes of data in memory to function.

Machine Families: Accelerator-Optimized

A2: NVIDIA A100 GPUs – ML training & inference at scale
G2: NVIDIA L4 GPUs – efficient inference, video transcoding
[A2]: high-bandwidth NVLink between GPUs for LLM training
Best for: deep learning, GPU computing, graphics rendering
Cannot live-migrate (GPU hardware constraint)

The accelerator-optimized family is for GPU-powered workloads. A2 machines use NVIDIA A100 GPUs , Google Cloud's most powerful GPU option. Multiple A100s can be connected via NVLink for very high inter-GPU bandwidth, which is essential for training large language models and transformer architectures. G2 machines use NVIDIA L4 GPUs which are highly efficient and well-suited for inference serving, video transcoding, and virtual workstations. Use cases include deep learning training, AI model inference, GPU-accelerated computing, and 3D rendering. Important: VMs with GPUs cannot use live migration because the GPU must stay on the same physical host, so plan for maintenance event handling accordingly.

Custom Machine Types

No predefined type fits? Create a custom one.
vCPU: choose 1, or any even number
Memory: 1 to 8 GB per vCPU, in multiples of 256 MB
Cost: slightly higher than nearest predefined equivalent
Extended memory: go beyond 8 GB/vCPU, up to 624 GB total

Custom machine types let you specify the exact amount of vCPUs and memory you need, rather than being constrained to predefined options. The rules are: vCPU count must be 1 or any even number. Memory must be between 1 and 8 GB per vCPU, in multiples of 256 MB. So if your application needs exactly 6 vCPUs and 22 GB of RAM, you can create that precise configuration. Custom machine types are priced slightly higher than an equivalent predefined type, so always check if a predefined type is close enough before going custom. Extended memory is also available, it lets you exceed the 8 GB/vCPU limit up to a total of 624 GB, for workloads with unusual memory requirements that don't need a full memory-optimized machine.

05

Compute Pricing

Pricing Fundamentals

Per-second billing with a 1-minute minimum
Resource-based: vCPU and memory priced separately
Price varies by region and machine family
Premium OS images (RHEL, Windows) add licensing charges
Use the Google Cloud Pricing Calculator to estimate costs

Compute Engine uses per-second billing, you pay only for the seconds your VM is running, with a minimum charge of one minute per VM. This is more granular than hourly billing and means ephemeral VMs are cheaper. Pricing is resource-based: vCPU and memory are priced separately, which enables custom machine types. Prices vary by region, some regions cost more than others, and by machine type family. If you choose a premium OS image like RHEL, SUSE, or any Windows Server variant, there are additional per-second licensing charges on top of the infrastructure cost. For estimating costs before deploying, use the Google Cloud Pricing Calculator, it models all discount programs and lets you compare scenarios.

Preemptible & Spot VMs: Pricing

Up to 91% discount vs. regular on-demand VMs
Trade-off: Google may terminate at any time
Preemptible: max 24-hour runtime
Spot VMs: no maximum runtime – newer model
Best for: batch processing, fault-tolerant, stateless workloads

Preemptible and Spot VMs can give you a discount of up to 91% compared to on-demand pricing, that is a massive saving and worth designing for when possible. The trade-off is that Google can reclaim these VMs at any time to serve other customers. Preemptible VMs have a hard 24-hour limit, they will always stop after 24 hours regardless. Spot VMs are the newer model that removes this 24-hour cap, but they can still be preempted at any time. You get a 30-second preemption warning before the VM is stopped (though not guaranteed). The best use cases are batch jobs where the work can checkpoint and resume, scientific simulations, and any stateless fault-tolerant workload. One trick: use monitoring and load balancers to automatically replace preempted VMs, building a self-healing cheap compute pool.

Committed Use Discounts (CUDs)

Commit to use specific resources for 1 or 3 years
Up to 57% discount for general-purpose machine types
Up to 70% discount for memory-optimized machine types
Commitment is on vCPU and memory, not a specific VM
No upfront payment required

Committed Use Discounts give you the deepest savings in exchange for committing to use a minimum level of compute resources over 1 or 3 years. You can get up to 57% off general-purpose machine types and up to 70% off memory-optimized types. A critical distinction from AWS Reserved Instances: the commitment in Google Cloud is on vCPU and memory resources as abstract quantities, not tied to a specific VM instance or machine type. This gives you flexibility, you can freely start, stop, and change VMs as long as you're consuming the committed resource level. No upfront payment is required, you simply commit to paying the discounted rate for the term, whether you use the resources or not.

Sustained Use Discounts: Effective Rate

50% usage → ~10% effective discount
75% usage → ~20% effective discount
100% usage → 30% effective discount

Sustained Use Discounts: Example

VM Sizing Recommendations

Compute Engine automatically identifies over-provisioned VMs
Recommendations appear 24 hours after VM creation
Suggest downsizing to a smaller, cheaper machine type
Compute Engine also has a free usage tier
Always validate with the Google Cloud Pricing Calculator

To help you right-size your VMs and avoid overpaying, Compute Engine provides automatic VM sizing recommendations. These appear in the Cloud Console approximately 24 hours after you create a VM. If your VM consistently uses significantly less CPU or memory than it's provisioned for, Compute Engine will surface a recommendation suggesting a smaller, cheaper machine type. These can save real money in production environments where over-provisioning is common. Also worth noting: Compute Engine has a free tier, for example, one e2-micro instance per month in specific US regions is free, useful for development and testing. For accurate production cost estimates, always model your workload in the Google Cloud Pricing Calculator.

Question 3

What is the maximum discount you can get with Committed Use Discounts on memory-optimized machine types?

Answer: C

70%, Committed Use Discounts on memory-optimized machine types can reach up to 70% off.

A (30%) is the max Sustained Use Discount
B (57%) is the CUD discount for general-purpose types
D (91%) is the Preemptible / Spot VM discount

06

Special Compute Configurations

Preemptible VMs

Up to 91% cheaper than regular on-demand VMs
May be preempted at any time – no charge if within the first minute
Maximum runtime: 24 hours
30-second preemption warning (not guaranteed)
No live migration, no automatic restart

Preemptible VMs are ideal for running batch workloads at a dramatically reduced cost, up to a 60 to 91% discount. The trade-off is that Google can terminate them at any time to reclaim capacity for higher-priority tasks. If a preemptible VM is terminated within the first minute, you're not charged at all. They have a hard 24-hour runtime limit, they always stop after 24 hours. Google sends a 30-second terminate notification before preemption, though this isn't guaranteed. There are no live migrations and no automatic restarts. However, you can build resilience externally: use monitoring and load balancers to detect terminated preemptible VMs and automatically spin up new ones, creating a self-healing cheap compute pool. This pattern is ideal for batch processing jobs, if some instances terminate, the job slows but doesn't stop.

Spot VMs

The latest evolution of preemptible VMs
Same pricing model, up to 91% discount
No 24-hour maximum runtime (key improvement over preemptible)
Still finite capacity, may not always be available
Capacity easier to get with smaller machine types

Spot VMs are the successor to preemptible VMs. They share the same dramatic pricing discount, up to 91% off, but remove the 24-hour runtime limit. This makes Spot VMs significantly more useful for longer-running fault-tolerant workloads that previously would hit the preemptible wall. Like preemptible VMs, Compute Engine can terminate Spot VMs at any time to reclaim capacity. The probability of preemption is generally low but varies by zone and machine type depending on current demand. Spot VMs draw from Google's excess and backup capacity, so they're finite resources, they may not always be available in all zones. For the best availability, use smaller machine types with fewer vCPUs and less memory, where Google has more surplus capacity to offer.

Sole-Tenant Nodes

Physical server dedicated exclusively to your project
All VMs on the node belong to you, no other customers' workloads
Use case: compliance (PCI DSS, HIPAA) requiring physical isolation
BYOL: bring existing Windows/software licenses
Can fill with multiple VM sizes, including custom types

Sole-tenant nodes give you a physical Compute Engine server that is dedicated entirely to your project. On a normal host, VMs from multiple different customers can run on the same physical hardware under separate hypervisor contexts. On a sole-tenant node, all VMs are yours, no other customer's workloads share the hardware. This physical isolation is essential for compliance frameworks like PCI DSS for payment processing or HIPAA for healthcare data. Another key advantage: BYOL, Bring Your Own License. If you have existing Windows Server or SQL Server licenses, you can use them on dedicated hardware with the in-place restart feature, potentially saving significant licensing costs. You can pack the node with VMs of different sizes, including custom machine types.

Shielded VMs

Provides verifiable integrity for VM instances
Secure Boot: only trusted signed bootloaders and kernels
vTPM: virtual Trusted Platform Module for attestation
Integrity monitoring: detects boot-time rootkits
Requires selecting a shielded image

Shielded VMs are part of Google's Shielded Cloud Initiative. They provide verifiable integrity, you can be confident that the VM hasn't been compromised by boot-level malware or rootkits. The three pillars are: Secure Boot, which ensures only software signed by trusted parties (Google, OEMs, or you) can be used to boot the VM; the virtual Trusted Platform Module (vTPM), which provides hardware-level cryptographic attestation and secure key storage; and Integrity Monitoring, which compares the current boot sequence against a known-good baseline and alerts if something unexpected happens at boot time. To use Shielded VM features you must select a shielded image, most standard public images from Google are shielded.

Confidential VMs

Encrypts data while it's being processed (data in use)
No code changes required – encryption is transparent
Runs on N2D with AMD EPYC "Rome" + AMD Secure Encrypted Virtualization
High memory capacity, high throughput, parallel workload support
Google has no access to the encryption keys

Confidential VMs take VM security a step further, they encrypt data not just at rest and in transit, but also while it's actively being processed in memory. This is achieved using AMD Secure Encrypted Virtualization, which uses cryptographic keys generated and managed by the AMD EPYC processor hardware itself. The VM runs on N2D machine types powered by AMD EPYC Rome processors. The critical guarantee: Google does not have access to the encryption keys, they are generated inside the CPU and never leave the silicon. This means even Google employees cannot read the VM's memory. Your code needs no changes, the encryption happens transparently at the hardware level. This is invaluable for industries handling highly sensitive data where even cloud provider trust is a concern.

Question 4

Which special VM type encrypts data while it is being processed in memory?

Shielded VM
Spot VM
Sole-tenant node
Confidential VM

Answer: D

Confidential VM, encrypts data in use via AMD Secure Encrypted Virtualization, even Google cannot access the keys.

A (Shielded VM) protects boot integrity, not data in memory
B (Spot VM) is purely a pricing model
C (Sole-tenant node) provides physical isolation, not encryption

07

Images

What's in an Image?

Boot loader – initializes hardware, starts the OS
Operating system – kernel, init system, base utilities
File system structure – directory layout, config files
Software – pre-installed packages and tools
Customizations – your application, settings, agents

When you create a VM you choose a boot disk image. An image is essentially a frozen snapshot of a disk that contains everything the VM needs to start and operate. The boot loader initializes the hardware and loads the OS. The operating system is the core, kernel, init system, system libraries. The file system structure provides the directory layout and default configuration files. Pre-installed software might include monitoring agents, security tools, or runtime environments. Customizations are what you add on top, your application code, specific configuration, or compliance tooling. Think of an image as a template from which identical VMs can be instantiated quickly.

Public Base Images

Provided by Google, third-party vendors, and the community
Linux: CentOS, CoreOS, Debian, RHEL(p), SUSE(p), Ubuntu, FreeBSD
Windows: Server 2019(p), 2016(p), 2012-r2(p) + SQL Server pre-installed
Images marked (p) are premium – additional per-second licensing charges
Premium image prices are global – do not vary by region

Google provides a rich library of public base images. For Linux, you have CentOS, CoreOS, Debian, Red Hat Enterprise Linux, SUSE, Ubuntu, openSUSE, and FreeBSD. On the Windows side, there's Windows Server 2019, 2016, and 2012 R2, as well as images with SQL Server 2019 or 2017 pre-installed. Images marked with "(p)" are premium, they carry additional per-second licensing charges on top of the infrastructure costs. RHEL and SUSE are commercial distributions with Red Hat and SUSE support contracts baked in. All Windows Server images are also premium due to Microsoft licensing. Premium image prices are global and uniform, they don't vary by region or zone, unlike compute infrastructure pricing.

Custom Images

Create from an existing VM with pre-installed software
Import from on-premises, workstation, or another cloud
Management: image sharing across projects, image families, deprecation
Import is a no-cost service – just install an agent
Image families always resolve to the latest non-deprecated version

Custom images let you build your own gold images tailored to your organization. You can create a custom image from an existing VM that has your software pre-installed, your agents configured, and your security settings hardened. You can also import virtual disk images from on-premises VMware or Hyper-V environments, from your own workstation, or from other cloud providers, this is a no-cost service that just requires installing a migration agent. Once you have custom images, you can share them across multiple Google Cloud projects in your organization. Image families are a powerful feature: when you reference an image family rather than a specific image version, Compute Engine always uses the latest non-deprecated image in that family, so you can push new versions and all new VMs automatically use the latest without changing deployment scripts.

Machine images

Scenarios	Machine image	Persistent disk snapshot	Custom image	Instance template
Single disk backup	Yes	Yes	Yes	No
Multiple disk backup	Yes	No	No	No
Differential backup	Yes	Yes	No	No
Instance cloning	Yes	No	Yes	Yes
Base image replication	No	No	Yes	No

08

Disk Options

Boot Disk

Every VM gets a single root persistent disk at creation
The chosen image is loaded onto this disk at first boot
Bootable: detach and attach to another VM to boot from it
Durable: survives VM termination by default
Uncheck "Delete boot disk when instance deleted" to keep it on VM deletion

Persistent Disks: Overview

Network-attached block storage (not physically attached to host)
Durable: survives VM termination
Resizable while running and attached
Attach in read-only mode to multiple VMs simultaneously
Zonal or Regional (active-active replication across 2 zones)

Persistent disks are the primary storage workhorse for Compute Engine. They are network-attached block storage, not physically connected to the VM host hardware. This separation is what makes them durable: if the host hardware fails, the disk data survives because it lives independently on Google's storage infrastructure. You can resize them dynamically while the VM is running without any downtime. You can attach a persistent disk in read-only mode to multiple VMs at the same time, which is great for sharing reference datasets or static content without replicating it. Zonal persistent disks are replicated within a single zone. Regional persistent disks are synchronously replicated across two zones in the same region, higher availability for critical databases.

Persistent Disk Types

pd-standard: HDD – large sequential workloads, lowest cost
pd-ssd: SSD – low latency, high IOPS for databases
pd-balanced: SSD – good performance-to-cost balance
pd-extreme: SSD (zonal only) – highest IOPS, user-provisionable
All types: encrypted at rest by default (Google-managed keys)

Four persistent disk types to choose from. pd-standard is magnetic HDD, cheapest per GB, suitable for large sequential workloads where latency isn't critical, like data pipelines and cold storage. pd-ssd is backed by solid-state drives, low latency and high IOPS, ideal for transactional databases, high-performance applications. pd-balanced is also SSD but with a balanced IOPS-per-GB ratio, it has the same maximum IOPS as pd-ssd but lower IOPS per gigabyte, making it a cost-effective choice for general-purpose workloads. pd-extreme is zonal-only and designed for high-end database workloads, uniquely, you provision your desired IOPS value rather than having it scale automatically with disk size. All four types are encrypted at rest by default using Google-managed keys. You can optionally use customer-managed keys via Cloud KMS or customer-supplied keys if your compliance requirements demand it.

Local SSD Disks

Physically attached to the VM's host machine
Each partition: 375 GB; up to 24 partitions per VM = 9 TB
Very high IOPS and throughput, very low latency
Data survives a reset, but lost on stop or terminate
VM-specific – cannot be detached and reattached to another VM

Local SSDs are fundamentally different from persistent disks. They are physically attached to the host server where the VM runs, there is no network hop, which gives them extremely high IOPS and very low latency, far exceeding any persistent disk type. Each partition is 375 GB, and you can attach up to 24 partitions for a total of 9 TB of local SSD storage per instance. However, they are ephemeral: data survives a VM reboot (reset), but is permanently lost if the VM is stopped or terminated. They are also VM-specific, you cannot detach a local SSD and move it to another VM. Use them for temporary scratch space, high-throughput caching, and buffers where you can regenerate the data if the VM stops.

RAM Disk (tmpfs)

Mount a tmpfs filesystem to store data in RAM
Faster than local SSD; useful when application needs a filesystem API
Extremely volatile: data lost on any restart or shutdown
Requires a larger machine type to have enough RAM for data
Pair with a persistent disk for periodic backups of RAM data

If you need the absolute fastest I/O and are willing to accept total volatility, you can use a RAM disk by mounting a tmpfs filesystem inside the VM. This stores data directly in RAM, faster than any SSD. It's useful when your application expects to interact with a file system (directories, file operations) but the data is purely temporary, like an in-memory computation scratch space or a high-speed cache. Critically, any restart or shutdown of the VM wipes all data in the RAM disk completely. You'll need a machine type with enough RAM to hold both the application's working set and the data you're storing in the RAM disk. If you use a RAM disk for anything semi-important, set up a background process to periodically sync it to a persistent disk.

Summary of disk options

	Persistent disk HDD	Persistent disk SSD	Local SSD disk	RAM disk
Data redundancy	Yes	Yes	No	No
Encryption at rest	Yes	Yes	Yes	N/A
Snapshotting	Yes	Yes	No	No
Bootable	Yes	Yes	Not	Not
Use case	General, bulk file storage	Very random IOPS	High IOPS and low latency	Low latency and risk of data loss

Maximum Persistent Disks per VM

Shared-core machines: maximum 16 persistent disks
Standard, High-Memory, High-CPU, Memory-optimized, Compute-optimized: maximum 128 disks
Network bandwidth and Disk I/O share the same throughput budget
Heavy disk I/O competes with network egress/ingress

There are limits on how many persistent disks you can attach to a VM, depending on the machine type. Shared-core machines, the micro and small variants, are limited to 16 disks. All other machine types support up to 128 persistent disks, allowing you to create enormous storage capacity on a single host. An important caveat: the ~2 Gbps per vCPU network throughput I mentioned earlier is shared between network traffic and disk I/O. If you're doing heavy concurrent disk reads and writes across many attached disks, that bandwidth competes with your network egress and ingress. Keep this in mind when designing I/O-heavy workloads, you may need to increase the vCPU count to get more aggregate bandwidth.

Cloud Disks vs. Physical Disks

Physical disk: must partition, repartition, and reformat to grow
Cloud persistent disk: simply resize (grow) and extend the filesystem
Redundancy: built in – no RAID arrays needed
Snapshots: built-in service – no extra backup software
Encryption at rest: automatic – or bring your own keys

Let me close the disk section by comparing cloud persistent disks to traditional physical hard drives. With physical disks, growing storage is painful, you have to repartition, potentially reformat, and worry about data migration. With cloud persistent disks, you simply grow the disk and extend the filesystem inside the OS. Redundancy is built into the storage infrastructure, no need for RAID configurations or disk mirroring. Snapshot functionality is a built-in cloud service, no third-party backup agent required. Encryption at rest is automatic, Google handles it, or you bring your own keys via Cloud KMS. The operational overhead of managing cloud disks is dramatically lower than managing physical ones because Google abstracts away all the hard parts.

Quiz

Let's test what you learned!

Question 1

Which statement is true of persistent disks?

Persistent disks are encrypted by default.
Once created, a persistent disk cannot be resized.
Persistent disks are physical hardware devices connected directly to VMs.
Persistent disks are always HDDs (magnetic spinning disks).

Answer: A

Persistent disks are encrypted by default.

Question 2

What are sustained use discounts?

Automatic discounts for running specific Compute Engine resources for a significant portion of the billing month
Discounts you receive by using preemptible VM instances
Purchase commitments for specific resources you know you will use
Per-second billing that starts after a 1 minute minimum

Answer: A

Automatic discounts that you get for running specific Compute Engine resources for a significant portion of the billing month.

Question 3

Which statement is true of Virtual Machine Instances in Compute Engine?

All Compute Engine VMs are single tenancy and do not share CPU hardware.
Compute Engine uses VMware to create Virtual Machine Instances.
In Compute Engine, a VM is a networked service that simulates the features of a computer.
A VM in Compute Engine always maps to a single hardware computer in a rack.

Answer: C

In Compute Engine, a VM is a networked service that simulates the features of a computer.

Key Takeaways

Compute Engine: flexible IaaS, any machine type, any OS, any scale
Right machine family + right pricing model = significant cost optimization
Special configs: preemptible, shielded, confidential, sole-tenant
Images: public, custom, machine images, powerful deployment templates
Disks: persistent (durable), local SSD (fast), RAM (fastest) – choose wisely

Questions?

That concludes our presentation on Google Cloud Virtual Machines. Five key takeaways. First, Compute Engine is a flexible IaaS platform that lets you run any machine type, any OS, at any scale. Second, choosing the right machine family and applying the right pricing model, sustained use discounts, committed use discounts, or preemptible/Spot VMs, can lead to very significant cost savings. Third, special configurations like preemptible VMs, shielded VMs, confidential VMs, and sole-tenant nodes address specific needs around cost, security, and compliance. Fourth, images are powerful deployment templates, public images, custom gold images, and machine images give you flexible options for consistent VM deployment. Fifth, choosing the right disk type, persistent for durability, local SSD for performance, RAM disk for maximum speed, is important for both cost and performance. Thank you all for attending, and we're happy to take questions now.

Google CloudVirtual Machines

DE53 – Cloud Infrastructure

Agenda

01

Compute Engine

What is Compute Engine?

Key Features

Available Resources: compute

Available Resources: Storage

Available Resources: Networking

Tensor Processing Units (TPU)

02

Common Compute Engine Actions

Instance Metadata & Startup Scripts

Moving a VM Instance

Snapshots: Backup & Migration

Snapshots: HDD → SSD Migration

Persistent Disk Snapshot Features

Resize a Persistent Disk

03

VM Access & Lifecycle

Accessing Linux VMs: SSH

Accessing Windows VMs: RDP

VM Lifecycle: Main States

VM Lifecycle: Additional States

Changing VM state from running

Availability Policy

OS Patch Management

Billing When a VM is Stopped

04

Compute Options

Three Ways to Create a VM

Machine Type Naming Convention

Machine Families: General-Purpose

Machine Families: Compute-Optimized

Machine Families: Memory-Optimized

Machine Families: Accelerator-Optimized

Custom Machine Types

05

Compute Pricing

Pricing Fundamentals

Preemptible & Spot VMs: Pricing

Committed Use Discounts (CUDs)

Sustained Use Discounts: Effective Rate

Sustained Use Discounts: Example

VM Sizing Recommendations

Question 3

Answer: C

06

Special Compute Configurations

Preemptible VMs

Spot VMs

Sole-Tenant Nodes

Shielded VMs

Confidential VMs

Question 4

Answer: D

07

Images

What's in an Image?

Public Base Images

Custom Images

Machine images

08

Disk Options

Boot Disk

Persistent Disks: Overview

Persistent Disk Types

Local SSD Disks

RAM Disk (tmpfs)

Summary of disk options

Maximum Persistent Disks per VM

Cloud Disks vs. Physical Disks

Quiz

Question 1

Answer: A

Question 2

Answer: A

Question 3

Answer: C

Google Cloud
Virtual Machines