Disk Storage on Linux VMs in Azure

Microsoft Azure supports the ability to mount VHDs into Azure Virtual Machines running any of the supported Linux distributions. Additionally, the Azure File Service provides a managed logical file system that can be mounted into Ubuntu distributions using the SMB protocol. This post focuses on mounting VHDs into Azure VMs, but also shows how to mount a file system managed by the Azure File Service.

Page Blobs and VHDs

The Azure Blob Service provides two types of blobs: block blobs and page blobs. A block blob contains a single file intended to be read sequentially from start to finish – for example, an image used in a web page. A page blob provides up to 1TB of random-access storage, with the primary use case being as the backing store for a VHD. The Blob Service supports high availability for blobs by storing three copies of each blob in the local region, with the option of storing an additional three copies in a paired remote region. The local copies are updated synchronously while any remote copies are updated asynchronously with a delay of the order of a few minutes. Reads and writes are fully consistent.

An Azure storage account provides a security boundary for access to blobs. The account is specified by an account name that is unique across the entire storage service. Each storage account can comprise zero or more containers, each of which can contain zero or more blobs. There is an upper limit of 500TB of storage in a single account, and that provides the only limit on the number of containers and blobs. An Azure subscription has a soft limit of 20 storage accounts with a hard limit of 50 storage accounts.

Azure Disks

Azure Virtual Machines provides an IaaS compute feature that supports the creation of VMs with per-minute billing. It supports instance sizes varying from a shared core/512MB A0 instance to a 16 core/112GB A9 instance. It also supports the use of several Windows Server versions as well as various Linux distributions, including Ubuntu, CentOS, Suse, and Oracle Linux.

Azure Virtual Machines supports the following types of disk:

  • OS disk
  • Temporary disk
  • Data disk

The OS disk comprises a VHD that is attached to the VM. The VHD is stored as a page blob in Azure Storage so is accessed remotely. This use of Blob Storage means that the OS disk is durable and any flushed writes are persisted to Azure Storage. Consequently, there is no loss of data in the event of a failure of the physical server hosting the VM.

The temporary disk is in the chassis of the physical server hosting the VM, and is intended for use as swap space or other scratch purposes where the complete loss of the data would not be an issue. The contents of the temporary disk are not persisted to Azure Storage and are lost in the event of a server-healing induced migration of the VM. Consequently, the temporary disk should not be used to store data which cannot be recreated.

The data disk comprises a VHD that is attached to the VM. The VHD is stored as a page blob in Azure Storage so is accessed remotely. The number of data disks that can be attached to a VM depends on the instance type. The general rule is that two data disks can be attached to the VM for each CPU core in the instance, with the exception of a 16-core A9 for which only 16 disks may be attached.

A basic Azure VM has two disks initially – the OS disk and the temporary disk. Additional data disks can be attached and detached at any time, provided the disk number limits are adhered to. The disks used by Azure Virtual Machines are standard VHDs. These can be moved from one VM to another. They can also be migrated to and from other locations, such as on-premises. A new disk added to a VM must be partitioned and formatted in a manner appropriate to the OS of the VM to which the disk is attached.

Linux Disks

Azure Virtual Machines use the following arrangement of disks:

  • /dev/sda – OS disk
  • /dev/sdb – temporary disk
  • /dev/sdc – 1st data disk
  • /dev/sdr – 16th data disk

Performance

The OS disk and all data disks are durable with the data persisted as VHDs in page blobs hosted by the Azure Storage Service. This is a shared service which has implemented scalability targets to ensure fair access to shared resources. The scalability targets are documented on MSDN.

The scalability target for a single VHD is:

  • Maximum size: 1TB
  • Maximum requests / second: 500
  • Maximum throughput: 60 MB / second

There are additional scalability targets for a storage account. These vary by region and by the level of redundancy for the storage account. For locally redundant storage in US regions, the scalability target for a storage account are:

  • 20Gb/s ingress
  • 30Gbps egress

These scalability targets impact the use of VHDs in Azure Virtual Machines, since they affect the scalability of storage on a single VM. These targets indicate that there are performance limits of a single VHD and there are also performance limits on a number of VHDs in a single storage account. If all disks are accessed at their maximum throughput a single storage account could support something like 30-40 VHDs. It is crucial that appropriate testing be done on any data-intensive application that makes heavy use of attached disks to identify any performance problems so that remedial action can be taken.

The way to increase storage performance with Azure VMs is to increase the number of disks. For example, two 100GB disks have double the throughput of a single 200GB disk. As the number of VMs increases there arises the possibility of hitting the scalability targets for a storage account. The solution in that event is to use additional storage accounts. This can happen with more than two VMs in a data intensive application, where both VMs had 16 data disks working at full throughput.

Given these various targets it would appear that the optimal solution would be to store each VHD in its own storage account. However, this is unnecessary since the scalability target for a single storage account comfortably exceeds the performance requirements of a fully-loaded VM. Furthermore, doing so adds significantly to the administration overhead and would lead to an early adventure with another scalability target – the hard limit of 50 storage accounts per subscription.

Disk Caching

Azure supports various caching options for data disks:

  • None
  • Read only (write through)
  • Read/Write (write back)

By default, OS disks have read/write caching configured on creation while data disks have no cache configured on creation. The caching options can be specified either when the disk is initially attached to the VM or later. Note that only four disks on each VM can be configured for caching, which limits the utility of caching in larger VMs. It is important to test applications to identify whether caching data disks provides any performance improvements for the application workload.

Trim Support

The VHDs used by OS disks and data disks are persisted as page blobs in Azure Storage. Page blobs are implemented as sparse storage which means that only pages that have actually been written to are stored and billed for. For example, a 1TB page blob which has never been written to occupies no space in Azure Storage and consequently incurs no charges. Azure Storage supports the ability to clear pages no longer needed which means that they are no longer billed.

When a file is deleted in a normal file system the appropriate entries in the partition table are deleted but the underlying storage is not cleared. With a sparse storage system such as a VHD backed by an Azure page blob this means that when a file is deleted the actual pages allocated to the file remain written to and incurring charges.

SSDs are subject to a different phenomenon whereby the memory occupied by a deleted file must be cleared before it can be written to. SSDs support a TRIM capability which file systems can use to clear the memory occupied by deleted files.

TRIM has been implemented in Azure Virtual Machines so that when a file is deleted the space it occupied can be deleted from the underlying page blob. Since this has some performance implications this is a manual process that can be scheduled at a convenient time. TRIM support is a cost optimization not a performance optimization.

Ubuntu 14.04 images in the Azure Gallery support TRIM, which is referred to as discard when listed as a file system option. For example, the following command performs a TRIM operation on the Azure disk mounted on /mnt/data:

# sudo fstrim /mnt/data

TRIM is not provided on CentOS images in the Azure Gallery.

Configuring a Data Disk

Data disks can be attached to an Azure VM in various ways including:

Attaching a data disk exposes it to the VM as a raw iSCSI device that must be configured prior to use. As with any Linux system, this entails the following tasks:

  • Create partitions
  • Install file systems on the partitions
  • Mount the partitions into the file system

The following commands are used to perform these operations:

  • fdisk – manage and view the disk partitions
  • lsblk – view the partition and file system topology of disks
  • mkfs – put a file system onto a disk
  • mount – mount a file system
  • umount – unmount a file system

Partition a Disk

fdisk can be used to partition a disk as well as view information about all the disks on the VM.

The disk layout can be displayed using the following command:

# fdisk -l

The data disk located at /dev/sdc can be partitioned using the following command:

# fdisk –c –u /dev/sdc

The -c parameter turns off DOS-compatibility mode while the -u parameter causes partition sizes to be given in sectors instead of blocks. fdisk provides a wizard for which the following responses can be given to create a partition occupying an entire device:

  • n, p, 1, enter (default), enter (default), p, w.

This creates a new partition named /dev/sdc1 that occupies the whole of the /dev/sdc device.

Make and Mount a File System

The mkfs command is used to put a file system on a partition. The file systems supported by the current kernel are listed in the /proc/filesystems file. In the Linux VMs provided in the Azure Gallery, Ubuntu supports ext2, ext3 and ext4 while CentOS support ext4.

For example, the following command installs the ext4 file system on the /dev/sdc1 partition:

# mkfs -t ext4 -m 1 /dev/sdc1

The -m parameter reserves 1% of the disk for the super-user (down from the default of 5%).

The mount command is used to mount this file system into some mount point on the overall file system of the VM. This mount point is a directory. The following commands create a mount directory – /mnt/data – and then mounts a partition containing an ext4 file system into it:

# mkdir /mnt/data
# mount –t ext4 /dev/sdc1 /mnt/data

The mounted file system can now be used like any other file system on the VM. However, it is not automatically remounted when the VM is restarted. This can be achieved by putting an entry in the /etc/fstab file, which specifies the file systems that are to be mounted automatically on reboot. The /etc/fstab entry contains essentially the same information as used in the mount command. The partition to be mounted can be identified in various ways, including its location (e.g., /dev/sdc1) and the UUID that uniquely identifies it. Note that the UUID is unique across VMs, so using it helps avoid name collisions when disks are moved from one VM to another.

The lsblk command can be used to list the device, partition and file system topology, as well as relevant metadata – including the uniquely identifying UUID. For example, the following command lists a tree structure of the file system topology:

# lsblk ‒‒fs

The data issues by the lsblk command can be configured so that it can be used as the source of data for other commands. For example, the following command outputs only the UUID for the /dev/sdc1 partition:

# lsblk ‒‒noheadings ‒‒output UUID /dev/sdc1

The ‒‒output parameter indicates that only the UUID should be in the output and the ‒‒noheadings indicates that there should be no header for the column, so that nothing but the UUID is in the output (i.e. it is convenient for use in scripts).

The /etc/fstab file contains one line for each file system to be mounted, with each line comprising the following (whitespace or tab delimited) entries:

  1. Physical identification of the file system (e.g., UUID, /dev/sdc1)
  2. Mount point for the file system (e.g., /mnt/data)
  3. Type of file system (e.g., ext4)
  4. File system options (e.g., noatime)
  5. Dump designator (set to 0)
  6. File system check indicator (2 – at boot time, check the file system after checking the boot file system)

When the VM is booted the file systems listed in /etc/fstab are mounted automatically with the options, etc. specified in the file. The following is an example of an entry using the UUID to identify the partition:

UUID=602d265e-1918-4c29-b13b-7caecda395d8 /mnt/data ext4 defaults,nofail,noatime 0 0

In this example, an ext4 file system is mounted on /mnt/data, with the default mount options supplemented by the nofail and noatime options. nofail means that no error is reported if the physical device is not present at boot time, while noatime turns off the writing of the last read time of a file. Note that noatime implies nodiratime, an option often provided with noatime. The 0 indicates that the (obsolete) dump program will not dump the file while the final 0 indicates that the file system should not be checked at boot time.

The df command displays, in human-readable form, the available space on mounted file systems:

$ df –h

 

RAID Arrays of Multiple Disks

A single disk attached to a VM can be up to 1TB and provide 500 IOPS. If either more space or higher performance is needed then multiple disks must be attached to the VM. Depending on the instance size, between 2 and 16 disks can be attached to the VM. The disks can be treated either as just a bunch of disks (JBOD) or as a RAID device.

A JBOD is managed by replicating the single-disk process for all the disks – /dev/sdc, /dev/sdd, etc. However, using a JBOD to improve application performance only works if the application is able to distribute load (evenly) across all the disks. Furthermore, even if the application can spread the data among all the disks the manner in which the data is accessed can prevent the application gaining the performance benefits of the multiple disks.

mdadm can be used to create a software RAID device from a set of raw devices or partitions. It supports the creation of the following types of RAID arrays:

  • RAID0 (striped)
  • RAID1 (mirroring)
  • RAID4 (striping with dedicated parity)
  • RAID5 (striping with distributed parity)
  • RAID6 (striping with multiple distributed parity blocks)

The various RAID levels other than RAID0 provide data security in the event of a failure of a single device. However this is not important in Azure where the underlying storage system provides high availability for individual VHDs. Consequently, only RAID0 is needed for a disk array in Azure.

In planning the deployment of a data-intensive application to Azure Virtual Machines it is important to test the application to identify the optimal disk layout. If the application has not been developed specifically to be performant with a JBOD, it is likely that a RAID0 disk array provides better performance. Disk caching is turned off by default for data disks, but depending on the application the use of read or read-write caching may improve performance. However, only four disks attached to an Azure VM can have caching enabled which limits the utility of caching with disk arrays.

Configuring a Disk Array

The creation of a RAID0 disk array entails the following tasks:

  1. Install mdadm
  2. Create disk array
  3. Configure the disk array
  4. Create partitions
  5. Install file systems on the partitions
  6. Mount the partitions into the file system

Other than using the disk array name instead of a device name, steps 4 through 6 are the same as for a single disk.

mdadm is installed from the repository appropriate to the distribution.

CentOS:

# sudo yum install mdadm

Ubuntu:

#sudo apt-get install mdadm

The following command creates a RAID0 disk array named data on device /dev/md/data using two raw disk devices /dev/sdc and /dev/sdd:

# mdadm ‒‒create /dev/md/data ‒‒name=data ‒‒chunk=8 ‒‒level=0 ‒‒raid-devices=2 /dev/sdc /dev/sdd
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/data started.

The default chunk size for a disk stripe is 512KB. The ‒‒chunk parameter in this example specifies that this disk array be created with 8 KiB chunks per disk.

Many examples specify the RAID0 device as /dev/md0 and do not provide a name. This can cause a name collision problem if the disk array is ever moved to a VM which already has a RAID device. Consequently, it is a good practice to take control of the RAID device name by specifying the ‒‒name parameter and using a device with the same name under /dev/md. When this is done mdadm automatically creates a link from /dev/md/name to /dev/mdN, where N is typically 127 but could be a sequentially lower number if the VM has more than one disk array. (See the comments by Doug Ledford on this page) Similar links are created for partitions created on the disk array. For example:

$ ls -l /dev/md*
brw-rw‒‒‒‒. 1 root disk 9, 127 May 25 02:17 /dev/md127
/dev/md:
total 4
lrwxrwxrwx. 1 root root 8 May 25 02:17 data -> ../md127
-rw‒‒‒‒‒‒‒. 1 root root 59 May 25 02:17 md-device-map

The details of all the arrays on the VM can be viewed as follows:

# mdadm ‒‒detail ‒‒verbose ‒‒scan
ARRAY /dev/md/data level=raid0 num-devices=2 metadata=1.2 name=snowpack-u3:data UUID=2bd91
d37:4bcc51cb:14a2913f:7d74dc0a
devices=/dev/sdc,/dev/sdd

In the example, the fully qualified RAID device name is provided as hostname:array-name, i.e., snowpack-u3:data. The UUID uniquely identifies the RAID0 disk array.

When invoked for a specific device, additional detail is provided. For example:

# mdadm ‒‒detail /dev/md/data
dev/md/data:
Version : 1.2
Creation Time : Sun May 25 19:56:35 2014
Raid Level : raid0
Array Size : 104857584 (100.00 GiB 107.37 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Sun May 25 19:56:35 2014
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 8K
Name : snowpack-u3:data (local to host snowpack-u3)
UUID : 2bd91d37:4bcc51cb:14a2913f:7d74dc0a
Events : 0
Number Major Minor RaidDevice State
0 8 32 0 active sync /dev/sdc
1 8 48 1 active sync /dev/sdd

mdadm uses a config file, mdadm.conf, the location of which varies by distribution.

Centos: /etc/mdadm.conf

Ubuntu: /etc/mdadm/mdadm.conf

Once a disk array has been created, mdadm can be used to create the configuration file as follows (for Ubuntu):

# mdadm ‒‒detail ‒‒verbose ‒‒scan > /etc/mdadm/mdadm.conf

This file is used by mdadm to control the assembly of the disk array when the system is rebooted or restarted.

A specified disk array can be stopped as follows:

# mdadm ‒‒stop /dev/md/data

Once the disk array has been started, it appears in the list displayed by fdisk, as follows:

# fdisk -l

Disk /dev/md127: 107.4 GB, 107374166016 bytes
2 heads, 4 sectors/track, 26214396 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 8192 bytes / 16384 bytes
Disk identifier: 0x00000000

Additional mdadm Configuration for Ubuntu

These instructions work fine on CentOS. However, some additional steps are needed with Ubuntu to ensure the disk array is assembled correctly on reboot. The mdadm.conf configuration file must be added to the initramfs configuration used during the boot process.

The update-initramfs command is used to update initramfs, as follows:

# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-3.13.0-24-generic

The following command displays the mdadm.conf configuration contained in initramfs so it can be used to verify that the initramfs update is successful:

$ gunzip -c /boot/initrd.img-3.13.0-24-generic | cpio -i ‒‒quiet ‒‒to-stdout etc/mdadm/mdadm.conf
ARRAY /dev/md/data level=raid0 num-devices=2 metadata=1.2 name=snowpack-u3:data UUID=8be17
1a8:817ef70d:bbac034e:fe77b402
devices=/dev/sdc,/dev/sdd

This should match the entry in /etc/mdadm/mdadm.conf.

Partition the Raid Device

The remaining process – partitioning the RAID0 device, creating a file system, and mounting the file system – proceeds exactly as for a single disk.

fdisk is used to partition a disk array. For example, create a partition on the /dev/md/data disk array as follows:

# fdisk -c -u /dev/md/data
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xc0586acc.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won’t be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First sector (2048-209715167, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-209715167, default 209715167):
Using default value 209715167
Command (m for help): p
Disk /dev/md/data: 107.4 GB, 107374166016 bytes
2 heads, 4 sectors/track, 26214396 cylinders, total 209715168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 8192 bytes / 16384 bytes
Disk identifier: 0xc0586acc
Device Boot Start End Blocks Id System
/dev/md/data1 2048 209715167 104856560 83 Linux
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.

Create a File System

mkfs is used to create a file system on a device or partition. For example, the following creates an ext4 file system on the partition /dev/md127p1:

# mkfs -t ext4 /dev/md127p1
mke2fs 1.42.9 (4-Feb-2014)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=2 blocks, Stripe width=4 blocks
6553600 inodes, 26214140 blocks
1310707 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
800 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

There used to be guidance that the file systems on RAID0 disk arrays be created with parameters indicating the size of the chunks. This is no longer necessary with ext4, as indicated by the stride and stripe values in the output – since these are precisely the values that would otherwise have been needed.

Mount a File System

Mount is used to mount a file system in a specified directory. For example the following creates a mount point directory, /mnt/data, and then mounts an ext4 file system hosted on /dev/md127p1 into the directory:

# mkdir /mnt/data
# mount -t ext4 /dev/md127p1 /mnt/data

Confirm the file system is mounted, as follows:

# sudo df –l -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 29G 1.1G 27G 4% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 826M 12K 826M 1% /dev
tmpfs 168M 396K 168M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 840M 0 840M 0% /run/shm
none 100M 0 100M 0% /run/user
/dev/sdb1 69G 52M 66G 1% /mnt
/dev/md127p1 99G 60M 94G 1% /mnt/data

A /etc/fstab entry must be made to ensure that the file system is mounted when the VM reboots. The entry is the same as for a single disk, with the appropriate device information. The UUID for the RAID partition can be found using lsblk, as in:

# lsblk ‒‒output UUID /dev/md127p1
UUID
fb273c6e-93be-4d15-bc11-973f6d20148b

The following is an example of a /etc/fstab entry for the partition on the RAID disk array:

UUID=fb273c6e-93be-4d15-bc11-973f6d20148b /mnt/data ext4 defaults,nofail,noatime 0 0

Note that the entry must be on a single line.

This completes the process of:

  • creating an mdadm RAID0 disk array on two raw disk devices
  • creating a file system on the RAID0 disk array
  • mounting the file system

Azure File Service

Microsoft has released a preview of the Azure File Service, which provides a managed file system that is exposed securely through service endpoint as an SMB 2.1 share. The Azure File Service is built on the same technology as the other features of the Azure Storage Service. The logical file system can be mounted through the SMB share in an Azure VM and accessed just like any other file system. Furthermore, the SMB share can be mounted simultaneously into different VMs which allows for the sharing of files among different VMs. The Azure File Service therefore provides an alternative way to access durable storage from inside an Azure VM.

The Azure File Service has the following scalability targets:

  • Maximum size of a file share: 5TB
  • Maximum size of a single file: 1TB
  • Throughput (8KB operations): 1000 IOPS
  • Throughput: 60 MB/s per share

The Azure File Service can be used in the Ubuntu images and CentOS 7 images in the Azure Gallery. However, during the preview, the Azure File Service must be managed through PowerShell cmdlets. The cifs package provides an SMB client which can be installed as follows:

Ubuntu:

# apt-get install cifs-utils

CentOS 7:

# yum install cifs-utils

The standard mount command can be used to mount the SMB share into the VM file system. For example, with ACCOUNT_NAME and ACCESS_KEY being the Azure Storage account name and access key the following command mounts a share named SHARE into the specified directory:

# mount –t cifs //ACCOUNT_NAME.file.core.windows.net/SHARE /mnt/DIRECTORY -o vers=2.1,username=ACCOUNT_NAME,password=ACCESS_KEY,dir_mode=0777,file_mode=0777

Note that, similarly to the other file systems, an /etc/fstab entry is needed to ensure that the file system is mounted when the system reboots. For example, the following would be the equivalent (single line) /etc/fstab entry for the above example:

//ACCOUNT_NAME.file.core.windows.net/SHARE /mnt/DIRECTORY cifs vers=2.1,dir_mode=0777,file_mode=0777,username=ACCOUNT_NAME,password=ACCESS_KEY

An SMB share named my-share hosted in a storage account named ACCOUNT_NAME is displayed in a file system listing as follows:

# sudo df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 29G 1.2G 27G 5% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 826M 12K 826M 1% /dev
tmpfs 168M 408K 168M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 840M 0 840M 0% /run/shm
none 100M 0 100M 0% /run/user
/dev/sdb1 69G 52M 66G 1% /mnt
/dev/md127p1 99G 60M 94G 1% /mnt/data
//ACCOUNT_NAME.file.core.windows.net/my-share 5.0T 3.7M 5.0T 1% /mnt/azure-files

Files

The following files and directories may be of use when working with disks, disk arrays and partitions:

  • /etc/fstab – provides file system configuration
  • /proc/filesystems – lists file systems that can be created on the VM
  • /proc/mdstat – lists raid arrays
  • /proc/mounts – lists mounted file systems
  • /dev/disk/by-uuid – lists disks by UUID
  • /dev/disk/by-label – lists disks by label
  • /dev/disk/by-id – lists disks by IDs

Summary

Azure Virtual Machines supports the ability to attach up to 16 1TB disks on an Azure VM. These disks are VHDs backed by page blobs in the Azure Storage Service. This makes them durable with an existence that transcends the existence of VMs to which they are attached. When multiple disks are attached to a VM they may be treated as a JBOD or combined into a RAID0 disk array. By aggregating the performance of individual disks the latter can provide significantly improved storage performance for data-intensive applications.

The Azure File Service is a preview of a managed service which provides a logical file system exposed through an SMB 2.1 endpoint. This file system can be mounted into Ubuntu (and Windows Server) VMs providing an alternative way to access persistent files. Furthermore, the Azure File Service supports the ability to mount the same logical file system into multiple VMs simultaneously making it easy to share files among several VMs.

In any data-intensive application running on Azure it is crucial that a representative application workload be tested to identify the most appropriate disk topology for the application.  This may involve using a larger number of smaller disks or using several Azure storage accounts.

About Neil Mackenzie

Cloud Solutions Architect. Microsoft
This entry was posted in Azure, Linux, Storage Service, Virtual Machines and tagged , , . Bookmark the permalink.

8 Responses to Disk Storage on Linux VMs in Azure

  1. Axl says:

    Good post. Thanks. One error though… The entry to go into fstab should have a double leading forward slash (slant) preceding the source.

    ACCOUNT_NAME.file.core.windows.net/SHARE /mnt/DIRECTORY cifs …

    Should be:

    //ACCOUNT_NAME.file.core.windows.net/SHARE /mnt/DIRECTORY cifs …

  2. DC says:

    How do I add a file service for my storage account in Azure? I can’t seem to create one with the steps/commands you provided. Thanks. I’m adding the storage account into an existing Ubuntu VM instance.

  3. Sascha Gottfried says:

    Very good post. Will add this post to my Azure documentation.

  4. Rob Donovan says:

    Hi,

    Nice article about Azure storage, thanks for that.

    However, I’m a bit confused about what I’m being billed, and regarding TRIM.

    I have a Ubuntu (3.13.0-39-generic) VM, and my vhds is 29.2GB, but I’m only being billed for 23.9GB (from MS billing and I also used this program http://fabriccontroller.net/blog/posts/calculating-how-much-space-a-windows-azure-disk-is-really-using/).

    However, I cant figure out why 23.9GB is used/billed.

    I think the thing that is causing ‘problems’ might be that I mount my own internal file systems with loop devices, but they are to file containers that are sparse file types, so I would expect the data taken up by deleted blocks to be freed (as a df on the root system shows).

    I have ‘discard’ set in fstab on both the root and my loop fs.

    I’ve manually run fstrim on all my loop fs, and it said it reclaimed some space, but it didnt effect the 23.9GB figure.

    I’ve tried to force the root system to clear with fstrim, but I get an error using it on the root, “fstrim: /mnt: FITRIM ioctl failed: Operation not supported”

    Not sure, if I can release this ‘extra’ space used or not, and wondered if you had any thoughts.

    My df, which shows that the root is only 6.4GB used (Because the loop fs are sparse), so I would expect MS to bill for 6.4GB:

    Filesystem Size Used Avail Use% Mounted on
    /dev/sda1 29G 6.4G 22G 23% /
    none 4.0K 0 4.0K 0% /sys/fs/cgroup
    udev 331M 8.0K 331M 1% /dev
    tmpfs 68M 388K 67M 1% /run
    none 5.0M 0 5.0M 0% /run/lock
    none 336M 0 336M 0% /run/shm
    none 100M 0 100M 0% /run/user
    /dev/loop0 969M 140M 763M 16% /home/system/www/site1
    /dev/loop1 969M 4.5M 898M 1% /home/system/mysql/site1
    /dev/loop2 969M 167M 737M 19% /home/system/www/site2
    /dev/loop3 969M 3.7M 899M 1% /home/system/mysql/site2
    /dev/loop4 969M 146M 757M 17% /home/system/www/site3
    /dev/loop5 969M 45M 858M 5% /home/system/mysql/site3
    /dev/loop6 969M 184M 719M 21% /home/system/www/site4
    /dev/loop7 969M 53M 850M 6% /home/system/mysql/site5
    /dev/sdb1 20G 2.1G 17G 12% /mnt

    The actual sparse files for the loop fs:
    -rw——- 1 root root 1048576000 Nov 14 09:57 /home/system/filesys/site1_mysql.img
    -rw——- 1 root root 1048576000 Nov 14 10:05 /home/system/filesys/site1_www.img
    -rw——- 1 root root 1048576000 Nov 14 10:02 /home/system/filesys/site2_mysql.img
    -rw——- 1 root root 1048576000 Nov 14 10:05 /home/system/filesys/site2_www.img
    -rw——- 1 root root 1048576000 Nov 14 09:06 /home/system/filesys/site3_mysql.img
    -rw——- 1 root root 1048576000 Nov 14 10:05 /home/system/filesys/site3_www.img
    -rw——- 1 root root 1048576000 Nov 14 10:05 /home/system/filesys/site4_mysql.img
    -rw——- 1 root root 1048576000 Nov 14 10:05 /home/system/filesys/site4_www.img

    fstab:
    # CLOUD_IMG: This file was created/modified by the Cloud Image build process
    UUID=xxxxxxxxx(removed real id) / ext4 defaults,discard 0 0
    /home/system/filesys/site1_www.img /home/system/www/site1 ext4 defaults,discard,loop 0 0
    /home/system/filesys/site1_mysql.img /home/system/mysql/site1 ext4 defaults,discard,loop 0 0
    /home/system/filesys/site2_www.img /home/system/www/site2 ext4 defaults,discard,loop 0 0
    /home/system/filesys/site2_mysql.img /home/system/mysql/site2 ext4 defaults,discard,loop 0 0
    /home/system/filesys/site3_www.img /home/system/www/site3 ext4 defaults,discard,loop 0 0
    /home/system/filesys/site3_mysql.img /home/system/mysql/site3 ext4 defaults,discard,loop 0 0
    /home/system/filesys/site4_www.img /home/system/www/site4 ext4 defaults,discard,loop 0 0
    /home/system/filesys/site4_mysql.img /home/system/mysql/site4 ext4 defaults,discard,loop 0 0
    /dev/sdb1 /mnt auto defaults,nobootwait,comment=cloudconfig 0 2
    /mnt/swapfile none swap sw 0 0

    Thanks,

    Rob.

    • It could be that TRIM is not supported on the Ubuntu version you are using. Sandrino Di Mattia’s utility is reporting that you are using 23.9GB. Without TRIM support the data used in Azure Storage by a file is not cleared (and hence remains billable) when the file is deleted. If you completely fill a disk and then delete all the files you will still be billed since you are still occupying space in the Storage Service even though you have no files. TRIM clears the space in the Storage Service so that you are no longer billed for it. However, it is also worth remembering that it costs $1.25 a month for a 25GB page blob with LRS so there may be more effective ways to save money.

  5. Ashwin says:

    This is a great article, thanks. I was wondering if it is possible in Azure to create a VHD and have several in Azure access it. In other words I have 4 linux instances and I want each one to access the same drive so I can share data across each one of them. Is this possible?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s