1# Protected Virtual Machine Firmware 2 3In the context of the [Android Virtualization Framework][AVF], a hypervisor 4(_e.g._ [pKVM]) enforces full memory isolation between its virtual machines 5(VMs) and the host. As a result, the host is only allowed to access memory that 6has been explicitly shared back by a VM. Such _protected VMs_ (“pVMs”) are 7therefore able to manipulate secrets without being at risk of an attacker 8stealing them by compromising the Android host. 9 10As pVMs are started dynamically by a _virtual machine manager_ (“VMM”) running 11as a host process and as pVMs must not trust the host (see [_Why 12AVF?_][why-avf]), the virtual machine it configures can't be trusted either. 13Furthermore, even though the isolation mentioned above allows pVMs to protect 14their secrets from the host, it does not help with provisioning them during 15boot. In particular, the threat model would prohibit the host from ever having 16access to those secrets, preventing the VMM from passing them to the pVM. 17 18To address these concerns the hypervisor securely loads the pVM firmware 19(“pvmfw”) in the pVM from a protected memory region (this prevents the host or 20any pVM from tampering with it), setting it as the entry point of the virtual 21machine. As a result, pvmfw becomes the very first code that gets executed in 22the pVM, allowing it to validate the environment and abort the boot sequence if 23necessary. This process takes place whenever the VMM places a VM in protected 24mode and can’t be prevented by the host. 25 26Given the threat model, pvmfw is not allowed to trust the devices or device 27layout provided by the virtual platform it is running on as those are configured 28by the VMM. Instead, it performs all the necessary checks to ensure that the pVM 29was set up as expected. For functional purposes, the interface with the 30hypervisor, although trusted, is also validated. 31 32Once it has been determined that the platform can be trusted, pvmfw derives 33unique secrets for the guest through the [_DICE Chain_][android-dice] (see 34[Open Profile for DICE][open-dice]) that can be used to prove the identity of 35the pVM to local and remote actors. If any operation or check fails, or in case 36of a missing prerequisite, pvmfw will abort the boot process of the pVM, 37effectively preventing non-compliant pVMs and/or guests from running. 38Otherwise, it hands over the pVM to the guest kernel by jumping to its first 39instruction, similarly to a bootloader. 40 41pvmfw currently only supports AArch64. 42 43[AVF]: https://source.android.com/docs/core/virtualization 44[why-avf]: https://source.android.com/docs/core/virtualization/whyavf 45[android-dice]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/android.md 46[pKVM]: https://source.android.com/docs/core/virtualization/architecture#hypervisor 47[open-dice]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md 48 49## Integration 50 51### pvmfw Loading 52 53When running pKVM, the physical memory from which the hypervisor loads pvmfw 54into guest address space is not initially populated by the hypervisor itself. 55Instead, it receives a pre-loaded memory region from a trusted pvmfw loader and 56only then becomes responsible for protecting it. As a result, the hypervisor is 57kept generic (beyond AVF) and small as it is not expected (nor necessary) for it 58to know how to interpret or obtain the content of that region. 59 60#### Android Bootloader (ABL) Support 61 62Starting in Android T, the `PRODUCT_BUILD_PVMFW_IMAGE` build variable controls 63the generation of `pvmfw.img`, a new [ABL partition][ABL-part] containing the 64pvmfw binary (sometimes called "`pvmfw.bin`") and following the internal format 65of the [`boot`][boot-img] partition, intended to be verified and loaded by ABL 66on AVF-compatible devices. 67 68Once ABL has verified the `pvmfw.img` chained static partition, the contained 69[`boot.img` header][boot-img] may be used to obtain the size of the `pvmfw.bin` 70image (recorded in the `kernel_size` field), as it already does for the kernel 71itself. In accordance with the header format, the `kernel_size` bytes of the 72partition following the header will be the `pvmfw.bin` image. 73 74Note that when it gets executed in the context of a pVM, `pvmfw` expects to have 75been loaded at 4KiB-aligned intermediate physical address (IPA) so if ABL loads 76the `pvmfw.bin` image without respecting this alignment, it is the 77responsibility of the hypervisor to either reject the image or copy it into 78guest address space with the right alignment. 79 80To support pKVM, ABL is expected to describe the region using a reserved memory 81device tree node where both address and size have been properly aligned to the 82page size used by the hypervisor. This single region must include both the pvmfw 83binary image and its configuration data (see below). For example, the following 84node describes a region of size `0x40000` at address `0x80000000`: 85``` 86reserved-memory { 87 ... 88 pkvm_guest_firmware { 89 compatible = "linux,pkvm-guest-firmware-memory"; 90 reg = <0x0 0x80000000 0x40000>; 91 no-map; 92 } 93} 94``` 95 96[ABL-part]: https://source.android.com/docs/core/architecture/bootloader/partitions 97[boot-img]: https://source.android.com/docs/core/architecture/bootloader/boot-image-header 98 99### Configuration Data 100 101As part of the process of loading pvmfw, the loader (typically the Android 102Bootloader, "ABL") is expected to pass device-specific pvmfw configuration data 103by appending it to the pvmfw binary and including it in the region passed to the 104hypervisor. As a result, the hypervisor will give the same protection to this 105data as it does to pvmfw and will transparently load it in guest memory, making 106it available to pvmfw at runtime. This enables pvmfw to be kept device-agnostic, 107simplifying its adoption and distribution as a centralized signed binary, while 108also being able to support device-specific details. 109 110The configuration data will be read by pvmfw at the next 4KiB boundary from the 111end of its loaded binary. Even if the pvmfw is position-independent, it will be 112expected for it to also have been loaded at a 4-KiB boundary. As a result, the 113location of the configuration data is implicitly passed to pvmfw and known to it 114at build time. 115 116#### Configuration Data Format 117 118The configuration data is described using the following [header]: 119 120``` 121+===============================+ 122| pvmfw.bin | 123+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ 124| (Padding to 4KiB alignment) | 125+===============================+ <-- HEAD 126| Magic (= 0x666d7670) | 127+-------------------------------+ 128| Version | 129+-------------------------------+ 130| Total Size = (TAIL - HEAD) | 131+-------------------------------+ 132| Flags | 133+-------------------------------+ 134| [Entry 0] | 135| offset = (FIRST - HEAD) | 136| size = (FIRST_END - FIRST) | 137+-------------------------------+ 138| [Entry 1] | 139| offset = (SECOND - HEAD) | 140| size = (SECOND_END - SECOND) | 141+-------------------------------+ 142| [Entry 2] | <-- Entry 2 is present since version 1.1 143| offset = (THIRD - HEAD) | 144| size = (THIRD_END - THIRD) | 145+-------------------------------+ 146| [Entry 3] | <-- Entry 3 is present since version 1.2 147| offset = (FOURTH - HEAD) | 148| size = (FOURTH_END - FOURTH) | 149+-------------------------------+ 150| ... | 151+-------------------------------+ 152| [Entry n] | 153+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ 154| (Padding to 8-byte alignment) | 155+===============================+ <-- FIRST 156| {First blob: DICE chain} | 157+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- FIRST_END 158| (Padding to 8-byte alignment) | 159+===============================+ <-- SECOND 160| {Second blob: DP} | 161+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- SECOND_END 162| (Padding to 8-byte alignment) | 163+===============================+ <-- THIRD 164| {Third blob: VM DTBO} | 165+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- THIRD_END 166| (Padding to 8-byte alignment) | 167+===============================+ <-- FOURTH 168| {Fourth blob: VM reference DT}| 169+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- FOURTH_END 170| (Padding to 8-byte alignment) | 171+===============================+ 172| ... | 173+===============================+ <-- TAIL 174``` 175 176Where the version number is encoded using a "`major.minor`" as follows 177 178``` 179((major << 16) | (minor & 0xffff)) 180``` 181 182and defines the format of the header (which may change between major versions), 183its size and, in particular, the expected number of appended blobs. Each blob is 184referred to by its offset in the entry array and may be mandatory or optional 185(as defined by this specification), where missing entries are denoted by a zero 186size. It is therefore not allowed to trim missing optional entries from the end 187of the array. The header uses the endianness of the virtual machine. 188 189The header format itself is agnostic of the internal format of the individual 190blos it refers to. In version 1.0, it describes two blobs: 191 192- entry 0 must point to a valid DICE chain handover (see below) 193- entry 1 may point to a [DTBO] to be applied to the pVM device tree. See 194 [debug policy][debug_policy] for an example. 195 196In version 1.1, a third blob is added. 197 198- entry 2 may point to a [DTBO] that describes VM DA DTBO for 199 [device assignment][device_assignment]. 200 pvmfw will provision assigned devices with the VM DTBO. 201 202In version 1.2, a fourth blob is added. 203 204- entry 3 if present contains the VM reference DT. This defines properties that 205 may be included in the device tree passed to a protected VM. pvmfw validates 206 that if any of these properties is included in the VM's device tree, the 207 property value exactly matches what is in the VM reference DT. 208 209 The bootloader should ensure that the same properties, with the same values, 210 are added under the "/avf/reference" node in the host Android device tree. 211 212 This provides a mechanism to allow configuration information to be securely 213 passed to the VM via the host. pvmfw does not interpret the content of VM 214 reference DT, nor does it apply it to the VM's device tree, it just ensures 215 that if matching properties are present in the VM device tree they contain the 216 correct values. 217 218 Use-cases of VM reference DT include: 219 220 - Passing the [public key of the Secretkeeper][secretkeeper_key] HAL 221 implementation to each VM. 222 223 - Passing the [vendor hashtree digest][vendor_hashtree_digest] to run 224 Microdroid with verified vendor image. 225 226[header]: src/config.rs 227[DTBO]: https://android.googlesource.com/platform/external/dtc/+/refs/heads/main/Documentation/dt-object-internal.txt 228[debug_policy]: ../docs/debug/README.md#debug-policy 229[device_assignment]: ../docs/device_assignment.md 230[secretkeeper_key]: https://android.googlesource.com/platform/system/secretkeeper/+/refs/heads/main/README.md#secretkeeper-public-key 231[vendor_hashtree_digest]: ../microdroid/README.md#verification-of-vendor-image 232 233#### Virtual Platform DICE Chain Handover 234 235The format of the DICE chain entry mentioned above, compatible with the 236[`AndroidDiceHandover`][AndroidDiceHandover] defined by the Open Profile for 237DICE reference implementation, is described by the following [CDDL][CDDL]: 238``` 239PvmfwDiceHandover = { 240 1 : bstr .size 32, ; CDI_Attest 241 2 : bstr .size 32, ; CDI_Seal 242 3 : DiceCertChain, ; Android DICE chain 243} 244``` 245 246and contains the _Compound Device Identifiers_ ("CDIs"), used to derive the 247next-stage secret, and a certificate chain, intended for pVM attestation. Note 248that it differs from the `AndroidDiceHandover` defined by the specification in 249that its `DiceCertChain` field is mandatory (while optional in the original). 250 251Devices that fully implement DICE should provide a certificate rooted at the 252Unique Device Secret (UDS) in a boot stage preceding the pvmfw loader (typically 253ABL), in such a way that it would receive a valid `AndroidDiceHandover`, that 254can be passed to [`DiceAndroidHandoverMainFlow`][DiceAndroidHandoverMainFlow] along with 255the inputs described below. 256 257Otherwise, as an intermediate step towards supporting DICE throughout the 258software stack of the device, incomplete implementations may root the DICE chain 259at the pvmfw loader, using an arbitrary constant as initial CDI. The pvmfw 260loader can easily do so by: 261 2621. Building an "empty" `AndroidDiceHandover` using CBOR operations only 263 containing constant CDIs ([example][Trusty-BCC]) 2641. Passing the resulting `AndroidDiceHandover` to `DiceAndroidHandoverMainFlow` 265 as described above 266 267The recommended DICE inputs at this stage are: 268 269- **Code**: hash of the pvmfw image, hypervisor (`boot.img`), and other target 270 code relevant to the secure execution of pvmfw (_e.g._ `vendor_boot.img`) 271- **Configuration Data**: any extra input relevant to pvmfw security 272- **Authority Data**: must cover all the public keys used to sign and verify the 273 code contributing to the **Code** input 274- **Mode Decision**: Set according to the [specification][dice-mode]. In 275 particular, should only be `Normal` if secure boot is being properly enforced 276 (_e.g._ locked device in [Android Verified Boot][AVB]) 277- **Hidden Inputs**: Factory Reset Secret (FRS, stored in a tamper evident 278 storage and changes during every factory reset) or similar that changes as 279 part of the device lifecycle (_e.g._ reset) 280 281The resulting `AndroidDiceHandover` is then used by pvmfw in a similar way to 282derive another [DICE layer][Layering], passed to the guest through a 283`/reserved-memory` device tree node marked as 284[`compatible=”google,open-dice”`][dice-dt]. 285 286[AVB]: https://source.android.com/docs/security/features/verifiedboot/boot-flow 287[AndroidDiceHandover]: https://pigweed.googlesource.com/open-dice/+/42ae7760023/src/android.c#212 288[DiceAndroidHandoverMainFlow]: https://pigweed.googlesource.com/open-dice/+/42ae7760023/src/android.c#221 289[CDDL]: https://datatracker.ietf.org/doc/rfc8610 290[dice-mode]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md#Mode-Value-Details 291[dice-dt]: https://www.kernel.org/doc/Documentation/devicetree/bindings/reserved-memory/google%2Copen-dice.yaml 292[Layering]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md#layering-details 293[Trusty-BCC]: https://android.googlesource.com/trusty/lib/+/1696be0a8f3a7103/lib/hwbcc/common/swbcc.c#554 294 295### Platform Requirements 296 297pvmfw is intended to run in a virtualized environment according to the `crosvm` 298[memory layout][crosvm-mem] for protected VMs and so it expects to have been 299loaded at address `0x7fc0_0000` and uses the 2MiB region at address 300`0x7fe0_0000` as scratch memory. It makes use of the virtual PCI bus to obtain a 301virtio interface to the host and prints its logs through the 16550 UART (address 302`0x3f8`). 303 304At boot, pvmfw discovers the running hypervisor in order to select the 305appropriate hypervisor calls to share/unshare memory, mark IPA regions as MMIO, 306obtain trusted true entropy, and reboot the virtual machine. In particular, it 307makes use of the following hypervisor calls: 308 309- Arm [SMC Calling Convention][smccc] v1.1 or above: 310 311 - `SMCCC_VERSION` 312 - Vendor Specific Hypervisor Service Call UID Query 313 314- Arm [Power State Coordination Interface][psci] v1.0 or above: 315 316 - `PSCI_VERSION` 317 - `PSCI_FEATURES` 318 - `PSCI_SYSTEM_RESET` 319 - `PSCI_SYSTEM_SHUTDOWN` 320 321- Arm [True Random Number Generator Firmware Interface][smccc-trng] v1.0: 322 323 - `TRNG_VERSION` 324 - `TRNG_FEATURES` 325 - `TRNG_RND` 326 327- When running under KVM, the pKVM-specific hypervisor interface must provide: 328 329 - `MEMINFO` (function ID `0xc6000002`) 330 - `MEM_SHARE` (function ID `0xc6000003`) 331 - `MEM_UNSHARE` (function ID `0xc6000004`) 332 - `MMIO_GUARD_INFO` (function ID `0xc6000005`) 333 - `MMIO_GUARD_ENROLL` (function ID `0xc6000006`) 334 - `MMIO_GUARD_MAP` (function ID `0xc6000007`) 335 - `MMIO_GUARD_UNMAP` (function ID `0xc6000008`) 336 337[crosvm-mem]: https://crosvm.dev/book/appendix/memory_layout.html 338[psci]: https://developer.arm.com/documentation/den0022 339[smccc]: https://developer.arm.com/documentation/den0028 340[smccc-trng]: https://developer.arm.com/documentation/den0098 341 342## Booting Protected Virtual Machines 343 344### Boot Protocol 345 346As the hypervisor makes pvmfw the entry point of the VM, the initial value of 347the registers it receives is configured by the VMM and is expected to follow the 348[Linux ABI] _i.e._ 349 350- x0 = physical address of device tree blob (dtb) in system RAM. 351- x1 = 0 (reserved for future use) 352- x2 = 0 (reserved for future use) 353- x3 = 0 (reserved for future use) 354 355Images to be verified, which have been loaded to guest memory by the VMM prior 356to booting the VM, are described to pvmfw using the device tree (x0): 357 358- the kernel in the `/config` DT node _e.g._ 359 360 ``` 361 / { 362 config { 363 kernel-address = <0x80200000>; 364 kernel-size = <0x1000000>; 365 }; 366 }; 367 ```` 368 369- the (optional) ramdisk in the standard `/chosen` node _e.g._ 370 371 ``` 372 / { 373 chosen { 374 linux,initrd-start = <0x82000000>; 375 linux,initrd-end = <0x82800000>; 376 }; 377 }; 378 ``` 379 380[Linux ABI]: https://www.kernel.org/doc/Documentation/arm64/booting.txt 381 382### Handover ABI 383 384After verifying the guest kernel, pvmfw boots it using the Linux ABI described 385above. It uses the device tree to pass the following: 386 387- a reserved memory node containing the produced DICE chain: 388 389 ``` 390 / { 391 reserved-memory { 392 #address-cells = <0x02>; 393 #size-cells = <0x02>; 394 ranges; 395 dice { 396 compatible = "google,open-dice"; 397 no-map; 398 reg = <0x0 0x7fe0000>, <0x0 0x1000>; 399 }; 400 }; 401 }; 402 ``` 403 404- the `/chosen/avf,new-instance` flag, set when pvmfw generated a new secret 405 (_i.e._ the pVM instance was booted for the first time). This should be used 406 by the next stages to ensure that an attacker isn't trying to force new 407 secrets to be generated by one stage, in isolation; 408 409- the `/chosen/avf,strict-boot` flag, always set and can be used by guests to 410 enable extra validation 411 412### Guest Image Signing 413 414pvmfw verifies the guest kernel image (loaded by the VMM) by re-using tools and 415formats introduced by the Android Verified Boot. In particular, it expects the 416kernel region (see `/config/kernel-{address,size}` described above) to contain 417an appended VBMeta structure, which can be generated as follows: 418 419``` 420avbtool add_hash_footer --image <kernel.bin> \ 421 --partition_name boot \ 422 --dynamic_partition_size \ 423 --key $KEY 424``` 425 426In cases where a ramdisk is required by the guest, pvmfw must also verify it. To 427do so, it must be covered by a hash descriptor in the VBMeta of the kernel: 428 429``` 430cp <initrd.bin> /tmp/ 431avbtool add_hash_footer --image /tmp/<initrd.bin> \ 432 --partition_name $INITRD_NAME \ 433 --dynamic_partition_size \ 434 --key $KEY 435avbtool add_hash_footer --image <kernel.bin> \ 436 --partition_name boot \ 437 --dynamic_partition_size \ 438 --include_descriptor_from_image /tmp/<initrd.bin> \ 439 --key $KEY 440``` 441 442Note that the `/tmp/<initrd.bin>` file is only created to temporarily hold the 443hash descriptor to be added to the kernel footer and that the unsigned 444`<initrd.bin>` should be passed to the VMM when booting a pVM. 445 446The name of the AVB "partition" for the ramdisk (`$INITRD_NAME`) can be used by 447the signer to specify if pvmfw must consider the guest to be debuggable 448(`initrd_debug`) or not (`initrd_normal`), which will be reflected in the 449certificate of the guest and will affect the secrets being provisioned. 450 451If pVM guest kernels are built and/or packaged using the Android Build system, 452the signing described above is recommended to be done through an 453`avb_add_hash_footer` Soong module (see [how we sign the Microdroid 454kernel][soong-udroid]). 455 456[soong-udroid]: https://cs.android.com/android/platform/superproject/main/+/main:packages/modules/Virtualization/microdroid/Android.bp;l=425;drc=b94a5cf516307c4279f6c16a63803527a8affc6d 457 458## Development 459 460For faster iteration, you can build pvmfw, adb-push it to the device, and use 461it directly for a new pVM, without having to flash it to the physical 462partition. To do that, the binary image composition performed by ABL described 463above must be replicated to produce a single file containing the pvmfw binary 464and its configuration data. 465 466As a quick prototyping solution, a valid DICE chain (such as this [test 467file][bcc.dat]) can be appended to the `pvmfw.bin` image with `pvmfw-tool`. 468 469```shell 470m pvmfw-tool pvmfw_bin 471PVMFW_BIN=${ANDROID_PRODUCT_OUT}/system/etc/pvmfw.bin 472DICE=${ANDROID_BUILD_TOP}/packages/modules/Virtualization/tests/pvmfw/assets/bcc.dat 473 474pvmfw-tool custom_pvmfw ${PVMFW_BIN} ${DICE} 475``` 476 477The result can then be pushed to the device. Pointing the system property 478`hypervisor.pvmfw.path` to it will cause AVF to use that image as pvmfw: 479 480```shell 481adb push custom_pvmfw /data/local/tmp/pvmfw 482adb root 483adb shell setprop hypervisor.pvmfw.path /data/local/tmp/pvmfw 484``` 485 486Then run a protected VM, for example: 487 488```shell 489adb shell /apex/com.android.virt/bin/vm run-microdroid --protected 490``` 491 492Note: `adb root` is required to set the system property. 493 494[bcc.dat]: https://cs.android.com/android/platform/superproject/main/+/main:packages/modules/Virtualization/tests/pvmfw/assets/bcc.dat 495