1# Protected Virtual Machine Firmware
2
3In the context of the [Android Virtualization Framework][AVF], a hypervisor
4(_e.g._ [pKVM]) enforces full memory isolation between its virtual machines
5(VMs) and the host.  As a result, the host is only allowed to access memory that
6has been explicitly shared back by a VM. Such _protected VMs_ (“pVMs”) are
7therefore able to manipulate secrets without being at risk of an attacker
8stealing them by compromising the Android host.
9
10As pVMs are started dynamically by a _virtual machine manager_ (“VMM”) running
11as a host process and as pVMs must not trust the host (see [_Why
12AVF?_][why-avf]), the virtual machine it configures can't be trusted either.
13Furthermore, even though the isolation mentioned above allows pVMs to protect
14their secrets from the host, it does not help with provisioning them during
15boot. In particular, the threat model would prohibit the host from ever having
16access to those secrets, preventing the VMM from passing them to the pVM.
17
18To address these concerns the hypervisor securely loads the pVM firmware
19(“pvmfw”) in the pVM from a protected memory region (this prevents the host or
20any pVM from tampering with it), setting it as the entry point of the virtual
21machine. As a result, pvmfw becomes the very first code that gets executed in
22the pVM, allowing it to validate the environment and abort the boot sequence if
23necessary. This process takes place whenever the VMM places a VM in protected
24mode and can’t be prevented by the host.
25
26Given the threat model, pvmfw is not allowed to trust the devices or device
27layout provided by the virtual platform it is running on as those are configured
28by the VMM. Instead, it performs all the necessary checks to ensure that the pVM
29was set up as expected. For functional purposes, the interface with the
30hypervisor, although trusted, is also validated.
31
32Once it has been determined that the platform can be trusted, pvmfw derives
33unique secrets for the guest through the [_DICE Chain_][android-dice] (see
34[Open Profile for DICE][open-dice]) that can be used to prove the identity of
35the pVM to local and remote actors. If any operation or check fails, or in case
36of a missing prerequisite, pvmfw will abort the boot process of the pVM,
37effectively preventing non-compliant pVMs and/or guests from running.
38Otherwise, it hands over the pVM to the guest kernel by jumping to its first
39instruction, similarly to a bootloader.
40
41pvmfw currently only supports AArch64.
42
43[AVF]: https://source.android.com/docs/core/virtualization
44[why-avf]: https://source.android.com/docs/core/virtualization/whyavf
45[android-dice]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/android.md
46[pKVM]: https://source.android.com/docs/core/virtualization/architecture#hypervisor
47[open-dice]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md
48
49## Integration
50
51### pvmfw Loading
52
53When running pKVM, the physical memory from which the hypervisor loads pvmfw
54into guest address space is not initially populated by the hypervisor itself.
55Instead, it receives a pre-loaded memory region from a trusted pvmfw loader and
56only then becomes responsible for protecting it. As a result, the hypervisor is
57kept generic (beyond AVF) and small as it is not expected (nor necessary) for it
58to know how to interpret or obtain the content of that region.
59
60#### Android Bootloader (ABL) Support
61
62Starting in Android T, the `PRODUCT_BUILD_PVMFW_IMAGE` build variable controls
63the generation of `pvmfw.img`, a new [ABL partition][ABL-part] containing the
64pvmfw binary (sometimes called "`pvmfw.bin`") and following the internal format
65of the [`boot`][boot-img] partition, intended to be verified and loaded by ABL
66on AVF-compatible devices.
67
68Once ABL has verified the `pvmfw.img` chained static partition, the contained
69[`boot.img` header][boot-img] may be used to obtain the size of the `pvmfw.bin`
70image (recorded in the `kernel_size` field), as it already does for the kernel
71itself. In accordance with the header format, the `kernel_size` bytes of the
72partition following the header will be the `pvmfw.bin` image.
73
74Note that when it gets executed in the context of a pVM, `pvmfw` expects to have
75been loaded at 4KiB-aligned intermediate physical address (IPA) so if ABL loads
76the `pvmfw.bin` image without respecting this alignment, it is the
77responsibility of the hypervisor to either reject the image or copy it into
78guest address space with the right alignment.
79
80To support pKVM, ABL is expected to describe the region using a reserved memory
81device tree node where both address and size have been properly aligned to the
82page size used by the hypervisor. This single region must include both the pvmfw
83binary image and its configuration data (see below). For example, the following
84node describes a region of size `0x40000` at address `0x80000000`:
85```
86reserved-memory {
87    ...
88    pkvm_guest_firmware {
89        compatible = "linux,pkvm-guest-firmware-memory";
90        reg = <0x0 0x80000000 0x40000>;
91        no-map;
92    }
93}
94```
95
96[ABL-part]: https://source.android.com/docs/core/architecture/bootloader/partitions
97[boot-img]: https://source.android.com/docs/core/architecture/bootloader/boot-image-header
98
99### Configuration Data
100
101As part of the process of loading pvmfw, the loader (typically the Android
102Bootloader, "ABL") is expected to pass device-specific pvmfw configuration data
103by appending it to the pvmfw binary and including it in the region passed to the
104hypervisor. As a result, the hypervisor will give the same protection to this
105data as it does to pvmfw and will transparently load it in guest memory, making
106it available to pvmfw at runtime. This enables pvmfw to be kept device-agnostic,
107simplifying its adoption and distribution as a centralized signed binary, while
108also being able to support device-specific details.
109
110The configuration data will be read by pvmfw at the next 4KiB boundary from the
111end of its loaded binary. Even if the pvmfw is position-independent, it will be
112expected for it to also have been loaded at a 4-KiB boundary. As a result, the
113location of the configuration data is implicitly passed to pvmfw and known to it
114at build time.
115
116#### Configuration Data Format
117
118The configuration data is described using the following [header]:
119
120```
121+===============================+
122|          pvmfw.bin            |
123+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
124|  (Padding to 4KiB alignment)  |
125+===============================+ <-- HEAD
126|      Magic (= 0x666d7670)     |
127+-------------------------------+
128|           Version             |
129+-------------------------------+
130|   Total Size = (TAIL - HEAD)  |
131+-------------------------------+
132|            Flags              |
133+-------------------------------+
134|           [Entry 0]           |
135|  offset = (FIRST - HEAD)      |
136|  size = (FIRST_END - FIRST)   |
137+-------------------------------+
138|           [Entry 1]           |
139|  offset = (SECOND - HEAD)     |
140|  size = (SECOND_END - SECOND) |
141+-------------------------------+
142|           [Entry 2]           | <-- Entry 2 is present since version 1.1
143|  offset = (THIRD - HEAD)      |
144|  size = (THIRD_END - THIRD)   |
145+-------------------------------+
146|           [Entry 3]           | <-- Entry 3 is present since version 1.2
147|  offset = (FOURTH - HEAD)     |
148|  size = (FOURTH_END - FOURTH) |
149+-------------------------------+
150|              ...              |
151+-------------------------------+
152|           [Entry n]           |
153+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
154| (Padding to 8-byte alignment) |
155+===============================+ <-- FIRST
156|   {First blob: DICE chain}    |
157+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- FIRST_END
158| (Padding to 8-byte alignment) |
159+===============================+ <-- SECOND
160|       {Second blob: DP}       |
161+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- SECOND_END
162| (Padding to 8-byte alignment) |
163+===============================+ <-- THIRD
164|     {Third blob: VM DTBO}     |
165+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- THIRD_END
166| (Padding to 8-byte alignment) |
167+===============================+ <-- FOURTH
168| {Fourth blob: VM reference DT}|
169+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ <-- FOURTH_END
170| (Padding to 8-byte alignment) |
171+===============================+
172|              ...              |
173+===============================+ <-- TAIL
174```
175
176Where the version number is encoded using a "`major.minor`" as follows
177
178```
179((major << 16) | (minor & 0xffff))
180```
181
182and defines the format of the header (which may change between major versions),
183its size and, in particular, the expected number of appended blobs. Each blob is
184referred to by its offset in the entry array and may be mandatory or optional
185(as defined by this specification), where missing entries are denoted by a zero
186size. It is therefore not allowed to trim missing optional entries from the end
187of the array. The header uses the endianness of the virtual machine.
188
189The header format itself is agnostic of the internal format of the individual
190blos it refers to. In version 1.0, it describes two blobs:
191
192- entry 0 must point to a valid DICE chain handover (see below)
193- entry 1 may point to a [DTBO] to be applied to the pVM device tree. See
194  [debug policy][debug_policy] for an example.
195
196In version 1.1, a third blob is added.
197
198- entry 2 may point to a [DTBO] that describes VM DA DTBO for
199  [device assignment][device_assignment].
200  pvmfw will provision assigned devices with the VM DTBO.
201
202In version 1.2, a fourth blob is added.
203
204- entry 3 if present contains the VM reference DT. This defines properties that
205  may be included in the device tree passed to a protected VM. pvmfw validates
206  that if any of these properties is included in the VM's device tree, the
207  property value exactly matches what is in the VM reference DT.
208
209  The bootloader should ensure that the same properties, with the same values,
210  are added under the "/avf/reference" node in the host Android device tree.
211
212  This provides a mechanism to allow configuration information to be securely
213  passed to the VM via the host. pvmfw does not interpret the content of VM
214  reference DT, nor does it apply it to the VM's device tree, it just ensures
215  that if matching properties are present in the VM device tree they contain the
216  correct values.
217
218  Use-cases of VM reference DT include:
219
220  - Passing the [public key of the Secretkeeper][secretkeeper_key] HAL
221    implementation to each VM.
222
223  - Passing the [vendor hashtree digest][vendor_hashtree_digest] to run
224    Microdroid with verified vendor image.
225
226[header]: src/config.rs
227[DTBO]: https://android.googlesource.com/platform/external/dtc/+/refs/heads/main/Documentation/dt-object-internal.txt
228[debug_policy]: ../docs/debug/README.md#debug-policy
229[device_assignment]: ../docs/device_assignment.md
230[secretkeeper_key]: https://android.googlesource.com/platform/system/secretkeeper/+/refs/heads/main/README.md#secretkeeper-public-key
231[vendor_hashtree_digest]: ../microdroid/README.md#verification-of-vendor-image
232
233#### Virtual Platform DICE Chain Handover
234
235The format of the DICE chain entry mentioned above, compatible with the
236[`AndroidDiceHandover`][AndroidDiceHandover] defined by the Open Profile for
237DICE reference implementation, is described by the following [CDDL][CDDL]:
238```
239PvmfwDiceHandover = {
240  1 : bstr .size 32,     ; CDI_Attest
241  2 : bstr .size 32,     ; CDI_Seal
242  3 : DiceCertChain,     ; Android DICE chain
243}
244```
245
246and contains the _Compound Device Identifiers_ ("CDIs"), used to derive the
247next-stage secret, and a certificate chain, intended for pVM attestation. Note
248that it differs from the `AndroidDiceHandover` defined by the specification in
249that its `DiceCertChain` field is mandatory (while optional in the original).
250
251Devices that fully implement DICE should provide a certificate rooted at the
252Unique Device Secret (UDS) in a boot stage preceding the pvmfw loader (typically
253ABL), in such a way that it would receive a valid `AndroidDiceHandover`, that
254can be passed to [`DiceAndroidHandoverMainFlow`][DiceAndroidHandoverMainFlow] along with
255the inputs described below.
256
257Otherwise, as an intermediate step towards supporting DICE throughout the
258software stack of the device, incomplete implementations may root the DICE chain
259at the pvmfw loader, using an arbitrary constant as initial CDI. The pvmfw
260loader can easily do so by:
261
2621. Building an "empty" `AndroidDiceHandover` using CBOR operations only
263   containing constant CDIs ([example][Trusty-BCC])
2641. Passing the resulting `AndroidDiceHandover` to `DiceAndroidHandoverMainFlow`
265   as described above
266
267The recommended DICE inputs at this stage are:
268
269- **Code**: hash of the pvmfw image, hypervisor (`boot.img`), and other target
270  code relevant to the secure execution of pvmfw (_e.g._ `vendor_boot.img`)
271- **Configuration Data**: any extra input relevant to pvmfw security
272- **Authority Data**: must cover all the public keys used to sign and verify the
273  code contributing to the **Code** input
274- **Mode Decision**: Set according to the [specification][dice-mode]. In
275  particular, should only be `Normal` if secure boot is being properly enforced
276  (_e.g._ locked device in [Android Verified Boot][AVB])
277- **Hidden Inputs**: Factory Reset Secret (FRS, stored in a tamper evident
278  storage and changes during every factory reset) or similar that changes as
279  part of the device lifecycle (_e.g._ reset)
280
281The resulting `AndroidDiceHandover` is then used by pvmfw in a similar way to
282derive another [DICE layer][Layering], passed to the guest through a
283`/reserved-memory` device tree node marked as
284[`compatible=”google,open-dice”`][dice-dt].
285
286[AVB]: https://source.android.com/docs/security/features/verifiedboot/boot-flow
287[AndroidDiceHandover]: https://pigweed.googlesource.com/open-dice/+/42ae7760023/src/android.c#212
288[DiceAndroidHandoverMainFlow]: https://pigweed.googlesource.com/open-dice/+/42ae7760023/src/android.c#221
289[CDDL]: https://datatracker.ietf.org/doc/rfc8610
290[dice-mode]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md#Mode-Value-Details
291[dice-dt]: https://www.kernel.org/doc/Documentation/devicetree/bindings/reserved-memory/google%2Copen-dice.yaml
292[Layering]: https://pigweed.googlesource.com/open-dice/+/refs/heads/main/docs/specification.md#layering-details
293[Trusty-BCC]: https://android.googlesource.com/trusty/lib/+/1696be0a8f3a7103/lib/hwbcc/common/swbcc.c#554
294
295### Platform Requirements
296
297pvmfw is intended to run in a virtualized environment according to the `crosvm`
298[memory layout][crosvm-mem] for protected VMs and so it expects to have been
299loaded at address `0x7fc0_0000` and uses the 2MiB region at address
300`0x7fe0_0000` as scratch memory. It makes use of the virtual PCI bus to obtain a
301virtio interface to the host and prints its logs through the 16550 UART (address
302`0x3f8`).
303
304At boot, pvmfw discovers the running hypervisor in order to select the
305appropriate hypervisor calls to share/unshare memory, mark IPA regions as MMIO,
306obtain trusted true entropy, and reboot the virtual machine. In particular, it
307makes use of the following hypervisor calls:
308
309- Arm [SMC Calling Convention][smccc] v1.1 or above:
310
311    - `SMCCC_VERSION`
312    - Vendor Specific Hypervisor Service Call UID Query
313
314- Arm [Power State Coordination Interface][psci] v1.0 or above:
315
316    - `PSCI_VERSION`
317    - `PSCI_FEATURES`
318    - `PSCI_SYSTEM_RESET`
319    - `PSCI_SYSTEM_SHUTDOWN`
320
321- Arm [True Random Number Generator Firmware Interface][smccc-trng] v1.0:
322
323    - `TRNG_VERSION`
324    - `TRNG_FEATURES`
325    - `TRNG_RND`
326
327- When running under KVM, the pKVM-specific hypervisor interface must provide:
328
329    - `MEMINFO` (function ID `0xc6000002`)
330    - `MEM_SHARE` (function ID `0xc6000003`)
331    - `MEM_UNSHARE` (function ID `0xc6000004`)
332    - `MMIO_GUARD_INFO` (function ID `0xc6000005`)
333    - `MMIO_GUARD_ENROLL` (function ID `0xc6000006`)
334    - `MMIO_GUARD_MAP` (function ID `0xc6000007`)
335    - `MMIO_GUARD_UNMAP` (function ID `0xc6000008`)
336
337[crosvm-mem]: https://crosvm.dev/book/appendix/memory_layout.html
338[psci]: https://developer.arm.com/documentation/den0022
339[smccc]: https://developer.arm.com/documentation/den0028
340[smccc-trng]: https://developer.arm.com/documentation/den0098
341
342## Booting Protected Virtual Machines
343
344### Boot Protocol
345
346As the hypervisor makes pvmfw the entry point of the VM, the initial value of
347the registers it receives is configured by the VMM and is expected to follow the
348[Linux ABI] _i.e._
349
350- x0 = physical address of device tree blob (dtb) in system RAM.
351- x1 = 0 (reserved for future use)
352- x2 = 0 (reserved for future use)
353- x3 = 0 (reserved for future use)
354
355Images to be verified, which have been loaded to guest memory by the VMM prior
356to booting the VM, are described to pvmfw using the device tree (x0):
357
358- the kernel in the `/config` DT node _e.g._
359
360    ```
361    / {
362        config {
363            kernel-address = <0x80200000>;
364            kernel-size = <0x1000000>;
365        };
366    };
367    ````
368
369- the (optional) ramdisk in the standard `/chosen` node _e.g._
370
371    ```
372    / {
373        chosen {
374            linux,initrd-start = <0x82000000>;
375            linux,initrd-end = <0x82800000>;
376        };
377    };
378    ```
379
380[Linux ABI]: https://www.kernel.org/doc/Documentation/arm64/booting.txt
381
382### Handover ABI
383
384After verifying the guest kernel, pvmfw boots it using the Linux ABI described
385above. It uses the device tree to pass the following:
386
387- a reserved memory node containing the produced DICE chain:
388
389    ```
390    / {
391        reserved-memory {
392            #address-cells = <0x02>;
393            #size-cells = <0x02>;
394            ranges;
395            dice {
396                compatible = "google,open-dice";
397                no-map;
398                reg = <0x0 0x7fe0000>, <0x0 0x1000>;
399            };
400        };
401    };
402    ```
403
404- the `/chosen/avf,new-instance` flag, set when pvmfw generated a new secret
405  (_i.e._ the pVM instance was booted for the first time). This should be used
406  by the next stages to ensure that an attacker isn't trying to force new
407  secrets to be generated by one stage, in isolation;
408
409- the `/chosen/avf,strict-boot` flag, always set and can be used by guests to
410  enable extra validation
411
412### Guest Image Signing
413
414pvmfw verifies the guest kernel image (loaded by the VMM) by re-using tools and
415formats introduced by the Android Verified Boot. In particular, it expects the
416kernel region (see `/config/kernel-{address,size}` described above) to contain
417an appended VBMeta structure, which can be generated as follows:
418
419```
420avbtool add_hash_footer --image <kernel.bin> \
421    --partition_name boot \
422    --dynamic_partition_size \
423    --key $KEY
424```
425
426In cases where a ramdisk is required by the guest, pvmfw must also verify it. To
427do so, it must be covered by a hash descriptor in the VBMeta of the kernel:
428
429```
430cp <initrd.bin> /tmp/
431avbtool add_hash_footer --image /tmp/<initrd.bin> \
432    --partition_name $INITRD_NAME \
433    --dynamic_partition_size \
434    --key $KEY
435avbtool add_hash_footer --image <kernel.bin> \
436    --partition_name boot \
437    --dynamic_partition_size \
438    --include_descriptor_from_image /tmp/<initrd.bin> \
439    --key $KEY
440```
441
442Note that the `/tmp/<initrd.bin>` file is only created to temporarily hold the
443hash descriptor to be added to the kernel footer and that the unsigned
444`<initrd.bin>` should be passed to the VMM when booting a pVM.
445
446The name of the AVB "partition" for the ramdisk (`$INITRD_NAME`) can be used by
447the signer to specify if pvmfw must consider the guest to be debuggable
448(`initrd_debug`) or not (`initrd_normal`), which will be reflected in the
449certificate of the guest and will affect the secrets being provisioned.
450
451If pVM guest kernels are built and/or packaged using the Android Build system,
452the signing described above is recommended to be done through an
453`avb_add_hash_footer` Soong module (see [how we sign the Microdroid
454kernel][soong-udroid]).
455
456[soong-udroid]: https://cs.android.com/android/platform/superproject/main/+/main:packages/modules/Virtualization/microdroid/Android.bp;l=425;drc=b94a5cf516307c4279f6c16a63803527a8affc6d
457
458## Development
459
460For faster iteration, you can build pvmfw, adb-push it to the device, and use
461it directly for a new pVM, without having to flash it to the physical
462partition. To do that, the binary image composition performed by ABL described
463above must be replicated to produce a single file containing the pvmfw binary
464and its configuration data.
465
466As a quick prototyping solution, a valid DICE chain (such as this [test
467file][bcc.dat]) can be appended to the `pvmfw.bin` image with `pvmfw-tool`.
468
469```shell
470m pvmfw-tool pvmfw_bin
471PVMFW_BIN=${ANDROID_PRODUCT_OUT}/system/etc/pvmfw.bin
472DICE=${ANDROID_BUILD_TOP}/packages/modules/Virtualization/tests/pvmfw/assets/bcc.dat
473
474pvmfw-tool custom_pvmfw ${PVMFW_BIN} ${DICE}
475```
476
477The result can then be pushed to the device. Pointing the system property
478`hypervisor.pvmfw.path` to it will cause AVF to use that image as pvmfw:
479
480```shell
481adb push custom_pvmfw /data/local/tmp/pvmfw
482adb root
483adb shell setprop hypervisor.pvmfw.path /data/local/tmp/pvmfw
484```
485
486Then run a protected VM, for example:
487
488```shell
489adb shell /apex/com.android.virt/bin/vm run-microdroid --protected
490```
491
492Note: `adb root` is required to set the system property.
493
494[bcc.dat]: https://cs.android.com/android/platform/superproject/main/+/main:packages/modules/Virtualization/tests/pvmfw/assets/bcc.dat
495