1ARM Trusted Firmware Design
2===========================
3
4Contents :
5
61.  [Introduction](#1--introduction)
72.  [Cold boot](#2--cold-boot)
83.  [EL3 runtime services framework](#3--el3-runtime-services-framework)
94.  [Power State Coordination Interface](#4--power-state-coordination-interface)
105.  [Secure-EL1 Payloads and Dispatchers](#5--secure-el1-payloads-and-dispatchers)
116.  [Crash Reporting in BL3-1](#6--crash-reporting-in-bl3-1)
127.  [Guidelines for Reset Handlers](#7--guidelines-for-reset-handlers)
138.  [CPU specific operations framework](#8--cpu-specific-operations-framework)
149.  [Memory layout of BL images](#9-memory-layout-of-bl-images)
1510. [Firmware Image Package (FIP)](#10--firmware-image-package-fip)
1611. [Use of coherent memory in Trusted Firmware](#11--use-of-coherent-memory-in-trusted-firmware)
1712. [Code Structure](#12--code-structure)
1813. [References](#13--references)
19
20
211.  Introduction
22----------------
23
24The ARM Trusted Firmware implements a subset of the Trusted Board Boot
25Requirements (TBBR) Platform Design Document (PDD) [1] for ARM reference
26platforms. The TBB sequence starts when the platform is powered on and runs up
27to the stage where it hands-off control to firmware running in the normal
28world in DRAM. This is the cold boot path.
29
30The ARM Trusted Firmware also implements the Power State Coordination Interface
31([PSCI]) PDD [2] as a runtime service. PSCI is the interface from normal world
32software to firmware implementing power management use-cases (for example,
33secondary CPU boot, hotplug and idle). Normal world software can access ARM
34Trusted Firmware runtime services via the ARM SMC (Secure Monitor Call)
35instruction. The SMC instruction must be used as mandated by the [SMC Calling
36Convention PDD][SMCCC] [3].
37
38The ARM Trusted Firmware implements a framework for configuring and managing
39interrupts generated in either security state. The details of the interrupt
40management framework and its design can be found in [ARM Trusted
41Firmware Interrupt Management Design guide][INTRG] [4].
42
432.  Cold boot
44-------------
45
46The cold boot path starts when the platform is physically turned on. One of
47the CPUs released from reset is chosen as the primary CPU, and the remaining
48CPUs are considered secondary CPUs. The primary CPU is chosen through
49platform-specific means. The cold boot path is mainly executed by the primary
50CPU, other than essential CPU initialization executed by all CPUs. The
51secondary CPUs are kept in a safe platform-specific state until the primary
52CPU has performed enough initialization to boot them.
53
54The cold boot path in this implementation of the ARM Trusted Firmware is divided
55into five steps (in order of execution):
56
57*   Boot Loader stage 1 (BL1) _AP Trusted ROM_
58*   Boot Loader stage 2 (BL2) _Trusted Boot Firmware_
59*   Boot Loader stage 3-1 (BL3-1) _EL3 Runtime Firmware_
60*   Boot Loader stage 3-2 (BL3-2) _Secure-EL1 Payload_ (optional)
61*   Boot Loader stage 3-3 (BL3-3) _Non-trusted Firmware_
62
63ARM development platforms (Fixed Virtual Platforms (FVPs) and Juno) implement a
64combination of the following types of memory regions. Each bootloader stage uses
65one or more of these memory regions.
66
67*   Regions accessible from both non-secure and secure states. For example,
68    non-trusted SRAM, ROM and DRAM.
69*   Regions accessible from only the secure state. For example, trusted SRAM and
70    ROM. The FVPs also implement the trusted DRAM which is statically
71    configured. Additionally, the Base FVPs and Juno development platform
72    configure the TrustZone Controller (TZC) to create a region in the DRAM
73    which is accessible only from the secure state.
74
75
76The sections below provide the following details:
77
78*   initialization and execution of the first three stages during cold boot
79*   specification of the BL3-1 entrypoint requirements for use by alternative
80    Trusted Boot Firmware in place of the provided BL1 and BL2
81*   changes in BL3-1 behavior when using the `RESET_TO_BL31` option which
82    allows BL3-1 to run without BL1 and BL2
83
84
85### BL1
86
87This stage begins execution from the platform's reset vector at EL3. The reset
88address is platform dependent but it is usually located in a Trusted ROM area.
89The BL1 data section is copied to trusted SRAM at runtime.
90
91On the ARM FVP port, BL1 code starts execution from the reset vector at address
92`0x00000000` (trusted ROM). The BL1 data section is copied to the start of
93trusted SRAM at address `0x04000000`.
94
95On the Juno ARM development platform port, BL1 code starts execution at
96`0x0BEC0000` (FLASH). The BL1 data section is copied to trusted SRAM at address
97`0x04001000.
98
99The functionality implemented by this stage is as follows.
100
101#### Determination of boot path
102
103Whenever a CPU is released from reset, BL1 needs to distinguish between a warm
104boot and a cold boot. This is done using platform-specific mechanisms (see the
105`platform_get_entrypoint()` function in the [Porting Guide]). In the case of a
106warm boot, a CPU is expected to continue execution from a seperate
107entrypoint. In the case of a cold boot, the secondary CPUs are placed in a safe
108platform-specific state (see the `plat_secondary_cold_boot_setup()` function in
109the [Porting Guide]) while the primary CPU executes the remaining cold boot path
110as described in the following sections.
111
112#### Architectural initialization
113
114BL1 performs minimal architectural initialization as follows.
115
116*   Exception vectors
117
118    BL1 sets up simple exception vectors for both synchronous and asynchronous
119    exceptions. The default behavior upon receiving an exception is to populate
120    a status code in the general purpose register `X0` and call the
121    `plat_report_exception()` function (see the [Porting Guide]). The status
122    code is one of:
123
124        0x0 : Synchronous exception from Current EL with SP_EL0
125        0x1 : IRQ exception from Current EL with SP_EL0
126        0x2 : FIQ exception from Current EL with SP_EL0
127        0x3 : System Error exception from Current EL with SP_EL0
128        0x4 : Synchronous exception from Current EL with SP_ELx
129        0x5 : IRQ exception from Current EL with SP_ELx
130        0x6 : FIQ exception from Current EL with SP_ELx
131        0x7 : System Error exception from Current EL with SP_ELx
132        0x8 : Synchronous exception from Lower EL using aarch64
133        0x9 : IRQ exception from Lower EL using aarch64
134        0xa : FIQ exception from Lower EL using aarch64
135        0xb : System Error exception from Lower EL using aarch64
136        0xc : Synchronous exception from Lower EL using aarch32
137        0xd : IRQ exception from Lower EL using aarch32
138        0xe : FIQ exception from Lower EL using aarch32
139        0xf : System Error exception from Lower EL using aarch32
140
141    The `plat_report_exception()` implementation on the ARM FVP port programs
142    the Versatile Express System LED register in the following format to
143    indicate the occurence of an unexpected exception:
144
145        SYS_LED[0]   - Security state (Secure=0/Non-Secure=1)
146        SYS_LED[2:1] - Exception Level (EL3=0x3, EL2=0x2, EL1=0x1, EL0=0x0)
147        SYS_LED[7:3] - Exception Class (Sync/Async & origin). This is the value
148                       of the status code
149
150    A write to the LED register reflects in the System LEDs (S6LED0..7) in the
151    CLCD window of the FVP.
152
153    BL1 does not expect to receive any exceptions other than the SMC exception.
154    For the latter, BL1 installs a simple stub. The stub expects to receive
155    only a single type of SMC (determined by its function ID in the general
156    purpose register `X0`). This SMC is raised by BL2 to make BL1 pass control
157    to BL3-1 (loaded by BL2) at EL3. Any other SMC leads to an assertion
158    failure.
159
160*   CPU initialization
161
162    BL1 calls the `reset_handler()` function which in turn calls the CPU
163    specific reset handler function (see the section: "CPU specific operations
164    framework").
165
166*   MMU setup
167
168    BL1 sets up EL3 memory translation by creating page tables to cover the
169    first 4GB of physical address space. This covers all the memories and
170    peripherals needed by BL1.
171
172*   Control register setup
173    -   `SCTLR_EL3`. Instruction cache is enabled by setting the `SCTLR_EL3.I`
174        bit. Alignment and stack alignment checking is enabled by setting the
175        `SCTLR_EL3.A` and `SCTLR_EL3.SA` bits. Exception endianness is set to
176        little-endian by clearing the `SCTLR_EL3.EE` bit.
177
178    -  `SCR_EL3`. The register width of the next lower exception level is set to
179        AArch64 by setting the `SCR.RW` bit.
180
181    -   `CPTR_EL3`. Accesses to the `CPACR_EL1` register from EL1 or EL2, or the
182        `CPTR_EL2` register from EL2 are configured to not trap to EL3 by
183        clearing the `CPTR_EL3.TCPAC` bit. Access to the trace functionality is
184        configured not to trap to EL3 by clearing the `CPTR_EL3.TTA` bit.
185        Instructions that access the registers associated with Floating Point
186        and Advanced SIMD execution are configured to not trap to EL3 by
187        clearing the `CPTR_EL3.TFP` bit.
188
189#### Platform initialization
190
191BL1 enables issuing of snoop and DVM (Distributed Virtual Memory) requests from
192the CCI-400 slave interface corresponding to the cluster that includes the
193primary CPU. BL1 also initializes UART0 (PL011 console), which enables access to
194the `printf` family of functions in BL1.
195
196#### BL2 image load and execution
197
198BL1 execution continues as follows:
199
2001.  BL1 determines the amount of free trusted SRAM memory available by
201    calculating the extent of its own data section, which also resides in
202    trusted SRAM. BL1 loads a BL2 raw binary image from platform storage, at a
203    platform-specific base address. If the BL2 image file is not present or if
204    there is not enough free trusted SRAM the following error message is
205    printed:
206
207        "Failed to load boot loader stage 2 (BL2) firmware."
208
209    If the load is successful, BL1 updates the limits of the remaining free
210    trusted SRAM. It also populates information about the amount of trusted
211    SRAM used by the BL2 image. The exact load location of the image is
212    provided as a base address in the platform header. Further description of
213    the memory layout can be found later in this document.
214
2152.  BL1 prints the following string from the primary CPU to indicate successful
216    execution of the BL1 stage:
217
218        "Booting trusted firmware boot loader stage 1"
219
2203.  BL1 passes control to the BL2 image at Secure EL1, starting from its load
221    address.
222
2234.  BL1 also passes information about the amount of trusted SRAM used and
224    available for use. This information is populated at a platform-specific
225    memory address.
226
227
228### BL2
229
230BL1 loads and passes control to BL2 at Secure-EL1. BL2 is linked against and
231loaded at a platform-specific base address (more information can be found later
232in this document). The functionality implemented by BL2 is as follows.
233
234#### Architectural initialization
235
236BL2 performs minimal architectural initialization required for subsequent
237stages of the ARM Trusted Firmware and normal world software. It sets up
238Secure EL1 memory translation by creating page tables to address the first 4GB
239of the physical address space in a similar way to BL1. EL1 and EL0 are given
240access to Floating Point & Advanced SIMD registers by clearing the `CPACR.FPEN`
241bits.
242
243#### Platform initialization
244
245BL2 copies the information regarding the trusted SRAM populated by BL1 using a
246platform-specific mechanism. It calculates the limits of DRAM (main memory)
247to determine whether there is enough space to load the BL3-3 image. A platform
248defined base address is used to specify the load address for the BL3-1 image.
249It also defines the extents of memory available for use by the BL3-2 image.
250BL2 also initializes UART0 (PL011 console), which enables  access to the
251`printf` family of functions in BL2. Platform security is initialized to allow
252access to controlled components. The storage abstraction layer is initialized
253which is used to load further bootloader images.
254
255#### BL3-0 (System Control Processor Firmware) image load
256
257Some systems have a separate System Control Processor (SCP) for power, clock,
258reset and system control. BL2 loads the optional BL3-0 image from platform
259storage into a platform-specific region of secure memory. The subsequent
260handling of BL3-0 is platform specific. For example, on the Juno ARM development
261platform port the image is transferred into SCP memory using the SCPI protocol
262after being loaded in the trusted SRAM memory at address `0x04009000`. The SCP
263executes BL3-0 and signals to the Application Processor (AP) for BL2 execution
264to continue.
265
266#### BL3-1 (EL3 Runtime Firmware) image load
267
268BL2 loads the BL3-1 image from platform storage into a platform-specific address
269in trusted SRAM. If there is not enough memory to load the image or image is
270missing it leads to an assertion failure. If the BL3-1 image loads successfully,
271BL2 updates the amount of trusted SRAM used and available for use by BL3-1.
272This information is populated at a platform-specific memory address.
273
274#### BL3-2 (Secure-EL1 Payload) image load
275
276BL2 loads the optional BL3-2 image from platform storage into a platform-
277specific region of secure memory. The image executes in the secure world. BL2
278relies on BL3-1 to pass control to the BL3-2 image, if present. Hence, BL2
279populates a platform-specific area of memory with the entrypoint/load-address
280of the BL3-2 image. The value of the Saved Processor Status Register (`SPSR`)
281for entry into BL3-2 is not determined by BL2, it is initialized by the
282Secure-EL1 Payload Dispatcher (see later) within BL3-1, which is responsible for
283managing interaction with BL3-2. This information is passed to BL3-1.
284
285#### BL3-3 (Non-trusted Firmware) image load
286
287BL2 loads the BL3-3 image (e.g. UEFI or other test or boot software) from
288platform storage into non-secure memory as defined by the platform.
289
290BL2 relies on BL3-1 to pass control to BL3-3 once secure state initialization is
291complete. Hence, BL2 populates a platform-specific area of memory with the
292entrypoint and Saved Program Status Register (`SPSR`) of the normal world
293software image. The entrypoint is the load address of the BL3-3 image. The
294`SPSR` is determined as specified in Section 5.13 of the [PSCI PDD] [PSCI]. This
295information is passed to BL3-1.
296
297#### BL3-1 (EL3 Runtime Firmware) execution
298
299BL2 execution continues as follows:
300
3011.  BL2 passes control back to BL1 by raising an SMC, providing BL1 with the
302    BL3-1 entrypoint. The exception is handled by the SMC exception handler
303    installed by BL1.
304
3052.  BL1 turns off the MMU and flushes the caches. It clears the
306    `SCTLR_EL3.M/I/C` bits, flushes the data cache to the point of coherency
307    and invalidates the TLBs.
308
3093.  BL1 passes control to BL3-1 at the specified entrypoint at EL3.
310
311
312### BL3-1
313
314The image for this stage is loaded by BL2 and BL1 passes control to BL3-1 at
315EL3. BL3-1 executes solely in trusted SRAM. BL3-1 is linked against and
316loaded at a platform-specific base address (more information can be found later
317in this document). The functionality implemented by BL3-1 is as follows.
318
319#### Architectural initialization
320
321Currently, BL3-1 performs a similar architectural initialization to BL1 as
322far as system register settings are concerned. Since BL1 code resides in ROM,
323architectural initialization in BL3-1 allows override of any previous
324initialization done by BL1. BL3-1 creates page tables to address the first
3254GB of physical address space and initializes the MMU accordingly. It initializes
326a buffer of frequently used pointers, called per-CPU pointer cache, in memory for
327faster access. Currently the per-CPU pointer cache contains only the pointer
328to crash stack. It then replaces the exception vectors populated by BL1 with its
329own. BL3-1 exception vectors implement more elaborate support for
330handling SMCs since this is the only mechanism to access the runtime services
331implemented by BL3-1 (PSCI for example). BL3-1 checks each SMC for validity as
332specified by the [SMC calling convention PDD][SMCCC] before passing control to
333the required SMC handler routine. BL3-1 programs the `CNTFRQ_EL0` register with
334the clock frequency of the system counter, which is provided by the platform.
335
336#### Platform initialization
337
338BL3-1 performs detailed platform initialization, which enables normal world
339software to function correctly. It also retrieves entrypoint information for
340the BL3-3 image loaded by BL2 from the platform defined memory address populated
341by BL2. BL3-1 also initializes UART0 (PL011 console), which enables
342access to the `printf` family of functions in BL3-1.  It enables the system
343level implementation of the generic timer through the memory mapped interface.
344
345* GICv2 initialization:
346
347    -   Enable group0 interrupts in the GIC CPU interface.
348    -   Configure group0 interrupts to be asserted as FIQs.
349    -   Disable the legacy interrupt bypass mechanism.
350    -   Configure the priority mask register to allow interrupts of all
351        priorities to be signaled to the CPU interface.
352    -   Mark SGIs 8-15, the secure physical timer interrupt (#29) and the
353        trusted watchdog interrupt (#56) as group0 (secure).
354    -   Target the trusted watchdog interrupt to CPU0.
355    -   Enable these group0 interrupts in the GIC distributor.
356    -   Configure all other interrupts as group1 (non-secure).
357    -   Enable signaling of group0 interrupts in the GIC distributor.
358
359*   GICv3 initialization:
360
361    If a GICv3 implementation is available in the platform, BL3-1 initializes
362    the GICv3 in GICv2 emulation mode with settings as described for GICv2
363    above.
364
365*   Power management initialization:
366
367    BL3-1 implements a state machine to track CPU and cluster state. The state
368    can be one of `OFF`, `ON_PENDING`, `SUSPEND` or `ON`. All secondary CPUs are
369    initially in the `OFF` state. The cluster that the primary CPU belongs to is
370    `ON`; any other cluster is `OFF`. BL3-1 initializes the data structures that
371    implement the state machine, including the locks that protect them. BL3-1
372    accesses the state of a CPU or cluster immediately after reset and before
373    the data cache is enabled in the warm boot path. It is not currently
374    possible to use 'exclusive' based spinlocks, therefore BL3-1 uses locks
375    based on Lamport's Bakery algorithm instead. BL3-1 allocates these locks in
376    device memory by default.
377
378*   Runtime services initialization:
379
380    The runtime service framework and its initialization is described in the
381    "EL3 runtime services framework" section below.
382
383    Details about the PSCI service are provided in the "Power State Coordination
384    Interface" section below.
385
386*   BL3-2 (Secure-EL1 Payload) image initialization
387
388    If a BL3-2 image is present then there must be a matching Secure-EL1 Payload
389    Dispatcher (SPD) service (see later for details). During initialization
390    that service  must register a function to carry out initialization of BL3-2
391    once the runtime services are fully initialized. BL3-1 invokes such a
392    registered function to initialize BL3-2 before running BL3-3.
393
394    Details on BL3-2 initialization and the SPD's role are described in the
395    "Secure-EL1 Payloads and Dispatchers" section below.
396
397*   BL3-3 (Non-trusted Firmware) execution
398
399    BL3-1 initializes the EL2 or EL1 processor context for normal-world cold
400    boot, ensuring that no secure state information finds its way into the
401    non-secure execution state. BL3-1 uses the entrypoint information provided
402    by BL2 to jump to the Non-trusted firmware image (BL3-3) at the highest
403    available Exception Level (EL2 if available, otherwise EL1).
404
405
406### Using alternative Trusted Boot Firmware in place of BL1 and BL2
407
408Some platforms have existing implementations of Trusted Boot Firmware that
409would like to use ARM Trusted Firmware BL3-1 for the EL3 Runtime Firmware. To
410enable this firmware architecture it is important to provide a fully documented
411and stable interface between the Trusted Boot Firmware and BL3-1.
412
413Future changes to the BL3-1 interface will be done in a backwards compatible
414way, and this enables these firmware components to be independently enhanced/
415updated to develop and exploit new functionality.
416
417#### Required CPU state when calling `bl31_entrypoint()` during cold boot
418
419This function must only be called by the primary CPU, if this is called by any
420other CPU the firmware will abort.
421
422On entry to this function the calling primary CPU must be executing in AArch64
423EL3, little-endian data access, and all interrupt sources masked:
424
425    PSTATE.EL = 3
426    PSTATE.RW = 1
427    PSTATE.DAIF = 0xf
428    SCTLR_EL3.EE = 0
429
430X0 and X1 can be used to pass information from the Trusted Boot Firmware to the
431platform code in BL3-1:
432
433    X0 : Reserved for common Trusted Firmware information
434    X1 : Platform specific information
435
436BL3-1 zero-init sections (e.g. `.bss`) should not contain valid data on entry,
437these will be zero filled prior to invoking platform setup code.
438
439##### Use of the X0 and X1 parameters
440
441The parameters are platform specific and passed from `bl31_entrypoint()` to
442`bl31_early_platform_setup()`. The value of these parameters is never directly
443used by the common BL3-1 code.
444
445The convention is that `X0` conveys information regarding the BL3-1, BL3-2 and
446BL3-3 images from the Trusted Boot firmware and `X1` can be used for other
447platform specific purpose. This convention allows platforms which use ARM
448Trusted Firmware's BL1 and BL2 images to transfer additional platform specific
449information from Secure Boot without conflicting with future evolution of the
450Trusted Firmware using `X0` to pass a `bl31_params` structure.
451
452BL3-1 common and SPD initialization code depends on image and entrypoint
453information about BL3-3 and BL3-2, which is provided via BL3-1 platform APIs.
454This information is required until the start of execution of BL3-3. This
455information can be provided in a platform defined manner, e.g. compiled into
456the platform code in BL3-1, or provided in a platform defined memory location
457by the Trusted Boot firmware, or passed from the Trusted Boot Firmware via the
458Cold boot Initialization parameters. This data may need to be cleaned out of
459the CPU caches if it is provided by an earlier boot stage and then accessed by
460BL3-1 platform code before the caches are enabled.
461
462ARM Trusted Firmware's BL2 implementation passes a `bl31_params` structure in
463`X0` and the FVP port interprets this in the BL3-1 platform code.
464
465##### MMU, Data caches & Coherency
466
467BL3-1 does not depend on the enabled state of the MMU, data caches or
468interconnect coherency on entry to `bl31_entrypoint()`. If these are disabled
469on entry, these should be enabled during `bl31_plat_arch_setup()`.
470
471##### Data structures used in the BL3-1 cold boot interface
472
473These structures are designed to support compatibility and independent
474evolution of the structures and the firmware images. For example, a version of
475BL3-1 that can interpret the BL3-x image information from different versions of
476BL2, a platform that uses an extended entry_point_info structure to convey
477additional register information to BL3-1, or a ELF image loader that can convey
478more details about the firmware images.
479
480To support these scenarios the structures are versioned and sized, which enables
481BL3-1 to detect which information is present and respond appropriately. The
482`param_header` is defined to capture this information:
483
484    typedef struct param_header {
485        uint8_t type;       /* type of the structure */
486        uint8_t version;    /* version of this structure */
487        uint16_t size;      /* size of this structure in bytes */
488        uint32_t attr;      /* attributes: unused bits SBZ */
489    } param_header_t;
490
491The structures using this format are `entry_point_info`, `image_info` and
492`bl31_params`. The code that allocates and populates these structures must set
493the header fields appropriately, and the `SET_PARA_HEAD()` a macro is defined
494to simplify this action.
495
496#### Required CPU state for BL3-1 Warm boot initialization
497
498When requesting a CPU power-on, or suspending a running CPU, ARM Trusted
499Firmware provides the platform power management code with a Warm boot
500initialization entry-point, to be invoked by the CPU immediately after the
501reset handler. On entry to the Warm boot initialization function the calling
502CPU must be in AArch64 EL3, little-endian data access and all interrupt sources
503masked:
504
505    PSTATE.EL = 3
506    PSTATE.RW = 1
507    PSTATE.DAIF = 0xf
508    SCTLR_EL3.EE = 0
509
510The PSCI implementation will initialize the processor state and ensure that the
511platform power management code is then invoked as required to initialize all
512necessary system, cluster and CPU resources.
513
514
515### Using BL3-1 as the CPU reset vector
516
517On some platforms the runtime firmware (BL3-x images) for the application
518processors are loaded by trusted firmware running on a secure system processor
519on the SoC, rather than by BL1 and BL2 running on the primary application
520processor. For this type of SoC it is desirable for the application processor
521to always reset to BL3-1 which eliminates the need for BL1 and BL2.
522
523ARM Trusted Firmware provides a build-time option `RESET_TO_BL31` that includes
524some additional logic in the BL3-1 entrypoint to support this use case.
525
526In this configuration, the platform's Trusted Boot Firmware must ensure that
527BL3-1 is loaded to its runtime address, which must match the CPU's RVBAR reset
528vector address, before the application processor is powered on. Additionally,
529platform software is responsible for loading the other BL3-x images required and
530providing entry point information for them to BL3-1. Loading these images might
531be done by the Trusted Boot Firmware or by platform code in BL3-1.
532
533The ARM FVP port supports the `RESET_TO_BL31` configuration, in which case the
534`bl31.bin` image must be loaded to its run address in Trusted SRAM and all CPU
535reset vectors be changed from the default `0x0` to this run address. See the
536[User Guide] for details of running the FVP models in this way.
537
538This configuration requires some additions and changes in the BL3-1
539functionality:
540
541#### Determination of boot path
542
543In this configuration, BL3-1 uses the same reset framework and code as the one
544described for BL1 above. On a warm boot a CPU is directed to the PSCI
545implementation via a platform defined mechanism. On a cold boot, the platform
546must place any secondary CPUs into a safe state while the primary CPU executes
547a modified BL3-1 initialization, as described below.
548
549#### Architectural initialization
550
551As the first image to execute in this configuration BL3-1 must ensure that
552interconnect coherency is enabled (if required) before enabling the MMU.
553
554#### Platform initialization
555
556In this configuration, when the CPU resets to BL3-1 there are no parameters
557that can be passed in registers by previous boot stages. Instead, the platform
558code in BL3-1 needs to know, or be able to determine, the location of the BL3-2
559(if required) and BL3-3 images and provide this information in response to the
560`bl31_plat_get_next_image_ep_info()` function.
561
562As the first image to execute in this configuration BL3-1 must also ensure that
563any security initialisation, for example programming a TrustZone address space
564controller, is carried out during early platform initialisation.
565
566
5673.  EL3 runtime services framework
568----------------------------------
569
570Software executing in the non-secure state and in the secure state at exception
571levels lower than EL3 will request runtime services using the Secure Monitor
572Call (SMC) instruction. These requests will follow the convention described in
573the SMC Calling Convention PDD ([SMCCC]). The [SMCCC] assigns function
574identifiers to each SMC request and describes how arguments are passed and
575returned.
576
577The EL3 runtime services framework enables the development of services by
578different providers that can be easily integrated into final product firmware.
579The following sections describe the framework which facilitates the
580registration, initialization and use of runtime services in EL3 Runtime
581Firmware (BL3-1).
582
583The design of the runtime services depends heavily on the concepts and
584definitions described in the [SMCCC], in particular SMC Function IDs, Owning
585Entity Numbers (OEN), Fast and Standard calls, and the SMC32 and SMC64 calling
586conventions. Please refer to that document for more detailed explanation of
587these terms.
588
589The following runtime services are expected to be implemented first. They have
590not all been instantiated in the current implementation.
591
5921.  Standard service calls
593
594    This service is for management of the entire system. The Power State
595    Coordination Interface ([PSCI]) is the first set of standard service calls
596    defined by ARM (see PSCI section later).
597
598    NOTE: Currently this service is called PSCI since there are no other
599    defined standard service calls.
600
6012.  Secure-EL1 Payload Dispatcher service
602
603    If a system runs a Trusted OS or other Secure-EL1 Payload (SP) then
604    it also requires a _Secure Monitor_ at EL3 to switch the EL1 processor
605    context between the normal world (EL1/EL2) and trusted world (Secure-EL1).
606    The Secure Monitor will make these world switches in response to SMCs. The
607    [SMCCC] provides for such SMCs with the Trusted OS Call and Trusted
608    Application Call OEN ranges.
609
610    The interface between the EL3 Runtime Firmware and the Secure-EL1 Payload is
611    not defined by the [SMCCC] or any other standard. As a result, each
612    Secure-EL1 Payload requires a specific Secure Monitor that runs as a runtime
613    service - within ARM Trusted Firmware this service is referred to as the
614    Secure-EL1 Payload Dispatcher (SPD).
615
616    ARM Trusted Firmware provides a Test Secure-EL1 Payload (TSP) and its
617    associated Dispatcher (TSPD). Details of SPD design and TSP/TSPD operation
618    are described in the "Secure-EL1 Payloads and Dispatchers" section below.
619
6203.  CPU implementation service
621
622    This service will provide an interface to CPU implementation specific
623    services for a given platform e.g. access to processor errata workarounds.
624    This service is currently unimplemented.
625
626Additional services for ARM Architecture, SiP and OEM calls can be implemented.
627Each implemented service handles a range of SMC function identifiers as
628described in the [SMCCC].
629
630
631### Registration
632
633A runtime service is registered using the `DECLARE_RT_SVC()` macro, specifying
634the name of the service, the range of OENs covered, the type of service and
635initialization and call handler functions. This macro instantiates a `const
636struct rt_svc_desc` for the service with these details (see `runtime_svc.h`).
637This structure is allocated in a special ELF section `rt_svc_descs`, enabling
638the framework to find all service descriptors included into BL3-1.
639
640The specific service for a SMC Function is selected based on the OEN and call
641type of the Function ID, and the framework uses that information in the service
642descriptor to identify the handler for the SMC Call.
643
644The service descriptors do not include information to identify the precise set
645of SMC function identifiers supported by this service implementation, the
646security state from which such calls are valid nor the capability to support
64764-bit and/or 32-bit callers (using SMC32 or SMC64). Responding appropriately
648to these aspects of a SMC call is the responsibility of the service
649implementation, the framework is focused on integration of services from
650different providers and minimizing the time taken by the framework before the
651service handler is invoked.
652
653Details of the parameters, requirements and behavior of the initialization and
654call handling functions are provided in the following sections.
655
656
657### Initialization
658
659`runtime_svc_init()` in `runtime_svc.c` initializes the runtime services
660framework running on the primary CPU during cold boot as part of the BL3-1
661initialization. This happens prior to initializing a Trusted OS and running
662Normal world boot firmware that might in turn use these services.
663Initialization involves validating each of the declared runtime service
664descriptors, calling the service initialization function and populating the
665index used for runtime lookup of the service.
666
667The BL3-1 linker script collects all of the declared service descriptors into a
668single array and defines symbols that allow the framework to locate and traverse
669the array, and determine its size.
670
671The framework does basic validation of each descriptor to halt firmware
672initialization if service declaration errors are detected. The framework does
673not check descriptors for the following error conditions, and may behave in an
674unpredictable manner under such scenarios:
675
6761.  Overlapping OEN ranges
6772.  Multiple descriptors for the same range of OENs and `call_type`
6783.  Incorrect range of owning entity numbers for a given `call_type`
679
680Once validated, the service `init()` callback is invoked. This function carries
681out any essential EL3 initialization before servicing requests. The `init()`
682function is only invoked on the primary CPU during cold boot. If the service
683uses per-CPU data this must either be initialized for all CPUs during this call,
684or be done lazily when a CPU first issues an SMC call to that service. If
685`init()` returns anything other than `0`, this is treated as an initialization
686error and the service is ignored: this does not cause the firmware to halt.
687
688The OEN and call type fields present in the SMC Function ID cover a total of
689128 distinct services, but in practice a single descriptor can cover a range of
690OENs, e.g. SMCs to call a Trusted OS function. To optimize the lookup of a
691service handler, the framework uses an array of 128 indices that map every
692distinct OEN/call-type combination either to one of the declared services or to
693indicate the service is not handled. This `rt_svc_descs_indices[]` array is
694populated for all of the OENs covered by a service after the service `init()`
695function has reported success. So a service that fails to initialize will never
696have it's `handle()` function invoked.
697
698The following figure shows how the `rt_svc_descs_indices[]` index maps the SMC
699Function ID call type and OEN onto a specific service handler in the
700`rt_svc_descs[]` array.
701
702![Image 1](diagrams/rt-svc-descs-layout.png?raw=true)
703
704
705### Handling an SMC
706
707When the EL3 runtime services framework receives a Secure Monitor Call, the SMC
708Function ID is passed in W0 from the lower exception level (as per the
709[SMCCC]). If the calling register width is AArch32, it is invalid to invoke an
710SMC Function which indicates the SMC64 calling convention: such calls are
711ignored and return the Unknown SMC Function Identifier result code `0xFFFFFFFF`
712in R0/X0.
713
714Bit[31] (fast/standard call) and bits[29:24] (owning entity number) of the SMC
715Function ID are combined to index into the `rt_svc_descs_indices[]` array. The
716resulting value might indicate a service that has no handler, in this case the
717framework will also report an Unknown SMC Function ID. Otherwise, the value is
718used as a further index into the `rt_svc_descs[]` array to locate the required
719service and handler.
720
721The service's `handle()` callback is provided with five of the SMC parameters
722directly, the others are saved into memory for retrieval (if needed) by the
723handler. The handler is also provided with an opaque `handle` for use with the
724supporting library for parameter retrieval, setting return values and context
725manipulation; and with `flags` indicating the security state of the caller. The
726framework finally sets up the execution stack for the handler, and invokes the
727services `handle()` function.
728
729On return from the handler the result registers are populated in X0-X3 before
730restoring the stack and CPU state and returning from the original SMC.
731
732
7334.  Power State Coordination Interface
734--------------------------------------
735
736TODO: Provide design walkthrough of PSCI implementation.
737
738The PSCI v1.0 specification categorizes APIs as optional and mandatory. All the
739mandatory APIs in PSCI v1.0 and all the APIs in PSCI v0.2 draft specification
740[Power State Coordination Interface PDD] [PSCI] are implemented. The table lists
741the PSCI v1.0 APIs and their support in generic code.
742
743An API implementation might have a dependency on platform code e.g. CPU_SUSPEND
744requires the platform to export a part of the implementation. Hence the level
745of support of the mandatory APIs depends upon the support exported by the
746platform port as well. The Juno and FVP (all variants) platforms export all the
747required support.
748
749| PSCI v1.0 API         |Supported| Comments                                  |
750|:----------------------|:--------|:------------------------------------------|
751|`PSCI_VERSION`         | Yes     | The version returned is 1.0               |
752|`CPU_SUSPEND`          | Yes*    | The original `power_state` format is used |
753|`CPU_OFF`              | Yes*    |                                           |
754|`CPU_ON`               | Yes*    |                                           |
755|`AFFINITY_INFO`        | Yes     |                                           |
756|`MIGRATE`              | Yes**   |                                           |
757|`MIGRATE_INFO_TYPE`    | Yes**   |                                           |
758|`MIGRATE_INFO_CPU`     | Yes**   |                                           |
759|`SYSTEM_OFF`           | Yes*    |                                           |
760|`SYSTEM_RESET`         | Yes*    |                                           |
761|`PSCI_FEATURES`        | Yes     |                                           |
762|`CPU_FREEZE`           | No      |                                           |
763|`CPU_DEFAULT_SUSPEND`  | No      |                                           |
764|`CPU_HW_STATE`         | No      |                                           |
765|`SYSTEM_SUSPEND`       | Yes*    |                                           |
766|`PSCI_SET_SUSPEND_MODE`| No      |                                           |
767|`PSCI_STAT_RESIDENCY`  | No      |                                           |
768|`PSCI_STAT_COUNT`      | No      |                                           |
769
770*Note : These PSCI APIs require platform power management hooks to be
771registered with the generic PSCI code to be supported.
772
773**Note : These PSCI APIs require appropriate Secure Payload Dispatcher
774hooks to be registered with the generic PSCI code to be supported.
775
776
7775.  Secure-EL1 Payloads and Dispatchers
778---------------------------------------
779
780On a production system that includes a Trusted OS running in Secure-EL1/EL0,
781the Trusted OS is coupled with a companion runtime service in the BL3-1
782firmware. This service is responsible for the initialisation of the Trusted
783OS and all communications with it. The Trusted OS is the BL3-2 stage of the
784boot flow in ARM Trusted Firmware. The firmware will attempt to locate, load
785and execute a BL3-2 image.
786
787ARM Trusted Firmware uses a more general term for the BL3-2 software that runs
788at Secure-EL1 - the _Secure-EL1 Payload_ - as it is not always a Trusted OS.
789
790The ARM Trusted Firmware provides a Test Secure-EL1 Payload (TSP) and a Test
791Secure-EL1 Payload Dispatcher (TSPD) service as an example of how a Trusted OS
792is supported on a production system using the Runtime Services Framework. On
793such a system, the Test BL3-2 image and service are replaced by the Trusted OS
794and its dispatcher service. The ARM Trusted Firmware build system expects that
795the dispatcher will define the build flag `NEED_BL32` to enable it to include
796the BL3-2 in the build either as a binary or to compile from source depending
797on whether the `BL32` build option is specified or not.
798
799The TSP runs in Secure-EL1. It is designed to demonstrate synchronous
800communication with the normal-world software running in EL1/EL2. Communication
801is initiated by the normal-world software
802
803*   either directly through a Fast SMC (as defined in the [SMCCC])
804
805*   or indirectly through a [PSCI] SMC. The [PSCI] implementation in turn
806    informs the TSPD about the requested power management operation. This allows
807    the TSP to prepare for or respond to the power state change
808
809The TSPD service is responsible for.
810
811*   Initializing the TSP
812
813*   Routing requests and responses between the secure and the non-secure
814    states during the two types of communications just described
815
816### Initializing a BL3-2 Image
817
818The Secure-EL1 Payload Dispatcher (SPD) service is responsible for initializing
819the BL3-2 image. It needs access to the information passed by BL2 to BL3-1 to do
820so. This is provided by:
821
822    entry_point_info_t *bl31_plat_get_next_image_ep_info(uint32_t);
823
824which returns a reference to the `entry_point_info` structure corresponding to
825the image which will be run in the specified security state. The SPD uses this
826API to get entry point information for the SECURE image, BL3-2.
827
828In the absence of a BL3-2 image, BL3-1 passes control to the normal world
829bootloader image (BL3-3). When the BL3-2 image is present, it is typical
830that the SPD wants control to be passed to BL3-2 first and then later to BL3-3.
831
832To do this the SPD has to register a BL3-2 initialization function during
833initialization of the SPD service. The BL3-2 initialization function has this
834prototype:
835
836    int32_t init();
837
838and is registered using the `bl31_register_bl32_init()` function.
839
840Trusted Firmware supports two approaches for the SPD to pass control to BL3-2
841before returning through EL3 and running the non-trusted firmware (BL3-3):
842
8431.  In the BL3-2 setup function, use `bl31_set_next_image_type()` to
844    request that the exit from `bl31_main()` is to the BL3-2 entrypoint in
845    Secure-EL1. BL3-1 will exit to BL3-2 using the asynchronous method by
846    calling bl31_prepare_next_image_entry() and el3_exit().
847
848    When the BL3-2 has completed initialization at Secure-EL1, it returns to
849    BL3-1 by issuing an SMC, using a Function ID allocated to the SPD. On
850    receipt of this SMC, the SPD service handler should switch the CPU context
851    from trusted to normal world and use the `bl31_set_next_image_type()` and
852    `bl31_prepare_next_image_entry()` functions to set up the initial return to
853    the normal world firmware BL3-3. On return from the handler the framework
854    will exit to EL2 and run BL3-3.
855
8562.  The BL3-2 setup function registers a initialization function using
857    `bl31_register_bl32_init()` which provides a SPD-defined mechanism to
858    invoke a 'world-switch synchronous call' to Secure-EL1 to run the BL3-2
859    entrypoint.
860    NOTE: The Test SPD service included with the Trusted Firmware provides one
861    implementation of such a mechanism.
862
863    On completion BL3-2 returns control to BL3-1 via a SMC, and on receipt the
864    SPD service handler invokes the synchronous call return mechanism to return
865    to the BL3-2 initialization function. On return from this function,
866    `bl31_main()` will set up the return to the normal world firmware BL3-3 and
867    continue the boot process in the normal world.
868
869
8706.  Crash Reporting in BL3-1
871----------------------------
872
873The BL3-1 implements a scheme for reporting the processor state when an unhandled
874exception is encountered. The reporting mechanism attempts to preserve all the
875register contents and report it via the default serial output. The general purpose
876registers, EL3, Secure EL1 and some EL2 state registers are reported.
877
878A dedicated per-CPU crash stack is maintained by BL3-1 and this is retrieved via
879the per-CPU pointer cache. The implementation attempts to minimise the memory
880required for this feature. The file `crash_reporting.S` contains the
881implementation for crash reporting.
882
883The sample crash output is shown below.
884
885    x0	:0x000000004F00007C
886    x1	:0x0000000007FFFFFF
887    x2	:0x0000000004014D50
888    x3	:0x0000000000000000
889    x4	:0x0000000088007998
890    x5	:0x00000000001343AC
891    x6	:0x0000000000000016
892    x7	:0x00000000000B8A38
893    x8	:0x00000000001343AC
894    x9	:0x00000000000101A8
895    x10	:0x0000000000000002
896    x11	:0x000000000000011C
897    x12	:0x00000000FEFDC644
898    x13	:0x00000000FED93FFC
899    x14	:0x0000000000247950
900    x15	:0x00000000000007A2
901    x16	:0x00000000000007A4
902    x17	:0x0000000000247950
903    x18	:0x0000000000000000
904    x19	:0x00000000FFFFFFFF
905    x20	:0x0000000004014D50
906    x21	:0x000000000400A38C
907    x22	:0x0000000000247950
908    x23	:0x0000000000000010
909    x24	:0x0000000000000024
910    x25	:0x00000000FEFDC868
911    x26	:0x00000000FEFDC86A
912    x27	:0x00000000019EDEDC
913    x28	:0x000000000A7CFDAA
914    x29	:0x0000000004010780
915    x30	:0x000000000400F004
916    scr_el3	:0x0000000000000D3D
917    sctlr_el3	:0x0000000000C8181F
918    cptr_el3	:0x0000000000000000
919    tcr_el3	:0x0000000080803520
920    daif	:0x00000000000003C0
921    mair_el3	:0x00000000000004FF
922    spsr_el3	:0x00000000800003CC
923    elr_el3	:0x000000000400C0CC
924    ttbr0_el3	:0x00000000040172A0
925    esr_el3	:0x0000000096000210
926    sp_el3	:0x0000000004014D50
927    far_el3	:0x000000004F00007C
928    spsr_el1	:0x0000000000000000
929    elr_el1	:0x0000000000000000
930    spsr_abt	:0x0000000000000000
931    spsr_und	:0x0000000000000000
932    spsr_irq	:0x0000000000000000
933    spsr_fiq	:0x0000000000000000
934    sctlr_el1	:0x0000000030C81807
935    actlr_el1	:0x0000000000000000
936    cpacr_el1	:0x0000000000300000
937    csselr_el1	:0x0000000000000002
938    sp_el1	:0x0000000004028800
939    esr_el1	:0x0000000000000000
940    ttbr0_el1	:0x000000000402C200
941    ttbr1_el1	:0x0000000000000000
942    mair_el1	:0x00000000000004FF
943    amair_el1	:0x0000000000000000
944    tcr_el1	:0x0000000000003520
945    tpidr_el1	:0x0000000000000000
946    tpidr_el0	:0x0000000000000000
947    tpidrro_el0	:0x0000000000000000
948    dacr32_el2	:0x0000000000000000
949    ifsr32_el2	:0x0000000000000000
950    par_el1	:0x0000000000000000
951    far_el1	:0x0000000000000000
952    afsr0_el1	:0x0000000000000000
953    afsr1_el1	:0x0000000000000000
954    contextidr_el1	:0x0000000000000000
955    vbar_el1	:0x0000000004027000
956    cntp_ctl_el0	:0x0000000000000000
957    cntp_cval_el0	:0x0000000000000000
958    cntv_ctl_el0	:0x0000000000000000
959    cntv_cval_el0	:0x0000000000000000
960    cntkctl_el1	:0x0000000000000000
961    fpexc32_el2	:0x0000000004000700
962    sp_el0	:0x0000000004010780
963
9647.  Guidelines for Reset Handlers
965---------------------------------
966
967Trusted Firmware implements a framework that allows CPU and platform ports to
968perform actions immediately after a CPU is released from reset in both the cold
969and warm boot paths. This is done by calling the `reset_handler()` function in
970both the BL1 and BL3-1 images. It in turn calls the platform and CPU specific
971reset handling functions.
972
973Details for implementing a CPU specific reset handler can be found in
974Section 8. Details for implementing a platform specific reset handler can be
975found in the [Porting Guide](see the `plat_reset_handler()` function).
976
977When adding functionality to a reset handler, the following points should be
978kept in mind.
979
9801.   The first reset handler in the system exists either in a ROM image
981     (e.g. BL1), or BL3-1 if `RESET_TO_BL31` is true. This may be detected at
982     compile time using the constant `FIRST_RESET_HANDLER_CALL`.
983
9842.   When considering ROM images, it's important to consider non TF-based ROMs
985     and ROMs based on previous versions of the TF code.
986
9873.   If the functionality should be applied to a ROM and there is no possibility
988     of a ROM being used that does not apply the functionality (or equivalent),
989     then the functionality should be applied within a `#if
990     FIRST_RESET_HANDLER_CALL` block.
991
9924.   If the functionality should execute in BL3-1 in order to override or
993     supplement a ROM version of the functionality, then the functionality
994     should be applied in the `#else` part of a `#if FIRST_RESET_HANDLER_CALL`
995     block.
996
9975.   If the functionality should be applied to a ROM but there is a possibility
998     of ROMs being used that do not apply the functionality, then the
999     functionality should be applied outside of a `FIRST_RESET_HANDLER_CALL`
1000     block, so that BL3-1 has an opportunity to apply the functionality instead.
1001     In this case, additional code may be needed to cope with different ROMs
1002     that do or do not apply the functionality.
1003
1004
10058.  CPU specific operations framework
1006-----------------------------
1007
1008Certain aspects of the ARMv8 architecture are implementation defined,
1009that is, certain behaviours are not architecturally defined, but must be defined
1010and documented by individual processor implementations. The ARM Trusted
1011Firmware implements a framework which categorises the common implementation
1012defined behaviours and allows a processor to export its implementation of that
1013behaviour. The categories are:
1014
10151.  Processor specific reset sequence.
1016
10172.  Processor specific power down sequences.
1018
10193.  Processor specific register dumping as a part of crash reporting.
1020
1021Each of the above categories fulfils a different requirement.
1022
10231.  allows any processor specific initialization before the caches and MMU
1024    are turned on, like implementation of errata workarounds, entry into
1025    the intra-cluster coherency domain etc.
1026
10272.  allows each processor to implement the power down sequence mandated in
1028    its Technical Reference Manual (TRM).
1029
10303.  allows a processor to provide additional information to the developer
1031    in the event of a crash, for example Cortex-A53 has registers which
1032    can expose the data cache contents.
1033
1034Please note that only 2. is mandated by the TRM.
1035
1036The CPU specific operations framework scales to accommodate a large number of
1037different CPUs during power down and reset handling. The platform can specify
1038any CPU optimization it wants to enable for each CPU. It can also specify
1039the CPU errata workarounds to be applied for each CPU type during reset
1040handling by defining CPU errata compile time macros. Details on these macros
1041can be found in the [cpu-specific-build-macros.md][CPUBM] file.
1042
1043The CPU specific operations framework depends on the `cpu_ops` structure which
1044needs to be exported for each type of CPU in the platform. It is defined in
1045`include/lib/cpus/aarch64/cpu_macros.S` and has the following fields : `midr`,
1046`reset_func()`, `core_pwr_dwn()`, `cluster_pwr_dwn()` and `cpu_reg_dump()`.
1047
1048The CPU specific files in `lib/cpus` export a `cpu_ops` data structure with
1049suitable handlers for that CPU.  For example, `lib/cpus/cortex_a53.S` exports
1050the `cpu_ops` for Cortex-A53 CPU. According to the platform configuration,
1051these CPU specific files must must be included in the build by the platform
1052makefile. The generic CPU specific operations framework code exists in
1053`lib/cpus/aarch64/cpu_helpers.S`.
1054
1055### CPU specific Reset Handling
1056
1057After a reset, the state of the CPU when it calls generic reset handler is:
1058MMU turned off, both instruction and data caches turned off and not part
1059of any coherency domain.
1060
1061The BL entrypoint code first invokes the `plat_reset_handler()` to allow
1062the platform to perform any system initialization required and any system
1063errata workarounds that needs to be applied. The `get_cpu_ops_ptr()` reads
1064the current CPU midr, finds the matching `cpu_ops` entry in the `cpu_ops`
1065array and returns it. Note that only the part number and implementer fields
1066in midr are used to find the matching `cpu_ops` entry. The `reset_func()` in
1067the returned `cpu_ops` is then invoked which executes the required reset
1068handling for that CPU and also any errata workarounds enabled by the platform.
1069This function must preserve the values of general purpose registers x20 to x29.
1070
1071Refer to Section "Guidelines for Reset Handlers" for general guidelines
1072regarding placement of code in a reset handler.
1073
1074### CPU specific power down sequence
1075
1076During the BL3-1 initialization sequence, the pointer to the matching `cpu_ops`
1077entry is stored in per-CPU data by `init_cpu_ops()` so that it can be quickly
1078retrieved during power down sequences.
1079
1080The PSCI service, upon receiving a power down request, determines the highest
1081affinity level at which to execute power down sequence for a particular CPU and
1082invokes the corresponding 'prepare' power down handler in the CPU specific
1083operations framework. For example, when a CPU executes a power down for affinity
1084level 0, the `prepare_core_pwr_dwn()` retrieves the `cpu_ops` pointer from the
1085per-CPU data and the corresponding `core_pwr_dwn()` is invoked. Similarly when
1086a CPU executes power down at affinity level 1, the `prepare_cluster_pwr_dwn()`
1087retrieves the `cpu_ops` pointer and the corresponding `cluster_pwr_dwn()` is
1088invoked.
1089
1090At runtime the platform hooks for power down are invoked by the PSCI service to
1091perform platform specific operations during a power down sequence, for example
1092turning off CCI coherency during a cluster power down.
1093
1094### CPU specific register reporting during crash
1095
1096If the crash reporting is enabled in BL3-1, when a crash occurs, the crash
1097reporting framework calls `do_cpu_reg_dump` which retrieves the matching
1098`cpu_ops` using `get_cpu_ops_ptr()` function. The `cpu_reg_dump()` in
1099`cpu_ops` is invoked, which then returns the CPU specific register values to
1100be reported and a pointer to the ASCII list of register names in a format
1101expected by the crash reporting framework.
1102
1103
11049. Memory layout of BL images
1105-----------------------------
1106
1107Each bootloader image can be divided in 2 parts:
1108
1109 *    the static contents of the image. These are data actually stored in the
1110      binary on the disk. In the ELF terminology, they are called `PROGBITS`
1111      sections;
1112
1113 *    the run-time contents of the image. These are data that don't occupy any
1114      space in the binary on the disk. The ELF binary just contains some
1115      metadata indicating where these data will be stored at run-time and the
1116      corresponding sections need to be allocated and initialized at run-time.
1117      In the ELF terminology, they are called `NOBITS` sections.
1118
1119All PROGBITS sections are grouped together at the beginning of the image,
1120followed by all NOBITS sections. This is true for all Trusted Firmware images
1121and it is governed by the linker scripts. This ensures that the raw binary
1122images are as small as possible. If a NOBITS section would sneak in between
1123PROGBITS sections then the resulting binary file would contain a bunch of zero
1124bytes at the location of this NOBITS section, making the image unnecessarily
1125bigger. Smaller images allow faster loading from the FIP to the main memory.
1126
1127### Linker scripts and symbols
1128
1129Each bootloader stage image layout is described by its own linker script. The
1130linker scripts export some symbols into the program symbol table. Their values
1131correspond to particular addresses. The trusted firmware code can refer to these
1132symbols to figure out the image memory layout.
1133
1134Linker symbols follow the following naming convention in the trusted firmware.
1135
1136*   `__<SECTION>_START__`
1137
1138    Start address of a given section named `<SECTION>`.
1139
1140*   `__<SECTION>_END__`
1141
1142    End address of a given section named `<SECTION>`. If there is an alignment
1143    constraint on the section's end address then `__<SECTION>_END__` corresponds
1144    to the end address of the section's actual contents, rounded up to the right
1145    boundary. Refer to the value of `__<SECTION>_UNALIGNED_END__`  to know the
1146    actual end address of the section's contents.
1147
1148*   `__<SECTION>_UNALIGNED_END__`
1149
1150    End address of a given section named `<SECTION>` without any padding or
1151    rounding up due to some alignment constraint.
1152
1153*   `__<SECTION>_SIZE__`
1154
1155    Size (in bytes) of a given section named `<SECTION>`. If there is an
1156    alignment constraint on the section's end address then `__<SECTION>_SIZE__`
1157    corresponds to the size of the section's actual contents, rounded up to the
1158    right boundary. In other words, `__<SECTION>_SIZE__ = __<SECTION>_END__ -
1159    _<SECTION>_START__`. Refer to the value of `__<SECTION>_UNALIGNED_SIZE__`
1160    to know the actual size of the section's contents.
1161
1162*   `__<SECTION>_UNALIGNED_SIZE__`
1163
1164    Size (in bytes) of a given section named `<SECTION>` without any padding or
1165    rounding up due to some alignment constraint. In other words,
1166    `__<SECTION>_UNALIGNED_SIZE__ = __<SECTION>_UNALIGNED_END__ -
1167    __<SECTION>_START__`.
1168
1169Some of the linker symbols are mandatory as the trusted firmware code relies on
1170them to be defined. They are listed in the following subsections. Some of them
1171must be provided for each bootloader stage and some are specific to a given
1172bootloader stage.
1173
1174The linker scripts define some extra, optional symbols. They are not actually
1175used by any code but they help in understanding the bootloader images' memory
1176layout as they are easy to spot in the link map files.
1177
1178#### Common linker symbols
1179
1180Early setup code needs to know the extents of the BSS section to zero-initialise
1181it before executing any C code. The following linker symbols are defined for
1182this purpose:
1183
1184* `__BSS_START__` This address must be aligned on a 16-byte boundary.
1185* `__BSS_SIZE__`
1186
1187Similarly, the coherent memory section (if enabled) must be zero-initialised.
1188Also, the MMU setup code needs to know the extents of this section to set the
1189right memory attributes for it. The following linker symbols are defined for
1190this purpose:
1191
1192* `__COHERENT_RAM_START__` This address must be aligned on a page-size boundary.
1193* `__COHERENT_RAM_END__` This address must be aligned on a page-size boundary.
1194* `__COHERENT_RAM_UNALIGNED_SIZE__`
1195
1196#### BL1's linker symbols
1197
1198BL1's early setup code needs to know the extents of the .data section to
1199relocate it from ROM to RAM before executing any C code. The following linker
1200symbols are defined for this purpose:
1201
1202* `__DATA_ROM_START__` This address must be aligned on a 16-byte boundary.
1203* `__DATA_RAM_START__` This address must be aligned on a 16-byte boundary.
1204* `__DATA_SIZE__`
1205
1206BL1's platform setup code needs to know the extents of its read-write data
1207region to figure out its memory layout. The following linker symbols are defined
1208for this purpose:
1209
1210* `__BL1_RAM_START__` This is the start address of BL1 RW data.
1211* `__BL1_RAM_END__` This is the end address of BL1 RW data.
1212
1213#### BL2's, BL3-1's and TSP's linker symbols
1214
1215BL2, BL3-1 and TSP need to know the extents of their read-only section to set
1216the right memory attributes for this memory region in their MMU setup code. The
1217following linker symbols are defined for this purpose:
1218
1219* `__RO_START__`
1220* `__RO_END__`
1221
1222### How to choose the right base addresses for each bootloader stage image
1223
1224There is currently no support for dynamic image loading in the Trusted Firmware.
1225This means that all bootloader images need to be linked against their ultimate
1226runtime locations and the base addresses of each image must be chosen carefully
1227such that images don't overlap each other in an undesired way. As the code
1228grows, the base addresses might need adjustments to cope with the new memory
1229layout.
1230
1231The memory layout is completely specific to the platform and so there is no
1232general recipe for choosing the right base addresses for each bootloader image.
1233However, there are tools to aid in understanding the memory layout. These are
1234the link map files: `build/<platform>/<build-type>/bl<x>/bl<x>.map`, with `<x>`
1235being the stage bootloader. They provide a detailed view of the memory usage of
1236each image. Among other useful information, they provide the end address of
1237each image.
1238
1239* `bl1.map` link map file provides `__BL1_RAM_END__` address.
1240* `bl2.map` link map file provides `__BL2_END__` address.
1241* `bl31.map` link map file provides `__BL31_END__` address.
1242* `bl32.map` link map file provides `__BL32_END__` address.
1243
1244For each bootloader image, the platform code must provide its start address
1245as well as a limit address that it must not overstep. The latter is used in the
1246linker scripts to check that the image doesn't grow past that address. If that
1247happens, the linker will issue a message similar to the following:
1248
1249    aarch64-none-elf-ld: BLx has exceeded its limit.
1250
1251Additionally, if the platform memory layout implies some image overlaying like
1252on FVP, BL3-1 and TSP need to know the limit address that their PROGBITS
1253sections must not overstep. The platform code must provide those.
1254
1255
1256####  Memory layout on ARM FVPs
1257
1258The following list describes the memory layout on the FVP:
1259
1260*   A 4KB page of shared memory is used to store the entrypoint mailboxes
1261    and the parameters passed between bootloaders. The shared memory is located
1262    at the base of the Trusted SRAM. The amount of Trusted SRAM available to
1263    load the bootloader images will be reduced by the size of the shared memory.
1264
1265*   BL1 is originally sitting in the Trusted ROM at address `0x0`. Its
1266    read-write data are relocated at the top of the Trusted SRAM at runtime.
1267
1268*   BL3-1 is loaded at the top of the Trusted SRAM, such that its NOBITS
1269    sections will overwrite BL1 R/W data.
1270
1271*   BL2 is loaded below BL3-1.
1272
1273*   BL3-2 can be loaded in one of the following locations:
1274
1275    *   Trusted SRAM
1276    *   Trusted DRAM
1277    *   Secure region of DRAM (top 16MB of DRAM configured by the TrustZone
1278        controller)
1279
1280When BL3-2 is loaded into Trusted SRAM, its NOBITS sections are allowed to
1281overlay BL2. This memory layout is designed to give the BL3-2 image as much
1282memory as possible when it is loaded into Trusted SRAM.
1283
1284The location of the BL3-2 image will result in different memory maps. This is
1285illustrated in the following diagrams using the TSP as an example.
1286
1287**TSP in Trusted SRAM (default option):**
1288
1289               Trusted SRAM
1290    0x04040000 +----------+  loaded by BL2  ------------------
1291               | BL1 (rw) |  <<<<<<<<<<<<<  |  BL3-1 NOBITS  |
1292               |----------|  <<<<<<<<<<<<<  |----------------|
1293               |          |  <<<<<<<<<<<<<  | BL3-1 PROGBITS |
1294               |----------|                 ------------------
1295               |   BL2    |  <<<<<<<<<<<<<  |  BL3-2 NOBITS  |
1296               |----------|  <<<<<<<<<<<<<  |----------------|
1297               |          |  <<<<<<<<<<<<<  | BL3-2 PROGBITS |
1298    0x04001000 +----------+                 ------------------
1299               |  Shared  |
1300    0x04000000 +----------+
1301
1302               Trusted ROM
1303    0x04000000 +----------+
1304               | BL1 (ro) |
1305    0x00000000 +----------+
1306
1307
1308**TSP in Trusted DRAM:**
1309
1310               Trusted DRAM
1311    0x08000000 +----------+
1312               |  BL3-2   |
1313    0x06000000 +----------+
1314
1315               Trusted SRAM
1316    0x04040000 +----------+  loaded by BL2  ------------------
1317               | BL1 (rw) |  <<<<<<<<<<<<<  |  BL3-1 NOBITS  |
1318               |----------|  <<<<<<<<<<<<<  |----------------|
1319               |          |  <<<<<<<<<<<<<  | BL3-1 PROGBITS |
1320               |----------|                 ------------------
1321               |   BL2    |
1322               |----------|
1323               |          |
1324    0x04001000 +----------+
1325               |  Shared  |
1326    0x04000000 +----------+
1327
1328               Trusted ROM
1329    0x04000000 +----------+
1330               | BL1 (ro) |
1331    0x00000000 +----------+
1332
1333**TSP in the TZC-Secured DRAM:**
1334
1335                   DRAM
1336    0xffffffff +----------+
1337               |  BL3-2   |  (secure)
1338    0xff000000 +----------+
1339               |          |
1340               :          :  (non-secure)
1341               |          |
1342    0x80000000 +----------+
1343
1344               Trusted SRAM
1345    0x04040000 +----------+  loaded by BL2  ------------------
1346               | BL1 (rw) |  <<<<<<<<<<<<<  |  BL3-1 NOBITS  |
1347               |----------|  <<<<<<<<<<<<<  |----------------|
1348               |          |  <<<<<<<<<<<<<  | BL3-1 PROGBITS |
1349               |----------|                 ------------------
1350               |   BL2    |
1351               |----------|
1352               |          |
1353    0x04001000 +----------+
1354               |  Shared  |
1355    0x04000000 +----------+
1356
1357               Trusted ROM
1358    0x04000000 +----------+
1359               | BL1 (ro) |
1360    0x00000000 +----------+
1361
1362Moving the TSP image out of the Trusted SRAM doesn't change the memory layout
1363of the other boot loader images in Trusted SRAM.
1364
1365
1366####  Memory layout on Juno ARM development platform
1367
1368The following list describes the memory layout on Juno:
1369
1370*   Trusted SRAM at 0x04000000 contains the MHU page, BL1 r/w section, BL2
1371    image, BL3-1 image and, optionally, the BL3-2 image.
1372
1373*   The MHU 4 KB page is used as communication channel between SCP and AP. It
1374    also contains the entrypoint mailboxes for the AP. Mailboxes are stored in
1375    the first 128 bytes of the MHU page.
1376
1377*   BL1 resides in flash memory at address `0x0BEC0000`. Its read-write data
1378    section is relocated to the top of the Trusted SRAM at runtime.
1379
1380*   BL3-1 is loaded at the top of the Trusted SRAM, such that its NOBITS
1381    sections will overwrite BL1 R/W data. This implies that BL1 global variables
1382    will remain valid only until execution reaches the BL3-1 entry point during
1383    a cold boot.
1384
1385*   BL2 is loaded below BL3-1.
1386
1387*   BL3-0 is loaded temporarily into the BL3-1 memory region and transfered to
1388    the SCP before being overwritten by BL3-1.
1389
1390*   The BL3-2 image is optional and can be loaded into one of these two
1391    locations: Trusted SRAM (right after the MHU page) or DRAM (14 MB starting
1392    at 0xFF000000 and secured by the TrustZone controller). When loaded into
1393    Trusted SRAM, its NOBITS sections are allowed to overlap BL2.
1394
1395Depending on the location of the BL3-2 image, it will result in different memory
1396maps, illustrated by the following diagrams.
1397
1398**BL3-2 in Trusted SRAM (default option):**
1399
1400                  Flash0
1401    0x0C000000 +----------+
1402               :          :
1403    0x0BED0000 |----------|
1404               | BL1 (ro) |
1405    0x0BEC0000 |----------|
1406               :          :
1407    0x08000000 +----------+                  BL3-1 is loaded
1408                                             after BL3-0 has
1409               Trusted SRAM                  been sent to SCP
1410    0x04040000 +----------+  loaded by BL2  ------------------
1411               | BL1 (rw) |  <<<<<<<<<<<<<  |  BL3-1 NOBITS  |
1412               |----------|  <<<<<<<<<<<<<  |----------------|
1413               |  BL3-0   |  <<<<<<<<<<<<<  | BL3-1 PROGBITS |
1414               |----------|                 ------------------
1415               |   BL2    |  <<<<<<<<<<<<<  |  BL3-2 NOBITS  |
1416               |----------|  <<<<<<<<<<<<<  |----------------|
1417               |          |  <<<<<<<<<<<<<  | BL3-2 PROGBITS |
1418    0x04001000 +----------+                 ------------------
1419               |   MHU    |
1420    0x04000000 +----------+
1421
1422
1423**BL3-2 in the secure region of DRAM:**
1424
1425                   DRAM
1426    0xFFE00000 +----------+
1427               |  BL3-2   |  (secure)
1428    0xFF000000 |----------|
1429               |          |
1430               :          :  (non-secure)
1431               |          |
1432    0x80000000 +----------+
1433
1434                  Flash0
1435    0x0C000000 +----------+
1436               :          :
1437    0x0BED0000 |----------|
1438               | BL1 (ro) |
1439    0x0BEC0000 |----------|
1440               :          :
1441    0x08000000 +----------+                  BL3-1 is loaded
1442                                             after BL3-0 has
1443               Trusted SRAM                  been sent to SCP
1444    0x04040000 +----------+  loaded by BL2  ------------------
1445               | BL1 (rw) |  <<<<<<<<<<<<<  |  BL3-1 NOBITS  |
1446               |----------|  <<<<<<<<<<<<<  |----------------|
1447               |  BL3-0   |  <<<<<<<<<<<<<  | BL3-1 PROGBITS |
1448               |----------|                 ------------------
1449               |   BL2    |
1450               |----------|
1451               |          |
1452    0x04001000 +----------+
1453               |   MHU    |
1454    0x04000000 +----------+
1455
1456Loading the BL3-2 image in DRAM doesn't change the memory layout of the other
1457images in Trusted SRAM.
1458
1459
146010.  Firmware Image Package (FIP)
1461---------------------------------
1462
1463Using a Firmware Image Package (FIP) allows for packing bootloader images (and
1464potentially other payloads) into a single archive that can be loaded by the ARM
1465Trusted Firmware from non-volatile platform storage. A driver to load images
1466from a FIP has been added to the storage layer and allows a package to be read
1467from supported platform storage. A tool to create Firmware Image Packages is
1468also provided and described below.
1469
1470### Firmware Image Package layout
1471
1472The FIP layout consists of a table of contents (ToC) followed by payload data.
1473The ToC itself has a header followed by one or more table entries. The ToC is
1474terminated by an end marker entry. All ToC entries describe some payload data
1475that has been appended to the end of the binary package. With the information
1476provided in the ToC entry the corresponding payload data can be retrieved.
1477
1478    ------------------
1479    | ToC Header     |
1480    |----------------|
1481    | ToC Entry 0    |
1482    |----------------|
1483    | ToC Entry 1    |
1484    |----------------|
1485    | ToC End Marker |
1486    |----------------|
1487    |                |
1488    |     Data 0     |
1489    |                |
1490    |----------------|
1491    |                |
1492    |     Data 1     |
1493    |                |
1494    ------------------
1495
1496The ToC header and entry formats are described in the header file
1497`include/firmware_image_package.h`. This file is used by both the tool and the
1498ARM Trusted firmware.
1499
1500The ToC header has the following fields:
1501    `name`: The name of the ToC. This is currently used to validate the header.
1502    `serial_number`: A non-zero number provided by the creation tool
1503    `flags`: Flags associated with this data. None are yet defined.
1504
1505A ToC entry has the following fields:
1506    `uuid`: All files are referred to by a pre-defined Universally Unique
1507        IDentifier [UUID] . The UUIDs are defined in
1508        `include/firmware_image_package`. The platform translates the requested
1509        image name into the corresponding UUID when accessing the package.
1510    `offset_address`: The offset address at which the corresponding payload data
1511        can be found. The offset is calculated from the ToC base address.
1512    `size`: The size of the corresponding payload data in bytes.
1513    `flags`: Flags associated with this entry. Non are yet defined.
1514
1515### Firmware Image Package creation tool
1516
1517The FIP creation tool can be used to pack specified images into a binary package
1518that can be loaded by the ARM Trusted Firmware from platform storage. The tool
1519currently only supports packing bootloader images. Additional image definitions
1520can be added to the tool as required.
1521
1522The tool can be found in `tools/fip_create`.
1523
1524### Loading from a Firmware Image Package (FIP)
1525
1526The Firmware Image Package (FIP) driver can load images from a binary package on
1527non-volatile platform storage. For the FVPs this is currently NOR FLASH.
1528
1529Bootloader images are loaded according to the platform policy as specified in
1530`plat/<platform>/plat_io_storage.c`. For the FVPs this means the platform will
1531attempt to load images from a Firmware Image Package located at the start of NOR
1532FLASH0.
1533
1534Currently the FVP's policy only allows loading of a known set of images. The
1535platform policy can be modified to allow additional images.
1536
1537
153811. Use of coherent memory in Trusted Firmware
1539----------------------------------------------
1540
1541There might be loss of coherency when physical memory with mismatched
1542shareability, cacheability and memory attributes is accessed by multiple CPUs
1543(refer to section B2.9 of [ARM ARM] for more details). This possibility occurs
1544in Trusted Firmware during power up/down sequences when coherency, MMU and
1545caches are turned on/off incrementally.
1546
1547Trusted Firmware defines coherent memory as a region of memory with Device
1548nGnRE attributes in the translation tables. The translation granule size in
1549Trusted Firmware is 4KB. This is the smallest possible size of the coherent
1550memory region.
1551
1552By default, all data structures which are susceptible to accesses with
1553mismatched attributes from various CPUs are allocated in a coherent memory
1554region (refer to section 2.1 of [Porting Guide]). The coherent memory region
1555accesses are Outer Shareable, non-cacheable and they can be accessed
1556with the Device nGnRE attributes when the MMU is turned on. Hence, at the
1557expense of at least an extra page of memory, Trusted Firmware is able to work
1558around coherency issues due to mismatched memory attributes.
1559
1560The alternative to the above approach is to allocate the susceptible data
1561structures in Normal WriteBack WriteAllocate Inner shareable memory. This
1562approach requires the data structures to be designed so that it is possible to
1563work around the issue of mismatched memory attributes by performing software
1564cache maintenance on them.
1565
1566### Disabling the use of coherent memory in Trusted Firmware
1567
1568It might be desirable to avoid the cost of allocating coherent memory on
1569platforms which are memory constrained. Trusted Firmware enables inclusion of
1570coherent memory in firmware images through the build flag `USE_COHERENT_MEM`.
1571This flag is enabled by default. It can be disabled to choose the second
1572approach described above.
1573
1574The below sections analyze the data structures allocated in the coherent memory
1575region and the changes required to allocate them in normal memory.
1576
1577### PSCI Affinity map nodes
1578
1579The `psci_aff_map` data structure stores the hierarchial node information for
1580each affinity level in the system including the PSCI states associated with them.
1581By default, this data structure is allocated in the coherent memory region in
1582the Trusted Firmware because it can be accessed by multiple CPUs, either with
1583their caches enabled or disabled.
1584
1585	typedef struct aff_map_node {
1586		unsigned long mpidr;
1587		unsigned char ref_count;
1588		unsigned char state;
1589		unsigned char level;
1590	#if USE_COHERENT_MEM
1591		bakery_lock_t lock;
1592	#else
1593		unsigned char aff_map_index;
1594	#endif
1595	} aff_map_node_t;
1596
1597In order to move this data structure to normal memory, the use of each of its
1598fields must be analyzed. Fields like `mpidr` and `level` are only written once
1599during cold boot. Hence removing them from coherent memory involves only doing
1600a clean and invalidate of the cache lines after these fields are written.
1601
1602The fields `state` and `ref_count` can be concurrently accessed by multiple
1603CPUs in different cache states. A Lamport's Bakery lock is used to ensure mutual
1604exlusion to these fields. As a result, it is possible to move these fields out
1605of coherent memory by performing software cache maintenance on them. The field
1606`lock` is the bakery lock data structure when `USE_COHERENT_MEM` is enabled.
1607The `aff_map_index` is used to identify the bakery lock when `USE_COHERENT_MEM`
1608is disabled.
1609
1610### Bakery lock data
1611
1612The bakery lock data structure `bakery_lock_t` is allocated in coherent memory
1613and is accessed by multiple CPUs with mismatched attributes. `bakery_lock_t` is
1614defined as follows:
1615
1616    typedef struct bakery_lock {
1617        int owner;
1618        volatile char entering[BAKERY_LOCK_MAX_CPUS];
1619        volatile unsigned number[BAKERY_LOCK_MAX_CPUS];
1620    } bakery_lock_t;
1621
1622It is a characteristic of Lamport's Bakery algorithm that the volatile per-CPU
1623fields can be read by all CPUs but only written to by the owning CPU.
1624
1625Depending upon the data cache line size, the per-CPU fields of the
1626`bakery_lock_t` structure for multiple CPUs may exist on a single cache line.
1627These per-CPU fields can be read and written during lock contention by multiple
1628CPUs with mismatched memory attributes. Since these fields are a part of the
1629lock implementation, they do not have access to any other locking primitive to
1630safeguard against the resulting coherency issues. As a result, simple software
1631cache maintenance is not enough to allocate them in coherent memory. Consider
1632the following example.
1633
1634CPU0 updates its per-CPU field with data cache enabled. This write updates a
1635local cache line which contains a copy of the fields for other CPUs as well. Now
1636CPU1 updates its per-CPU field of the `bakery_lock_t` structure with data cache
1637disabled. CPU1 then issues a DCIVAC operation to invalidate any stale copies of
1638its field in any other cache line in the system. This operation will invalidate
1639the update made by CPU0 as well.
1640
1641To use bakery locks when `USE_COHERENT_MEM` is disabled, the lock data structure
1642has been redesigned. The changes utilise the characteristic of Lamport's Bakery
1643algorithm mentioned earlier. The per-CPU fields of the new lock structure are
1644aligned such that they are allocated on separate cache lines. The per-CPU data
1645framework in Trusted Firmware is used to achieve this. This enables software to
1646perform software cache maintenance on the lock data structure without running
1647into coherency issues associated with mismatched attributes.
1648
1649The per-CPU data framework enables consolidation of data structures on the
1650fewest cache lines possible. This saves memory as compared to the scenario where
1651each data structure is separately aligned to the cache line boundary to achieve
1652the same effect.
1653
1654The bakery lock data structure `bakery_info_t` is defined for use when
1655`USE_COHERENT_MEM` is disabled as follows:
1656
1657    typedef struct bakery_info {
1658        /*
1659         * The lock_data is a bit-field of 2 members:
1660         * Bit[0]       : choosing. This field is set when the CPU is
1661         *                choosing its bakery number.
1662         * Bits[1 - 15] : number. This is the bakery number allocated.
1663         */
1664         volatile uint16_t lock_data;
1665    } bakery_info_t;
1666
1667The `bakery_info_t` represents a single per-CPU field of one lock and
1668the combination of corresponding `bakery_info_t` structures for all CPUs in the
1669system represents the complete bakery lock. It is embedded in the per-CPU
1670data framework `cpu_data` as shown below:
1671
1672      CPU0 cpu_data
1673    ------------------
1674    | ....           |
1675    |----------------|
1676    | `bakery_info_t`| <-- Lock_0 per-CPU field
1677    |    Lock_0      |     for CPU0
1678    |----------------|
1679    | `bakery_info_t`| <-- Lock_1 per-CPU field
1680    |    Lock_1      |     for CPU0
1681    |----------------|
1682    | ....           |
1683    |----------------|
1684    | `bakery_info_t`| <-- Lock_N per-CPU field
1685    |    Lock_N      |     for CPU0
1686    ------------------
1687
1688
1689      CPU1 cpu_data
1690    ------------------
1691    | ....           |
1692    |----------------|
1693    | `bakery_info_t`| <-- Lock_0 per-CPU field
1694    |    Lock_0      |     for CPU1
1695    |----------------|
1696    | `bakery_info_t`| <-- Lock_1 per-CPU field
1697    |    Lock_1      |     for CPU1
1698    |----------------|
1699    | ....           |
1700    |----------------|
1701    | `bakery_info_t`| <-- Lock_N per-CPU field
1702    |    Lock_N      |     for CPU1
1703    ------------------
1704
1705Consider a system of 2 CPUs with 'N' bakery locks as shown above.  For an
1706operation on Lock_N, the corresponding `bakery_info_t` in both CPU0 and CPU1
1707`cpu_data` need to be fetched and appropriate cache operations need to be
1708performed for each access.
1709
1710For multiple bakery locks, an array of `bakery_info_t` is declared in `cpu_data`
1711and each lock is given an `id` to identify it in the array.
1712
1713### Non Functional Impact of removing coherent memory
1714
1715Removal of the coherent memory region leads to the additional software overhead
1716of performing cache maintenance for the affected data structures. However, since
1717the memory where the data structures are allocated is cacheable, the overhead is
1718mostly mitigated by an increase in performance.
1719
1720There is however a performance impact for bakery locks, due to:
1721*   Additional cache maintenance operations, and
1722*   Multiple cache line reads for each lock operation, since the bakery locks
1723    for each CPU are distributed across different cache lines.
1724
1725The implementation has been optimized to mimimize this additional overhead.
1726Measurements indicate that when bakery locks are allocated in Normal memory, the
1727minimum latency of acquiring a lock is on an average 3-4 micro seconds whereas
1728in Device memory the same is 2 micro seconds. The measurements were done on the
1729Juno ARM development platform.
1730
1731As mentioned earlier, almost a page of memory can be saved by disabling
1732`USE_COHERENT_MEM`. Each platform needs to consider these trade-offs to decide
1733whether coherent memory should be used. If a platform disables
1734`USE_COHERENT_MEM` and needs to use bakery locks in the porting layer, it should
1735reserve memory in `cpu_data` by defining the macro `PLAT_PCPU_DATA_SIZE` (see
1736the [Porting Guide]). Refer to the reference platform code for examples.
1737
1738
173912.  Code Structure
1740-------------------
1741
1742Trusted Firmware code is logically divided between the three boot loader
1743stages mentioned in the previous sections. The code is also divided into the
1744following categories (present as directories in the source code):
1745
1746*   **Architecture specific.** This could be AArch32 or AArch64.
1747*   **Platform specific.** Choice of architecture specific code depends upon
1748    the platform.
1749*   **Common code.** This is platform and architecture agnostic code.
1750*   **Library code.** This code comprises of functionality commonly used by all
1751    other code.
1752*   **Stage specific.** Code specific to a boot stage.
1753*   **Drivers.**
1754*   **Services.** EL3 runtime services, e.g. PSCI or SPD. Specific SPD services
1755    reside in the `services/spd` directory (e.g. `services/spd/tspd`).
1756
1757Each boot loader stage uses code from one or more of the above mentioned
1758categories. Based upon the above, the code layout looks like this:
1759
1760    Directory    Used by BL1?    Used by BL2?    Used by BL3-1?
1761    bl1          Yes             No              No
1762    bl2          No              Yes             No
1763    bl31         No              No              Yes
1764    arch         Yes             Yes             Yes
1765    plat         Yes             Yes             Yes
1766    drivers      Yes             No              Yes
1767    common       Yes             Yes             Yes
1768    lib          Yes             Yes             Yes
1769    services     No              No              Yes
1770
1771The build system provides a non configurable build option IMAGE_BLx for each
1772boot loader stage (where x = BL stage). e.g. for BL1 , IMAGE_BL1 will be
1773defined by the build system. This enables the Trusted Firmware to compile
1774certain code only for specific boot loader stages
1775
1776All assembler files have the `.S` extension. The linker source files for each
1777boot stage have the extension `.ld.S`. These are processed by GCC to create the
1778linker scripts which have the extension `.ld`.
1779
1780FDTs provide a description of the hardware platform and are used by the Linux
1781kernel at boot time. These can be found in the `fdts` directory.
1782
1783
178413.  References
1785---------------
1786
17871.  Trusted Board Boot Requirements CLIENT PDD (ARM DEN 0006B-5). Available
1788    under NDA through your ARM account representative.
1789
17902.  [Power State Coordination Interface PDD (ARM DEN 0022B.b)][PSCI].
1791
17923.  [SMC Calling Convention PDD (ARM DEN 0028A)][SMCCC].
1793
17944.  [ARM Trusted Firmware Interrupt Management Design guide][INTRG].
1795
1796- - - - - - - - - - - - - - - - - - - - - - - - - -
1797
1798_Copyright (c) 2013-2014, ARM Limited and Contributors. All rights reserved._
1799
1800[ARM ARM]:          http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.e/index.html "ARMv8-A Reference Manual (ARM DDI0487A.E)"
1801[PSCI]:             http://infocenter.arm.com/help/topic/com.arm.doc.den0022c/DEN0022C_Power_State_Coordination_Interface.pdf "Power State Coordination Interface PDD (ARM DEN 0022C)"
1802[SMCCC]:            http://infocenter.arm.com/help/topic/com.arm.doc.den0028a/index.html "SMC Calling Convention PDD (ARM DEN 0028A)"
1803[UUID]:             https://tools.ietf.org/rfc/rfc4122.txt "A Universally Unique IDentifier (UUID) URN Namespace"
1804[User Guide]:       ./user-guide.md
1805[Porting Guide]:    ./porting-guide.md
1806[INTRG]:            ./interrupt-framework-design.md
1807[CPUBM]:            ./cpu-specific-build-macros.md.md
1808