1============ 2Region Store 3============ 4The analyzer "Store" represents the contents of memory regions. It is an opaque 5functional data structure stored in each ``ProgramState``; the only class that 6can modify the store is its associated StoreManager. 7 8Currently (Feb. 2013), the only StoreManager implementation being used is 9``RegionStoreManager``. This store records bindings to memory regions using a 10"base region + offset" key. (This allows ``*p`` and ``p[0]`` to map to the same 11location, among other benefits.) 12 13Regions are grouped into "clusters", which roughly correspond to "regions with 14the same base region". This allows certain operations to be more efficient, 15such as invalidation. 16 17Regions that do not have a known offset use a special "symbolic" offset. These 18keys store both the original region, and the "concrete offset region" -- the 19last region whose offset is entirely concrete. (For example, in the expression 20``foo.bar[1][i].baz``, the concrete offset region is the array ``foo.bar[1]``, 21since that has a known offset from the start of the top-level ``foo`` struct.) 22 23 24Binding Invalidation 25-------------------- 26 27Supporting both concrete and symbolic offsets makes things a bit tricky. Here's 28an example: 29 30.. code-block:: cpp 31 32 foo[0] = 0; 33 foo[1] = 1; 34 foo[i] = i; 35 36After the third assignment, nothing can be said about the value of ``foo[0]``, 37because ``foo[i]`` may have overwritten it! Thus, *binding to a region with a 38symbolic offset invalidates the entire concrete offset region.* We know 39``foo[i]`` is somewhere within ``foo``, so we don't have to invalidate 40anything else, but we do have to be conservative about all other bindings within 41``foo``. 42 43Continuing the example: 44 45.. code-block:: cpp 46 47 foo[i] = i; 48 foo[0] = 0; 49 50After this latest assignment, nothing can be said about the value of ``foo[i]``, 51because ``foo[0]`` may have overwritten it! *Binding to a region R with a 52concrete offset invalidates any symbolic offset bindings whose concrete offset 53region is a super-region **or** sub-region of R.* All we know about ``foo[i]`` 54is that it is somewhere within ``foo``, so changing *anything* within ``foo`` 55might change ``foo[i]``, and changing *all* of ``foo`` (or its base region) will 56*definitely* change ``foo[i]``. 57 58This logic could be improved by using the current constraints on ``i``, at the 59cost of speed. The latter case could also be improved by matching region kinds, 60i.e. changing ``foo[0].a`` is unlikely to affect ``foo[i].b``, no matter what 61``i`` is. 62 63For more detail, read through ``RegionStoreManager::removeSubRegionBindings`` in 64RegionStore.cpp. 65 66 67ObjCIvarRegions 68--------------- 69 70Objective-C instance variables require a bit of special handling. Like struct 71fields, they are not base regions, and when their parent object region is 72invalidated, all the instance variables must be invalidated as well. However, 73they have no concrete compile-time offsets (in the modern, "non-fragile" 74runtime), and so cannot easily be represented as an offset from the start of 75the object in the analyzer. Moreover, this means that invalidating a single 76instance variable should *not* invalidate the rest of the object, since unlike 77struct fields or array elements there is no way to perform pointer arithmetic 78to access another instance variable. 79 80Consequently, although the base region of an ObjCIvarRegion is the entire 81object, RegionStore offsets are computed from the start of the instance 82variable. Thus it is not valid to assume that all bindings with non-symbolic 83offsets start from the base region! 84 85 86Region Invalidation 87------------------- 88 89Unlike binding invalidation, region invalidation occurs when the entire 90contents of a region may have changed---say, because it has been passed to a 91function the analyzer can model, like memcpy, or because its address has 92escaped, usually as an argument to an opaque function call. In these cases we 93need to throw away not just all bindings within the region itself, but within 94its entire cluster, since neighboring regions may be accessed via pointer 95arithmetic. 96 97Region invalidation typically does even more than this, however. Because it 98usually represents the complete escape of a region from the analyzer's model, 99its *contents* must also be transitively invalidated. (For example, if a region 100``p`` of type ``int **`` is invalidated, the contents of ``*p`` and ``**p`` may 101have changed as well.) The algorithm that traverses this transitive closure of 102accessible regions is known as ClusterAnalysis, and is also used for finding 103all live bindings in the store (in order to throw away the dead ones). The name 104"ClusterAnalysis" predates the cluster-based organization of bindings, but 105refers to the same concept: during invalidation and liveness analysis, all 106bindings within a cluster must be treated in the same way for a conservative 107model of program behavior. 108 109 110Default Bindings 111---------------- 112 113Most bindings in RegionStore are simple scalar values -- integers and pointers. 114These are known as "Direct" bindings. However, RegionStore supports a second 115type of binding called a "Default" binding. These are used to provide values to 116all the elements of an aggregate type (struct or array) without having to 117explicitly specify a binding for each individual element. 118 119When there is no Direct binding for a particular region, the store manager 120looks at each super-region in turn to see if there is a Default binding. If so, 121this value is used as the value of the original region. The search ends when 122the base region is reached, at which point the RegionStore will pick an 123appropriate default value for the region (usually a symbolic value, but 124sometimes zero, for static data, or "uninitialized", for stack variables). 125 126.. code-block:: cpp 127 128 int manyInts[10]; 129 manyInts[1] = 42; // Creates a Direct binding for manyInts[1]. 130 print(manyInts[1]); // Retrieves the Direct binding for manyInts[1]; 131 print(manyInts[0]); // There is no Direct binding for manyInts[0]. 132 // Is there a Default binding for the entire array? 133 // There is not, but it is a stack variable, so we use 134 // "uninitialized" as the default value (and emit a 135 // diagnostic!). 136 137NOTE: The fact that bindings are stored as a base region plus an offset limits 138the Default Binding strategy, because in C aggregates can contain other 139aggregates. In the current implementation of RegionStore, there is no way to 140distinguish a Default binding for an entire aggregate from a Default binding 141for the sub-aggregate at offset 0. 142 143 144Lazy Bindings (LazyCompoundVal) 145------------------------------- 146 147RegionStore implements an optimization for copying aggregates (structs and 148arrays) called "lazy bindings", implemented using a special SVal called 149LazyCompoundVal. When the store is asked for the "binding" for an entire 150aggregate (i.e. for an lvalue-to-rvalue conversion), it returns a 151LazyCompoundVal instead. When this value is then stored into a variable, it is 152bound as a Default value. This makes copying arrays and structs much cheaper 153than if they had required memberwise access. 154 155Under the hood, a LazyCompoundVal is implemented as a uniqued pair of (region, 156store), representing "the value of the region during this 'snapshot' of the 157store". This has important implications for any sort of liveness or 158reachability analysis, which must take the bindings in the old store into 159account. 160 161Retrieving a value from a lazy binding happens in the same way as any other 162Default binding: since there is no direct binding, the store manager falls back 163to super-regions to look for an appropriate default binding. LazyCompoundVal 164differs from a normal default binding, however, in that it contains several 165different values, instead of one value that will appear several times. Because 166of this, the store manager has to reconstruct the subregion chain on top of the 167LazyCompoundVal region, and look up *that* region in the previous store. 168 169Here's a concrete example: 170 171.. code-block:: cpp 172 173 CGPoint p; 174 p.x = 42; // A Direct binding is made to the FieldRegion 'p.x'. 175 CGPoint p2 = p; // A LazyCompoundVal is created for 'p', along with a 176 // snapshot of the current store state. This value is then 177 // used as a Default binding for the VarRegion 'p2'. 178 return p2.x; // The binding for FieldRegion 'p2.x' is requested. 179 // There is no Direct binding, so we look for a Default 180 // binding to 'p2' and find the LCV. 181 // Because it's a LCV, we look at our requested region 182 // and see that it's the '.x' field. We ask for the value 183 // of 'p.x' within the snapshot, and get back 42. 184