Name |
Date |
Size |
#Lines |
LOC |
||
---|---|---|---|---|---|---|
.. | - | - | ||||
model/ | 23-Nov-2023 | - | 84 | 84 | ||
CompletionModel.cmake | D | 23-Nov-2023 | 1.4 KiB | 38 | 33 | |
CompletionModelCodegen.py | D | 23-Nov-2023 | 10.2 KiB | 316 | 238 | |
README.md | D | 23-Nov-2023 | 5.7 KiB | 220 | 193 |
README.md
1# Decision Forest Code Completion Model 2 3## Decision Forest 4A **decision forest** is a collection of many decision trees. A **decision tree** is a full binary tree that provides a quality prediction for an input (code completion item). Internal nodes represent a **binary decision** based on the input data, and leaf nodes represent a prediction. 5 6In order to predict the relevance of a code completion item, we traverse each of the decision trees beginning with their roots until we reach a leaf. 7 8An input (code completion candidate) is characterized as a set of **features**, such as the *type of symbol* or the *number of existing references*. 9 10At every non-leaf node, we evaluate the condition to decide whether to go left or right. The condition compares one *feature** of the input against a constant. The condition can be of two types: 11- **if_greater**: Checks whether a numerical feature is **>=** a **threshold**. 12- **if_member**: Check whether the **enum** feature is contained in the **set** defined in the node. 13 14A leaf node contains the value **score**. 15To compute an overall **quality** score, we traverse each tree in this way and add up the scores. 16 17## Model Input Format 18The input model is represented in json format. 19 20### Features 21The file **features.json** defines the features available to the model. 22It is a json list of features. The features can be of following two kinds. 23 24#### Number 25``` 26{ 27 "name": "a_numerical_feature", 28 "kind": "NUMBER" 29} 30``` 31#### Enum 32``` 33{ 34 "name": "an_enum_feature", 35 "kind": "ENUM", 36 "enum": "fully::qualified::enum", 37 "header": "path/to/HeaderDeclaringEnum.h" 38} 39``` 40The field `enum` specifies the fully qualified name of the enum. 41The maximum cardinality of the enum can be **32**. 42 43The field `header` specifies the header containing the declaration of the enum. 44This header is included by the inference runtime. 45 46 47### Decision Forest 48The file `forest.json` defines the decision forest. It is a json list of **DecisionTree**. 49 50**DecisionTree** is one of **IfGreaterNode**, **IfMemberNode**, **LeafNode**. 51#### IfGreaterNode 52``` 53{ 54 "operation": "if_greater", 55 "feature": "a_numerical_feature", 56 "threshold": A real number, 57 "then": {A DecisionTree}, 58 "else": {A DecisionTree} 59} 60``` 61#### IfMemberNode 62``` 63{ 64 "operation": "if_member", 65 "feature": "an_enum_feature", 66 "set": ["enum_value1", "enum_value2", ...], 67 "then": {A DecisionTree}, 68 "else": {A DecisionTree} 69} 70``` 71#### LeafNode 72``` 73{ 74 "operation": "boost", 75 "score": A real number 76} 77``` 78 79## Code Generator for Inference 80The implementation of inference runtime is split across: 81 82### Code generator 83The code generator `CompletionModelCodegen.py` takes input the `${model}` dir and generates the inference library: 84- `${output_dir}/{filename}.h` 85- `${output_dir}/{filename}.cpp` 86 87Invocation 88``` 89python3 CompletionModelCodegen.py \ 90 --model path/to/model/dir \ 91 --output_dir path/to/output/dir \ 92 --filename OutputFileName \ 93 --cpp_class clang::clangd::YourExampleClass 94``` 95### Build System 96`CompletionModel.cmake` provides `gen_decision_forest` method . 97Client intending to use the CompletionModel for inference can use this to trigger the code generator and generate the inference library. 98It can then use the generated API by including and depending on this library. 99 100### Generated API for inference 101The code generator defines the Example `class` inside relevant namespaces as specified in option `${cpp_class}`. 102 103Members of this generated class comprises of all the features mentioned in `features.json`. 104Thus this class can represent a code completion candidate that needs to be scored. 105 106The API also provides `float Evaluate(const MyClass&)` which can be used to score the completion candidate. 107 108 109## Example 110### model/features.json 111``` 112[ 113 { 114 "name": "ANumber", 115 "type": "NUMBER" 116 }, 117 { 118 "name": "AFloat", 119 "type": "NUMBER" 120 }, 121 { 122 "name": "ACategorical", 123 "type": "ENUM", 124 "enum": "ns1::ns2::TestEnum", 125 "header": "model/CategoricalFeature.h" 126 } 127] 128``` 129### model/forest.json 130``` 131[ 132 { 133 "operation": "if_greater", 134 "feature": "ANumber", 135 "threshold": 200.0, 136 "then": { 137 "operation": "if_greater", 138 "feature": "AFloat", 139 "threshold": -1, 140 "then": { 141 "operation": "boost", 142 "score": 10.0 143 }, 144 "else": { 145 "operation": "boost", 146 "score": -20.0 147 } 148 }, 149 "else": { 150 "operation": "if_member", 151 "feature": "ACategorical", 152 "set": [ 153 "A", 154 "C" 155 ], 156 "then": { 157 "operation": "boost", 158 "score": 3.0 159 }, 160 "else": { 161 "operation": "boost", 162 "score": -4.0 163 } 164 } 165 }, 166 { 167 "operation": "if_member", 168 "feature": "ACategorical", 169 "set": [ 170 "A", 171 "B" 172 ], 173 "then": { 174 "operation": "boost", 175 "score": 5.0 176 }, 177 "else": { 178 "operation": "boost", 179 "score": -6.0 180 } 181 } 182] 183``` 184### DecisionForestRuntime.h 185``` 186... 187namespace ns1 { 188namespace ns2 { 189namespace test { 190class Example { 191public: 192 void setANumber(float V) { ... } 193 void setAFloat(float V) { ... } 194 void setACategorical(unsigned V) { ... } 195 196private: 197 ... 198}; 199 200float Evaluate(const Example&); 201} // namespace test 202} // namespace ns2 203} // namespace ns1 204``` 205 206### CMake Invocation 207Inorder to use the inference runtime, one can use `gen_decision_forest` function 208described in `CompletionModel.cmake` which invokes `CodeCompletionCodegen.py` with the appropriate arguments. 209 210For example, the following invocation reads the model present in `path/to/model` and creates 211`${CMAKE_CURRENT_BINARY_DIR}/myfilename.h` and `${CMAKE_CURRENT_BINARY_DIR}/myfilename.cpp` 212describing a `class` named `MyClass` in namespace `fully::qualified`. 213 214 215 216``` 217gen_decision_forest(path/to/model 218 myfilename 219 ::fully::qualifed::MyClass) 220```