1# Eigen Tensors
2
3Tensors are multidimensional arrays of elements. Elements are typically scalars,
4but more complex types such as strings are also supported.
5
6[TOC]
7
8## Tensor Classes
9
10You can manipulate a tensor with one of the following classes.  They all are in
11the namespace ```::Eigen.```
12
13
14### Class Tensor<data_type, rank>
15
16This is the class to use to create a tensor and allocate memory for it.  The
17class is templatized with the tensor datatype, such as float or int, and the
18tensor rank.  The rank is the number of dimensions, for example rank 2 is a
19matrix.
20
21Tensors of this class are resizable.  For example, if you assign a tensor of a
22different size to a Tensor, that tensor is resized to match its new value.
23
24#### Constructor Tensor<data_type, rank>(size0, size1, ...)
25
26Constructor for a Tensor.  The constructor must be passed ```rank``` integers
27indicating the sizes of the instance along each of the the ```rank```
28dimensions.
29
30    // Create a tensor of rank 3 of sizes 2, 3, 4.  This tensor owns
31    // memory to hold 24 floating point values (24 = 2 x 3 x 4).
32    Tensor<float, 3> t_3d(2, 3, 4);
33
34    // Resize t_3d by assigning a tensor of different sizes, but same rank.
35    t_3d = Tensor<float, 3>(3, 4, 3);
36
37#### Constructor Tensor<data_type, rank>(size_array)
38
39Constructor where the sizes for the constructor are specified as an array of
40values instead of an explicitly list of parameters.  The array type to use is
41```Eigen::array<Eigen::Index>```.  The array can be constructed automatically
42from an initializer list.
43
44    // Create a tensor of strings of rank 2 with sizes 5, 7.
45    Tensor<string, 2> t_2d({5, 7});
46
47
48### Class TensorFixedSize<data_type, Sizes<size0, size1, ...>>
49
50Class to use for tensors of fixed size, where the size is known at compile
51time.  Fixed sized tensors can provide very fast computations because all their
52dimensions are known by the compiler.  FixedSize tensors are not resizable.
53
54If the total number of elements in a fixed size tensor is small enough the
55tensor data is held onto the stack and does not cause heap allocation and free.
56
57    // Create a 4 x 3 tensor of floats.
58    TensorFixedSize<float, Sizes<4, 3>> t_4x3;
59
60### Class TensorMap<Tensor<data_type, rank>>
61
62This is the class to use to create a tensor on top of memory allocated and
63owned by another part of your code.  It allows to view any piece of allocated
64memory as a Tensor.  Instances of this class do not own the memory where the
65data are stored.
66
67A TensorMap is not resizable because it does not own the memory where its data
68are stored.
69
70#### Constructor TensorMap<Tensor<data_type, rank>>(data, size0, size1, ...)
71
72Constructor for a Tensor.  The constructor must be passed a pointer to the
73storage for the data, and "rank" size attributes.  The storage has to be
74large enough to hold all the data.
75
76    // Map a tensor of ints on top of stack-allocated storage.
77    int storage[128];  // 2 x 4 x 2 x 8 = 128
78    TensorMap<int, 4> t_4d(storage, 2, 4, 2, 8);
79
80    // The same storage can be viewed as a different tensor.
81    // You can also pass the sizes as an array.
82    TensorMap<int, 2> t_2d(storage, 16, 8);
83
84    // You can also map fixed-size tensors.  Here we get a 1d view of
85    // the 2d fixed-size tensor.
86    Tensor<float, Sizes<4, 5>> t_4x3;
87    TensorMap<float, 1> t_12(t_4x3, 12);
88
89
90#### Class TensorRef
91
92See Assigning to a TensorRef below.
93
94## Accessing Tensor Elements
95
96#### <data_type> tensor(index0, index1...)
97
98Return the element at position ```(index0, index1...)``` in tensor
99```tensor```.  You must pass as many parameters as the rank of ```tensor```.
100The expression can be used as an l-value to set the value of the element at the
101specified position.  The value returned is of the datatype of the tensor.
102
103    // Set the value of the element at position (0, 1, 0);
104    Tensor<float, 3> t_3d(2, 3, 4);
105    t_3d(0, 1, 0) = 12.0f;
106
107    // Initialize all elements to random values.
108    for (int i = 0; i < 2; ++i) {
109      for (int j = 0; j < 3; ++j) {
110        for (int k = 0; k < 4; ++k) {
111          t_3d(i, j, k) = ...some random value...;
112        }
113      }
114    }
115
116    // Print elements of a tensor.
117    for (int i = 0; i < 2; ++i) {
118      LOG(INFO) << t_3d(i, 0, 0);
119    }
120
121
122## TensorLayout
123
124The tensor library supports 2 layouts: ```ColMajor``` (the default) and
125```RowMajor```.  Only the default column major layout is currently fully
126supported, and it is therefore not recommended to attempt to use the row major
127layout at the moment.
128
129The layout of a tensor is optionally specified as part of its type. If not
130specified explicitly column major is assumed.
131
132    Tensor<float, 3, ColMajor> col_major;  // equivalent to Tensor<float, 3>
133    TensorMap<Tensor<float, 3, RowMajor> > row_major(data, ...);
134
135All the arguments to an expression must use the same layout. Attempting to mix
136different layouts will result in a compilation error.
137
138It is possible to change the layout of a tensor or an expression using the
139```swap_layout()``` method.  Note that this will also reverse the order of the
140dimensions.
141
142    Tensor<float, 2, ColMajor> col_major(2, 4);
143    Tensor<float, 2, RowMajor> row_major(2, 4);
144
145    Tensor<float, 2> col_major_result = col_major;  // ok, layouts match
146    Tensor<float, 2> col_major_result = row_major;  // will not compile
147
148    // Simple layout swap
149    col_major_result = row_major.swap_layout();
150    eigen_assert(col_major_result.dimension(0) == 4);
151    eigen_assert(col_major_result.dimension(1) == 2);
152
153    // Swap the layout and preserve the order of the dimensions
154    array<int, 2> shuffle(1, 0);
155    col_major_result = row_major.swap_layout().shuffle(shuffle);
156    eigen_assert(col_major_result.dimension(0) == 2);
157    eigen_assert(col_major_result.dimension(1) == 4);
158
159
160## Tensor Operations
161
162The Eigen Tensor library provides a vast library of operations on Tensors:
163numerical operations such as addition and multiplication, geometry operations
164such as slicing and shuffling, etc.  These operations are available as methods
165of the Tensor classes, and in some cases as operator overloads.  For example
166the following code computes the elementwise addition of two tensors:
167
168    Tensor<float, 3> t1(2, 3, 4);
169    ...set some values in t1...
170    Tensor<float, 3> t2(2, 3, 4);
171    ...set some values in t2...
172    // Set t3 to the element wise sum of t1 and t2
173    Tensor<float, 3> t3 = t1 + t2;
174
175While the code above looks easy enough, it is important to understand that the
176expression ```t1 + t2``` is not actually adding the values of the tensors.  The
177expression instead constructs a "tensor operator" object of the class
178TensorCwiseBinaryOp<scalar_sum>, which has references to the tensors
179```t1``` and ```t2```.  This is a small C++ object that knows how to add
180```t1``` and ```t2```.  It is only when the value of the expression is assigned
181to the tensor ```t3``` that the addition is actually performed.  Technically,
182this happens through the overloading of ```operator=()``` in the Tensor class.
183
184This mechanism for computing tensor expressions allows for lazy evaluation and
185optimizations which are what make the tensor library very fast.
186
187Of course, the tensor operators do nest, and the expression ```t1 + t2 *
1880.3f``` is actually represented with the (approximate) tree of operators:
189
190    TensorCwiseBinaryOp<scalar_sum>(t1, TensorCwiseUnaryOp<scalar_mul>(t2, 0.3f))
191
192
193### Tensor Operations and C++ "auto"
194
195Because Tensor operations create tensor operators, the C++ ```auto``` keyword
196does not have its intuitive meaning.  Consider these 2 lines of code:
197
198    Tensor<float, 3> t3 = t1 + t2;
199    auto t4 = t1 + t2;
200
201In the first line we allocate the tensor ```t3``` and it will contain the
202result of the addition of ```t1``` and ```t2```.  In the second line, ```t4```
203is actually the tree of tensor operators that will compute the addition of
204```t1``` and ```t2```.  In fact, ```t4``` is *not* a tensor and you cannot get
205the values of its elements:
206
207    Tensor<float, 3> t3 = t1 + t2;
208    cout << t3(0, 0, 0);  // OK prints the value of t1(0, 0, 0) + t2(0, 0, 0)
209
210    auto t4 = t1 + t2;
211    cout << t4(0, 0, 0);  // Compilation error!
212
213When you use ```auto``` you do not get a Tensor as a result but instead a
214non-evaluated expression.  So only use ```auto``` to delay evaluation.
215
216Unfortunately, there is no single underlying concrete type for holding
217non-evaluated expressions, hence you have to use auto in the case when you do
218want to hold non-evaluated expressions.
219
220When you need the results of set of tensor computations you have to assign the
221result to a Tensor that will be capable of holding onto them.  This can be
222either a normal Tensor, a fixed size Tensor, or a TensorMap on an existing
223piece of memory.  All the following will work:
224
225    auto t4 = t1 + t2;
226
227    Tensor<float, 3> result = t4;  // Could also be: result(t4);
228    cout << result(0, 0, 0);
229
230    TensorMap<float, 4> result(<a float* with enough space>, <size0>, ...) = t4;
231    cout << result(0, 0, 0);
232
233    TensorFixedSize<float, Sizes<size0, ...>> result = t4;
234    cout << result(0, 0, 0);
235
236Until you need the results, you can keep the operation around, and even reuse
237it for additional operations.  As long as you keep the expression as an
238operation, no computation is performed.
239
240    // One way to compute exp((t1 + t2) * 0.2f);
241    auto t3 = t1 + t2;
242    auto t4 = t3 * 0.2f;
243    auto t5 = t4.exp();
244    Tensor<float, 3> result = t5;
245
246    // Another way, exactly as efficient as the previous one:
247    Tensor<float, 3> result = ((t1 + t2) * 0.2f).exp();
248
249### Controlling When Expression are Evaluated
250
251There are several ways to control when expressions are evaluated:
252
253*   Assignment to a Tensor, TensorFixedSize, or TensorMap.
254*   Use of the eval() method.
255*   Assignment to a TensorRef.
256
257#### Assigning to a Tensor, TensorFixedSize, or TensorMap.
258
259The most common way to evaluate an expression is to assign it to a Tensor.  In
260the example below, the ```auto``` declarations make the intermediate values
261"Operations", not Tensors, and do not cause the expressions to be evaluated.
262The assignment to the Tensor ```result``` causes the evaluation of all the
263operations.
264
265    auto t3 = t1 + t2;             // t3 is an Operation.
266    auto t4 = t3 * 0.2f;           // t4 is an Operation.
267    auto t5 = t4.exp();            // t5 is an Operation.
268    Tensor<float, 3> result = t5;  // The operations are evaluated.
269
270If you know the ranks and sizes of the Operation value you can assign the
271Operation to a TensorFixedSize instead of a Tensor, which is a bit more
272efficient.
273
274    // We know that the result is a 4x4x2 tensor!
275    TensorFixedSize<float, 4, 4, 2> result = t5;
276
277Simiarly, assigning an expression to a TensorMap causes its evaluation.  Like
278tensors of type TensorFixedSize, TensorMaps cannot be resized so they have to
279have the rank and sizes of the expression that are assigned to them.
280
281#### Calling eval().
282
283When you compute large composite expressions, you sometimes want to tell Eigen
284that an intermediate value in the expression tree is worth evaluating ahead of
285time.  This is done by inserting a call to the ```eval()``` method of the
286expression Operation.
287
288    // The previous example could have been written:
289    Tensor<float, 3> result = ((t1 + t2) * 0.2f).exp();
290
291    // If you want to compute (t1 + t2) once ahead of time you can write:
292    Tensor<float, 3> result = ((t1 + t2).eval() * 0.2f).exp();
293
294Semantically, calling ```eval()``` is equivalent to materializing the value of
295the expression in a temporary Tensor of the right size.  The code above in
296effect does:
297
298    // .eval() knows the size!
299    TensorFixedSize<float, 4, 4, 2> tmp = t1 + t2;
300    Tensor<float, 3> result = (tmp * 0.2f).exp();
301
302Note that the return value of ```eval()``` is itself an Operation, so the
303following code does not do what you may think:
304
305    // Here t3 is an evaluation Operation.  t3 has not been evaluated yet.
306    auto t3 = (t1 + t2).eval();
307
308    // You can use t3 in another expression.  Still no evaluation.
309    auto t4 = (t3 * 0.2f).exp();
310
311    // The value is evaluated when you assign the Operation to a Tensor, using
312    // an intermediate tensor to represent t3.x
313    Tensor<float, 3> result = t4;
314
315While in the examples above calling ```eval()``` does not make a difference in
316performance, in other cases it can make a huge difference.  In the expression
317below the ```broadcast()``` expression causes the ```X.maximum()``` expression
318to be evaluated many times:
319
320    Tensor<...> X ...;
321    Tensor<...> Y = ((X - X.maximum(depth_dim).reshape(dims2d).broadcast(bcast))
322                     * beta).exp();
323
324Inserting a call to ```eval()``` between the ```maximum()``` and
325```reshape()``` calls guarantees that maximum() is only computed once and
326greatly speeds-up execution:
327
328    Tensor<...> Y =
329      ((X - X.maximum(depth_dim).eval().reshape(dims2d).broadcast(bcast))
330        * beta).exp();
331
332In the other example below, the tensor ```Y``` is both used in the expression
333and its assignment.  This is an aliasing problem and if the evaluation is not
334done in the right order Y will be updated incrementally during the evaluation
335resulting in bogus results:
336
337     Tensor<...> Y ...;
338     Y = Y / (Y.sum(depth_dim).reshape(dims2d).broadcast(bcast));
339
340Inserting a call to ```eval()``` between the ```sum()``` and ```reshape()```
341expressions ensures that the sum is computed before any updates to ```Y``` are
342done.
343
344     Y = Y / (Y.sum(depth_dim).eval().reshape(dims2d).broadcast(bcast));
345
346Note that an eval around the full right hand side expression is not needed
347because the generated has to compute the i-th value of the right hand side
348before assigning it to the left hand side.
349
350However, if you were assigning the expression value to a shuffle of ```Y```
351then you would need to force an eval for correctness by adding an ```eval()```
352call for the right hand side:
353
354     Y.shuffle(...) =
355        (Y / (Y.sum(depth_dim).eval().reshape(dims2d).broadcast(bcast))).eval();
356
357
358#### Assigning to a TensorRef.
359
360If you need to access only a few elements from the value of an expression you
361can avoid materializing the value in a full tensor by using a TensorRef.
362
363A TensorRef is a small wrapper class for any Eigen Operation.  It provides
364overloads for the ```()``` operator that let you access individual values in
365the expression.  TensorRef is convenient, because the Operation themselves do
366not provide a way to access individual elements.
367
368    // Create a TensorRef for the expression.  The expression is not
369    // evaluated yet.
370    TensorRef<Tensor<float, 3> > ref = ((t1 + t2) * 0.2f).exp();
371
372    // Use "ref" to access individual elements.  The expression is evaluated
373    // on the fly.
374    float at_0 = ref(0, 0, 0);
375    cout << ref(0, 1, 0);
376
377Only use TensorRef when you need a subset of the values of the expression.
378TensorRef only computes the values you access.  However note that if you are
379going to access all the values it will be much faster to materialize the
380results in a Tensor first.
381
382In some cases, if the full Tensor result would be very large, you may save
383memory by accessing it as a TensorRef.  But not always.  So don't count on it.
384
385
386### Controlling How Expressions Are Evaluated
387
388The tensor library provides several implementations of the various operations
389such as contractions and convolutions.  The implementations are optimized for
390different environments: single threaded on CPU, multi threaded on CPU, or on a
391GPU using cuda.  Additional implementations may be added later.
392
393You can choose which implementation to use with the ```device()``` call.  If
394you do not choose an implementation explicitly the default implementation that
395uses a single thread on the CPU is used.
396
397The default implementation has been optimized for recent Intel CPUs, taking
398advantage of SSE, AVX, and FMA instructions.  Work is ongoing to tune the
399library on ARM CPUs.  Note that you need to pass compiler-dependent flags
400to enable the use of SSE, AVX, and other instructions.
401
402For example, the following code adds two tensors using the default
403single-threaded CPU implementation:
404
405    Tensor<float, 2> a(30, 40);
406    Tensor<float, 2> b(30, 40);
407    Tensor<float, 2> c = a + b;
408
409To choose a different implementation you have to insert a ```device()``` call
410before the assignment of the result.  For technical C++ reasons this requires
411that the Tensor for the result be declared on its own.  This means that you
412have to know the size of the result.
413
414    Eigen::Tensor<float, 2> c(30, 40);
415    c.device(...) = a + b;
416
417The call to ```device()``` must be the last call on the left of the operator=.
418
419You must pass to the ```device()``` call an Eigen device object.  There are
420presently three devices you can use: DefaultDevice, ThreadPoolDevice and
421GpuDevice.
422
423
424#### Evaluating With the DefaultDevice
425
426This is exactly the same as not inserting a ```device()``` call.
427
428    DefaultDevice my_device;
429    c.device(my_device) = a + b;
430
431#### Evaluating with a Thread Pool
432
433    // Create the Eigen ThreadPoolDevice.
434    Eigen::ThreadPoolDevice my_device(4 /* number of threads to use */);
435
436    // Now just use the device when evaluating expressions.
437    Eigen::Tensor<float, 2> c(30, 50);
438    c.device(my_device) = a.contract(b, dot_product_dims);
439
440
441#### Evaluating On GPU
442
443This is presently a bit more complicated than just using a thread pool device.
444You need to create a GPU device but you also need to explicitly allocate the
445memory for tensors with cuda.
446
447
448## API Reference
449
450### Datatypes
451
452In the documentation of the tensor methods and Operation we mention datatypes
453that are tensor-type specific:
454
455#### <Tensor-Type>::Dimensions
456
457Acts like an array of ints.  Has an ```int size``` attribute, and can be
458indexed like an array to access individual values.  Used to represent the
459dimensions of a tensor.  See ```dimensions()```.
460
461#### <Tensor-Type>::Index
462
463Acts like an ```int```.  Used for indexing tensors along their dimensions.  See
464```operator()```, ```dimension()```, and ```size()```.
465
466#### <Tensor-Type>::Scalar
467
468Represents the datatype of individual tensor elements.  For example, for a
469```Tensor<float>```, ```Scalar``` is the type ```float```.  See
470```setConstant()```.
471
472#### <Operation>
473
474We use this pseudo type to indicate that a tensor Operation is returned by a
475method.  We indicate in the text the type and dimensions of the tensor that the
476Operation returns after evaluation.
477
478The Operation will have to be evaluated, for example by assigning it to a
479tensor, before you can access the values of the resulting tensor.  You can also
480access the values through a TensorRef.
481
482
483## Built-in Tensor Methods
484
485These are usual C++ methods that act on tensors immediately.  They are not
486Operations which provide delayed evaluation of their results.  Unless specified
487otherwise, all the methods listed below are available on all tensor classes:
488Tensor, TensorFixedSize, and TensorMap.
489
490## Metadata
491
492### int NumDimensions
493
494Constant value indicating the number of dimensions of a Tensor.  This is also
495known as the tensor "rank".
496
497      Eigen::Tensor<float, 2> a(3, 4);
498      cout << "Dims " << a.NumDimensions;
499      => Dims 2
500
501### Dimensions dimensions()
502
503Returns an array-like object representing the dimensions of the tensor.
504The actual type of the dimensions() result is <Tensor-Type>::Dimensions.
505
506    Eigen::Tensor<float, 2> a(3, 4);
507    const Eigen::Tensor<float, 2>::Dimensions& d = a.dimensions();
508    cout << "Dim size: " << d.size << ", dim 0: " << d[0]
509         << ", dim 1: " << d[1];
510    => Dim size: 2, dim 0: 3, dim 1: 4
511
512If you use a C++11 compiler, you can use ```auto``` to simplify the code:
513
514    const auto& d = a.dimensions();
515    cout << "Dim size: " << d.size << ", dim 0: " << d[0]
516         << ", dim 1: " << d[1];
517    => Dim size: 2, dim 0: 3, dim 1: 4
518
519### Index dimension(Index n)
520
521Returns the n-th dimension of the tensor.  The actual type of the
522```dimension()``` result is ```<Tensor-Type>::Index```, but you can
523always use it like an int.
524
525      Eigen::Tensor<float, 2> a(3, 4);
526      int dim1 = a.dimension(1);
527      cout << "Dim 1: " << dim1;
528      => Dim 1: 4
529
530### Index size()
531
532Returns the total number of elements in the tensor.  This is the product of all
533the tensor dimensions.  The actual type of the ```size()``` result is
534```<Tensor-Type>::Index```, but you can always use it like an int.
535
536    Eigen::Tensor<float, 2> a(3, 4);
537    cout << "Size: " << a.size();
538    => Size: 12
539
540
541### Getting Dimensions From An Operation
542
543A few operations provide ```dimensions()``` directly,
544e.g. ```TensorReslicingOp```.  Most operations defer calculating dimensions
545until the operation is being evaluated.  If you need access to the dimensions
546of a deferred operation, you can wrap it in a TensorRef (see Assigning to a
547TensorRef above), which provides ```dimensions()``` and ```dimension()``` as
548above.
549
550TensorRef can also wrap the plain Tensor types, so this is a useful idiom in
551templated contexts where the underlying object could be either a raw Tensor
552or some deferred operation (e.g. a slice of a Tensor).  In this case, the
553template code can wrap the object in a TensorRef and reason about its
554dimensionality while remaining agnostic to the underlying type.
555
556
557## Constructors
558
559### Tensor
560
561Creates a tensor of the specified size. The number of arguments must be equal
562to the rank of the tensor. The content of the tensor is not initialized.
563
564    Eigen::Tensor<float, 2> a(3, 4);
565    cout << "NumRows: " << a.dimension(0) << " NumCols: " << a.dimension(1) << endl;
566    => NumRows: 3 NumCols: 4
567
568### TensorFixedSize
569
570Creates a tensor of the specified size. The number of arguments in the Size<>
571template parameter determines the rank of the tensor. The content of the tensor
572is not initialized.
573
574    Eigen::TensorFixedSize<float, Size<3, 4>> a;
575    cout << "Rank: " << a.rank() << endl;
576    => Rank: 2
577    cout << "NumRows: " << a.dimension(0) << " NumCols: " << a.dimension(1) << endl;
578    => NumRows: 3 NumCols: 4
579
580### TensorMap
581
582Creates a tensor mapping an existing array of data. The data must not be freed
583until the TensorMap is discarded, and the size of the data must be large enough
584to accomodate of the coefficients of the tensor.
585
586    float data[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11};
587    Eigen::TensorMap<float, 2> a(data, 3, 4);
588    cout << "NumRows: " << a.dimension(0) << " NumCols: " << a.dimension(1) << endl;
589    => NumRows: 3 NumCols: 4
590    cout << "a(1, 2): " << a(1, 2) << endl;
591    => a(1, 2): 9
592
593
594## Contents Initialization
595
596When a new Tensor or a new TensorFixedSize are created, memory is allocated to
597hold all the tensor elements, but the memory is not initialized.  Similarly,
598when a new TensorMap is created on top of non-initialized memory the memory its
599contents are not initialized.
600
601You can use one of the methods below to initialize the tensor memory.  These
602have an immediate effect on the tensor and return the tensor itself as a
603result.  These are not tensor Operations which delay evaluation.
604
605### <Tensor-Type> setConstant(const Scalar& val)
606
607Sets all elements of the tensor to the constant value ```val```.  ```Scalar```
608is the type of data stored in the tensor.  You can pass any value that is
609convertible to that type.
610
611Returns the tensor itself in case you want to chain another call.
612
613    a.setConstant(12.3f);
614    cout << "Constant: " << endl << a << endl << endl;
615    =>
616    Constant:
617    12.3 12.3 12.3 12.3
618    12.3 12.3 12.3 12.3
619    12.3 12.3 12.3 12.3
620
621Note that ```setConstant()``` can be used on any tensor where the element type
622has a copy constructor and an ```operator=()```:
623
624    Eigen::Tensor<string, 2> a(2, 3);
625    a.setConstant("yolo");
626    cout << "String tensor: " << endl << a << endl << endl;
627    =>
628    String tensor:
629    yolo yolo yolo
630    yolo yolo yolo
631
632
633### <Tensor-Type> setZero()
634
635Fills the tensor with zeros.  Equivalent to ```setConstant(Scalar(0))```.
636Returns the tensor itself in case you want to chain another call.
637
638    a.setZero();
639    cout << "Zeros: " << endl << a << endl << endl;
640    =>
641    Zeros:
642    0 0 0 0
643    0 0 0 0
644    0 0 0 0
645
646
647### <Tensor-Type> setValues({..initializer_list})
648
649Fills the tensor with explicit values specified in a std::initializer_list.
650The type of the initializer list depends on the type and rank of the tensor.
651
652If the tensor has rank N, the initializer list must be nested N times.  The
653most deeply nested lists must contains P scalars of the Tensor type where P is
654the size of the last dimension of the Tensor.
655
656For example, for a ```TensorFixedSize<float, 2, 3>``` the initializer list must
657contains 2 lists of 3 floats each.
658
659```setValues()``` returns the tensor itself in case you want to chain another
660call.
661
662    Eigen::Tensor<float, 2> a(2, 3);
663    a.setValues({{0.0f, 1.0f, 2.0f}, {3.0f, 4.0f, 5.0f}});
664    cout << "a" << endl << a << endl << endl;
665    =>
666    a
667    0 1 2
668    3 4 5
669
670If a list is too short, the corresponding elements of the tensor will not be
671changed.  This is valid at each level of nesting.  For example the following
672code only sets the values of the first row of the tensor.
673
674    Eigen::Tensor<int, 2> a(2, 3);
675    a.setConstant(1000);
676    a.setValues({{10, 20, 30}});
677    cout << "a" << endl << a << endl << endl;
678    =>
679    a
680    10   20   30
681    1000 1000 1000
682
683### <Tensor-Type> setRandom()
684
685Fills the tensor with random values.  Returns the tensor itself in case you
686want to chain another call.
687
688    a.setRandom();
689    cout << "Random: " << endl << a << endl << endl;
690    =>
691    Random:
692      0.680375    0.59688  -0.329554    0.10794
693     -0.211234   0.823295   0.536459 -0.0452059
694      0.566198  -0.604897  -0.444451   0.257742
695
696You can customize ```setRandom()``` by providing your own random number
697generator as a template argument:
698
699    a.setRandom<MyRandomGenerator>();
700
701Here, ```MyRandomGenerator``` must be a struct with the following member
702functions, where Scalar and Index are the same as ```<Tensor-Type>::Scalar```
703and ```<Tensor-Type>::Index```.
704
705See ```struct UniformRandomGenerator``` in TensorFunctors.h for an example.
706
707    // Custom number generator for use with setRandom().
708    struct MyRandomGenerator {
709      // Default and copy constructors. Both are needed
710      MyRandomGenerator() { }
711      MyRandomGenerator(const MyRandomGenerator& ) { }
712
713      // Return a random value to be used.  "element_location" is the
714      // location of the entry to set in the tensor, it can typically
715      // be ignored.
716      Scalar operator()(Eigen::DenseIndex element_location,
717                        Eigen::DenseIndex /*unused*/ = 0) const {
718        return <randomly generated value of type T>;
719      }
720
721      // Same as above but generates several numbers at a time.
722      typename internal::packet_traits<Scalar>::type packetOp(
723          Eigen::DenseIndex packet_location, Eigen::DenseIndex /*unused*/ = 0) const {
724        return <a packet of randomly generated values>;
725      }
726    };
727
728You can also use one of the 2 random number generators that are part of the
729tensor library:
730*   UniformRandomGenerator
731*   NormalRandomGenerator
732
733
734## Data Access
735
736The Tensor, TensorFixedSize, and TensorRef classes provide the following
737accessors to access the tensor coefficients:
738
739    const Scalar& operator()(const array<Index, NumIndices>& indices)
740    const Scalar& operator()(Index firstIndex, IndexTypes... otherIndices)
741    Scalar& operator()(const array<Index, NumIndices>& indices)
742    Scalar& operator()(Index firstIndex, IndexTypes... otherIndices)
743
744The number of indices must be equal to the rank of the tensor. Moreover, these
745accessors are not available on tensor expressions. In order to access the
746values of a tensor expression, the expression must either be evaluated or
747wrapped in a TensorRef.
748
749
750### Scalar* data() and const Scalar* data() const
751
752Returns a pointer to the storage for the tensor.  The pointer is const if the
753tensor was const.  This allows direct access to the data.  The layout of the
754data depends on the tensor layout: RowMajor or ColMajor.
755
756This access is usually only needed for special cases, for example when mixing
757Eigen Tensor code with other libraries.
758
759Scalar is the type of data stored in the tensor.
760
761    Eigen::Tensor<float, 2> a(3, 4);
762    float* a_data = a.data();
763    a_data[0] = 123.45f;
764    cout << "a(0, 0): " << a(0, 0);
765    => a(0, 0): 123.45
766
767
768## Tensor Operations
769
770All the methods documented below return non evaluated tensor ```Operations```.
771These can be chained: you can apply another Tensor Operation to the value
772returned by the method.
773
774The chain of Operation is evaluated lazily, typically when it is assigned to a
775tensor.  See "Controlling when Expression are Evaluated" for more details about
776their evaluation.
777
778### <Operation> constant(const Scalar& val)
779
780Returns a tensor of the same type and dimensions as the original tensor but
781where all elements have the value ```val```.
782
783This is useful, for example, when you want to add or subtract a constant from a
784tensor, or multiply every element of a tensor by a scalar.
785
786    Eigen::Tensor<float, 2> a(2, 3);
787    a.setConstant(1.0f);
788    Eigen::Tensor<float, 2> b = a + a.constant(2.0f);
789    Eigen::Tensor<float, 2> c = b * b.constant(0.2f);
790    cout << "a" << endl << a << endl << endl;
791    cout << "b" << endl << b << endl << endl;
792    cout << "c" << endl << c << endl << endl;
793    =>
794    a
795    1 1 1
796    1 1 1
797
798    b
799    3 3 3
800    3 3 3
801
802    c
803    0.6 0.6 0.6
804    0.6 0.6 0.6
805
806### <Operation> random()
807
808Returns a tensor of the same type and dimensions as the current tensor
809but where all elements have random values.
810
811This is for example useful to add random values to an existing tensor.
812The generation of random values can be customized in the same manner
813as for ```setRandom()```.
814
815    Eigen::Tensor<float, 2> a(2, 3);
816    a.setConstant(1.0f);
817    Eigen::Tensor<float, 2> b = a + a.random();
818    cout << "a" << endl << a << endl << endl;
819    cout << "b" << endl << b << endl << endl;
820    =>
821    a
822    1 1 1
823    1 1 1
824
825    b
826    1.68038   1.5662  1.82329
827    0.788766  1.59688 0.395103
828
829
830## Unary Element Wise Operations
831
832All these operations take a single input tensor as argument and return a tensor
833of the same type and dimensions as the tensor to which they are applied.  The
834requested operations are applied to each element independently.
835
836### <Operation> operator-()
837
838Returns a tensor of the same type and dimensions as the original tensor
839containing the opposite values of the original tensor.
840
841    Eigen::Tensor<float, 2> a(2, 3);
842    a.setConstant(1.0f);
843    Eigen::Tensor<float, 2> b = -a;
844    cout << "a" << endl << a << endl << endl;
845    cout << "b" << endl << b << endl << endl;
846    =>
847    a
848    1 1 1
849    1 1 1
850
851    b
852    -1 -1 -1
853    -1 -1 -1
854
855### <Operation> sqrt()
856
857Returns a tensor of the same type and dimensions as the original tensor
858containing the square roots of the original tensor.
859
860### <Operation> rsqrt()
861
862Returns a tensor of the same type and dimensions as the original tensor
863containing the inverse square roots of the original tensor.
864
865### <Operation> square()
866
867Returns a tensor of the same type and dimensions as the original tensor
868containing the squares of the original tensor values.
869
870### <Operation> inverse()
871
872Returns a tensor of the same type and dimensions as the original tensor
873containing the inverse of the original tensor values.
874
875### <Operation> exp()
876
877Returns a tensor of the same type and dimensions as the original tensor
878containing the exponential of the original tensor.
879
880### <Operation> log()
881
882Returns a tensor of the same type and dimensions as the original tensor
883containing the natural logarithms of the original tensor.
884
885### <Operation> abs()
886
887Returns a tensor of the same type and dimensions as the original tensor
888containing the absolute values of the original tensor.
889
890### <Operation> pow(Scalar exponent)
891
892Returns a tensor of the same type and dimensions as the original tensor
893containing the coefficients of the original tensor to the power of the
894exponent.
895
896The type of the exponent, Scalar, is always the same as the type of the
897tensor coefficients.  For example, only integer exponents can be used in
898conjuntion with tensors of integer values.
899
900You can use cast() to lift this restriction.  For example this computes
901cubic roots of an int Tensor:
902
903    Eigen::Tensor<int, 2> a(2, 3);
904    a.setValues({{0, 1, 8}, {27, 64, 125}});
905    Eigen::Tensor<double, 2> b = a.cast<double>().pow(1.0 / 3.0);
906    cout << "a" << endl << a << endl << endl;
907    cout << "b" << endl << b << endl << endl;
908    =>
909    a
910    0   1   8
911    27  64 125
912
913    b
914    0 1 2
915    3 4 5
916
917### <Operation>  operator * (Scalar scale)
918
919Multiplies all the coefficients of the input tensor by the provided scale.
920
921### <Operation>  cwiseMax(Scalar threshold)
922TODO
923
924### <Operation>  cwiseMin(Scalar threshold)
925TODO
926
927### <Operation>  unaryExpr(const CustomUnaryOp& func)
928TODO
929
930
931## Binary Element Wise Operations
932
933These operations take two input tensors as arguments. The 2 input tensors should
934be of the same type and dimensions. The result is a tensor of the same
935dimensions as the tensors to which they are applied, and unless otherwise
936specified it is also of the same type. The requested operations are applied to
937each pair of elements independently.
938
939### <Operation> operator+(const OtherDerived& other)
940
941Returns a tensor of the same type and dimensions as the input tensors
942containing the coefficient wise sums of the inputs.
943
944### <Operation> operator-(const OtherDerived& other)
945
946Returns a tensor of the same type and dimensions as the input tensors
947containing the coefficient wise differences of the inputs.
948
949### <Operation> operator*(const OtherDerived& other)
950
951Returns a tensor of the same type and dimensions as the input tensors
952containing the coefficient wise products of the inputs.
953
954### <Operation> operator/(const OtherDerived& other)
955
956Returns a tensor of the same type and dimensions as the input tensors
957containing the coefficient wise quotients of the inputs.
958
959This operator is not supported for integer types.
960
961### <Operation> cwiseMax(const OtherDerived& other)
962
963Returns a tensor of the same type and dimensions as the input tensors
964containing the coefficient wise maximums of the inputs.
965
966### <Operation> cwiseMin(const OtherDerived& other)
967
968Returns a tensor of the same type and dimensions as the input tensors
969containing the coefficient wise mimimums of the inputs.
970
971### <Operation> Logical operators
972
973The following logical operators are supported as well:
974
975*   operator&&(const OtherDerived& other)
976*   operator||(const OtherDerived& other)
977*   operator<(const OtherDerived& other)
978*   operator<=(const OtherDerived& other)
979*   operator>(const OtherDerived& other)
980*   operator>=(const OtherDerived& other)
981*   operator==(const OtherDerived& other)
982*   operator!=(const OtherDerived& other)
983
984They all return a tensor of boolean values.
985
986
987## Selection (select(const ThenDerived& thenTensor, const ElseDerived& elseTensor)
988
989Selection is a coefficient-wise ternary operator that is the tensor equivalent
990to the if-then-else operation.
991
992    Tensor<bool, 3> if = ...;
993    Tensor<float, 3> then = ...;
994    Tensor<float, 3> else = ...;
995    Tensor<float, 3> result = if.select(then, else);
996
997The 3 arguments must be of the same dimensions, which will also be the dimension
998of the result.  The 'if' tensor must be of type boolean, the 'then' and the
999'else' tensor must be of the same type, which will also be the type of the
1000result.
1001
1002Each coefficient in the result is equal to the corresponding coefficient in the
1003'then' tensor if the corresponding value in the 'if' tensor is true. If not, the
1004resulting coefficient will come from the 'else' tensor.
1005
1006
1007## Contraction
1008
1009Tensor *contractions* are a generalization of the matrix product to the
1010multidimensional case.
1011
1012    // Create 2 matrices using tensors of rank 2
1013    Eigen::Tensor<int, 2> a(2, 3);
1014    a.setValues({{1, 2, 3}, {6, 5, 4}});
1015    Eigen::Tensor<int, 2> b(3, 2);
1016    a.setValues({{1, 2}, {4, 5}, {5, 6}});
1017
1018    // Compute the traditional matrix product
1019    array<IndexPair<int>, 1> product_dims = { IndexPair(1, 0) };
1020    Eigen::Tensor<int, 2> AB = a.contract(b, product_dims);
1021
1022    // Compute the product of the transpose of the matrices
1023    array<IndexPair<int>, 1> transpose_product_dims = { IndexPair(0, 1) };
1024    Eigen::Tensor<int, 2> AtBt = a.contract(b, transposed_product_dims);
1025
1026
1027## Reduction Operations
1028
1029A *Reduction* operation returns a tensor with fewer dimensions than the
1030original tensor.  The values in the returned tensor are computed by applying a
1031*reduction operator* to slices of values from the original tensor.  You specify
1032the dimensions along which the slices are made.
1033
1034The Eigen Tensor library provides a set of predefined reduction operators such
1035as ```maximum()``` and ```sum()``` and lets you define additional operators by
1036implementing a few methods from a reductor template.
1037
1038### Reduction Dimensions
1039
1040All reduction operations take a single parameter of type
1041```<TensorType>::Dimensions``` which can always be specified as an array of
1042ints.  These are called the "reduction dimensions."  The values are the indices
1043of the dimensions of the input tensor over which the reduction is done.  The
1044parameter can have at most as many element as the rank of the input tensor;
1045each element must be less than the tensor rank, as it indicates one of the
1046dimensions to reduce.
1047
1048Each dimension of the input tensor should occur at most once in the reduction
1049dimensions as the implementation does not remove duplicates.
1050
1051The order of the values in the reduction dimensions does not affect the
1052results, but the code may execute faster if you list the dimensions in
1053increasing order.
1054
1055Example: Reduction along one dimension.
1056
1057    // Create a tensor of 2 dimensions
1058    Eigen::Tensor<int, 2> a(2, 3);
1059    a.setValues({{1, 2, 3}, {6, 5, 4}});
1060    // Reduce it along the second dimension (1)...
1061    Eigen::array<int, 1> dims({1 /* dimension to reduce */});
1062    // ...using the "maximum" operator.
1063    // The result is a tensor with one dimension.  The size of
1064    // that dimension is the same as the first (non-reduced) dimension of a.
1065    Eigen::Tensor<int, 1> b = a.maximum(dims);
1066    cout << "a" << endl << a << endl << endl;
1067    cout << "b" << endl << b << endl << endl;
1068    =>
1069    a
1070    1 2 3
1071    6 5 4
1072
1073    b
1074    3
1075    6
1076
1077Example: Reduction along two dimensions.
1078
1079    Eigen::Tensor<float, 3, Eigen::ColMajor> a(2, 3, 4);
1080    a.setValues({{{0.0f, 1.0f, 2.0f, 3.0f},
1081                  {7.0f, 6.0f, 5.0f, 4.0f},
1082                  {8.0f, 9.0f, 10.0f, 11.0f}},
1083                 {{12.0f, 13.0f, 14.0f, 15.0f},
1084                  {19.0f, 18.0f, 17.0f, 16.0f},
1085                  {20.0f, 21.0f, 22.0f, 23.0f}}});
1086    // The tensor a has 3 dimensions.  We reduce along the
1087    // first 2, resulting in a tensor with a single dimension
1088    // of size 4 (the last dimension of a.)
1089    // Note that we pass the array of reduction dimensions
1090    // directly to the maximum() call.
1091    Eigen::Tensor<float, 1, Eigen::ColMajor> b =
1092        a.maximum(Eigen::array<int, 2>({0, 1}));
1093    cout << "b" << endl << b << endl << endl;
1094    =>
1095    b
1096    20
1097    21
1098    22
1099    23
1100
1101#### Reduction along all dimensions
1102
1103As a special case, if you pass no parameter to a reduction operation the
1104original tensor is reduced along *all* its dimensions.  The result is a
1105scalar, represented as a zero-dimension tensor.
1106
1107    Eigen::Tensor<float, 3> a(2, 3, 4);
1108    a.setValues({{{0.0f, 1.0f, 2.0f, 3.0f},
1109                  {7.0f, 6.0f, 5.0f, 4.0f},
1110                  {8.0f, 9.0f, 10.0f, 11.0f}},
1111                 {{12.0f, 13.0f, 14.0f, 15.0f},
1112                  {19.0f, 18.0f, 17.0f, 16.0f},
1113                  {20.0f, 21.0f, 22.0f, 23.0f}}});
1114    // Reduce along all dimensions using the sum() operator.
1115    Eigen::Tensor<float, 0> b = a.sum();
1116    cout << "b" << endl << b << endl << endl;
1117    =>
1118    b
1119    276
1120
1121
1122### <Operation> sum(const Dimensions& new_dims)
1123### <Operation> sum()
1124
1125Reduce a tensor using the sum() operator.  The resulting values
1126are the sum of the reduced values.
1127
1128### <Operation> mean(const Dimensions& new_dims)
1129### <Operation> mean()
1130
1131Reduce a tensor using the mean() operator.  The resulting values
1132are the mean of the reduced values.
1133
1134### <Operation> maximum(const Dimensions& new_dims)
1135### <Operation> maximum()
1136
1137Reduce a tensor using the maximum() operator.  The resulting values are the
1138largest of the reduced values.
1139
1140### <Operation> minimum(const Dimensions& new_dims)
1141### <Operation> minimum()
1142
1143Reduce a tensor using the minimum() operator.  The resulting values
1144are the smallest of the reduced values.
1145
1146### <Operation> prod(const Dimensions& new_dims)
1147### <Operation> prod()
1148
1149Reduce a tensor using the prod() operator.  The resulting values
1150are the product of the reduced values.
1151
1152### <Operation> all(const Dimensions& new_dims)
1153### <Operation> all()
1154Reduce a tensor using the all() operator.  Casts tensor to bool and then checks
1155whether all elements are true.  Runs through all elements rather than
1156short-circuiting, so may be significantly inefficient.
1157
1158### <Operation> any(const Dimensions& new_dims)
1159### <Operation> any()
1160Reduce a tensor using the any() operator.  Casts tensor to bool and then checks
1161whether any element is true.  Runs through all elements rather than
1162short-circuiting, so may be significantly inefficient.
1163
1164
1165### <Operation> reduce(const Dimensions& new_dims, const Reducer& reducer)
1166
1167Reduce a tensor using a user-defined reduction operator.  See ```SumReducer```
1168in TensorFunctors.h for information on how to implement a reduction operator.
1169
1170
1171## Scan Operations
1172
1173A *Scan* operation returns a tensor with the same dimensions as the original
1174tensor. The operation performs an inclusive scan along the specified
1175axis, which means it computes a running total along the axis for a given
1176reduction operation.
1177If the reduction operation corresponds to summation, then this computes the
1178prefix sum of the tensor along the given axis.
1179
1180Example:
1181dd a comment to this line
1182
1183    // Create a tensor of 2 dimensions
1184    Eigen::Tensor<int, 2> a(2, 3);
1185    a.setValues({{1, 2, 3}, {4, 5, 6}});
1186    // Scan it along the second dimension (1) using summation
1187    Eigen::Tensor<int, 2> b = a.cumsum(1);
1188    // The result is a tensor with the same size as the input
1189    cout << "a" << endl << a << endl << endl;
1190    cout << "b" << endl << b << endl << endl;
1191    =>
1192    a
1193    1 2 3
1194    6 5 4
1195
1196    b
1197    1  3  6
1198    4  9 15
1199
1200### <Operation> cumsum(const Index& axis)
1201
1202Perform a scan by summing consecutive entries.
1203
1204### <Operation> cumprod(const Index& axis)
1205
1206Perform a scan by multiplying consecutive entries.
1207
1208
1209## Convolutions
1210
1211### <Operation> convolve(const Kernel& kernel, const Dimensions& dims)
1212
1213Returns a tensor that is the output of the convolution of the input tensor with the kernel,
1214along the specified dimensions of the input tensor. The dimension size for dimensions of the output tensor
1215which were part of the convolution will be reduced by the formula:
1216output_dim_size = input_dim_size - kernel_dim_size + 1 (requires: input_dim_size >= kernel_dim_size).
1217The dimension sizes for dimensions that were not part of the convolution will remain the same.
1218Performance of the convolution can depend on the length of the stride(s) of the input tensor dimension(s) along which the
1219convolution is computed (the first dimension has the shortest stride for ColMajor, whereas RowMajor's shortest stride is
1220for the last dimension).
1221
1222    // Compute convolution along the second and third dimension.
1223    Tensor<float, 4, DataLayout> input(3, 3, 7, 11);
1224    Tensor<float, 2, DataLayout> kernel(2, 2);
1225    Tensor<float, 4, DataLayout> output(3, 2, 6, 11);
1226    input.setRandom();
1227    kernel.setRandom();
1228
1229    Eigen::array<ptrdiff_t, 2> dims({1, 2});  // Specify second and third dimension for convolution.
1230    output = input.convolve(kernel, dims);
1231
1232    for (int i = 0; i < 3; ++i) {
1233      for (int j = 0; j < 2; ++j) {
1234        for (int k = 0; k < 6; ++k) {
1235          for (int l = 0; l < 11; ++l) {
1236            const float result = output(i,j,k,l);
1237            const float expected = input(i,j+0,k+0,l) * kernel(0,0) +
1238                                   input(i,j+1,k+0,l) * kernel(1,0) +
1239                                   input(i,j+0,k+1,l) * kernel(0,1) +
1240                                   input(i,j+1,k+1,l) * kernel(1,1);
1241            VERIFY_IS_APPROX(result, expected);
1242          }
1243        }
1244      }
1245    }
1246
1247
1248## Geometrical Operations
1249
1250These operations return a Tensor with different dimensions than the original
1251Tensor.  They can be used to access slices of tensors, see them with different
1252dimensions, or pad tensors with additional data.
1253
1254### <Operation> reshape(const Dimensions& new_dims)
1255
1256Returns a view of the input tensor that has been reshaped to the specified
1257new dimensions.  The argument new_dims is an array of Index values.  The
1258rank of the resulting tensor is equal to the number of elements in new_dims.
1259
1260The product of all the sizes in the new dimension array must be equal to
1261the number of elements in the input tensor.
1262
1263    // Increase the rank of the input tensor by introducing a new dimension
1264    // of size 1.
1265    Tensor<float, 2> input(7, 11);
1266    array<int, 3> three_dims{{7, 11, 1}};
1267    Tensor<float, 3> result = input.reshape(three_dims);
1268
1269    // Decrease the rank of the input tensor by merging 2 dimensions;
1270    array<int, 1> one_dim{{7 * 11}};
1271    Tensor<float, 1> result = input.reshape(one_dim);
1272
1273This operation does not move any data in the input tensor, so the resulting
1274contents of a reshaped Tensor depend on the data layout of the original Tensor.
1275
1276For example this is what happens when you ```reshape()``` a 2D ColMajor tensor
1277to one dimension:
1278
1279    Eigen::Tensor<float, 2, Eigen::ColMajor> a(2, 3);
1280    a.setValues({{0.0f, 100.0f, 200.0f}, {300.0f, 400.0f, 500.0f}});
1281    Eigen::array<Eigen::DenseIndex, 1> one_dim({3 * 2});
1282    Eigen::Tensor<float, 1, Eigen::ColMajor> b = a.reshape(one_dim);
1283    cout << "b" << endl << b << endl;
1284    =>
1285    b
1286      0
1287    300
1288    100
1289    400
1290    200
1291    500
1292
1293This is what happens when the 2D Tensor is RowMajor:
1294
1295    Eigen::Tensor<float, 2, Eigen::RowMajor> a(2, 3);
1296    a.setValues({{0.0f, 100.0f, 200.0f}, {300.0f, 400.0f, 500.0f}});
1297    Eigen::array<Eigen::DenseIndex, 1> one_dim({3 * 2});
1298    Eigen::Tensor<float, 1, Eigen::RowMajor> b = a.reshape(one_dim);
1299    cout << "b" << endl << b << endl;
1300    =>
1301    b
1302      0
1303    100
1304    200
1305    300
1306    400
1307    500
1308
1309The reshape operation is a lvalue. In other words, it can be used on the left
1310side of the assignment operator.
1311
1312The previous example can be rewritten as follow:
1313
1314    Eigen::Tensor<float, 2, Eigen::ColMajor> a(2, 3);
1315    a.setValues({{0.0f, 100.0f, 200.0f}, {300.0f, 400.0f, 500.0f}});
1316    Eigen::array<Eigen::DenseIndex, 2> two_dim({2, 3});
1317    Eigen::Tensor<float, 1, Eigen::ColMajor> b;
1318    b.reshape(two_dim) = a;
1319    cout << "b" << endl << b << endl;
1320    =>
1321    b
1322      0
1323    300
1324    100
1325    400
1326    200
1327    500
1328
1329Note that "b" itself was not reshaped but that instead the assignment is done to
1330the reshape view of b.
1331
1332
1333### <Operation> shuffle(const Shuffle& shuffle)
1334
1335Returns a copy of the input tensor whose dimensions have been
1336reordered according to the specified permutation. The argument shuffle
1337is an array of Index values. Its size is the rank of the input
1338tensor. It must contain a permutation of 0, 1, ..., rank - 1. The i-th
1339dimension of the output tensor equals to the size of the shuffle[i]-th
1340dimension of the input tensor. For example:
1341
1342    // Shuffle all dimensions to the left by 1.
1343    Tensor<float, 3> input(20, 30, 50);
1344    // ... set some values in input.
1345    Tensor<float, 3> output = input.shuffle({1, 2, 0})
1346
1347    eigen_assert(output.dimension(0) == 30);
1348    eigen_assert(output.dimension(1) == 50);
1349    eigen_assert(output.dimension(2) == 20);
1350
1351Indices into the output tensor are shuffled accordingly to formulate
1352indices into the input tensor. For example, one can assert in the above
1353code snippet that:
1354
1355    eigen_assert(output(3, 7, 11) == input(11, 3, 7));
1356
1357In general, one can assert that
1358
1359    eigen_assert(output(..., indices[shuffle[i]], ...) ==
1360                 input(..., indices[i], ...))
1361
1362The shuffle operation results in a lvalue, which means that it can be assigned
1363to. In other words, it can be used on the left side of the assignment operator.
1364
1365Let's rewrite the previous example to take advantage of this feature:
1366
1367    // Shuffle all dimensions to the left by 1.
1368    Tensor<float, 3> input(20, 30, 50);
1369    // ... set some values in input.
1370    Tensor<float, 3> output(30, 50, 20);
1371    output.shuffle({2, 0, 1}) = input;
1372
1373
1374### <Operation> stride(const Strides& strides)
1375
1376Returns a view of the input tensor that strides (skips stride-1
1377elements) along each of the dimensions.  The argument strides is an
1378array of Index values.  The dimensions of the resulting tensor are
1379ceil(input_dimensions[i] / strides[i]).
1380
1381For example this is what happens when you ```stride()``` a 2D tensor:
1382
1383    Eigen::Tensor<int, 2> a(4, 3);
1384    a.setValues({{0, 100, 200}, {300, 400, 500}, {600, 700, 800}, {900, 1000, 1100}});
1385    Eigen::array<Eigen::DenseIndex, 2> strides({3, 2});
1386    Eigen::Tensor<int, 2> b = a.stride(strides);
1387    cout << "b" << endl << b << endl;
1388    =>
1389    b
1390       0   200
1391     900  1100
1392
1393It is possible to assign a tensor to a stride:
1394    Tensor<float, 3> input(20, 30, 50);
1395    // ... set some values in input.
1396    Tensor<float, 3> output(40, 90, 200);
1397    output.stride({2, 3, 4}) = input;
1398
1399
1400### <Operation> slice(const StartIndices& offsets, const Sizes& extents)
1401
1402Returns a sub-tensor of the given tensor. For each dimension i, the slice is
1403made of the coefficients stored between offset[i] and offset[i] + extents[i] in
1404the input tensor.
1405
1406    Eigen::Tensor<int, 2> a(4, 3);
1407    a.setValues({{0, 100, 200}, {300, 400, 500},
1408                 {600, 700, 800}, {900, 1000, 1100}});
1409    Eigen::array<int, 2> offsets = {1, 0};
1410    Eigen::array<int, 2> extents = {2, 2};
1411    Eigen::Tensor<int, 1> slice = a.slice(offsets, extents);
1412    cout << "a" << endl << a << endl;
1413    =>
1414    a
1415       0   100   200
1416     300   400   500
1417     600   700   800
1418     900  1000  1100
1419    cout << "slice" << endl << slice << endl;
1420    =>
1421    slice
1422     300   400
1423     600   700
1424
1425
1426### <Operation> chip(const Index offset, const Index dim)
1427
1428A chip is a special kind of slice. It is the subtensor at the given offset in
1429the dimension dim. The returned tensor has one fewer dimension than the input
1430tensor: the dimension dim is removed.
1431
1432For example, a matrix chip would be either a row or a column of the input
1433matrix.
1434
1435    Eigen::Tensor<int, 2> a(4, 3);
1436    a.setValues({{0, 100, 200}, {300, 400, 500},
1437                 {600, 700, 800}, {900, 1000, 1100}});
1438    Eigen::Tensor<int, 1> row_3 = a.chip(2, 0);
1439    Eigen::Tensor<int, 1> col_2 = a.chip(1, 1);
1440    cout << "a" << endl << a << endl;
1441    =>
1442    a
1443       0   100   200
1444     300   400   500
1445     600   700   800
1446     900  1000  1100
1447    cout << "row_3" << endl << row_3 << endl;
1448    =>
1449    row_3
1450       600   700   800
1451    cout << "col_2" << endl << col_2 << endl;
1452    =>
1453    col_2
1454       100   400   700    1000
1455
1456It is possible to assign values to a tensor chip since the chip operation is a
1457lvalue. For example:
1458
1459    Eigen::Tensor<int, 1> a(3);
1460    a.setValues({{100, 200, 300}});
1461    Eigen::Tensor<int, 2> b(2, 3);
1462    b.setZero();
1463    b.chip(0, 0) = a;
1464    cout << "a" << endl << a << endl;
1465    =>
1466    a
1467     100
1468     200
1469     300
1470    cout << "b" << endl << b << endl;
1471    =>
1472    b
1473       100   200   300
1474         0     0     0
1475
1476
1477### <Operation> reverse(const ReverseDimensions& reverse)
1478
1479Returns a view of the input tensor that reverses the order of the coefficients
1480along a subset of the dimensions.  The argument reverse is an array of boolean
1481values that indicates whether or not the order of the coefficients should be
1482reversed along each of the dimensions.  This operation preserves the dimensions
1483of the input tensor.
1484
1485For example this is what happens when you ```reverse()``` the first dimension
1486of a 2D tensor:
1487
1488    Eigen::Tensor<int, 2> a(4, 3);
1489    a.setValues({{0, 100, 200}, {300, 400, 500},
1490                {600, 700, 800}, {900, 1000, 1100}});
1491    Eigen::array<bool, 2> reverse({true, false});
1492    Eigen::Tensor<int, 2> b = a.reverse(reverse);
1493    cout << "a" << endl << a << endl << "b" << endl << b << endl;
1494    =>
1495    a
1496       0   100   200
1497     300   400   500
1498     600   700   800
1499     900  1000  1100
1500    b
1501     900  1000  1100
1502     600   700   800
1503     300   400   500
1504       0   100   200
1505
1506
1507### <Operation> broadcast(const Broadcast& broadcast)
1508
1509Returns a view of the input tensor in which the input is replicated one to many
1510times.
1511The broadcast argument specifies how many copies of the input tensor need to be
1512made in each of the dimensions.
1513
1514    Eigen::Tensor<int, 2> a(2, 3);
1515    a.setValues({{0, 100, 200}, {300, 400, 500}});
1516    Eigen::array<int, 2> bcast({3, 2});
1517    Eigen::Tensor<int, 2> b = a.broadcast(bcast);
1518    cout << "a" << endl << a << endl << "b" << endl << b << endl;
1519    =>
1520    a
1521       0   100   200
1522     300   400   500
1523    b
1524       0   100   200    0   100   200
1525     300   400   500  300   400   500
1526       0   100   200    0   100   200
1527     300   400   500  300   400   500
1528       0   100   200    0   100   200
1529     300   400   500  300   400   500
1530
1531### <Operation> concatenate(const OtherDerived& other, Axis axis)
1532
1533TODO
1534
1535### <Operation>  pad(const PaddingDimensions& padding)
1536
1537Returns a view of the input tensor in which the input is padded with zeros.
1538
1539    Eigen::Tensor<int, 2> a(2, 3);
1540    a.setValues({{0, 100, 200}, {300, 400, 500}});
1541    Eigen::array<pair<int, int>, 2> paddings;
1542    paddings[0] = make_pair(0, 1);
1543    paddings[1] = make_pair(2, 3);
1544    Eigen::Tensor<int, 2> b = a.pad(paddings);
1545    cout << "a" << endl << a << endl << "b" << endl << b << endl;
1546    =>
1547    a
1548       0   100   200
1549     300   400   500
1550    b
1551       0     0     0    0
1552       0     0     0    0
1553       0   100   200    0
1554     300   400   500    0
1555       0     0     0    0
1556       0     0     0    0
1557       0     0     0    0
1558
1559
1560### <Operation>  extract_patches(const PatchDims& patch_dims)
1561
1562Returns a tensor of coefficient patches extracted from the input tensor, where
1563each patch is of dimension specified by 'patch_dims'. The returned tensor has
1564one greater dimension than the input tensor, which is used to index each patch.
1565The patch index in the output tensor depends on the data layout of the input
1566tensor: the patch index is the last dimension ColMajor layout, and the first
1567dimension in RowMajor layout.
1568
1569For example, given the following input tensor:
1570
1571  Eigen::Tensor<float, 2, DataLayout> tensor(3,4);
1572  tensor.setValues({{0.0f, 1.0f, 2.0f, 3.0f},
1573                    {4.0f, 5.0f, 6.0f, 7.0f},
1574                    {8.0f, 9.0f, 10.0f, 11.0f}});
1575
1576  cout << "tensor: " << endl << tensor << endl;
1577=>
1578tensor:
1579 0   1   2   3
1580 4   5   6   7
1581 8   9  10  11
1582
1583Six 2x2 patches can be extracted and indexed using the following code:
1584
1585  Eigen::Tensor<float, 3, DataLayout> patch;
1586  Eigen::array<ptrdiff_t, 2> patch_dims;
1587  patch_dims[0] = 2;
1588  patch_dims[1] = 2;
1589  patch = tensor.extract_patches(patch_dims);
1590  for (int k = 0; k < 6; ++k) {
1591    cout << "patch index: " << k << endl;
1592    for (int i = 0; i < 2; ++i) {
1593      for (int j = 0; j < 2; ++j) {
1594        if (DataLayout == ColMajor) {
1595          cout << patch(i, j, k) << " ";
1596        } else {
1597          cout << patch(k, i, j) << " ";
1598        }
1599      }
1600      cout << endl;
1601    }
1602  }
1603
1604This code results in the following output when the data layout is ColMajor:
1605
1606patch index: 0
16070 1
16084 5
1609patch index: 1
16104 5
16118 9
1612patch index: 2
16131 2
16145 6
1615patch index: 3
16165 6
16179 10
1618patch index: 4
16192 3
16206 7
1621patch index: 5
16226 7
162310 11
1624
1625This code results in the following output when the data layout is RowMajor:
1626(NOTE: the set of patches is the same as in ColMajor, but are indexed differently).
1627
1628patch index: 0
16290 1
16304 5
1631patch index: 1
16321 2
16335 6
1634patch index: 2
16352 3
16366 7
1637patch index: 3
16384 5
16398 9
1640patch index: 4
16415 6
16429 10
1643patch index: 5
16446 7
164510 11
1646
1647### <Operation>  extract_image_patches(const Index patch_rows, const Index patch_cols,
1648                          const Index row_stride, const Index col_stride,
1649                          const PaddingType padding_type)
1650
1651Returns a tensor of coefficient image patches extracted from the input tensor,
1652which is expected to have dimensions ordered as follows (depending on the data
1653layout of the input tensor, and the number of additional dimensions 'N'):
1654
1655*) ColMajor
16561st dimension: channels (of size d)
16572nd dimension: rows (of size r)
16583rd dimension: columns (of size c)
16594th-Nth dimension: time (for video) or batch (for bulk processing).
1660
1661*) RowMajor (reverse order of ColMajor)
16621st-Nth dimension: time (for video) or batch (for bulk processing).
1663N+1'th dimension: columns (of size c)
1664N+2'th dimension: rows (of size r)
1665N+3'th dimension: channels (of size d)
1666
1667The returned tensor has one greater dimension than the input tensor, which is
1668used to index each patch. The patch index in the output tensor depends on the
1669data layout of the input tensor: the patch index is the 4'th dimension in
1670ColMajor layout, and the 4'th from the last dimension in RowMajor layout.
1671
1672For example, given the following input tensor with the following dimension
1673sizes:
1674 *) depth:   2
1675 *) rows:    3
1676 *) columns: 5
1677 *) batch:   7
1678
1679  Tensor<float, 4> tensor(2,3,5,7);
1680  Tensor<float, 4, RowMajor> tensor_row_major = tensor.swap_layout();
1681
16822x2 image patches can be extracted and indexed using the following code:
1683
1684*) 2D patch: ColMajor (patch indexed by second-to-last dimension)
1685  Tensor<float, 5> twod_patch;
1686  twod_patch = tensor.extract_image_patches<2, 2>();
1687  // twod_patch.dimension(0) == 2
1688  // twod_patch.dimension(1) == 2
1689  // twod_patch.dimension(2) == 2
1690  // twod_patch.dimension(3) == 3*5
1691  // twod_patch.dimension(4) == 7
1692
1693*) 2D patch: RowMajor (patch indexed by the second dimension)
1694  Tensor<float, 5, RowMajor> twod_patch_row_major;
1695  twod_patch_row_major = tensor_row_major.extract_image_patches<2, 2>();
1696  // twod_patch_row_major.dimension(0) == 7
1697  // twod_patch_row_major.dimension(1) == 3*5
1698  // twod_patch_row_major.dimension(2) == 2
1699  // twod_patch_row_major.dimension(3) == 2
1700  // twod_patch_row_major.dimension(4) == 2
1701
1702## Special Operations
1703
1704### <Operation> cast<T>()
1705
1706Returns a tensor of type T with the same dimensions as the original tensor.
1707The returned tensor contains the values of the original tensor converted to
1708type T.
1709
1710    Eigen::Tensor<float, 2> a(2, 3);
1711    Eigen::Tensor<int, 2> b = a.cast<int>();
1712
1713This can be useful for example if you need to do element-wise division of
1714Tensors of integers.  This is not currently supported by the Tensor library
1715but you can easily cast the tensors to floats to do the division:
1716
1717    Eigen::Tensor<int, 2> a(2, 3);
1718    a.setValues({{0, 1, 2}, {3, 4, 5}});
1719    Eigen::Tensor<int, 2> b =
1720        (a.cast<float>() / a.constant(2).cast<float>()).cast<int>();
1721    cout << "a" << endl << a << endl << endl;
1722    cout << "b" << endl << b << endl << endl;
1723    =>
1724    a
1725    0 1 2
1726    3 4 5
1727
1728    b
1729    0 0 1
1730    1 2 2
1731
1732
1733### <Operation>     eval()
1734
1735TODO
1736
1737
1738## Representation of scalar values
1739
1740Scalar values are often represented by tensors of size 1 and rank 1. It would be
1741more logical and user friendly to use tensors of rank 0 instead. For example
1742Tensor<T, N>::maximum() currently returns a Tensor<T, 1>. Similarly, the inner
1743product of 2 1d tensors (through contractions) returns a 1d tensor. In the
1744future these operations might be updated to return 0d tensors instead.
1745
1746## Limitations
1747
1748*   The number of tensor dimensions is currently limited to 250 when using a
1749    compiler that supports cxx11. It is limited to only 5 for older compilers.
1750*   The IndexList class requires a cxx11 compliant compiler. You can use an
1751    array of indices instead if you don't have access to a modern compiler.
1752*   On GPUs only floating point values are properly tested and optimized for.
1753*   Complex and integer values are known to be broken on GPUs. If you try to use
1754    them you'll most likely end up triggering a static assertion failure such as
1755    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
1756
1757
1758