## General Speed of Built-in Functions

Shader hardware will do multiplication, addition and MAD in one cycle.

Generally, only exp2, log2, inversesqrt, sin, cos and rcp can be assumed to be implemented in hardware, every other function is made up of those parts. These are called "special functions" and they are slower than arithmetic. They are generally assumed to take 4 cycles

### SFU

On nvidia, special functions are run on a separate lane, so they do not cost anything if mixed in with arithmetic, but they still run at 1/4 rate (1/8 on old fermi cards) so using multiple special functions in a row will still cost the same as on more classical cards.

### Implementation of Built-in Functions

a / b == a * rcp(b)

1./a == rcp(a)

sqrt(a) == a * inversesqrt(a)

pow(a, b) == exp2(log2(a) * b)

exp(a) == exp2(a * constant)

normalize(a) == a * inversesqrt(dot(a,a))

mix(a, b, c) == (b-a) * c + a

### Vectors and Matrices

Vectors are a collection of multiple scalars, the cost every operation on them is multiplied by the number of components of the vector.

So vec3 * vec3 is 3x more expensive than float * float

vec3 * float is as expensive as vec3 * vec3

Matrix multiplications are not the same as simple vector / scalar multiplications, they are way more expensive

vec2 * mat2 is 4 cycles

vec3 * mat3 is 9 cycles

vec4 * mat4 is 16 cycles

mat2 * mat2 is 8 cycles

mat3 * mat3 is 27 cycles

mat4 * mat4 is 64 cycles!!!!

## Identities

exp(a+b) == exp(a) * exp(b)

pow(pow(a,b),c) == pow(a, b*c)

a / pow(b, c) == a * pow(b,-c)

log(a) + log(b) == log(a*b)

log(a/b) == log(a) - log(b)

log(pow(a,b)) == b * log(a)

log(sqrt(a)) = log(a) * 0.5

cross(a, cross(b, c)) = b * dot(a,c) - c * dot(a,b)