The Linear Approximation

The linear approximation amounts essentially to replacing the curve f(x) at a point by a short segment of its tangent line at that point. When we zoom in far enough on the graph of a differentiable function, it looks like a straight line, i.e the function and its tangent line become indistinguishable.

We start by approximating the increment of a function f, when its independent variable changes from x0 to x0 + dx.
Let f: (a, b) → ℝ differentiable at x0; as the independent variable is incremented from x0 to x0 + dx, the increment of f is:

Δf = f(x0 + dx) − f(x0)

which in general, fixed x0, this increment is not proportional to dx i.e. is not linear with respect to dx.

approssimazione funzione retta tangente
Fig.1 df = tg α

the increment along the tangent line is equal to the lenght (with sign!) of the segment QR that is equal to tg α, that is to f'(x0) dx, recalling that f'(x0) dx = tg α. This increment is proportional to dx, and is known as differential of f at x0 and it is indicated by df(x0):

df(x0) = f'(x0)dx

Thus whereas Δf represents the increment of the “y-coordinate” along the graph of f at x0, df represents the increment along the tangent line to this graph at x0.

What's the error of the approximation, that is the difference between Δf and df(x0)? To answer just recall the definition of derivative at a point:

derivata di una funzione

we can write the derivative withouth the symbol of limit as:

f ( x 0 + d x ) f ( x 0 ) d x = f ( x 0 ) + ε ( d x )

where ε(dx) is a quantity that tends to 0 for dx ⟶ 0, that is an infinitesimal quantity for dx → 0

By multiplying both sides by dx we have

f'(x0 + dx) −f(x0) = f'(x0) dx + dxε(dx)

or equivalently:

Δf − df(x0) = dxε(dx)

The quantity dxε(dx) is a quantity, that divided by dx, tends to zero: this means that dxε(dx) is an infinitesimal of higher order than dx. We introduce a symbolism useful for these circumstances.

Orders of Smallness; little "o symbol"

We often want to compare two function as a variable approaches some limit value x0. A useful notation, that we already introduced for asymptotic estimates of sequences is Landau notation (“big-O” and “little-o”) used to describe the limiting behavior of a function.

Definition 6.1.1.

f(x) = o(g(x)) as xx0   if   lim x x 0 f ( x ) g ( x ) = 0

Notation: f is o(g), f ∈ o(g(x), f ∈ o(g), f = o(g).

The expression f(x) = o(g(x)) is read “f is little-o of g at x0” or “f is of smaller order than g at x0 (as x approaches x0).  □

The following expressions are also used.

f(x) = g(x) + o(g(x)) as xx0   which means   limxx0 [f(x) − g(x)]/φ(x) = 0

which means that, "the difference between f and g is of smaller order that φ as x approaches x0. For example for x ⟶ 0, x2 = o(x); x3 = o(x); x3 = o(x2).

The growth of a function representing the complexity of an algorithm can be estimated using the big-O notation as its variable increases.

Definition 6.1.2. (big-Oh) For functions f,g: ℝ ⟶ ℝ or f,g: ℕ ⟶ ℝ (sequences of real numbers) g dominates f if there exist constants C and k such that

|f(x)| ≤ C |g(x)|   ∀x > k

Notation: f is O(g), f ∈ O(g(x), f ∈ O(g), f = O(g).

Proposition 6.1.3 (Little-Oh is Stronger than Big-Oh) I t f(x) = o(g(x)), xx0 then g(x) = O(f(x)), xx0.

The symbol o(..) (or O(..)) does not denote a particular function, but a set of functions each one having the property expressed by definition 6.1.1. The equality sign in the definition f(x) = o(g(x)) obscures the fact that o(g(x)) is a set and writing f(x) ∈ o(g(x)) would make much more sense.

Prooposition 6.1.4. We have the following properties, given two functions f,g: ℝ ⟶ ℝ:

  1. o(x) ± o(x) = o(x)

  2. for x ⟶ 0, o(x) + o(x2) = o(x).

  3. for x ⟶ +∞, o(x) + o(x2) = o(x2).

  4. o(x) = −o(x)  and  o(−4x2) = o(x2)  (multiplicative coefficients are irrelevant).

  5. xn = o(xm),   x ⟶ 0  ⇐⇒  n > m.

  6. f ⋅ o(g) = o(fg), e.g.

    • xo(x2) = o(x3)

    • o(x3) /x = o(x2).

    • o(x2) /x2 = o(1).

  7. o(f) ⋅ o(g) = o(fg), e.g. o(x) o(x2) = o(x3).

    Proof. (i) o(x) − o(x) means difference between two sets. Given two sets A,B then AB (different from A \ B) is defined by AB = {ab | aA, bB} e.g ℤ − ℤ = ℤ (but ℤ \ ℤ = {∅}); Analogous definition for A + B. It follows directly from the definition of little-o that the sum of two (and hence of any finite number) of little-o functions is a little-o function, and multiplying a little-oh function by a contant yields another little-o function.

    (ii) If f(x) ∈ o(x) limx ⟶ 0 f(x)/x = 0 and if g(x) in o(x2) limx ⟶ 0 g(x)/x2 = 0 from this follows that if f(x) + g(x) ∈ o(x) + o(x2) then we have limx ⟶ 0 [f(x)+g(x)]/x = 0 that is f(x) + g(x) ∈ o(x) thus o(x) + o(x2) ⊆ o(x). The other inclusion o(x) + o(x2) ⊇ o(x) is obvious hence we have equality.

  8. (v) In fact limx ⟶ 0 xn/xm = limx ⟶ 0 xnm = 0 if and only if nm > 0;  □

Definition 6.1.5. We say that f(x) and g(x) are equivalent infinitesimal as xx0 if

lim x x 0 f ( x ) g ( x ) = 1

In particular, the symbol o(1) indicates an infinitesimal quantity. For example we write f(x) = o(1) for xx0 to indicate that f is infinitesimal, since:

lim x x 0 f ( x ) 1 = 0

People will write 1 + x + x2 = 1 + x + o(x) = 1 + o(1) and "=" will mean "element of", then "subset of" (implicit limit is x ⟶ 0).

For the functions cons x and sin x the linearization process at x = 0, gives

1 − cos x = o(x)   and   sin x = x + o(x)   as x ⟶ 0

when we studied the asymptotic comparison we saw that for x ⟶ 0

sin x ~ x  hence  sin x = x + o(x)
1 − cos x ~ x2/2  hence  1 − cos x = o(x)

The asymptotic relation ~ and the little-o notation are indeed equivalent.

Theorem 6.1.6. (Relation between "~" and o()). The following equivalence property holds

for xx0, f(x) ~ g(x)  iff f(x) = g(x) + o(g(x))  □

The derivative with the little-o symbol

The definition of derivative can be written with the little-oh symbol as:

f(x0 + dx) − f(x0) = f'(x0) dx + o(dx)   for dx ⟶ 0

This is known as first order linear approximation. It can be written more succinctly as

Δf(x0) = df(x0) + o(dx)   for dx ⟶ 0

and can also be expressed as:

Δf(x0) ≈ df(x0)   near x0

The rules to evaluate differentials are analogous to those for derivatives

d(f±g) = df ± dg

d(fg) = g df ± f dg

d(f/g) = (gdffdg)/g2

Noti also that

d f f = d log | f |

logarihtmic differential of f.

Example 6.1.7. Evaluate the linear approximation of

f ( x ) = 1 + x

for x ⟶ 0. Here x0 = 0, hence x coincides with the increment dx. We have

f ( x ) = 1 2 1 + x f ( 0 ) = 1 2

we have

sqrt(1 + x) − 1 ≈ x/2

or equivalently

1 + x = 1 + 1 2 x + o ( x ) , for  x 0

More generally considering the function f(x) = (1 + x)α, we have

f'(x = α(1 + x)α − 1    f'(0)

and we obtain

(1 + x)α − 1 ≈ αx. ■

«Derivative Index Null »