The Linear Approximation

The linear approximation amounts essentially to replacing the curve f(x) at a point by a short segment of its tangent line at that point. When we zoom in far enough on the graph of a differentiable function, it looks like a straight line, i.e the function and its tangent line become indistinguishable.

We start by approximating the increment of a function f, when its independent variable changes from x₀ to x₀ + dx.
Let f: (a, b) → ℝ differentiable at x₀; as the independent variable is incremented from x₀ to x₀ + dx, the increment of f is:

Δf = f(x₀ + dx) − f(x₀)

which in general, fixed x₀, this increment is not proportional to dx i.e. is not linear with respect to dx.

approssimazione funzione retta tangente — **Fig.1** df = tg α

the increment along the tangent line is equal to the lenght (with sign!) of the segment QR that is equal to tg α, that is to f'(x₀) dx, recalling that f'(x₀) dx = tg α. This increment is proportional to dx, and is known as differential of f at x₀ and it is indicated by df(x₀):

df(x₀) = f'(x₀)dx

Thus whereas Δf represents the increment of the “y-coordinate” along the graph of f at x₀, df represents the increment along the tangent line to this graph at x₀.

What's the error of the approximation, that is the difference between Δf and df(x₀)? To answer just recall the definition of derivative at a point:

we can write the derivative withouth the symbol of limit as:

\frac{f (x_{0} + d x) - f (x_{0})}{d x} = f^{'} (x_{0}) + ε (d x)

where ε(dx) is a quantity that tends to 0 for dx ⟶ 0, that is an infinitesimal quantity for dx → 0

By multiplying both sides by dx we have

f'(x₀ + dx) −f(x₀) = f'(x₀) dx + dx ⋅ ε(dx)

or equivalently:

Δf − df(x₀) = dx ⋅ ε(dx)

The quantity dx ⋅ ε(dx) is a quantity, that divided by dx, tends to zero: this means that dx ⋅ ε(dx) is an infinitesimal of higher order than dx. We introduce a symbolism useful for these circumstances.

Orders of Smallness; little "o symbol"

We often want to compare two function as a variable approaches some limit value x₀. A useful notation, that we already introduced for asymptotic estimates of sequences is Landau notation (“big-O” and “little-o”) used to describe the limiting behavior of a function.

Definition 6.1.1.

f(x) = o(g(x)) as x ⟶ x₀ if $lim_{x \to x_{0}} \frac{f (x)}{g (x)} = 0$

Notation: f is o(g), f ∈ o(g(x), f ∈ o(g), f = o(g).

The expression f(x) = o(g(x)) is read “f is little-o of g at x₀” or “f is of smaller order than g at x₀ (as x approaches x₀). □

The following expressions are also used.

f(x) = g(x) + o(g(x)) as x ⟶ x₀ which means lim_{x ⟶ x₀} [f(x) − g(x)]/φ(x) = 0

which means that, "the difference between f and g is of smaller order that φ as x approaches x₀. For example for x ⟶ 0, x² = o(x); x³ = o(x); x³ = o(x²).

The growth of a function representing the complexity of an algorithm can be estimated using the big-O notation as its variable increases.

Definition 6.1.2. (big-Oh) For functions f,g: ℝ ⟶ ℝ or f,g: ℕ ⟶ ℝ (sequences of real numbers) g dominates f if there exist constants C and k such that

|f(x)| ≤ C |g(x)| ∀x > k

Notation: f is O(g), f ∈ O(g(x), f ∈ O(g), f = O(g).

Proposition 6.1.3 (Little-Oh is Stronger than Big-Oh) I t f(x) = o(g(x)), x ⟶ x₀ then g(x) = O(f(x)), x ⟶ x₀.

The symbol o(..) (or O(..)) does not denote a particular function, but a set of functions each one having the property expressed by definition 6.1.1. The equality sign in the definition f(x) = o(g(x)) obscures the fact that o(g(x)) is a set and writing f(x) ∈ o(g(x)) would make much more sense.

Prooposition 6.1.4. We have the following properties, given two functions f,g: ℝ ⟶ ℝ:

o(x) ± o(x) = o(x)
for x ⟶ 0, o(x) + o(x²) = o(x).
for x ⟶ +∞, o(x) + o(x²) = o(x²).
o(x) = −o(x) and o(−4x²) = o(x²) (multiplicative coefficients are irrelevant).
xⁿ = o(x^m), x ⟶ 0 ⇐⇒ n > m.
f ⋅ o(g) = o(f ⋅ g), e.g.
- xo(x²) = o(x³)
- o(x³) /x = o(x²).
- o(x²) /x² = o(1).
o(f) ⋅ o(g) = o(f ⋅ g), e.g. o(x) o(x²) = o(x³).

Proof. (i) o(x) − o(x) means difference between two sets. Given two sets A,B then A − B (different from A \ B) is defined by A − B = {a − b | a ∈ A, b ∈ B} e.g ℤ − ℤ = ℤ (but ℤ \ ℤ = {∅}); Analogous definition for A + B. It follows directly from the definition of little-o that the sum of two (and hence of any finite number) of little-o functions is a little-o function, and multiplying a little-oh function by a contant yields another little-o function.

(ii) If f(x) ∈ o(x) lim_{x ⟶ 0} f(x)/x = 0 and if g(x) in o(x²) lim_{x ⟶ 0} g(x)/x² = 0 from this follows that if f(x) + g(x) ∈ o(x) + o(x²) then we have lim_{x ⟶ 0} [f(x)+g(x)]/x = 0 that is f(x) + g(x) ∈ o(x) thus o(x) + o(x²) ⊆ o(x). The other inclusion o(x) + o(x²) ⊇ o(x) is obvious hence we have equality.

(v) In fact lim_{x ⟶ 0} xⁿ/x^m = lim_{x ⟶ 0} x^{n − m} = 0 if and only if n − m > 0; □

Definition 6.1.5. We say that f(x) and g(x) are equivalent infinitesimal as x ⟶ x₀ if

lim_{x \to x_{0}} \frac{f (x)}{g (x)} = 1

In particular, the symbol o(1) indicates an infinitesimal quantity. For example we write f(x) = o(1) for x → x₀ to indicate that f is infinitesimal, since:

lim_{x \to x_{0}} \frac{f (x)}{1} = 0

People will write 1 + x + x² = 1 + x + o(x) = 1 + o(1) and "=" will mean "element of", then "subset of" (implicit limit is x ⟶ 0).

For the functions cons x and sin x the linearization process at x = 0, gives

1 − cos x = o(x) and sin x = x + o(x) as x ⟶ 0

when we studied the asymptotic comparison we saw that for x ⟶ 0

sin x ~ x hence sin x = x + o(x)
1 − cos x ~ x²/2 hence 1 − cos x = o(x)

The asymptotic relation ~ and the little-o notation are indeed equivalent.

Theorem 6.1.6. (Relation between "~" and o()). The following equivalence property holds

for x ⟶ x₀, f(x) ~ g(x) iff f(x) = g(x) + o(g(x)) □

The derivative with the little-o symbol

The definition of derivative can be written with the little-oh symbol as:

f(x₀ + dx) − f(x₀) = f'(x₀) dx + o(dx) for dx ⟶ 0

This is known as first order linear approximation. It can be written more succinctly as

Δf(x₀) = df(x₀) + o(dx) for dx ⟶ 0

and can also be expressed as:

Δf(x₀) ≈ df(x₀) near x₀

The rules to evaluate differentials are analogous to those for derivatives

d(f±g) = df ± dg

d(f ⋅ g) = g df ± f dg

d(f/g) = (gdf − fdg)/g²

Noti also that

\frac{d f}{f} = d \log | f |

logarihtmic differential of f.

Example 6.1.7. Evaluate the linear approximation of

f (x) = \sqrt{1 + x}

for x ⟶ 0. Here x₀ = 0, hence x coincides with the increment dx. We have

f^{'} (x) = \frac{1}{2 \sqrt{1 + x}} f^{'} (0) = \frac{1}{2}

we have

sqrt(1 + x) − 1 ≈ x/2

or equivalently

\sqrt{1 + x} = 1 + \frac{1}{2} x + o (x), for x \to 0

More generally considering the function f(x) = (1 + x)^α, we have

f'(x = α(1 + x)^{α − 1} f'(0)

and we obtain

(1 + x)^α − 1 ≈ αx. ■

«Derivative Index Null »