The Linear Approximation
The linear approximation amounts essentially to replacing the curve f(x) at a point by a short segment of its tangent line at that point. When we zoom in far enough on the graph of a differentiable function, it looks like a straight line, i.e the function and its tangent line become indistinguishable.
We start by approximating the increment of a function f, when its independent variable changes from x0 to x0 + dx.
Let f: (a, b) → ℝ differentiable at x0; as the independent variable is incremented from x0 to x0 + dx, the increment of f is:
Δf = f(x0 + dx) − f(x0)
which in general, fixed x0, this increment is not proportional to dx i.e. is not linear with respect to dx.
the increment along the tangent line is equal to the lenght (with sign!) of the segment QR that is equal to tg α, that is to f'(x0) dx, recalling that f'(x0) dx = tg α. This increment is proportional to dx, and is known as differential of f at x0 and it is indicated by df(x0):
df(x0) = f'(x0)dx
Thus whereas Δf represents the increment of the “y-coordinate” along the graph of f at x0, df represents the increment along the tangent line to this graph at x0.
What's the error of the approximation, that is the difference between Δf and df(x0)? To answer just recall the definition of derivative at a point:
we can write the derivative withouth the symbol of limit as:
where ε(dx) is a quantity that tends to 0 for dx ⟶ 0, that is an infinitesimal quantity for dx → 0
By multiplying both sides by dx we have
f'(x0 + dx) −f(x0) = f'(x0) dx + dx ⋅ ε(dx)
or equivalently:
Δf − df(x0) = dx ⋅ ε(dx)
The quantity dx ⋅ ε(dx) is a quantity, that divided by dx, tends to zero: this means that dx ⋅ ε(dx) is an infinitesimal of higher order than dx. We introduce a symbolism useful for these circumstances.
Orders of Smallness; little "o symbol"
We often want to compare two function as a variable approaches some limit value x0. A useful notation, that we already introduced for asymptotic estimates of sequences is Landau notation (“big-O” and “little-o”) used to describe the limiting behavior of a function.
Definition 6.1.1.
f(x) = o(g(x)) as x ⟶ x0 if
Notation: f is o(g), f ∈ o(g(x), f ∈ o(g), f = o(g).
The expression f(x) = o(g(x)) is read “f is little-o of g at x0” or “f is of smaller order than g at x0 (as x approaches x0). □
The following expressions are also used.
f(x) = g(x) + o(g(x)) as x ⟶ x0 which means limx ⟶ x0 [f(x) − g(x)]/φ(x) = 0
which means that, "the difference between f and g is of smaller order that φ as x approaches x0. For example for x ⟶ 0, x2 = o(x); x3 = o(x); x3 = o(x2).
The growth of a function representing the complexity of an algorithm can be estimated using the big-O notation as its variable increases.
Definition 6.1.2. (big-Oh) For functions f,g: ℝ ⟶ ℝ or f,g: ℕ ⟶ ℝ (sequences of real numbers) g dominates f if there exist constants C and k such that
|f(x)| ≤ C |g(x)| ∀x > k
Notation: f is O(g), f ∈ O(g(x), f ∈ O(g), f = O(g).
Proposition 6.1.3 (Little-Oh is Stronger than Big-Oh) I t f(x) = o(g(x)), x ⟶ x0 then g(x) = O(f(x)), x ⟶ x0.
The symbol o(..) (or O(..)) does not denote a particular function, but a set of functions each one having the property expressed by definition 6.1.1. The equality sign in the definition f(x) = o(g(x)) obscures the fact that o(g(x)) is a set and writing f(x) ∈ o(g(x)) would make much more sense.
Prooposition 6.1.4. We have the following properties, given two functions f,g: ℝ ⟶ ℝ:
o(x) ± o(x) = o(x)
for x ⟶ 0, o(x) + o(x2) = o(x).
for x ⟶ +∞, o(x) + o(x2) = o(x2).
o(x) = −o(x) and o(−4x2) = o(x2) (multiplicative coefficients are irrelevant).
xn = o(xm), x ⟶ 0 ⇐⇒ n > m.
f ⋅ o(g) = o(f ⋅ g), e.g.
xo(x2) = o(x3)
o(x3) /x = o(x2).
o(x2) /x2 = o(1).
o(f) ⋅ o(g) = o(f ⋅ g), e.g. o(x) o(x2) = o(x3).
Proof. (i) o(x) − o(x) means difference between two sets. Given two sets A,B then A − B (different from A \ B) is defined by A − B = {a − b | a ∈ A, b ∈ B} e.g ℤ − ℤ = ℤ (but ℤ \ ℤ = {∅}); Analogous definition for A + B. It follows directly from the definition of little-o that the sum of two (and hence of any finite number) of little-o functions is a little-o function, and multiplying a little-oh function by a contant yields another little-o function.
(ii) If f(x) ∈ o(x) limx ⟶ 0 f(x)/x = 0 and if g(x) in o(x2) limx ⟶ 0 g(x)/x2 = 0 from this follows that if f(x) + g(x) ∈ o(x) + o(x2) then we have limx ⟶ 0 [f(x)+g(x)]/x = 0 that is f(x) + g(x) ∈ o(x) thus o(x) + o(x2) ⊆ o(x). The other inclusion o(x) + o(x2) ⊇ o(x) is obvious hence we have equality.
(v) In fact limx ⟶ 0 xn/xm = limx ⟶ 0 xn − m = 0 if and only if n − m > 0; □
Definition 6.1.5. We say that f(x) and g(x) are equivalent infinitesimal as x ⟶ x0 if
In particular, the symbol o(1) indicates an infinitesimal quantity. For example we write f(x) = o(1) for x → x0 to indicate that f is infinitesimal, since:
People will write 1 + x + x2 = 1 + x + o(x) = 1 + o(1) and "=" will mean "element of", then "subset of" (implicit limit is x ⟶ 0).
For the functions cons x and sin x the linearization process at x = 0, gives
1 − cos x = o(x) and sin x = x + o(x) as x ⟶ 0
when we studied the asymptotic comparison we saw that for x ⟶ 0
sin x ~ x hence sin x = x + o(x)
1 − cos x ~ x2/2 hence 1 − cos x = o(x)
The asymptotic relation ~ and the little-o notation are indeed equivalent.
Theorem 6.1.6. (Relation between "~" and o()). The following equivalence property holds
for x ⟶ x0, f(x) ~ g(x) iff f(x) = g(x) + o(g(x)) □
The derivative with the little-o symbol
The definition of derivative can be written with the little-oh symbol as:
f(x0 + dx) − f(x0) = f'(x0) dx + o(dx) for dx ⟶ 0
This is known as first order linear approximation. It can be written more succinctly as
Δf(x0) = df(x0) + o(dx) for dx ⟶ 0
and can also be expressed as:
Δf(x0) ≈ df(x0) near x0
The rules to evaluate differentials are analogous to those for derivatives
d(f±g) = df ± dg
d(f ⋅ g) = g df ± f dg
d(f/g) = (gdf − fdg)/g2
Noti also that
logarihtmic differential of f.
Example 6.1.7. Evaluate the linear approximation of
for x ⟶ 0. Here x0 = 0, hence x coincides with the increment dx. We have
we have
sqrt(1 + x) − 1 ≈ x/2
or equivalently
More generally considering the function f(x) = (1 + x)α, we have
f'(x = α(1 + x)α − 1 f'(0)
and we obtain
(1 + x)α − 1 ≈ αx. ■