Gradient and Directional Derivatives: Calculus 3 project by Charles Tang and Hengyuan Zhang
Overview
This Project has been completed as part of a standard 10 weeks Calculus 3 asynchronous online course with optional WebEx sessions during Summer 2021 Semester at MassBay Community College, Wellesley Hills, MA.
Introduction
Imagine standing on a mountain blindfolded, and you are tasked with reaching the peak of the mountain. You feel your foot around the surface you are standing on, and realize one of the directions is the steepest in the upwards direction. You take a step in that direction and repeat that until you reach the peak. Your path might look something like the image below. To the right of the mountain is an image showing what your path might look like if you were to draw level curves along the mountain.
*the vector notation used below is the column vector notation, which is simply the transpose of the row vector and means the exact same thing
A transposed vector is a "flipped" version of the vector.
For example, the row-vector \(<a,b,c>\)would be \(\begin{align} \begin{bmatrix} a\\ b\\ c \end{bmatrix} \end{align}\)in column-vector notation. In more advance courses such as linear algebra and advanced calculus courses, you will see more of the column vector-notation.
What you just performed was an iterative process of using the gradient of some function, in that example - a mountain.
The Gradient
A gradient of a function or a vector field is a vector that encompasses all of the partial derivatives of the function. It is defined using the nabla symbol \(\nabla f(x,y)\), or grad \( f\) for some function \(f\).
Definition of Gradient
Gradient of \(f(x,y)\) Let \(z=f(x, y)\) be a function of two variables x and y such that all partial derivatives exist. The vector \(\nabla f(x,y)\) is called the gradient of \(f\) and is defined as
\(\vec \nabla f(x,y)=\begin{align} \begin{bmatrix} \dfrac{\partial f}{\partial x}\\ \dfrac{\partial f}{\partial y} \end{bmatrix} \end{align}\) | Gradient of \(f(x,y,z)\) Let \(f(x, y, z)\) be a function of three variables x, y, and z such that all partial derivatives exist. The vector \(\nabla f(x,y,z)\) is called the gradient of f and is deffined as
\(\vec \nabla f(x,y)=\begin{align} \begin{bmatrix} \dfrac{\partial f}{\partial x}\\ \dfrac{\partial f}{\partial y} \\ \dfrac{\partial f}{\partial z} \end{bmatrix} \end{align}\) | Even Higher Dimensions - Gradient of \(f(x,y,z, \dots)\) The previous definitions can be generalized to higher dimensions. Assume all partial derivatives of the function exist, then the gradient of is defined as \(\vec \nabla f(x,y)=\begin{align} \begin{bmatrix} \dfrac{\partial f}{\partial x}\\ \dfrac{\partial f}{\partial y} \\ \dfrac{\partial f}{\partial z}\\ \dots \\ \dots \end{bmatrix} \end{align}\) |
At every point along the surface of the function, there corresponds to a gradient vector that points in a direction normal to the level curve as long as the function has continuous first-order partial derivatives around the points to which the gradient is drawn from. This also means that the gradient vector points in the direction of the steepest ascent. In the image below, each arrow corresponds to a gradient vector drawn along a level curve of \(a = f(x_1, x_2)\) that points towards the relative maximum. As you can see, each gradient vector is normal to the level curve. This is analogous to our first example of climbing a mountain where the fastest way to get up the mountain is to continually climb along the gradient vectors until you reach the peak.
Example 1 - Calculating Gradient
Given \(z=f(x,y)=2x^3+4y^2+4xy\), find the gradient vector \(\nabla f(x,y)\) at point \((2, 3)\)
Solution
First, we need to find the partial derivative with respect to x and the partial derivative with respect to y in order to construct the vector for the gradient.
\(\dfrac{\partial} {\partial x}[f(x,y)]= \dfrac{\partial} {\partial x}[2x^3+4y^2+4xy]\)
\(\dfrac{\partial} {\partial x}[f(x,y)]=6x^2+4y\)
\(\dfrac{\partial} {\partial y}[f(x,y)]= \dfrac{\partial} {\partial y}[2x^3+4y^2+4xy]\)
\(\dfrac{\partial} {\partial y}[f(x,y)]=8y+4x\)
Thus, the gradient vector is \(\begin{align} \begin{bmatrix} 6x^2+4y\\ 8y+4x \end{bmatrix} \end{align}\) then evaluated at the point \((x, y) = (2, 3)\), the gradient vector at the point \(((2, 3), f(2, 3))\) is \(\begin{align} \begin{bmatrix} 6\times 2^2+4\times 3\\ 8\times 3+4\times 2 \end{bmatrix} \end{align} = \)\(\begin{align} \begin{bmatrix} 36\\ 32 \end{bmatrix} \end{align} \)
Example 2 - Calculating Gradient
Given \(f(x,y,z)=yz\sin(x)+e^{xyz}+4z\) , find the gradient vector \(\nabla f(x,y,z)\).
Solution:
First, we need to find the partial derivative with respect to x, the partial derivative with respect to y, and the partial derivative with respect to z in order to construct the vector for the gradient.
\(\dfrac{\partial f }{\partial x}=\dfrac{\partial }{\partial x} [yz \sin(x) +e^{xyz}+4z] \\ \dfrac{\partial f }{\partial x}=\dfrac{\partial }{\partial x} [yz \sin(x)] +\dfrac{\partial }{\partial x}[ e^{xyz}]+\dfrac{\partial }{\partial x}[4z] \\ \dfrac{\partial f }{\partial x}=yz \cos(x) + yz e^{xyz} \)
\(\dfrac{\partial f }{\partial y}=\dfrac{\partial }{\partial y} [yz \sin(x) +e^{xyz}+4z] \\ \dfrac{\partial f }{\partial y}=\dfrac{\partial }{\partial y} [yz \sin(x)] +\dfrac{\partial }{\partial y}[ e^{xyz}]+\dfrac{\partial }{\partial y}[4z] \\ \dfrac{\partial f }{\partial y}=z \sin(x) + xz e^{xyz} \)
\(\dfrac{\partial f }{\partial z}=\dfrac{\partial }{\partial z} [yz \sin(x) +e^{xyz}+4z] \\ \dfrac{\partial f }{\partial z}=\dfrac{\partial }{\partial z} [yz \sin(x)] +\dfrac{\partial }{\partial z}[ e^{xyz}]+\dfrac{\partial }{\partial z}[4z] \\ \dfrac{\partial f }{\partial z}=y\sin(x)+ xy e^{xyz}+4 \)
Constructing the gradient vector using the partial derivatives gives us:
\( \nabla f(x,y,z)= \begin{align} \begin{bmatrix} yz \cos (x)+yz e^{xyz}\\ z\sin(x) +xz e^{xyz}\\ y\sin(x) +xy e^{xyz}+4) \end{bmatrix} \end{align} \)
Please watch the supplemental video found near the bottom of the wiki page that summarizes what we learned above.
Directional Derivatives
Another question you might ask from the mountain analogy, in the beginning, is: How steep is the mountain in any specific direction? We know that the partial derivatives with respect to x and y represent the "steepness" of the mountain in the x and y directions respectively, but we don't know a way to find the rate of changes in any direction. The directional derivative in the direction of unit vector \(\vec u\) is represented with a capital \(D\) and a subscript \(\vec u\) i.e. \( D_{\vec u}\).
Definition of Directional Derivative
Directional Derivative of \(f(x,y)\) Let \(z=f(x, y)\) be a function of two variables x and y, and assume that \(f_x\) and \(f_y\) both exist. Then the directional derivative of f in the direction \(\vec u = <\cos(\theta), \sin(\theta)>\) is \(D_{\vec u}~ f(x,y)= \lim_{h \to 0}\dfrac {f(x+h\cos \theta, y+h \sin \theta)-f(x,y)}{h}=\\ \hspace{2.4cm} \dfrac{\partial f}{\partial x}\cos\theta+ \dfrac{\partial f}{\partial y}\sin\theta\)
| Directional Derivative of \(f(x, y, z)\) Let \(w=f(x, y, z)\) be a function of three variables x, y, and z and assume that \(f_x,~ f_y\) and \(f_z\) exist. Then the directional derivative of \(f\) in the direction \(\vec u = <\cos(\alpha), \cos(\beta), \cos(\gamma)>\) is \(D_{\vec u}~ f(x,y,z)= \lim_{h \to 0}\dfrac {f(x+h\cos \alpha, y+h \sin \beta,z+h \cos \gamma)-f(x,y,z)}{h}=\\ \hspace{2.4cm} \dfrac{\partial f}{\partial x}\cos\alpha+ \dfrac{\partial f}{\partial y}\cos\beta+\dfrac{\partial f}{\partial z}\cos\gamma\)
|
Directional Derivatives in Higher Dimensions Let \( z = f(x, y, z, ...)\) be a function of multiple variables \(x, y, z, ... ,\) and assume that all partial derivatives exist. Then the directional derivative of \(f \) in the direction \(\vec u = <cos(\alpha), cos(\beta), cos(\gamma), ...>\) is
\(D_{\vec u}~ f(x,y,z)= \lim_{h \to 0}\dfrac {f(x+h\cos \alpha, y+h \sin \beta,z+h \cos \gamma, \dots )-f(x,y,z, \dots)}{h}\) or \(D_{\vec u}~ f(x,y,z)= \dfrac{\partial f}{\partial x}\cos\alpha+ \dfrac{\partial f}{\partial y}\cos\beta+\dfrac{\partial f}{\partial z}\cos\gamma+ \dots= \nabla f \cdot \vec u\) |
The directional derivative of a function is the dot product between the gradient vector of the function and the unit vector of the specified direction. The general formula for the directional derivative is \(\nabla f \cdot \vec u\) for function \(f\) and unit vector \(\vec u\) . A directional derivative may look like one of the following images, where the direction of the unit vector is along the plane.
Using the directional derivative \(\nabla f \cdot \vec u\) , and our vector properties, we know that \( \vec \nabla f \cdot \vec{u}= || {\vec \nabla f}|| \cdot ||\vec u ||\cdot \cos (\theta)\).
Since \(\vec u\) is a unit vector and has a magnitude of 1, we know \( \vec \nabla f \cdot \vec{u}= || {\vec \nabla f}|| \cdot \cos (\theta)\) .
To answer a common question, in what direction does the directional derivative point in the direction of steepest ascent and/or steepest descent, we want to maximize/minimize the directional derivative \( \vec \nabla f \cdot \vec{u}= || {\vec \nabla f}|| \cdot \cos (\theta)\). To maximize the directional derivative, cos(θ) must equal 1 and the direction is thus the direction that the gradient points to. To minimize the directional derivative, cos(θ) must equal -1 and the direction is the direction opposite the gradient vector points towards. This proves our previous statement that the gradient points to the direction of the steepest ascent and the negative of the gradient point to the direction of steepest descent. This concept is useful in areas like Machine Learning and Artificial Intelligence where models are trained using an iterative method of finding the gradient.
Example 1 - Finding the Directional Derivative
Given \(z=f(x,y)=x^2e^y+3xy^2\) , find the directional derivative in the direction \(\theta=\dfrac{\pi}{3}.\)
Solution:
The directional vector is the unit vector:
\(\vec u= \begin{align} \begin{bmatrix} \cos(\pi/3)\\ \sin(\pi/3) \end{bmatrix} \end{align} =\)\(\begin{align} \begin{bmatrix} \dfrac{1}{2}\\ \dfrac{\sqrt {3}}{2} \end{bmatrix} \end{align} \)
The gradient vector for the function \(f(x, y)\) is:
`\vec (nabla) f(x,y)= [((partial f)/(partial x)),((partial f)/(partial x))]= [((partial )/(partial x) [x^2e^y+3xy^2]),((partial)/(partial y)[x^2e^y+3xy^2])]=[( 2x e^y +3y^2),(x^2 e^y+6xy) ]`
\(\vec \nabla f(x,y)=\begin{align} \begin{bmatrix} \dfrac{\partial f}{\partial x}\\ \dfrac{\partial f}{\partial y} \end{bmatrix} \end{align}\)=\(\begin{align} \begin{bmatrix} \dfrac{\partial }{\partial x} [x^2e^y+3xy^2]\\ \dfrac{\partial }{\partial y} [x^2e^y+3xy^2] \end{bmatrix} \end{align} = \)\(\begin{align} \begin{bmatrix} 2x e^y +3y^2\\ x^2 e^y+6xy \end{bmatrix} \end{align} \)
Using the definition and formula to find the directional derivative given the gradient and the unit vector, we find the dot product between the two:
\(D_{\vec u}~ f(x,y)= \nabla f(x,y) \vec u=\)\(\begin{align} \begin{bmatrix} 2x e^y +3y^2\\ x^2 e^y+6xy \end{bmatrix} \end{align} \cdot \)\(\begin{align} \begin{bmatrix} \dfrac{1}{2}\\ \dfrac{\sqrt {3}}{2} \end{bmatrix} \end{align}= \)\(\dfrac{1}{2}(2x e^y +3y^2)+\dfrac{\sqrt{3}}{2}(x^2 e^y+6xy)\)
Example 2 - Finding the Directional Derivative
Given \(z=f(x,y)=\cos(xy)\), find the directional derivative in the direction of the vector \(\vec v=<4,3>\) at the point \((1,\dfrac{\pi}{2})\) .
Solution:
First we need to make our directional vector a unit vector \(\vec u_v\), so we scale it by its magnitude
\(|| \vec v||=\sqrt {4^2+3^2}=\sqrt{25}=5 \Rightarrow \vec {u_v }=\Big<\dfrac{4}{5},\dfrac{3}{5} \Big>\)
Next we use the formula for the directional derivative: the dot product between the gradient and the unit directional vector. To find the gradient, we need to take the partial derivatives with respect to x and y. The gradient vector for the function \(f(x, y)\) is:
\(\vec \nabla f(x,y)=\begin{align} \begin{bmatrix} \dfrac{\partial f}{\partial x}\\ \dfrac{\partial f}{\partial y} \end{bmatrix} \end{align}\)\(=\begin{align} \begin{bmatrix} \dfrac{\partial }{\partial x} [cos(xy)]\\ \dfrac{\partial }{\partial y} [cos(xy)] \end{bmatrix} \end{align} = \)\(\begin{align} \begin{bmatrix} -y\sin(xy)\\ -x\sin(xy) \end{bmatrix} \end{align} \)
Using the definition and formula to find the directional derivative given the gradient and the unit vector, we find the dot product between the two:
\(D_{\vec u}~ f(x,y)= \nabla f(x,y) \vec u_v=\)\(\begin{align} \begin{bmatrix} -y\sin(xy)\\ -x\sin(xy) \end{bmatrix} \end{align} \)\(\cdot \begin{align} \begin{bmatrix} \dfrac{4}{5}\\ \dfrac{3}{5} \end{bmatrix} \end{align}= \)\(\dfrac{4}{5}(-y\sin(xy))+\dfrac{3}{5}(-x \sin(xy)\)
Evaluating it at the point \((1, \dfrac{\pi}{2})\), we get
\(D_{\vec u}~ f(1,\dfrac{\pi}{2})= \dfrac{4}{5}(-\dfrac{\pi}{2}\sin(\dfrac{\pi}{2}))+ \dfrac{3}{5}(-\sin(\dfrac{\pi}{2}))=-\dfrac{2\pi}{5}-\dfrac{3}{5}=\dfrac{-2\pi-3}{5}\)
Please watch the supplemental video found near the bottom of the wiki page that summarizes what we learned above.
Properties of the Gradient using Directional Derivatives
Suppose the function \(f(x,y)\) is differentiable at \((x_0,y_0)\) , then:
- If \(\nabla f(x_0,y_0)=0\) , then \(D_{\vec u}~ f(x_0, y_0)=0\) for any unit vector \(\vec{u}\).
- If \(\nabla f(x_0,y_0)\ne 0\), then \(D_{\vec u}~ f(x_0, y_0)\) is maximized when \(\vec{u}\) points in the same direction as \(\nabla f(x_0,y_0)\). The maximum value is \(||\nabla f(x_0,y_0)||\)
- If \(\nabla f(x_0,y_0)\ne 0\) , then \(D_{\vec u}~ f(x_0, y_0)\) is minimized when \(\vec{u}\) points in the opposite direction as \(\nabla f(x_0,y_0)\) . The maximum value is \(-||\nabla f(x_0,y_0)||\) .
These properties are pretty intuitive as we have discussed some above.
Supplemental Videos
Directional Derivatives, Gradient
Applications of Gradients and Directional Derivatives
Please watch this video before moving onto the next section to understand how the gradient plays a role in machine learning.
https://youtu.be/IHZwWFHWa-w
Conclusion
Gradients and Directional Derivatives play a powerful role in math and other fields, especially machine learning. With the formulas and processes shown above, you are now able to calculate the gradient and directional derivatives of multivariate functions. If you are on a mountain with a defined function giving its height everywhere, you are able to find how steep the mountain in every direction and the direction of steepest ascent/descent. We hope you found this wiki page about gradients and directional derivatives helpful.
Be sure to check out two other wonderful wiki pages below to learn more topics in Calculus!
Arc-length in Polar Coordinates
References
[1] D. Nykamp, An introduction to the directional derivative and the gradient https://mathinsight.org/directional_derivative_gradient_introduction
[2] G. Strang, E. Herman, N. Bila, S. Boyd, D. Smith, E. Terry, D. Torain, K. Messer, A. Mulzet, W. Radulovich, E. Rutter, D. McCune, M. Merriweather, J. Lakey, J. Levandosky, C. Abbott, J. Debnath, Calculus Volume 3, OpenStax, 2016.
[3] P. Dawkins, Pauls Online Notes https://tutorial.math.lamar.edu/Classes/CalcIII/DirectionalDeriv.aspx (updated March 10, 2021)
[4] S. Jamshidi, Directional Derivatives and the Gradient Vector http://www.personal.psu.edu/sxj937/Notes/Directional_Derivatives_and_The_Gradient_Vector.pdf (Updated 2013)