Note: This post is an English adaptation of my original Chinese article (URL). Some parts have been modified for clarity, cultural relevance, or to better fit the English-speaking audience.
I was once puzzled by this issue, so I’d like to briefly share my understanding now. It may not be entirely correct, but here’s my take:
Without losing generality, let’s consider a $1$-dimensional space, as the conclusions here can be extended to $n$ dimensions.
First, in the Euler-Lagrange equation, the Lagrangian $\mathcal{L}$ is defined as a multivariable function $\mathcal{L}=f : \mathbb{R}^{2 N+1} \rightarrow \mathbb{R}$, so $\mathcal{L}(q_1, \cdots ,q_N;v_1, \cdots ,v_N;t) : \mathbb{R}$. Therefore, when we consider expressions like $\dfrac{\partial \mathcal{L}}{\partial q_i}$, $\dfrac{\partial \mathcal{L}}{\partial v_i}$, or $\dfrac{\partial \mathcal{L}}{\partial t}$, we are treating $q_i$, $v_i$, and $t$ as three entirely independent variables, much like how we handle variables $x$, $y$, and $z$ in the function $f(x, y, z)$.
Why is that? Fundamentally, it’s because this is how the mathematical definition of partial derivatives works. Even if the variables we’re differentiating are interrelated: for example, consider the function $f(x(t), t)$. Clearly, $x$ is a function of $t$, but when you take the partial derivative of $f$ with respect to $t$, it doesn’t affect $x(t)$ at all, and when you take the partial derivative of $f$ with respect to $x(t)$, it doesn’t affect $t$. This is simply how partial derivatives operate, and you must refer to the formal definition of partial derivatives to grasp this. Therefore, when we take partial derivatives of the Lagrangian $\mathcal{L}$, $q_i$, $v_i$, and $t$ are treated as independent variables.
However, when we consider the derivative $\dfrac{\mathrm{d} \mathcal{L}}{\mathrm{d} t}$, we are effectively treating the Lagrangian $\mathcal{L}$ as a single-variable function $\mathcal{L}=f(t): \mathbb{R} \rightarrow \mathbb{R}$. Thus, we need to apply the chain rule to expand it, resulting in:
$$
\displaystyle \frac{\mathrm{d} \mathcal{L}}{\mathrm{d} t}=\frac{\partial \mathcal{L}}{\partial t}+\sum_{i=1}^N \frac{\partial \mathcal{L}}{\partial q_i} \frac{\mathrm{d} q_i}{\mathrm{d} t} +\sum_{i=1}^N \frac{\partial \mathcal{L}}{\partial v_i} \frac{\mathrm{d} v_i}{\mathrm{d} t}\\
$$
It is important to note that the derivative $\dfrac{\mathrm{d} \mathcal{L}}{\mathrm{d} t}$ referred to here is the total derivative of the Lagrangian $\mathcal{L}$. According to the definition of the total derivative, the function being differentiated should be a single-variable function.
Therefore, the essence of solving this question lies in understanding the formal definitions of total derivatives and partial derivatives.
Updated: 2024-09-26
A friend of mine recently mentioned that he still doesn’t quite understand why the Lagrangian behaves differently in the Euler-Lagrange equation versus when considering the total derivative of it after reading my initial post, so I decided to clarify this distinction by rewriting my initial post into a formal mathematical approach. So here’s an updated post, using extremely formal and rigorous mathematical language, to explain the reasoning behind it.
I used to be perplexed by this question, and now, after some time, I’d like to share my understanding of it from a purely mathematical perspective (note: this might not be the definitive answer).
Without losing generality, let’s consider a $1$-dimensional space, as the conclusions here can be extended to $n$ dimensions.
First and foremost, let’s clarify an important point: in the Euler-Lagrange equation, the Lagrangian $\mathcal{L}$ is defined as a multivariate function $\mathcal{L} = f(q_1, \cdots, q_N; v_1, \cdots, v_N; t)$, where $f: \mathbb{R}^{2N+1} \to \mathbb{R}$.
For the partial derivatives $\dfrac{\partial \mathcal{L}}{\partial q_i}$, $\dfrac{\partial \mathcal{L}}{\partial v_i}$, or $\dfrac{\partial \mathcal{L}}{\partial t}$, we treat $q_i$, $v_i$, and $t$ as three completely independent variables, much like how we treat $x$, $y$, and $z$ in a function $f(x, y, z)$.
According to the formal definition of partial derivatives (note: set $m = 1$ and $n = 2N+1$ in the diagram, which aligns the function $\mathbf{f}$ as $\mathbb{R}^{2N+1} \to \mathbb{R}$, thereby matching the type of the function Lagrangian $\mathcal{L}$. Let $\mathbf{f}(\mathbf{x}) = \mathcal{L}$, which yields $f_i(\mathbf{x}) = \mathbf{f}(\mathbf{x}) = \mathcal{L}$):
Which indicates that when calculating partial derivatives, we disregard the relationships between the input variables of the function, as each input in a multivariate function forms an independent dimension. Therefore, when calculating the partial derivative $f_i(\mathbf{x} + t \mathbf{e}_j) -f_i(\mathbf{x})$, we are only looking at the change in a single input, while the input variables are orthogonal to each other.
For example, consider the function $f(x(t), t)$. Clearly, $x$ is a function of $t$, but when you take the partial derivative of $f$ with respect to $t$, it won’t affect $x(t)$ at all, and similarly, taking the partial derivative of $f$ with respect to $x(t)$ won’t affect $t$. This is because, within this function, the dimensions formed by $x(t)$ and $t$ are orthogonal and do not influence each other.
Now, regarding the total derivative, its formal definition is:
where $\mathbf{f}'(\mathbf{x})$ is defined as:
Once these definitions and theorems are in place, consider the following example:
Thus, when calculating the total derivative $\dfrac{d\mathcal{L}}{dt}$ of the Lagrangian $\mathcal{L}$, the Lagrangian $\mathcal{L}$ is defined as $\mathcal{L} = (f \circ \gamma) : \mathbb{R} \rightarrow \mathbb{R}$, where $f: \mathbb{R}^{2N+1} \to \mathbb{R}$ and $\gamma: \mathbb{R} \to \mathbb{R}^{2N+1}$.
So, if we set $g(t) = \mathcal{L}(t) = (f \circ \gamma)(t)$ and define $\gamma(t) = \begin{bmatrix} q_1 & \cdots & q_N & v_1 & \cdots & v_N & t \end{bmatrix}^T$, using the chain rule to expand the total derivative of the Lagrangian $\mathcal{L}$, we obtain the following formula:
$$
\displaystyle \frac{\mathrm{d} \mathcal{L}}{\mathrm{d} t}= \sum_{i=1}^N \left ( \ (D_i f ) (\gamma (t)) \ \gamma_i'(t) \ \right ) = \frac{\partial f}{\partial t}+\sum_{i=1}^N \frac{\partial f}{\partial q_i} \frac{\mathrm{d} q_i}{\mathrm{d} t} +\sum_{i=1}^N \frac{\partial f}{\partial v_i} \frac{\mathrm{d} v_i}{\mathrm{d} t}\
$$
Therefore, the essence of solving this question lies in understanding the formal definitions of total derivatives and partial derivatives.
Updated: 2024-11-02
Here is an update, as I feel I’ve gained a new understanding of the mathematical concept of total derivative.
Firstly, most textbooks provide the total derivative formula for the Lagrangian $\mathcal{L}$ as:
$$
\frac{\mathrm{d} \mathcal{L}}{\mathrm{d} t}= \frac{\partial \mathcal{L}}{\partial t}+\sum_{i=1}^N \frac{\partial \mathcal{L}}{\partial q_i} \frac{\mathrm{d} q_i}{\mathrm{d} t} +\sum_{i=1}^N \frac{\partial \mathcal{L}}{\partial v_i} \frac{\mathrm{d} v_i}{\mathrm{d} t}
$$
On the other hand, the total derivative formula I wrote for the Lagrangian $\mathcal{L}$ is:
$$
\frac{\mathrm{d} \mathcal{L}}{\mathrm{d} t} = \frac{\partial f}{\partial t}+\sum_{i=1}^N \frac{\partial f}{\partial q_i} \frac{\mathrm{d} q_i}{\mathrm{d} t} +\sum_{i=1}^N \frac{\partial f}{\partial v_i} \frac{\mathrm{d} v_i}{\mathrm{d} t}
$$
Then, who made a mistake here? Now please let me analyze.
Generally, textbooks provide the general formula for the total derivative of any function $f: \mathbb{R}^N \rightarrow \mathbb{R}$ as:
$$
\frac{\mathrm{d} f}{\mathrm{d} t}=\sum_{i=1}^N \frac{\partial f}{\partial x_i} \frac{\mathrm{d} x_i}{\mathrm{d} t}
$$
In fact, if we observe the LHS of this identity, we find that the type of function $f$ is not $\mathbb{R} \rightarrow \mathbb{R}$, then why is it acceptable to use the total derivative formula on that function directly?
I personally believe this is a matter of a historical notational convention, because:
For a function $f: \mathbb{R}^N \rightarrow \mathbb{R}$, when we write out the total derivative $\dfrac{\mathrm{d} f}{\mathrm{d} t}$, we are actually referring to $\dfrac{\mathrm{d} z}{\mathrm{d} t}$, where $z(t) = (f \circ g)(t)$, and $z: \mathbb{R} \rightarrow \mathbb{R}$, $f: \mathbb{R}^N \rightarrow \mathbb{R}$, $g: \mathbb{R} \rightarrow \mathbb{R}^N$, $g(t) = \begin{bmatrix} x_1 \cdots \ x_N \end{bmatrix}^{T}$
Thus, the correct formula for the total derivative should be:
$$
\frac{\mathrm{d}z}{\mathrm{d} t}=\sum_{i=1}^N \frac{\partial f}{\partial x_i} \frac{\mathrm{d} x_i}{\mathrm{d} t}
$$
However, comparing this with the original total derivative formula:
$$
\frac{\mathrm{d} f}{\mathrm{d} t}=\sum_{i=1}^N \frac{\partial f}{\partial x_i} \frac{\mathrm{d} x_i}{\mathrm{d} t}
$$
We find that the change is actually minimal, with the only difference lying on the LHS. Therefore, conventionally, if we see the ordinary derivative symbol $\dfrac{\mathrm{d}}{\mathrm{d} t}$ applied to a multivariate function $f: \mathbb{R}^N \rightarrow \mathbb{R}$, as $\dfrac{\mathrm{d} f}{\mathrm{d} t}$, it means that we are actually referring to the total derivative $\dfrac{\mathrm{d}(f \circ g)}{\mathrm{d} t}$. If we see the ordinary derivative symbol $\dfrac{\mathrm{d}}{\mathrm{d} t}$ applied to a univariate function $f: \mathbb{R} \rightarrow \mathbb{R}$, then it remains unchanged.
Leave a Reply