Author: Louis Liu

Intro to Generalized Coordinates
Note: This post is an English adaptation of my original Chinese article (URL). Some parts have been modified for clarity, cultural relevance, or to better fit the English-speaking audience.

When you first started learning analytical mechanics, have you ever been confused about what generalized coordinates really are? Are they just “generalized” versions of Cartesian coordinates? How general can they be?

In essence, generalized coordinates are a set of parameterized, irreducible, and independent variables that can fully describe every possible state of a mechanical system subject to various constraints. (Here, “irreducible” basically means “minimal,” implying that the number of these coordinates is exactly what is needed to completely describe the system under its constraints. If you have more coordinates than that, the set contains dependent coordinates; if you have fewer, you cannot fully describe the system.)

So, what’s the difference compared to Cartesian coordinates? It may seem that generalized coordinates only add “parameterization” and “irreducibility” as properties.

But let’s clarify one point first: both generalized coordinates and Cartesian coordinates can “fully describe the system’s motion under constraints.” Why then do we even need generalized coordinates in analytical mechanics? Is it because Cartesian coordinates are not sufficient? Or are those extra properties — “parameterization” and “irreducibility” — really that crucial? Let’s take an example to answer these questions.

Assume we have a classical mechanical system with $N$ particles in $D$ dimensions, subject to $M$ constraint equations. (We assume the system is “well-behaved,” meaning all constraints are integrable, independent, etc. We’ll use Newtonian mechanics for our discussion to highlight the difference between generalized and Cartesian coordinates.)
- Using Cartesian coordinates: we first write down all the force-component equations for each particle, then add these $M$ constraint equations, leading to a total of $N \times D + M$ equations to solve.
- Using generalized coordinates: we would first apply some parameterization methods, incorporating the known constraint equations to determine a set of generalized coordinates. After that, we only need to write down the force-component equations in terms of these generalized coordinates, which leaves us with $N \times D – M$ equations to solve.
Combined with some past experience of equation-solving, we know that the generalized-coordinate approach is usually better for two obvious reasons:
1. The total number of equations to solve is reduced.
2. The constraint equations are effectively “built into” the coordinate variables themselves, which often makes the resulting equations easier to handle (e.g., avoiding complicated coupled equations).
Essentially, this is because a set of generalized coordinates reveals the degrees of freedom of the system. However, be careful: in cases involving nonintegrable constraints, the number of generalized coordinates does not necessarily equal the system’s degrees of freedom. (For details, here is an excellent article on Zhihu: Link)

Now, let’s talk about how to determine those irreducible generalized coordinates through parameterization:

The simplest way is to just see it directly. For instance, with a 2D pendulum constraint, it’s quite straightforward to imagine using a single generalized coordinate $\theta$ to parameterize $x$ and $y$. But this only works for very simple problems in which the choice of generalized coordinates is obvious.

When the constraints are more complicated (but still integrable), there is a more systematic and mathematical method: using the Implicit Function Theorem. (For its proof, see this article on Zhihu) Once its conditions are satisfied—which we’ve assumed in our well-behaved system—you can directly conclude the necessary number of irreducible generalized coordinates, i.e., the system’s degrees of freedom. (Again, be reminded: in systems with nonintegrable constraints, the number of generalized coordinates may not equal the degrees of freedom, but we’re excluding such complications here.)

One thing to note is that the result you get from the Implicit Function Theorem is local, not global. Its validity holds in a neighborhood where the system is still well-behaved. For instance, with a single pendulum, applying the theorem at one point is typically enough, since the situation is pretty much the same for the other points — unless the pendulum swings overhead or something similar, in which case we must check if the conditions still hold there.

Once you’ve determined the number of generalized coordinates $X$ (under your assumed well-behaved conditions), you can go ahead and choose coordinates that best fit the system. Of course, you could naively pick $X$ coordinates out of your Cartesian set and call it a day, but that might not be the most efficient approach. Observe the constraints carefully and try to pick the most convenient generalized coordinates possible!
A Mathematical Exploration of the Virial Theorem

General Virial Theorem:

In a $3$-dimensional space of $N$ point masses with masses $\{m_i\}_{i \in N}$, positions $\{\vec{q}_i\}_{i \in N}$ and velocities $\{\vec{v}_i\}_{i \in N}$, if $\left < \dfrac{\mathrm{d}\left ( G(t) \right )}{\mathrm{d} t} \right >_{T} = 0$, then

$$2 \left < K(t) \right >_{T} + \left < \sum_{i=1}^{N} \left ( \vec{F}_i(t) \cdot \vec{q}_i(t) \right ) \right >_{T} =0$$

where $N \in \mathbb{N}$, $G(t) = \displaystyle \sum_{i=1}^{N} \left ( m_i \vec{q}_i(t) \cdot \vec{v}_i(t) \right )$, $\displaystyle K(t) = \dfrac{1}{2} \sum_{i=1}^{N} \left ( m_i \|\vec{v}_i(t)\|^2 \right )$, $\displaystyle \left < f(t) \right >_{T} = \frac{1}{T} \int_0^T f(t) \ \mathrm{d} t$

Proof:

Based on the definition of virial $G(t)$:

\[
G(t) = \sum_{i=1}^{N} \left ( m_i \vec{q}_i(t) \cdot \vec{v}_i(t) \right )
\]

Taking the time derivative on both sides, we have:

$$
\begin{align}
\dfrac{\mathrm{d}\left ( G(t) \right )}{\mathrm{d} t} &= \sum_{i=1}^{N} \left ( \dfrac{\mathrm{d} \left ( m_i \vec{q}_i(t) \cdot \vec{v}_i(t) \right )}{\mathrm{d} t} \right ) \\
&= \sum_{i=1}^{N} \left ( m_i \left ( \dfrac{\mathrm{d} \left ( \vec{q}_i(t)\right )}{\mathrm{d} t} \cdot \vec{v}_i(t) + \vec{q}_i(t) \cdot \dfrac{\mathrm{d} \left ( \vec{v}_i(t)\right )}{\mathrm{d} t} \right ) \right ) \\
&= \sum_{i=1}^{N} \left ( m_i \left ( \vec{v}_i(t) \cdot \vec{v}_i(t) + \vec{q}_i(t) \cdot \vec{a}_i(t) \right ) \right ) \\
&= \sum_{i=1}^{N} \left ( m_i \left \| \vec{v}_i(t) \right \|^2 + m_i \vec{a}_i(t) \cdot \vec{q}_i(t) \right ) \\
&= \sum_{i=1}^{N} \left ( m_i \left \| \vec{v}_i(t) \right \|^2 \right ) + \sum_{i=1}^{N} \left ( m_i \vec{a}_i(t) \cdot \vec{q}_i(t) \right ) \\
&= 2K(t) + \sum_{i=1}^{N} \left ( \vec{F}_i(t) \cdot \vec{q}_i(t) \right )
\end{align}
$$

$$
\begin{align}
\implies \frac{1}{T} \int_0^T \left ( \dfrac{\mathrm{d}\left ( G(t) \right )}{\mathrm{d} t} \right ) \ \mathrm{d} t &= \frac{1}{T} \int_0^T \left ( 2K(t) + \sum_{i=1}^{N} \left ( \vec{F}_i(t) \cdot \vec{q}_i(t) \right ) \right ) \ \mathrm{d} t\\
\frac{1}{T} \int_0^T \left ( \dfrac{\mathrm{d}\left ( G(t) \right )}{\mathrm{d} t} \right ) \ \mathrm{d} t &= 2 \left ( \frac{1}{T} \int_0^T \left ( K(t) \right ) \ \mathrm{d} t \right )+ \frac{1}{T} \int_0^T \left ( \sum_{i=1}^{N} \left ( \vec{F}_i(t) \cdot \vec{q}_i(t) \right ) \right ) \ \mathrm{d} t\\
\left < \dfrac{\mathrm{d}\left ( G(t) \right )}{\mathrm{d} t} \right >_T &= 2 \left < K(t) \right >_T + \left < \sum_{i=1}^{N} \left ( \vec{F}_i(t) \cdot \vec{q}_i(t) \right ) \right >_T
\end{align}
$$

If $\left < \dfrac{\mathrm{d}\left ( G(t) \right )}{\mathrm{d} t} \right >_{T} = 0$, then

$$
2 \left < K(t) \right >_T + \left < \sum_{i=1}^{N} \left ( \vec{F}_i(t) \cdot \vec{q}_i(t) \right ) \right >_T=0
$$

$\Box$

Virial Theorem with conservative system and homogeneous potential:

In a $3$-dimensional space, for a conservative system of $N$ point masses with masses $\{m_i\}_{i \in N}$, positions $\{\vec{q}_i\}_{i \in N}$ and velocities $\{\vec{v}_i\}_{i \in N}$, if $\left < \dfrac{\mathrm{d}\left ( G(t) \right )}{\mathrm{d} t} \right >_{T} = 0$ and $\forall i, \quad U_i(\vec{q}_i(t))$ are homogeneous of degree $\alpha$ (which $ \forall \lambda \in \mathbb{R}, \quad U_i(\lambda \vec{q}_i(t)) = \lambda^{\alpha} U_i(\vec{q}_i(t))$), then

$$
2 \left< K(t) \right>_T -\alpha \left< U(t) \right>_T = 0
$$

where $N \in \mathbb{N}$, $\alpha \in \mathbb{R}$, $G(t) = \displaystyle \sum_{i=1}^{N} \left ( m_i \vec{q}_i(t) \cdot \vec{v}_i(t) \right )$, $\displaystyle K(t) = \dfrac{1}{2} \sum_{i=1}^{N} \left ( m_i \|\vec{v}_i(t)\|^2 \right )$, $\displaystyle U(t) = \sum_{i=1}^{N} \left ( U_i(\vec{q}_i(t)) \right )$, $\displaystyle \left < f(t) \right >_{T} = \frac{1}{T} \int_0^T f(t) \ \mathrm{d} t$

Proof:

Applying the General Virial Theorem and the definition of conservative system which $\vec{F}_i(t) = -\nabla U_i(\vec{q}_i(t))$, we have

$$
\begin{align}
2 \left < K(t) \right >_T + \left < \sum_{i=1}^{N} \left ( \vec{F}_i(t) \cdot \vec{q}_i(t) \right ) \right >_T &= 0 \\ \\
2 \left< K(t) \right>_T + \left< \sum_{i=1}^{N} \Bigg( \Big( -\nabla U_i(\vec{q}_i(t)) \Big) \cdot \vec{q}_i(t) \Bigg) \right>_T &= 0 \\
\end{align}
$$

Applying the Euler’s Homogeneous Function Theorem as $\forall i, \quad U_i(\vec{q}_i)$ are homogeneous of degree $\alpha$, which results in $\nabla U_i(\vec{q}_i(t)) \cdot \vec{q}_i(t) = \alpha U_i(\vec{q}_i(t))$, then we have

$$
\begin{align}
2 \left< K(t) \right>_T + \left< \sum_{i=1}^{N} \Bigg( \Big( -\nabla U_i(\vec{q}_i(t)) \Big) \cdot \vec{q}_i(t) \Bigg) \right>_T &= 0 \\ \\
2 \left< K(t) \right>_T + \left< \sum_{i=1}^{N} \Bigg( -\Big( \nabla U_i(\vec{q}_i(t)) \cdot \vec{q}_i(t) \Big) \Bigg) \right>_T &= 0 \\ \\
2 \left< K(t) \right>_T + \left< \sum_{i=1}^{N} \Bigg( -\alpha U_i(\vec{q}_i(t)) \Bigg) \right>_T &= 0 \\ \\
2 \left< K(t) \right>_T -\alpha \left< \sum_{i=1}^{N} \Bigg( U_i(\vec{q}_i(t)) \Bigg) \right>_T &= 0 \\ \\
2 \left< K(t) \right>_T -\alpha \left< U(t) \right>_T &= 0
\end{align}
$$

$\Box$
Recommendations for Rigorous Classical Mechanics Textbooks
Note: This post is an English adaptation of my original Chinese article (URL). Some parts have been modified for clarity, cultural relevance, or to better fit the English-speaking audience.

As a student of pure mathematics, I once found myself on a quest to find extremely rigorous textbooks for studying classical mechanics — something in the Bourbaki style, akin to a “baby Rudin” version of classical mechanics.

Due to the different thinking patterns between mathematics and natural sciences, I often approach natural science through the lens of formal science. Unfortunately, most of the available textbooks didn’t suit my taste. For example, when it comes to introductory calculus, my personal favorite is Baby Rudin (Walter Rudin’s Principles of Mathematical Analysis). Although it’s difficult to digest, I simply cannot accept textbooks that skip the $\varepsilon – \delta$ definition of limits and jump straight to using limits. This used to leave me baffled—questions like “Where does this come from?” and “What justifies this?” plagued my reading. I read painstakingly slow, even attempting to force myself to accept and understand these imprecise definitions and derivations, but I just couldn’t. At that time, I even began to question my intelligence, feeling as though I must be too slow to comprehend what others grasped with ease.

That painful confusion lasted until I saw all the related proofs laid out in Baby Rudin. It was then that the fog lifted entirely.

After this experience, I realized that I can only walk the path of formal sciences, such as pure mathematics and computer science. It’s not that I’m unwilling to learn other subjects — rather, the learning cost for me is too high. My formal science mindset simply does not support my ability to easily accept and understand the common textbooks used in natural science.

Fortunately, after browsing many forums and reading numerous books, I found a few classical mechanics textbooks that emphasize rigor:
1. Geometric Mechanics and Symmetry: From Finite to Infinite Dimensions – Darryl D. Holm, Tanya Schmah, and Cristina Stoica
2. Introduction to Mechanics and Symmetry: A Basic Exposition of Classical Mechanical Systems (2nd Edition) – Jerrold E. Marsden and Tudor S. Ratiu
3. Mathematical Methods of Classical Mechanics – V. I. Arnold
4. Mechanics: Volume 1 – L. D. Landau and E. M. Lifshitz
Among these, I believe the first two are the most suitable for those seeking rigor. The level of formalization in these books, to me, is more than adequate. They are written in a style reminiscent of Bourbaki. The third and fourth books are excellent supplementary materials (though not as rigidly formal, they are still exceptionally good). I highly recommend reading these books together for a more comprehensive understanding of classical mechanics.

However, be aware that these books require a solid foundation in mathematics, including calculus, differential equations, differential geometry, tensor analysis, abstract algebra, and variational calculus. If your mathematical analysis foundation isn’t strong enough, I recommend working through Rudin’s three-part series (Baby Rudin, Father Rudin, Grandpa Rudin) to build a solid groundwork.

Additionally, if you’re interested in the history of classical mechanics, you might find A Brief History of Analytical Mechanics by Fengxiang Mei, Huibin Wu, and Yanmin Li (《分析力学史略》) to be a fascinating read. I’ve recently started reading this book myself.
Why Aren’t Generalized Coordinates in the Lagrangian Equation Considered Functions of Time?

Note: This post is an English adaptation of my original Chinese article (URL). Some parts have been modified for clarity, cultural relevance, or to better fit the English-speaking audience.

I was once puzzled by this issue, so I’d like to briefly share my understanding now. It may not be entirely correct, but here’s my take:

Without losing generality, let’s consider a $1$-dimensional space, as the conclusions here can be extended to $n$ dimensions.

First, in the Euler-Lagrange equation, the Lagrangian $\mathcal{L}$ is defined as a multivariable function $\mathcal{L}=f : \mathbb{R}^{2 N+1} \rightarrow \mathbb{R}$, so $\mathcal{L}(q_1, \cdots ,q_N;v_1, \cdots ,v_N;t) : \mathbb{R}$. Therefore, when we consider expressions like $\dfrac{\partial \mathcal{L}}{\partial q_i}$, $\dfrac{\partial \mathcal{L}}{\partial v_i}$, or $\dfrac{\partial \mathcal{L}}{\partial t}$, we are treating $q_i$, $v_i$, and $t$ as three entirely independent variables, much like how we handle variables $x$, $y$, and $z$ in the function $f(x, y, z)$.

Why is that? Fundamentally, it’s because this is how the mathematical definition of partial derivatives works. Even if the variables we’re differentiating are interrelated: for example, consider the function $f(x(t), t)$. Clearly, $x$ is a function of $t$, but when you take the partial derivative of $f$ with respect to $t$, it doesn’t affect $x(t)$ at all, and when you take the partial derivative of $f$ with respect to $x(t)$, it doesn’t affect $t$. This is simply how partial derivatives operate, and you must refer to the formal definition of partial derivatives to grasp this. Therefore, when we take partial derivatives of the Lagrangian $\mathcal{L}$, $q_i$, $v_i$, and $t$ are treated as independent variables.

However, when we consider the derivative $\dfrac{\mathrm{d} \mathcal{L}}{\mathrm{d} t}$, we are effectively treating the Lagrangian $\mathcal{L}$ as a single-variable function $\mathcal{L}=f(t): \mathbb{R} \rightarrow \mathbb{R}$. Thus, we need to apply the chain rule to expand it, resulting in:

$$
\displaystyle \frac{\mathrm{d} \mathcal{L}}{\mathrm{d} t}=\frac{\partial \mathcal{L}}{\partial t}+\sum_{i=1}^N \frac{\partial \mathcal{L}}{\partial q_i} \frac{\mathrm{d} q_i}{\mathrm{d} t} +\sum_{i=1}^N \frac{\partial \mathcal{L}}{\partial v_i} \frac{\mathrm{d} v_i}{\mathrm{d} t}\\
$$

It is important to note that the derivative $\dfrac{\mathrm{d} \mathcal{L}}{\mathrm{d} t}$ referred to here is the total derivative of the Lagrangian $\mathcal{L}$. According to the definition of the total derivative, the function being differentiated should be a single-variable function.

Therefore, the essence of solving this question lies in understanding the formal definitions of total derivatives and partial derivatives.

Updated: 2024-09-26

A friend of mine recently mentioned that he still doesn’t quite understand why the Lagrangian behaves differently in the Euler-Lagrange equation versus when considering the total derivative of it after reading my initial post, so I decided to clarify this distinction by rewriting my initial post into a formal mathematical approach. So here’s an updated post, using extremely formal and rigorous mathematical language, to explain the reasoning behind it.

I used to be perplexed by this question, and now, after some time, I’d like to share my understanding of it from a purely mathematical perspective (note: this might not be the definitive answer).

Without losing generality, let’s consider a $1$-dimensional space, as the conclusions here can be extended to $n$ dimensions.

First and foremost, let’s clarify an important point: in the Euler-Lagrange equation, the Lagrangian $\mathcal{L}$ is defined as a multivariate function $\mathcal{L} = f(q_1, \cdots, q_N; v_1, \cdots, v_N; t)$, where $f: \mathbb{R}^{2N+1} \to \mathbb{R}$.

For the partial derivatives $\dfrac{\partial \mathcal{L}}{\partial q_i}$, $\dfrac{\partial \mathcal{L}}{\partial v_i}$, or $\dfrac{\partial \mathcal{L}}{\partial t}$, we treat $q_i$, $v_i$, and $t$ as three completely independent variables, much like how we treat $x$, $y$, and $z$ in a function $f(x, y, z)$.

According to the formal definition of partial derivatives (note: set $m = 1$ and $n = 2N+1$ in the diagram, which aligns the function $\mathbf{f}$ as $\mathbb{R}^{2N+1} \to \mathbb{R}$, thereby matching the type of the function Lagrangian $\mathcal{L}$. Let $\mathbf{f}(\mathbf{x}) = \mathcal{L}$, which yields $f_i(\mathbf{x}) = \mathbf{f}(\mathbf{x}) = \mathcal{L}$):

Referenced from Baby Rudin – 9.16

Which indicates that when calculating partial derivatives, we disregard the relationships between the input variables of the function, as each input in a multivariate function forms an independent dimension. Therefore, when calculating the partial derivative $f_i(\mathbf{x} + t \mathbf{e}_j) -f_i(\mathbf{x})$, we are only looking at the change in a single input, while the input variables are orthogonal to each other.

For example, consider the function $f(x(t), t)$. Clearly, $x$ is a function of $t$, but when you take the partial derivative of $f$ with respect to $t$, it won’t affect $x(t)$ at all, and similarly, taking the partial derivative of $f$ with respect to $x(t)$ won’t affect $t$. This is because, within this function, the dimensions formed by $x(t)$ and $t$ are orthogonal and do not influence each other.

Now, regarding the total derivative, its formal definition is:

Referenced from Baby Rudin – 9.17

where $\mathbf{f}'(\mathbf{x})$ is defined as:

Referenced from Baby Rudin – 9.11

Once these definitions and theorems are in place, consider the following example:

Referenced from Baby Rudin – 9.18

Thus, when calculating the total derivative $\dfrac{d\mathcal{L}}{dt}$ of the Lagrangian $\mathcal{L}$, the Lagrangian $\mathcal{L}$ is defined as $\mathcal{L} = (f \circ \gamma) : \mathbb{R} \rightarrow \mathbb{R}$, where $f: \mathbb{R}^{2N+1} \to \mathbb{R}$ and $\gamma: \mathbb{R} \to \mathbb{R}^{2N+1}$.

So, if we set $g(t) = \mathcal{L}(t) = (f \circ \gamma)(t)$ and define $\gamma(t) = \begin{bmatrix} q_1 & \cdots & q_N & v_1 & \cdots & v_N & t \end{bmatrix}^T$, using the chain rule to expand the total derivative of the Lagrangian $\mathcal{L}$, we obtain the following formula:

$$
\displaystyle \frac{\mathrm{d} \mathcal{L}}{\mathrm{d} t}= \sum_{i=1}^N \left ( \ (D_i f ) (\gamma (t)) \ \gamma_i'(t) \ \right ) = \frac{\partial f}{\partial t}+\sum_{i=1}^N \frac{\partial f}{\partial q_i} \frac{\mathrm{d} q_i}{\mathrm{d} t} +\sum_{i=1}^N \frac{\partial f}{\partial v_i} \frac{\mathrm{d} v_i}{\mathrm{d} t}\
$$

Therefore, the essence of solving this question lies in understanding the formal definitions of total derivatives and partial derivatives.

Updated: 2024-11-02

Here is an update, as I feel I’ve gained a new understanding of the mathematical concept of total derivative.

Firstly, most textbooks provide the total derivative formula for the Lagrangian $\mathcal{L}$ as:

$$
\frac{\mathrm{d} \mathcal{L}}{\mathrm{d} t}= \frac{\partial \mathcal{L}}{\partial t}+\sum_{i=1}^N \frac{\partial \mathcal{L}}{\partial q_i} \frac{\mathrm{d} q_i}{\mathrm{d} t} +\sum_{i=1}^N \frac{\partial \mathcal{L}}{\partial v_i} \frac{\mathrm{d} v_i}{\mathrm{d} t}
$$

On the other hand, the total derivative formula I wrote for the Lagrangian $\mathcal{L}$ is:

$$
\frac{\mathrm{d} \mathcal{L}}{\mathrm{d} t} = \frac{\partial f}{\partial t}+\sum_{i=1}^N \frac{\partial f}{\partial q_i} \frac{\mathrm{d} q_i}{\mathrm{d} t} +\sum_{i=1}^N \frac{\partial f}{\partial v_i} \frac{\mathrm{d} v_i}{\mathrm{d} t}
$$

Then, who made a mistake here? Now please let me analyze.

Generally, textbooks provide the general formula for the total derivative of any function $f: \mathbb{R}^N \rightarrow \mathbb{R}$ as:

$$
\frac{\mathrm{d} f}{\mathrm{d} t}=\sum_{i=1}^N \frac{\partial f}{\partial x_i} \frac{\mathrm{d} x_i}{\mathrm{d} t}
$$

In fact, if we observe the LHS of this identity, we find that the type of function $f$ is not $\mathbb{R} \rightarrow \mathbb{R}$, then why is it acceptable to use the total derivative formula on that function directly?

I personally believe this is a matter of a historical notational convention, because:

For a function $f: \mathbb{R}^N \rightarrow \mathbb{R}$, when we write out the total derivative $\dfrac{\mathrm{d} f}{\mathrm{d} t}$, we are actually referring to $\dfrac{\mathrm{d} z}{\mathrm{d} t}$, where $z(t) = (f \circ g)(t)$, and $z: \mathbb{R} \rightarrow \mathbb{R}$, $f: \mathbb{R}^N \rightarrow \mathbb{R}$, $g: \mathbb{R} \rightarrow \mathbb{R}^N$, $g(t) = \begin{bmatrix} x_1 \cdots \ x_N \end{bmatrix}^{T}$

Thus, the correct formula for the total derivative should be:

$$
\frac{\mathrm{d}z}{\mathrm{d} t}=\sum_{i=1}^N \frac{\partial f}{\partial x_i} \frac{\mathrm{d} x_i}{\mathrm{d} t}
$$

However, comparing this with the original total derivative formula:
$$
\frac{\mathrm{d} f}{\mathrm{d} t}=\sum_{i=1}^N \frac{\partial f}{\partial x_i} \frac{\mathrm{d} x_i}{\mathrm{d} t}
$$

We find that the change is actually minimal, with the only difference lying on the LHS. Therefore, conventionally, if we see the ordinary derivative symbol $\dfrac{\mathrm{d}}{\mathrm{d} t}$ applied to a multivariate function $f: \mathbb{R}^N \rightarrow \mathbb{R}$, as $\dfrac{\mathrm{d} f}{\mathrm{d} t}$, it means that we are actually referring to the total derivative $\dfrac{\mathrm{d}(f \circ g)}{\mathrm{d} t}$. If we see the ordinary derivative symbol $\dfrac{\mathrm{d}}{\mathrm{d} t}$ applied to a univariate function $f: \mathbb{R} \rightarrow \mathbb{R}$, then it remains unchanged.
A Mathematical Exploration of Norton’s Dome and Determinism in Classical Mechanics

Note: This post is an English adaptation of my original Chinese article (URL). Some parts have been modified for clarity, cultural relevance, or to better fit the English-speaking audience.

Let’s explore this from a mathematical perspective:

First, when we say that classical mechanics follows determinism, mathematically, it means the following:

Within the frame of classical mechanics, the mechanical differential equations of any system must always have a solution, and that solution is unique.

However, it is important to note that this does not imply that these mechanical differential equations must satisfy the sufficient conditions of the Existence and Uniqueness Theorem (for ODEs, this would be the Picard-Lindelöf Theorem or the Cauchy Existence and Uniqueness Theorem, among others; for PDEs, it could be the Cauchy-Kowalevski Theorem or the Lax-Milgram Theorem, among others). Since the sufficient conditions of the Existence and Uniqueness Theorem are not necessary conditions, we cannot prove that a differential equation lacks a unique solution. For instance, some ODEs may not satisfy Lipschitz continuity, but their solutions may still exist and be unique.

Now, for the Norton’s Dome problem, its mechanical differential equation is as follows (note, we are only considering the tangential equation, as we are more interested in how the ball moves along the surface of the dome. Here, $\vec{r}(t)$ represents the displacement vector from the apex of the dome to the position of the ball along its surface. Due to the radial symmetry of the geometric model, we can use the scalar $r(t)$ to simplify the calculations):

$$
\frac{d^2 r(t)}{dt^2} = \sqrt{r(t)}, \quad \left. \frac{dr(t)}{dt} \right|_{t=0} = 0 , \quad r(0) = 0\\
$$

Mathematically, we can verify (details omitted here for brevity) that the function on the right-hand side of the differential equation satisfies the Continuity for $r\ge0$, which guarantees the existence of at least one solution. However, the function does not satisfy the Lipschitz Continuity at $r=0$, which means we cannot guarantee the uniqueness of the solution.

Once again, to emphasize: since these conditions are sufficient but not necessary, we cannot prove that the differential equation lacks a unique solution.

So, let’s try solving it anyway (perhaps there is a unique solution, who knows? Haha). But in the end, we find, unfortunately, that there are infinitely many solutions.

$$
r(t) = \begin{cases} 0, & \forall \ t<t_0 \\ \dfrac{1}{144}(t – t_0)^4, & \forall \ t \geq t_0 \end{cases}, \quad \text{where } t_0 \in \mathbb{R}\\
$$

From a physical standpoint, constructing such a perfect system is likely impossible. It’s akin to assuming a perfect spherical object placed on a perfectly flat surface and calculating the pressure distribution. However, we know that in reality, the surface of the sphere must curve, otherwise, we would obtain an infinite pressure value.

Updated: 2024-09-11

I’ve seen some responses and comments suggesting that Norton’s Dome can be resolved by applying Newton’s 1st Law, leading to the conclusion that classical mechanics does adhere to determinism. I’d like to offer my perspective on this from a mathematical standpoint.

First, mathematically speaking, Newton’s 2nd Law is essentially a definition of force. I recall reading in a book that if we were to remove the concept of force entirely and rely solely on $\dfrac{\mathrm{d} \vec{p}}{\mathrm{d} t}$, all the results of physics would remain unchanged.

Similarly, Newton’s 1st Law is simply a special case of the 2nd Law where $\vec{F} = 0$. In fact, Newton’s 1st Law cannot hold prior to the 2nd Law, because before we’ve defined force, we cannot even interpret $\vec{F} = 0$. Thus, mathematically, Newton’s 2nd Law is more fundamental. Moreover, Newton’s 1st Law is not a definition but rather a proposition, and it is a true proposition, meaning it’s a theorem (based on its importance level).

As such, Newton’s 1st Law cannot solve this issue. The core problem is that, based on the displacement solution $r(t)$, we can derive the acceleration $a(t)$:

$$
a(t) = \frac{d^2 r(t)}{dt^2} = \begin{cases} 0, & \forall \ t<t_0 \\ \dfrac{1}{12}(t – t_0)^2, & \forall \ t \geq t_0 \end{cases}, \quad \text{where } t_0 \in \mathbb{R}\\
$$

We can further derive the force $F(t)$:

$$
F(t)=ma(t) = m\frac{d^2 r(t)}{dt^2} = \begin{cases} 0, & \forall \ t<t_0 \\ \dfrac{m}{12}(t – t_0)^2, & \forall \ t \geq t_0 \end{cases}, \quad \text{where } t_0 \in \mathbb{R}, \quad m \in \mathbb{R}^+ \\
$$

Clearly, for $t \geq t_0$, the force $F(t)$ is not a constant but a variable. Thus, if you wish to apply Newton’s 1st Law, it only holds for $t \le t_0$, since during this time $F(t)=0$, satisfying the conditions of Newton’s 1st Law. However, as soon as $\ t > t_0$, the force becomes $F(t)\ne0$, and Newton’s 1st Law no longer applies.

This shows that Newton’s 1st Law alone is insufficient to resolve the Norton’s Dome problem.
Why Use L2 Norm Instead of L1 Norm in Loss Functions?
Have you noticed that, in many applications, MSE (Mean Squared Error), RMSE (Root Mean Squared Error) and SSE (Sum of Squared Error) are often the preferred choice for the loss function. But why is this the case? Why do we favor the L2 norm over the L1 norm, such as Mean Absolute Error (MAE)?

For a linear regression model, the answer is obvious — Gauss-Markov Theorem directly implies that L2 norm error is inside the best linear unbiased estimator. But in practice, not all models we work with are linear regression models…

Consider the loss function in some machine learning models (typically non-linear), which is often defined as

$$ \text{MSE} = \dfrac{1}{N}\sum_{i=1}^{N} \left( \left( y -\hat{y} \right)^2 \right )$$

One might argue that L2 norm error emphasizes larger errors by squaring the residuals, effectively “zooming in” on significant deviations. But if that’s the case, why not use even higher powers which can penalize large errors more heavily, such as $ \dfrac{1}{N}\displaystyle\sum_{i=1}^{N} \left( \left( y -\hat{y} \right)^4 \right ) $

Indeed, higher powers would penalize large errors even more. However, the preference for the L2 norm isn’t just about magnifying errors, now let’s delve into it!

Usually, the goal in many statistical models is to find the function $f(\mathbf{x})$ that best describes the input $\mathbf{x}$ and the observed data, enabling accurate predictions and generalization to new data.

To achieve this, we typically use Maximum Likelihood Estimation (MLE), which allows us to estimate the model parameters that make the observed data most probable. Specifically, when we maximize the likelihood function of the errors $\mathbf{\epsilon}$ — the differences between the model’s predictions and the observed data — we are finding the parameters that make these errors most likely under our model.

Why? Because by maximizing the likelihood of these errors, we then can identify the parameters that most likely generated the observed errors. This approach is rooted in empirical evidence: It makes sense to believe the cases (to choose the parameters) that make the observed errors the most probable (that maximizes the likelihood function of the observed errors), as we have no reason to prefer less likely errors.

For example, imagine your parents walk into your room five times, and each time they catch you playing computer games instead of doing homework 😂. They might conclude that you’ve been playing computer games for all day, even though you actually spent hours doing homework and just happened to take a break at the wrong moments (what a bad excuse btw😂)… Here, they’re maximizing the likelihood of the “variable” — their assumption that you’re always gaming — because those were the moments they observed, and they don’t think that is a rare coincidence. In reality, you were just unlucky, but based on the evidence they have, their conclusion is the most probable one by applying MLE.

So, typically, the statistical model’s goal is to find $\hat{y}$ such that:

$$
\hat{y} = \arg \left ( \max_{y} \Big ( L(\epsilon) \Big ) \right )
$$

where
- $\hat{y}$ is the set of expect model’s best outputs which $\hat{y} = \{\hat{y}_1,\hat{y}_2,\cdots, \hat{y}_N\}$
- $y$ is the set of observed data which $y = \{y_1, y_2, \cdots, y_N\}$
- $L(\epsilon)$ is the joint likelihood of every individual error which $L(\epsilon) = L\left (\displaystyle \bigcap^{N}_{i=1} \epsilon_i \right )$
- $\epsilon_i$ is individual error which $\epsilon_i=\hat{y}_i-y_i$
For simplicity and to make the model computationally feasible, we assume every individual error in $\epsilon$ to be statistically independent, which $L(\epsilon) = \displaystyle \prod^{M}_{i=1} L(\epsilon_i) $, resultantly:

$$
\hat{y} = \arg \bigg ( \max_{y} \Big ( \displaystyle \prod^{N}_{i=1} L(\epsilon_i) \Big ) \bigg )
$$

Taking the logarithm to simplify the product into a sum (and because the logarithm is a strictly increasing monotonic function, which $\forall x_1, x_2 \in \mathbb{R}^{+}, \, x_1 < x_2 \implies \log(x_1) < \log(x_2)$, so the maximization is preserved):

\begin{align*}
\hat{y} &= \arg \bigg ( \max_{y} \bigg ( \log \Big ( \displaystyle \prod^{N}_{i=1} L(\epsilon_i) \Big ) \bigg ) \bigg ) \\
\hat{y} &= \arg \bigg ( \max_{y} \bigg ( \sum^{N}_{i=1} \Big ( \log \big ( L(\epsilon_i) \big ) \Big ) \bigg ) \bigg ) \\
\end{align*}

Here, we assume that every individual error follows a normal distribution, which $L(\epsilon_i) = \dfrac{1}{\sqrt{2\pi \displaystyle\sigma_i^2}} \ \exp\left(-\dfrac{\epsilon_i^2}{2\sigma_i^2}\right)$, with the mean of every individual error is $0$ and homoscedasticity which $\sigma_1=\sigma_2=\cdots=\sigma_n=\sigma$. Because each error can be seen as the sum of smaller i.i.d. (identically distributed and independent) variables, then applying the Central Limit Theorem implies every individual error tends to follow a normal distribution when taking the limit as the total number of variables approaches infinity.

\begin{align*}
\hat{y} &= \arg \bigg ( \max_{y} \bigg ( \sum^{N}_{i=1} \Big ( \log \big ( L(\epsilon_i) \big ) \Big ) \bigg ) \bigg ) \\
&= \arg \left( \max_{y} \left( \sum_{i=1}^{N} \left( -\frac{1}{2} \log(2\pi\sigma_i^2) -\frac{\epsilon_i^2}{2\sigma_i^2} \right) \right) \right) \\
&= \arg \left( \min_{y} \left( \sum_{i=1}^{N} \left( \frac{1}{2} \log(2\pi\sigma_i^2) + \frac{\epsilon_i^2}{2\sigma_i^2} \right) \right) \right) \\
&= \arg \left( \min_{y} \left( \frac{1}{2} \sum_{i=1}^{N} \log(2\pi\sigma_i^2) + \frac{1}{2} \sum_{i=1}^{N} \frac{\epsilon_i^2}{\sigma_i^2} \right) \right)\\
&= \arg \left( \min_{y} \left( \frac{1}{2} \sum_{i=1}^{N} \log(2\pi) + \frac{1}{2} \sum_{i=1}^{N} \log(\sigma_i^2) + \frac{1}{2} \sum_{i=1}^{N} \frac{\epsilon_i^2}{\sigma_i^2} \right) \right) \\
&= \arg \left( \min_{y} \left( \frac{N}{2} \log(2\pi) + \frac{N}{2} \log(\sigma^2) + \frac{1}{2\sigma^2} \sum_{i=1}^{N} \epsilon_i^2 \right) \right) \\
&= \arg \left( \min_{y} \left( \frac{1}{2\sigma^2} \sum_{i=1}^{N} \epsilon_i^2 \right) \right) \\
&= \arg \left( \min_{y} \left( \sum_{i=1}^{N} \epsilon_i^2 \right) \right) \\
&= \arg \left( \min_{y} \left( \sum_{i=1}^{N} \left (\hat{y}_i-y_i \right )^2 \right) \right) \\
\end{align*}

Given the above derivation, we see that minimizing the sum of squared errors is equivalent to maximizing the likelihood of the errors under the assumption that they follow a normal distribution with mean zero and constant variance. This directly leads to the use of the L2 norm (squared errors) in loss functions such as Mean Squared Error (MSE).

However, it’s important to note that the L2 norm error may not be the best choice in all cases. Specifically, when the error distribution deviates from normality, the L2 norm’s assumptions break down.

For example, in classification tasks where errors are often not normally distributed, so L2 norm error might lead to suboptimal results. In classification tasks, the errors are related to the incorrect classification of categories rather than continuous deviations, therefore the data typically follow Categorical Distribution. Here, the loss function can be more appropriately modeled by Cross-Entropy. Or if the data follow the Laplace Distribution, then picking L1 norm error in the loss function would be a better option. (Note: You can derive those results by applying the similar math strategies.)
Intro to Git, Github and VSCode
Hi, this is Louis Liu. In this tutorial, I will guide you through some basic operations in Git, GitHub, and VSCode. (The Mandarin Translation is on the second page.)

If you haven’t yet completed the LICENSE signing for the Echo-Land game project under the CruxAbyss game development team, please make sure to do so first. Before signing, you’ll need to familiarize yourself with some fundamental GitHub operations, such as:
1. What are Git and GitHub
2. What is a Repository (Repo) in GitHub
3. What is an Issue in a GitHub Repo
4. What is a Pull Request (PR) in a GitHub Repo
5. How to modify, add, or delete files in a GitHub Repo
Once you have a good understanding of these operations, you’ll be ready to complete the signing process on your own.

What Are Git and GitHub

Git is a distributed version control system for code.

Let’s start by understanding what a “version control system for code” means.

Essentially, Git acts as a historical recorder for your code. Why do we need this? In real-world development projects (coding), it’s almost impossible to write all the code at once and have the entire project running smoothly. Typically, we modularize the code, breaking down a large task (large codebase) into smaller tasks (smaller pieces of code) that are handled separately, and then gradually integrating these smaller pieces into the larger codebase.

At this point, Git becomes incredibly important as a historical recorder for your code because it helps you document your progress at various stages. For example, if your manager asks you to build a robot that can automatically send and receive emails, your first step should be to break down this large task: “a robot that can automatically send and receive emails” into two smaller tasks: “a robot that can automatically receive emails” and “a robot that can automatically send emails.” Let’s say you work on the “receive emails” part first. Once completed, wouldn’t it be wise to save your progress? This is where Git comes in handy — you can use Git to record the current state of your code and then move on to the next task, “a robot that can automatically send emails.”

You might wonder: Why do I need to record this? Can’t I just complete the two tasks separately without using Git?

Yes, you could, but can you guarantee that while working on the second task, you won’t accidentally modify the code for the first task? For instance, while writing code for the “send emails” functionality, you might suddenly realize that you could optimize the code for the “receive emails” part, so you make some changes. But then you find out that your optimization was entirely wrong, or you accidentally deleted some of your previous code, causing the program to stop running. Now, you want to revert to the state when you just finished writing the “receive emails” functionality to compare and figure out what went wrong. You start using Undo (Ctrl + Z or Command + Z), but find that you can’t get back to that exact state. Now, you’re left staring at your broken code, forced to painstakingly fix it.

But what if you had used Git? It would be incredibly convenient, you could simply use Git to revert to the last recorded state of your code and compare the differences between the last version and your current one. That’s why Git is essential in almost all large-scale project developments.

Now, why is Git called “distributed”?

Because Git allows multiple developers to collaborate on the same project. For example, you can share your project along with its Git repository with others, and everyone can work on it simultaneously, optimizing code, proposing new features, and more. Git allows for code comparison, branch creation, code merging, and so on.

GitHub is a platform that integrates Git for hosting code repositories.

This platform provides a way for developers worldwide to collaborate on various projects, with added functionalities and features.

That’s why we host our game project files on GitHub: firstly, it has Git to help us record the historical state of all files in the project. Additionally, GitHub’s built-in features like webhooks and code review tools can significantly enhance our game development process.

What Is a Repository (Repo) in GitHub

In GitHub, a Repository (often abbreviated as Repo) is essentially a “code repository.” This is where all the files and their history related to a project are stored. For instance, the game project we’ve uploaded to GitHub is currently one of the Repositories in our GitHub account, as shown in the image below:

Repositories are central to how GitHub works. They act as the home for your project’s files, including source code, documentation, and any other assets related to the project. Within a Repo, you can track changes to your files over time, collaborate with others, and manage different versions of your project.

What Is an Issue in a GitHub Repo

Imagine this scenario:

You’re browsing GitHub, looking for some interesting code projects to try out. Suddenly, you stumble upon a repository (Repo) that contains a Canvas bot that automatically completes your homework. You’re instantly intrigued! You download the Repo to your local machine and follow the instructions in the repository’s README file to learn how to use the bot to do your Canvas assignments.

(Quick note: What is a README file? It’s essentially a “User Guide” that most repositories include. The reason it’s called “READ ME” is that the code author wants to ensure others understand what the Repo does. So the README contains an introduction to the Repo and details on how to use it.)

However, as you run the code from the Repo, you encounter an error! You’re confident that you’ve followed every step outlined in the README file. This likely means one of two things: either the author’s code has a problem, or there’s an issue with the README instructions. Either way, the issue lies within the files of the Repo — something is wrong or there’s a bug. Since you’ve discovered this problem and would like the author to fix it (Come on!! This bot could help you with your homework!!), you can go to the GitHub repository’s Issues section to report this problem and provide feedback on the bug. If you have a bit more technical skill, you could even debug the code, identify the exact source of the error or bug, and then report it via an Issue.

Here’s an example of where you would submit/report an Issue on GitHub:

On GitHub, here is the “Issues” tab you may click on to report an issue

Here is an example of a reported issue of a Repo

It’s important to note that in the Issues section of a GitHub repo, you’re not just limited to reporting bugs. You can also suggest improvements to the code or request the addition of new features. For example, you might be very satisfied with the Canvas bot that automatically does your homework, but you also wish it could help you complete online Canvas exams. Since this feature isn’t currently available in the repository, and you might not have the coding skills to add it yourself, you can submit a feature request in the Repo’s Issues section. This way, you can inform the code author of your desire for this functionality, and they may consider adding it in the future.

Overall, here are the procedures to open an issue of a Repo on Github
1. Navigate to the Repo: Go to the GitHub repository where you encountered the issue or want to suggest an improvement.
2. Access the Issues Tab: Click on the “Issues” tab, typically located near the top of the repository page.
3. Create a New Issue: Click the “New Issue” button to start reporting your problem or suggesting a new feature.
4. Describe the Issue or Request: Give your issue a clear title and provide a detailed description. If you’re reporting a bug, include information about what you were doing when you encountered the error, any relevant error messages, and steps to reproduce the issue. If you’re suggesting a feature, explain why the feature would be useful and how it could be implemented.
5. Submit the Issue: Once you’ve filled out the details, click “Submit new issue” to send it to the repository’s maintainers.
What is a Pull Request (PR) in a GitHub Repo?

Imagine this scenario:

You’ve created a bot that automatically completes assignments on Canvas, and you’re really pleased with it. However, you also wish that this bot could help you complete Canvas online exams, but the current repository (repo) doesn’t include this feature yet. However, being a skilled developer, you’re confident that you can add this new functionality to the repo. So, you spend a few days improving the repo you downloaded locally, and you successfully implement the feature that enables the bot to take online exams on Canvas! Feeling generous, you realize that such a tool shouldn’t be kept to yourself — it should benefit all students!

You decide to upload your improved repo to GitHub. But there’s a catch: you can’t directly modify the original author’s repo on GitHub because you don’t have the necessary permissions. Also, creating a new repo and uploading it wouldn’t be appropriate because your code is based on the original author’s work, where they created the bot that completes assignments on Canvas. Therefore, following international conventions and basic ethical guidelines, you should submit a Pull Request (PR) to the original author’s repo on GitHub. If the original author accepts your PR, your code will be merged into the repo, and your name will be added to the list of contributors for that project.

But this raises the question: What exactly is a Pull Request (PR)?

A PR is essentially a request to the original author, asking them to consider integrating your changes into the current repository. It’s as if you’ve written a new poem in the style of the “Tang Poems” and submitted it to the “300 Tang Poems” collection. If the curator of the “300 Tang Poems” accepts your submission, the collection would become “301 Tang Poems,” and you would be recognized as one of the contributors.

OK, so how do you submit a PR? Can you just click the Pull Request button on the repo on Github?

Actually, it’s not that simple. Please read the next section!

How to Modify, Add, or Delete Files in a GitHub Repository

Now, let’s simulate a scenario where you’ve made changes to files in a GitHub repository and want to submit a Pull Request (PR) for those changes.

First, let’s clarify one point: if you’ve downloaded the repository to your local machine and made changes there, you won’t be able to submit a PR directly. Why? Because GitHub requires that the repository you’re modifying must be connected to the same Git repository to submit a PR.

You might wonder, “Oh, so should I just create a new Git repository for my local changes?”

Absolutely not! If you create a new Git repository for your local changes, it will be a completely new Git instance, different from the original author’s repository. The history recorded by your Git (version control system) will only include the changes you’ve made, not those made by the original author. This is because your Git only starts recording from the state of the code you initially downloaded (like starting a test from 60% completion without knowing what happened from 0% to 60%). On the other hand, the original author’s Git tracks all their changes, but none of yours (like a test recording only the progress from 0% to 60%).

A shared Git repository should record the entire process from 0% to 100%, without missing any steps!

So, how do you merge the changes you’ve made (from 60% to 100%) with the original author’s changes (from 0% to 60%)?

Answer: You need to start by copying the original repository, including its Git history, to your local machine, and then make your modifications. You should not create a new Git repository!

But why is it necessary to use the same Git repository to make improvements to a repo? What if I don’t want to use the same Git repository?

Answer: You can indeed choose not to use the same Git repository, but:
1. If you intend to make your improved repository public but use a different Git repository, it effectively erases the contributions (from 0% to 60%) of the original author, which is generally considered unethical.
2. From a collaboration standpoint, this approach can lead to a lack of consistency, leaving other developers unsure which repository to contribute to. For instance, you might improve the code’s readability, while someone else adds a new feature. If you’re not using the same Git repository, the project could split into two separate repositories, preventing the integration of these different improvements and reducing overall development efficiency.
Remember we mentioned earlier that Git is “distributed”? Git allows multiple developers to collaborate on the same project, but the key is that you must use the same “version control system” or “historical recorder for code” to ensure that everyone is working on the same repository.

Example: Imagine A and B both borrowed C’s homework to copy, but since C is known as the class underachiever, his homework typically scores only 60%. A and B must be very careful while copying to avoid making the same basic mistakes. After making their own corrections, they both finish the homework. However, since A and B are very generous, they decide to share their corrected versions with the whole class. The problem is, whose version should be shared — A’s or B’s? If one is shared and later D, the top student, wants to improve it further, what happens to the other version? To avoid this issue, A and B decide to compare their work and combine their efforts into one perfect version (similar to merging two repos through a PR).

This example reflects the principle that when collaborating on a repository, everyone must use the same Git. If A and B both share their versions, it’s like splitting one modified repository into two, as they didn’t use the same Git. This forces the rest of the class to choose between the two, which is inefficient. By combining their efforts, they ensure that others can benefit from a single, improved version — just as using the same Git allows for a unified repository.

Now that we’ve established the importance of using the same Git when collaborating on a repository, let’s dive into the practical steps.

Since you need to copy the original repository along with its Git history to your local machine from the start, how do you do this? A normal download won’t include the Git history. Therefore, you need to click “Fork” next to the repository. This will copy both the original repository and its Git history to your GitHub account.

Let’s go through this step together. Follow along with this tutorial using a sample repository designed for learning GitHub, no need to worry about causing any issues.

First, click “Fork” on the repository page. See the image below for reference:

After clicking, you’ll see a new page. Don’t change anything, simply click the green “Create fork” button in the lower right corner:

Once you’ve clicked, wait a few seconds, and you should see the following page:

In the image, the information highlighted in the red box indicates that this is the repository you’ve forked (in this case, Deep0Thinking forked the Repo). You can compare it with the original author’s repository to see the differences (in this case, this forked Repo is up to date, so currently no changes between the forked Repo and the original Repo).

Now that you’ve forked the repository, the next step is to clone it to your local machine. This will allow you to make changes, add new features, or delete files, all while maintaining a connection to the original Git history.

To clone your forked repository:

1. Make sure you have “git” installed on your computer, if you don’t have git installed, please check here: https://git-scm.com/downloads

2. Go to your forked repository on GitHub and click on the “Code” button. You’ll see an option to clone using HTTPS, SSH, or GitHub CLI. Copy the URL provided (by clicking on that icon next to the URL inside the red box):

3. Open your terminal (on MacOS, you can press command + spacebar, then type in terminal and press return to open your terminal).

4. In terminal, type the following command:
```
git clone <your-copied-URL>
```
This command will create a local copy of the repository on your computer.

5. Navigate into the cloned repository directory:
```
cd <repository-name>
```
Now that your repository is cloned locally, you can start by navigating to the cloned repository directory (your project folder). Once you’ve opened this folder in your preferred text editor (in this case I’ll be sticked with VSCode to demonstrate), go ahead and write something in it, anything you like, just to create a change.

Next, you’ll need to save the file locally (on MacOS, you can press command + s to save a file locally), which will register the change you’ve made. Once the change is saved locally, you’ll see a red icon or indicator in VSCode, signaling that there are uncommitted changes for Git in your working directory.

(Here, I did some changes to the `ok.md` file)

Now, let’s prepare to commit your changes. In the Source Control panel, click on the red-highlighted icon (typically a “+” symbol) next to that ok.md file. This action stages the file, indicating that you want to include this change in your next commit.

If you’ve modified multiple files and only want to commit some of them, you can selectively stage individual files by clicking the “+” icon next to each file you wish to include. This selective staging is useful when you’ve made changes across different files but want to commit only the changes that are complete and ready for version control.

After staging your changes, the next step is to commit them. In the message box under “Source Control,” write a clear and concise commit message. This message should briefly describe what changes were made to the files you’ve staged. Note that writing a commit message is mandatory; without it, Git will not allow you to commit your changes.

It’s essential to develop the habit of writing meaningful commit messages. These messages act as a log for your code’s history, and vague or unclear messages can make it difficult to track the progress of your project or understand the purpose of past changes. Properly written commit messages make it easier to manage your project over time.

Once you’ve written your message, click the green “Commit & Push” button. This action records your changes in the Git history and pushes them to your remote repository (e.g., on GitHub).

The “Commit changes” action in Git is simply a way of recording your changes — it’s not the same as creating a Pull Request (PR). A PR is a formal request to merge changes from one repository (a forked one of the original repo) into another (the original repo).

Commit changes are typically minor and incremental, helping to keep a record of small updates or progress steps (think of it as taking snapshots of your work to prevent data loss). In contrast, a PR is more significant, akin to submitting a complete and finalized version of your work for review and integration into the main project.

You should use commit changes regularly to keep track of your progress. However, reserve PRs for when you have completed a substantial part of your work that meets specific project goals.

Now, let’s create a PR to see how it works. After you’ve made and committed several changes that fulfill a certain project objective, click on the icon highlighted in red (usually labeled “Pull Requests”) in GitHub or in your GitHub repo’s interface:

You’ll be directed to a new page where you can create a PR. Here, you should:
- Title: Provide a title that briefly summarizes the changes you are proposing to merge.
- Description: In the description field, add detailed information about the changes. Explain what was modified, why the changes were made, and any other relevant context.
- Labels: Don’t forget to click on the “Labels” section and select the appropriate label if possible (like “Bug Fixed”, “Typo Fixed”…). This categorizes your PR, making it easier for reviewers to understand the scope and purpose of your changes.
After filling out these fields, click the “Create pull request” button.

Once the PR is created, it will appear on the PR page of the repository, and the repository’s maintainers will be notified for review:

The maintainers will review the PR based on its contribution and relevance to the project. If the changes meet the necessary standards, the PR will be merged into the original repository.

Remember, a PR is essentially a request to merge the code from your forked repository into the original one. If the PR is accepted, both repositories will be synchronized, with the original repository incorporating your changes.

Additional Notes
- Commit Messages: Aim to be as descriptive as possible. Instead of writing “fixed bugs,” you might write “fixed null pointer exception in email processing module.” This level of detail will save you and your team time when reviewing past commits.
- Branching: Before working on a new feature, create a new branch. This keeps your work isolated until it’s ready to be merged.
- Merge Conflicts: When working on a team, you might encounter merge conflicts. These occur when changes from different branches conflict with each other. Git will require you to resolve these conflicts manually.
With these steps, you should now have a solid foundation for using Git, GitHub, and VSCode in your development workflow. Happy coding!!!!

Pages: 1 2
How is Diandian (点点) (A cat my family rescued 2 years ago) now?

Two years ago, my family and I embarked on an unexpected journey that would forever change our lives. It began on a stormy night when we found a small, injured kitten struggling to survive. The experience, filled with challenges and moments of hope, has been one of profound learning and immense love.

To share our story, I’ve uploaded a detailed video on Bilibili (on 2022-02-19 at 11:08:30), documenting the entire process of how we found, rescued, and nurtured this little life back to health. The video is a raw and honest representation of the events, spoken in Chinese, capturing every step of this emotional journey.

If you prefer reading over watching, I’ve written a detailed account below, outlining the significant moments and reflections from this experience. Whether you choose to watch the video or read the text, I invite you to join us on this journey of recovery, resilience, and unconditional love.

The story you’re about to see or read is more than just about saving a kitten; it’s a testament to the power of family, the kindness of the human spirit, and the incredible will to live that exists within all beings, no matter how small!!

Thank you for taking the time to witness our story. Here are the links to the video on Bilibili and YouTube. For those who prefer reading, please find the complete narrative below.

On February 7, 2022, my family found this poor little kitten lying inside our garden in a distressing condition during a heavy rainstorm. The kitten was severely injured, barely alive, and in dire need of help. It was a heart-wrenching scene: the kitten was drenched, injured, and had been attacked by a larger cat, presumably to claim territory. Thankfully, my family intervened just in time, scaring off the attacker and bringing the kitten into our house for warmth and safety.

My parents immediately took action, drying the kitten and providing a warm, comfortable space for it to rest. Despite its weak state, the kitten showed a strong will to live. It was heart-breaking to see it struggle; the injuries were severe, affecting its ability to move and even swallow properly due to a throat injury. Yet, with careful and loving care from my family, the kitten began to show signs of recovery, although the journey was slow and fraught with challenges.

In the beginning, the kitten could hardly eat due to its injuries, but with patience and tender care, it slowly started to regain strength. My parents went above and beyond, ensuring the kitten was well-fed with suitable milk and gradually introducing soft food. It was a delicate process, given the extent of the kitten’s injuries, including nerve damage and physical trauma.

Over time, the kitten, which we came to cherish as a symbol of resilience and hope, began to exhibit signs of improvement. It was not just a physical recovery; the kitten brought a new sense of joy and purpose into our home. We were all invested in its recovery, celebrating every small step forward, from the first time it stood on its own to the moment it began to eat without assistance.

The journey was not without its setbacks. There were moments of doubt and fear, times when we wondered if we were doing enough or if the kitten would fully recover. But through it all, the bond between us and the little fighter grew stronger. The kitten’s will to live and our dedication to its recovery created an unbreakable bond.

Today, two years later, the kitten is no longer just a kitten but a vibrant, loving member of our family. Its recovery was a miracle, a testament to the power of love, care, and resilience. The kitten, now fully grown, has overcome its traumatic start to life and has become a source of endless joy and laughter for our family:

This experience taught us invaluable lessons about compassion, the importance of helping those in need, and the incredible strength of even the smallest creature. It was a reminder that every life is precious and worth saving.

As I write this update, the once fragile kitten is curled up comfortably beside my family, purring contentedly.

To everyone who supported us through this journey, whether by offering advice, sending well-wishes, or simply keeping us in your thoughts, we extend our deepest gratitude! This story is not just about the survival of a little kitten; it’s a story of hope, love, and the incredible difference a small act of kindness can make.

Thank you for being part of our journey!!
Cracking the Logic Gates Construction Using the Knowledge from Mathematical Logic
Recently, I’ve started exploring Mathematical Logic, guided by Elliott Mendelson’s “Introduction to Mathematical Logic, 6th Edition”. One fascinating fact I found in the textbook is that “Every truth function is generated by a statement form involving the logical connectives in a functional complete set“. This idea sparked my interest, leading me to connect it with experiences from my past.

Before diving deeper, please let me introduce some prerequisite concepts to you.

Formally, in Mathematical Logic:
- A truth function is defined as $f: \{0, 1\}^n \to \{0, 1\} $, where $0$ represents False (F) and $1$ represents True (T).
- A propositional variable is the variable $\in \{0, 1\}$, equivalently, it is the variable that can either have the True or False value.
- Well-formed formulas (WFFs) are expressions defined as follows:
  - A propositional variable is a WFF.
  - If $A$ and $B$ are WFFs, then $A \wedge B, \neg B, \neg A$ are also WFFs.
Then we proceed to introduce the following theorems:

$\textbf{Theorem 1}$

Every truth function is generated by a WFF involving the logical connectives $\neg$ (NOT), $\wedge$ (AND) and $\vee$ (OR). Equivalently, the set $\{\neg, \wedge, \vee\}$ constitutes a functionally complete set of logical connectives.

$\textbf{Proof}$

For any truth function $f(x_1,x_2,\cdots,x_n)$ where $x_1,x_2,\cdots,x_n \in \{0, 1\}$, we can represent $f(x_1,x_2,\cdots,x_n)$ in a truth table with $2^n$ rows, since the total number of combinations of $x_1,x_2,\cdots,x_n$ is $2^n$ and each combination produces a corresponding output of $f(x_1,x_2,\cdots,x_n)$.

For each row in the truth table where the function $f(x_1,x_2,\cdots,x_n)$ outputs $1$, we can construct a conjunction $\wedge$ (AND) that corresponds to that particular combination of inputs.

More generally, for a given row $i \in \{1,2,\cdots, 2^n\}$, we define a conjunction $C_i$ as follows: $C_i = U_{i,1} \wedge U_{i,2} \wedge \cdots \wedge U_{i,n}$, where $U_{i,j} = x_{i,j}$ if $x_{i,j} = 1$, else $U_{i,j} = \neg x_{i,j}$ if $x_{i,j} = 0$, where $j \in \{1,2,\cdots,n\}$.

After setting up $C_{1}, C_{2},\cdots, C_{n}$, we take the disjunction $\vee$ (OR) of all such conjunctions corresponding to the truth rows where the function $f$ outputs $1$, assume there are $m \in \{1,2,\cdots,n-1\}$ such rows (The situations of $m=0$ and $m=n$ are discussed below.). This forms a statement in disjunctive normal form (DNF) representing our original truth function $f$. This DNF statement, denoted as $D$, is defined as $D = C_1 \vee C_2 \vee \cdots \vee C_m$, where each $C_k$ (where $k \in \{1,2,\cdots,m\}$) corresponds to one of the rows where $f$ outputs $1$, and $m$ is the number of such rows.

If $m=0$, in other words, if $f$ always outputs $0$, then $D$ can be defined as a contradiction (for example: $D = x_1 \wedge \neg x_1$). If $m=n$, in other words, if $f$ always outputs $1$, then $D$ can be defined as a tautology (for example: $D = x_1 \vee \neg x_1$).

Accordingly and obviously, we have $f(x_1,x_2,\cdots,x_n) \iff D$. Then since $D$ is a corresponding wff involving the logical connectives $\neg$ (NOT), $\wedge$ (AND) and $\vee$ (OR), therefore “Every truth function is generated by a WFF involving the logical connectives $\neg$ (NOT), $\wedge$ (AND) and $\vee$ (OR).”

$\Box$

$\textbf{Example 1}$

$$\begin{array}{c}
x_1 & x_2 & x_3 & f\left(x_1, x_2, x_3\right) \\
\mathrm{F} & \mathrm{F} & \mathrm{F} & \mathrm{T} \\
\mathrm{F} & \mathrm{F} & \mathrm{T} & \mathrm{T} \\
\mathrm{F} & \mathrm{T} & \mathrm{F} & \mathrm{F} \\
\mathrm{T} & \mathrm{F} & \mathrm{F} & \mathrm{F} \\
\mathrm{F} & \mathrm{T} & \mathrm{T} & \mathrm{F} \\
\mathrm{T} & \mathrm{T} & \mathrm{F} & \mathrm{F} \\
\mathrm{T} & \mathrm{F} & \mathrm{T} & \mathrm{F} \\
\mathrm{T} & \mathrm{T} & \mathrm{T} & \mathrm{F}
\end{array}$$

Then $$f(x_1, x_2, x_3) \iff D = (\neg x_1 \wedge \neg x_2 \wedge \neg x_3) \vee (\neg x_1 \wedge \neg x_2 \wedge x_3)$$

$\Box$
Here is the Python program I wrote. This program uses AND, OR and NOT gate, based on $\textbf{Theorem 1}$, to construct the equivalent DNF of any logic gate.
```
# python3

def convert_to_dnf(truth_table):
    D = []
    for row in truth_table:
        if row[-1] == 1:
            terms = []
            for i, val in enumerate(row[:-1]):
                if val == 1:
                    terms.append(f"x_{i+1}")
                else:
                    terms.append(f"¬x_{i+1}")   
            if terms:
                C_i = " ∧ ".join(terms)
                D.append(C_i)
    if D:
        dnf_structure = f'({") ∨ (".join(D)})'
    else:
        dnf_structure = "(x_1 ∧ ¬x_1)"

    return dnf_structure

# Here we use the truth table from Example 1
truth_table = [
    [0, 0, 0, 1],
    [0, 0, 1, 1],
    [0, 1, 0, 0],
    [1, 0, 0, 0],
    [0, 1, 1, 0],
    [1, 1, 0, 0],
    [1, 0, 1, 0],
    [1, 1, 1, 0],
]

dnf_gate_structure = convert_to_dnf(truth_table)
print("DNF:", dnf_gate_structure)
```
$\textbf{Theorem 2}$

The set $\{\downarrow \text{(NOR)} \}$ constitutes a functionally complete set of logical connectives.

$\textbf{Proof}$

According to the definition, $x_1 \downarrow x_2 \iff \neg (x_1 \vee x_2)$. By constructing and analyzing the corresponding truth table, we deduce that $\neg x_1 \iff (x_1 \downarrow x_1)$, $x_1 \wedge x_2 \iff (x_1 \downarrow x_1) \downarrow(x_2 \downarrow x_2)$ and $x_1 \vee x_2 \iff (x_1 \downarrow x_2) \downarrow(x_1 \downarrow x_2)$. Then apply $\textbf{Theorem 1}$ and the substitutions, clearly “The set $\{\downarrow \}$ constitutes a functionally complete set of logical connectives.”

$\Box$

$\textbf{Example 2}$

Set

$$\begin{align}
A & = \bigg( \Big( \big( (x_1 \downarrow x_1) \downarrow (x_1 \downarrow x_1) \big) \downarrow \big( (x_2 \downarrow x_2) \downarrow (x_2 \downarrow x_2) \big) \Big) \downarrow \Big( \big( (x_1 \downarrow x_1) \downarrow (x_1 \downarrow x_1) \big) \downarrow \big( (x_2 \downarrow x_2) \downarrow (x_2 \downarrow x_2) \big) \Big) \bigg)\\
B & = \bigg( (x_3 \downarrow x_3) \downarrow (x_3 \downarrow x_3) \bigg) \\
C & = \bigg( x_3 \downarrow x_3 \bigg)
\end{align}$$

Then the function $f(x_1, x_2, x_3)$ in $\textbf{Example 1}$ can be rewritten as

$$\begin{align}
&f(x_1, x_2, x_3) \\
\iff & D = (\neg x_1 \wedge \neg x_2 \wedge \neg x_3) \vee (\neg x_1 \wedge \neg x_2 \wedge x_3) \\
\iff & D = \big( (x_1 \downarrow x_1) \wedge (x_2 \downarrow x_2) \wedge (x_3 \downarrow x_3) \big) \vee \big( (x_1 \downarrow x_1) \wedge (x_2 \downarrow x_2) \wedge x_3 \big) \\
\iff & D = \bigg( \Big( \big( (x_1 \downarrow x_1) \downarrow (x_1 \downarrow x_1) \big) \downarrow \big( (x_2 \downarrow x_2) \downarrow (x_2 \downarrow x_2) \big) \Big) \wedge (x_3 \downarrow x_3) \bigg) \\
& \vee \bigg( \Big( \big( (x_1 \downarrow x_1) \downarrow (x_1 \downarrow x_1) \big) \downarrow \big( (x_2 \downarrow x_2) \downarrow (x_2 \downarrow x_2) \big) \Big) \wedge x_3 \bigg) \\
\iff & D = \bigg( A \downarrow B \bigg) \vee \bigg( A \downarrow C \bigg) \\
\iff & D = \Bigg( \bigg( A \downarrow B \bigg) \downarrow \bigg( A \downarrow C \bigg) \Bigg) \downarrow \Bigg( \bigg( A \downarrow B \bigg) \downarrow \bigg( A \downarrow C \bigg) \Bigg)
\end{align}$$

$\Box$

Applying $\textbf{Theorem 2}$ fully unlocks the potential to design any logic gate we desire! This concept took me back to October 2021, when I discovered “Turing Complete”, which is an amazing game on Steam dedicated to constructing fundamental stuffs in Computer Science (such as logic gates and circuits). Then I shared my excitement about this game on Zhihu.com, recommending it as an excellent introduction for those new to logic circuits.

Below are some related screenshots.

I post a thought on Zhihu.com at 19:28 on October 23, 2021 and said “A game recommendation: ‘Turing Complete’ on Steam, any newcomer that is interested in logic circuit should try it, super amazing!” (Note: I don’t use that account or Zhihu any more, the reason is here.)

A very basic level problem’s solution in the game “Turing Complete”

At that time, I actually found some of the construction problems in that game quite challenging (to me), so I was thinking that whether it is possible to build up an algorithm that can automatically construct new logic gates through a pattern of systematic combinations? And later on I was working on other academic stuffs (I checked my timeline and found myself was preparing for the IB and AP exams during that time, though those exams were later all canceled in Shanghai due to the quarantine unfortunately…) and completely forgot about this.

This old idea resurfaced recently as I delved into Mathematical Logic, so I wrote this post…
Using Raspberry Pi to Build a US VPN Server for My Family in China

Disclaimer: This post is based on my personal experience and is not sponsored or influenced by any platform or service mentioned.

In this blog post, I’m going to share my experience and detailed process of setting up a Raspberry Pi as a VPN server for my family in China. Due to the “Great Firewall” of China (for a more detailed background on this, please refer to my previous post here), accessing services like Google and ChatGPT is always a big challenge for the internet users in China (including my family). This became even more pressing when, around December 2023, all ChatGPT access node IPs used by our ClashX VPN were blocked by OpenAI. This situation led me to seek alternative methods to restore unrestricted internet access for my family.

To circumvent these restrictions, I decided to set up a VPN server using a Raspberry Pi. I placed the Raspberry Pi at my friend Ethan’s house, whom I thank for his and his wife Lily’s help!

Why Raspberry Pi? I chose a Raspberry Pi due to its compact size, low power consumption, and affordability (the total price of the Raspberry Pi Kit I bought is 187.41$ after tax).

My order history for Raspberry Pi 5 kit on Amazon. (here)

(March 14, 2024: Update, I just uploaded a video of Raspberry Pi 5 (which recorded on January 26, 2024) to YouTube so you can see how small it is.)

Why this VPN? WireGuard was chosen over OpenVPN for its simplicity, lighter nature and better performance, though both are viable options. I configured the Raspberry Pi using the CLI interface to maximize performance since a GUI consumes more resources. So for those looking to optimize their Raspberry Pi for VPN use, I recommend switching the “boot option” in “system config” to “text console” through sudo raspi-config.

One critical aspect was configuring port forwarding and static routing in NAT to enable communication between the VPN clients and the server. Port forwarding was set to direct traffic from port 51820 by default(used by WireGuard) to the Raspberry Pi, enabling external VPN client connections. Meanwhile, port forwarding and static routing was necessary within the home network to guide traffic destined for the VPN network directly to the Raspberry Pi in the router’s NAT setting, ensuring proper internal communication and internet access. Incidentally, I chose CloudFlare as the DNS provider.

And the Deco router in Ethan’s home has the ability to interact with IPv4 devices via NAT port forwarding, theoretically eliminating the need for Dynamic DNS (DDNS) for VPN connections, because the combination of the router’s IPv6 + the specified port number can create a definitive pathway for initiating communication and accessing the Raspberry Pi remotely.

However, I heard that IPv6 supports better security features natively as more ISPs and devices are gradually shifting to IPv6. So I prefer using IPv6 for my Raspberry Pi’s remote SSH access (in my case, as I’m frequently updating and maintaining my custom-designed Discord bot (here) for my game dev team (CruxAbyss) hosted on the Raspberry Pi recently: Despite Cloudflare Workers is a popular choice for serverless code hosting (with the first 100,000 requests each day being free!), I prefer direct control in Raspberry Pi for frequent update at the current stage, particularly as the request volume my code would generate may exceed this number each day.), especially, I’m very interested in applying the real-time IP address updates for my Raspberry Pi to deal with the dynamic property of Ethan’s router’s IPv6 address assignments:

The main challenge was dealing with dynamic IP addresses due to Ethan’s router (a Deco model) using DHCPv6. Assigning a static IPv6 to the router itself was not feasible due to the limitations of the router’s model and the potential security risks involved. Furthermore, I attempted to configure the router to reserve a static IPv6 address to the Raspberry Pi. However, I found that Ethan’s Deco router could only assign static IPv4 addresses, not IPv6. So this led to the necessity of implementing DDNS to keep the server accessible despite changing IP addresses. Thus, I resorted to using DDNS to ensure the SSH accessibility to my Raspberry Pi server regardless of the changes in its IPv6.

Awkwardly, during the setup, I discovered that Ethan’s Deco router supported VPN settings (including VPN like OpenVPN) 😂, including DDNS, but its DDNS was intended for the router’s use, not the devices under it. Therefore, I used a custom CloudFlare-DDNS solution (the repo is here). Also, ensure to initiate the ddns configuration on the Raspberry Pi by first establishing an SSH connection to input the token, as manual entry can be pretty cumbersome…

Forgot to mention, to enable SSH for Raspberry Pi, please execute sudo raspi-config, followed by sudo systemctl restart cron and sudo reboot. Initially, I recommend to connect both the Raspberry Pi and the controlling computer to the same network for SSH access. To enable SSH access under different networks later on, you need to make adjustments to the router’s firewall and Raspberry Pi’s UFW settings to permit SSH and VPN traffic, also ensuring port 22 is open for SSH connections. For IPv6 SSH connection, make sure both the client and server have IPv6 addresses and are configured to accept IPv6 connections. The device’s IPv6 connectivity can be checked using online tests such as here.

Once you could access the SSH connection with Raspberry Pi, then make sure replace the “DROP” in “DEFAULT_FORWARD_POLICY” the UFW config file with “ACCEPT”, or the traffic coming from the VPN interface (for example, wg0) and destined for the internet won’t be forwarded. This would prevent your VPN clients (like your phone) from accessing external websites or any resource outside the Raspberry Pi itself!

After configuring the VPN and resolving the networking issues, I recommend running pivpn debug to identify and resolve potential problems. This step is essential to ensure the VPN operates smoothly and securely.

Note: For the CloudFlare users, a significant step was configuring the CloudFlare DDNS script without enabling the “proxied” option in your CloudFlare DNS record settings. This was crucial as the VPN service needed to recognize direct connection requests from clients (recall the public DNS name you previously input), not masked ones via CloudFlare’s DDoS protection services (the underlying mechanism involves CloudFlare’s proxy acting to overlay a masking IP onto your domain. You can verify this by pinging your proxied domain directly from the terminal, which reveals that the IPv6 address returned is not the actual IPv6 of your domain.). Ensuring the DNS records for the VPN were not proxied allowed for direct and secure connections to the Raspberry Pi server.

However, I noticed that when I’m on an external network (outside my home), I can SSH into your Raspberry Pi using both its IPv4 and IPv6 addresses. But, when I’m at home and on the same local network as my Raspberry Pi, I can only SSH into it using its IPv4 address, as the SSH over IPv6 fails when I’m on the same local network. I think the most likely cause is that the router’s default settings were blocking or not routing IPv6 traffic correctly within the local network. As many home routers are configured primarily to handle IPv4 traffic internally.