3 、无监督学习

一、聚类🧀

1、`K-means`算法🧀

随机选择 $K$ 个中心点作为初始质心
将每个点分配给最近的质心形成 $K$ 个聚类
重新计算每个聚类中的所有点坐标平均值
重复直到质心不再变化

Repeat {
    # Assign points to cluster centroids
    for i = 1 to m
        c(i) := index (from 1 to K) of cluster centroid closest to x(i)

    # Move cluster centroids
    for k = 1 to K
        mu_k := average (mean) of points assigned to cluster k
}

2、优化目标🧀

代价函数

\[ J(c^{(1)},...,c^{(m)},\mu_1,...,\mu_K) = \frac{1}{m} \sum_{i=1}^{m} \left\| x^{(i)} - \mu_{c^{(i)}} \right\|^2 \]

\[ \min_{c^{(1)},...,c^{(m)}, \mu_1,...,\mu_K} J(c^{(1)},...,c^{(m)},\mu_1,...,\mu_K) \]

（1）初始化🧀

随机选择 $K$ 个训练样本作为中心点，随机多组然后选择 $J$ 最小的一组，优化减少迭代次数

For i = 1 to 100 {
    Randomly initialize K-means.
    Run K-means. Get c^(1), ..., c^(m), μ_1, μ_2, ..., μ_k
    Compute cost function (distortion) J(c^(1), ..., c^(m), μ_1, μ_2, ..., μ_k)
}
Pick set of clusters that gave lowest cost J

（2）选择聚类个数🧀

肘部法（Elbow Method）:寻找 $J$ 下降速度明显减缓的点，即为肘部，选择 $K$
结合实际选择取舍

3、异常检测🧀

密度估计（Dense Estimation） $$ p(x)<\epsilon \Rightarrow Anomaly $$

\[ p(x) = \prod_{i=1}^{n} p(x_i; \mu_j, \sigma_j^2) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi}\sigma_j} \exp\left(-\frac{(x_i - \mu_j)^2}{2\sigma_j^2}\right) \]

实数评估

在开发异常检测系统时，可以将数据集分为训练集、验证集和测试集（或训练集和验证集，如果异常样本极少）。通过在验证集上评估模型性能，可以确定最合适的阈值ε

异常检测 (Anomaly Detection)	监督学习 (Supervised Learning)
非常少的正例 (y = 1)，大量的负例 (y = 0)。	大量的正例和负例。
许多不同类型的异常 (y = 1)。很难从正例中学习异常的样子	足够的正例让算法了解正例的样子，未来的正例可能与训练集中的相似。

4、特征选择🧀

异常检测的错误分析目标：

对于正常样本 $x$，希望 $p(x)≥ε$（即概率较大）。
对于异常样本 $x$，希望 $p(x)<ε$（即概率较小）。

最常见的问题：

对于正常和异常样本，$p(x)$ 的值相当（即两者都较大）。

二、推荐系统🧀

学习所有用户的参数

Note

$r(i,j) = 1$ 如果用户 $j$ 已经对电影 $i$ 进行了评分（否则为 0）
$y^{(i,j)}$ = 用户 $j$ 对电影 $i$ 的评分（如果已定义）
$w^{(j)}, b^{(j)}$ = 用户 $j$ 的参数
$x^{(i)}$ = 电影 $i$ 的特征向量

\[ \min_{w^{(j)}, b^{(j)}} J(w^{(j)}, b^{(j)}) = \frac{1}{2m^{(j)}} \sum_{i:r(i,j)=1} \left( w^{(j)} \cdot x^{(i)} + b^{(j)} - y^{(i,j)} \right)^2 + \frac{\lambda}{2m^{(j)}} \sum_{k=1}^{n} \left( w_k^{(j)} \right)^2 \]

1、协同过滤🧀

学习用户参数 $w^{(1)}, b^{(1)}, \ldots, w^{(n_u)}, b^{(n_u)}$ 的成本函数：

\[ \min_{w^{(1)}, b^{(1)}, \ldots, w^{(n_u)}, b^{(n_u)}} \frac{1}{2} \sum_{j=1}^{n_u} \sum_{i: r(i,j)=1} (w^{(j)} \cdot x^{(i)} + b^{(j)} - y^{(i,j)})^2 + \frac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^{n} (w_k^{(j)})^2 \]

学习电影特征向量 $x^{(1)}, \ldots, x^{(n_m)}$ 的成本函数：

\[ \min_{x^{(1)}, \ldots, x^{(n_m)}} \frac{1}{2} \sum_{i=1}^{n_m} \sum_{j: r(i,j)=1} (w^{(j)} \cdot x^{(i)} + b^{(j)} - y^{(i,j)})^2 + \frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^{n} (x_k^{(i)})^2 \]

两式叠加：

\[ \min_{\substack{w^{(1)}, \ldots, w^{(n_u)} \\ b^{(1)}, \ldots, b^{(n_u)} \\ x^{(1)}, \ldots, x^{(n_m)}}} J(w, b, x) = \frac{1}{2} \sum_{(i,j): r(i,j)=1} \left( w^{(j)} \cdot x^{(i)} + b^{(j)} - y^{(i,j)} \right)^2 + \frac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^{n} \left( w_k^{(j)} \right)^2 + \frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^{n} \left( x_k^{(i)} \right)^2 \]

def cofi_cost_func(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    nm, nu = Y.shape
    J = 0
    ### START CODE HERE ###  
    for j in range(nu):
        w = W[j, :]
        b_j = b[0, j]
        for i in range(nm):
            x = X[i, :]
            y = Y[i, j]
            r = R[i, j]
            J += 0.5 * np.square(r * (np.dot(w, x) + b_j - y))
    # Regularization
    J += (lambda_ / 2) * (np.sum(np.square(W)) + np.sum(np.square(X)))
    ### END CODE HERE ### 

    return J

（1）二元标签🧀

1 表示用户在展示某项物品后参与了
0 表示用户在展示某项物品后没有参与

损失函数 $L$ 为：

$$ L\left(f_{w, b, x}\left(x\right), y^{(i, j)}\right) = -y^{(i, j)} \log\left(f_{w, b, x}\left(x\right)\right) - \left(1 - y^{(i, j)}\right) \log\left(1 - f_{w, b, x}\left(x\right)\right) $$ 整体损失函数 $J$ 为：

\[ J(w, b, x) = \sum_{(i, j): r(i, j) = 1} L\left(f_{w, b, x}(x), y^{(i, j)}\right) \]

（2）均值归一化🧀

均值归一化让数据分布更加合理，提高模型的训练效率和效果

\[ origin=\begin{bmatrix} 5 & 5 & 0 & 0 & ? \\ 5 & ? & ? & 0 & ? \\ ? & 4 & 0 & ? & ? \\ 0 & 0 & 5 & 4 & ? \\ 0 & 0 & 5 & 0 & ? \end{bmatrix} \]

\[ \mu = \begin{bmatrix} 2.5 \\ 2.5 \\ 2 \\ 2.25 \\ 1.25 \end{bmatrix} \Rightarrow \begin{bmatrix} 2.5 & 2.5 & -2.5 & -2.5 & ? \\ 2.5 & ? & ? & -2.5 & ? \\ ? & 2 & -2 & ? & ? \\ -2.25 & -2.25 & 2.75 & 1.75 & ? \\ -1.25 & -1.25 & 3.75 & -1.25 & ? \end{bmatrix} \]

（3）实现🧀

# Initialize an optimizer
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

# Set the number of iterations
iterations = 200

# Define the cost function
def cofiCostFuncV(X, W, b, Ynorm, R, num_users, num_movies, lambda):
    # The implementation of the cost function should be here
    pass

# Gradient descent loop
for iter in range(iterations):
    # Use TensorFlow's GradientTape to record the operations used to compute the cost
    with tf.GradientTape() as tape:
        # Compute the cost
        cost_value = cofiCostFuncV(X, W, b, Ynorm, R, num_users, num_movies, lambda)

    # Use the gradient tape to automatically retrieve the gradients of the trainable variables with respect to the loss
    grads = tape.gradient(cost_value, [X, W, b])

    # Run one step of gradient descent by updating the value of the variables to minimize the loss
    optimizer.apply_gradients(zip(grads, [X, W, b]))

2、基于内容的过滤🧀

协同过滤 (Collaborative Filtering)	基于内容过滤 (Content-based Filtering)
基于具有相似评分行为的用户的评分	基于用户和项目特征之间的匹配
对新用户或新项目效果较差	对新项目表现较好，只要有特征数据

\[ J = \sum_{(i,j): r(i,j)=1} \left( \nu_u^{(j)} \cdot \nu_m^{(i)} - y^{(i,j)} \right)^2 + \text{NN regularization term} \]

（1）大目录🧀

Ⅰ. 检索（Retrieval）：

生成一份潜在的项目候选列表。根据用户最近观看的项目、用户常看的类别和流行项目，找出并合并相关的推荐项。

Ⅱ. 排序（Ranking）：

对检索到的候选项目进行评分与排序，使用学习到的模型根据用户偏好进行个性化调整，最终展示给用户最佳的推荐列表。

（2）实现🧀

# Create the user neural network
user_NN = tf.keras.models.Sequential([
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(32)
])

# Create the item neural network
item_NN = tf.keras.models.Sequential([
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(32)
])

# Create the user input and point to the base network
input_user = tf.keras.layers.Input(shape=(num_user_features,))
vu = user_NN(input_user)
vu = tf.linalg.l2_normalize(vu, axis=1)

# Create the item input and point to the base network
input_item = tf.keras.layers.Input(shape=(num_item_features,))
vm = item_NN(input_item)
vm = tf.linalg.l2_normalize(vm, axis=1)

# Measure the similarity of the two vector outputs
output = tf.keras.layers.Dot(axes=1)([vu, vm])

# Specify the inputs and output of the model
model = tf.keras.Model(inputs=[input_user, input_item], outputs=output)

# Specify the cost function
cost_fn = tf.keras.losses.MeanSquaredError()

三、强化学习🧀

奖励函数 → 告诉机器做什么而不是怎么做，根据结果来奖惩

创建这样的映射，而行为一般是比较模糊的，所以监督学习在这种情况下不适用了

\[ 状态\rightarrow 行为 \]

1、return🧀

表示从某一时刻开始，智能体（agent）在未来所能获得的累计奖励（reward）。具体来说，它描述了未来奖励的（可能折扣的）累计和

(1) 无折扣的 Return：如果任务结束于某个时间步 $T$，则在时间步 $t$ 时，智能体的未来 return $G_t$ 定义为：

$$ G_t = r_t + r_{t+1} + r_{t+2} + \dots + r_T $$

其中：

$r_t$：时间步 $t$ 的即时奖励（reward）。
$T$：任务结束时间。

(2) 带折扣因子的 Return：在许多情况下，为了防止累计奖励发散（例如任务时间无限长），引入了折扣因子 $ \gamma $ 配置未来奖励的权重。此时，return $G_t$ 定义为： $$ G_t = r_t + \gamma r_{t+1} + \gamma^2 r_{t+2} + \dots = \sum_{k=0}^\infty \gamma^k r_{t+k} $$ 其中：

$ \gamma $：折扣因子， $ 0 \leq \gamma \leq 1 $。
- 当 $ \gamma = 0 $：只关心即时奖励，忽略未来回报。
- 当 $ \gamma \to 1 $：未来奖励的权重接近等于当前奖励，强调长期回报。

2、策略🧀

\[ \pi(s)=a \]

找到每个状态下对应应该做的行为在最大化return（$\pi$→policy）

Tip

马尔可夫决策过程（Markov Decision Process, MDP）未来状态只取决于当下决定

3、状态-动作函数（$Q-function$）🧀

graph TD
    A[Agent] -->|State,Reward| B[Environment/World]
    B -->|Action| A

\[ \begin{aligned} Q(s, a) = return\\ \end{aligned} \]

从状态 s 开始
执行动作 a（一次）
然后之后的行为都是最优的

Note

在状态 $s$ 最好的 $return$ 就是$\max Q(s,a)$

4、$Bellman$ 方程🧀

$s' $：采取动作 $a$ 后得到的状态
$a'$：在状态 $s'$ 下采取的动作
$R(s)$：当前状态的奖励

\[ Q(s, a) = R(s) + γ \times \max Q(s', a') \]

Info

随机环境：

解决：多次随机试验，取平均值，因为可能存在进行了某个行为但是misstep的情况

$$ \text{Expected Return} = \text{Average}(R_1 + \gamma R_2 + \gamma^2 R_3 + \gamma^3 R_4 + \cdots) $$

5、连续状态空间应用🧀

连续状态中不再是一个数字，而是状态表示为一串向量

$$ \begin{aligned} Q(s, a) = R(s) + γ \times \max Q(s', a')\ \end{aligned} $$ $这里认为(s,a)为x,R(s) + γ \times \max Q(s', a')为y，然后进行监督学习$

训练学习函数

# DQN
Initialize neural network randomly as guess of Q(s, a).

Repeat {
  Take actions in the lunar lander. Get (s, a, R(s), s').
  Store 10,000 most recent (s, a, R(s), s') tuples.

  Train neural network:
  Create training set of 10,000 examples using
  x = (s, a) and y = R(s) + γ * max_a' Q(s', a')

  Train Q_new such that Q_new(s, a) ≈ y.
  Set Q = Q_new.
}

6、优化🧀

输出不是一个单元输出最大的可能行为$a$，而是多个单元输出所有的$Q(s,a)$，这样一次推理即可而非多次推理
$\epsilon-贪婪策略$ ：
- 以 0.95 的概率选择使 Q(s,a) 最大化的动作 a，这是一种贪婪策略，因为它倾向于选择当前最优的动作。
- 以 0.05（$\epsilon$）的概率随机选择一个动作 a，这被称为“探索”（Exploration），因为它允许智能体尝试新的动作，因为$Q$ 函数可能初始化有高偏差。
小批量：每次训练不使用全部数据，而是选取一个子集（即小批量），尤其是对于深度Q网络（DQN）和其他基于策略梯度的算法。它控制每次网络参数更新时使用的样本数，通过从存储的经验池（replay buffer）中随机抽取一个小批量样本，用于网络更新，从而实现模型参数的优化
软更新：在软更新中，目标网络的权重会根据一个小的正数因子（通常称为 tau 或 τ）来逐渐接近在线网络的权重。这个过程可以表示为：

\[ \theta_{\text{target}} = \tau \times \theta_{\text{online}} + (1 - \tau) \times \theta_{\text{target old}} \]

3 、无监督学习

一、聚类🧀

1、`K-means`算法🧀

2、优化目标🧀

（1）初始化🧀

（2）选择聚类个数🧀

3、异常检测🧀

4、特征选择🧀

二、推荐系统🧀

1、协同过滤🧀

（1）二元标签🧀

（2）均值归一化🧀

（3）实现🧀

2、基于内容的过滤🧀

（1）大目录🧀

（2）实现🧀

三、强化学习🧀

1、return🧀

2、策略🧀

3、状态-动作函数（\(Q-function\)）🧀

4、\(Bellman\) 方程🧀

5、连续状态空间应用🧀

6、优化🧀

3 、无监督学习

一、聚类🧀

1、K-means算法🧀

2、优化目标🧀

（1）初始化🧀

（2）选择聚类个数🧀

3、异常检测🧀

4、特征选择🧀

二、推荐系统🧀

1、协同过滤🧀

（1）二元标签🧀

（2）均值归一化🧀

（3）实现🧀

2、基于内容的过滤🧀

（1）大目录🧀

（2）实现🧀

三、强化学习🧀

1、return🧀

2、策略🧀

3、状态-动作函数（\(Q-function\)）🧀

4、\(Bellman\) 方程🧀

5、连续状态空间应用🧀

6、优化🧀

1、`K-means`算法🧀