1 、概述 Tip
本文是对于机器学习基础概念的简单回顾,具体可以参照 机器学习(ML) 的文档
一、通用步骤 step1 :找出 \(f\) 来拟合
step2 :从训练函数定义 \(Loss\)
step3 :梯度下降 寻找最优解(局部)
Note
超参数 :需要自己设定的参数,e.g. 激活函数、batch、epoch ……
二、激活函数 线性模型 \(y=\vec w·\vec x+b\) 不能拟合复杂情况,需要更有弹性 的函数
用足够多的激活函数凑出piecewise linear,再用足够多的piecewise linear来逼近原来的曲线 $$ \begin{align} y=c·\frac{1}{1+e^{-(b+wx_1)}}\ =c·sigmoid(b+wx_1) \end{align} $$ 通过调整 \(w,b,c\) 来组合出不同的 \(sigmoid\) 函数
进而改进模型 $$ f=y = b + \sum_i c_i \, \text{sigmoid} \left( b_i + \sum_j w_{ij} x_j \right) $$
将参数展开成列向量 \(\theta\) ,然后 \(\theta^* = \arg\min_{\theta} L\)
\[ \theta = \begin{bmatrix} \theta_1 \\ \theta_2 \\ \theta_3 \\ \vdots \end{bmatrix} \]
\[ \quad g_{gradient} = \begin{bmatrix} \frac{\partial L}{\partial \theta_1} \bigg|_{\theta=\theta^0} \\ \frac{\partial L}{\partial \theta_2} \bigg|_{\theta=\theta^0} \\ \vdots \end{bmatrix} =\nabla L(\theta^0) \]
Tip
把数据分成多个 batch
,1 epoch
= 把所有 batch
运行过一遍
e.g. N = 10,000, B = 10,每个epoch
更新1,000 次 通过增加网络层数可以显著降低 \(Loss\)
Deep Learning 中 Deep 代表有许多的隐藏层,但是光通过叠层数会导致过拟合 (overfitting)的问题
三、反向传播 \[ L(\theta) = \sum_{n=1}^{N} c^n(\theta)\Rightarrow \frac{\partial L(\theta)}{\partial w} = \sum_{n=1}^{N} \frac{\partial c^n(\theta)}{\partial w} \]
🌟 HW01 1、Model
Odd
发现不用 batchnorm
和 dropout
反而效果更好
class My_Model ( nn . Module ):
def __init__ ( self , input_dim ):
super ( My_Model , self ) . __init__ ()
self . layers = nn . Sequential (
# 第一层
nn . Linear ( input_dim , 32 ),
nn . LeakyReLU (),
# 第二层
nn . Linear ( 32 , 16 ),
nn . LeakyReLU (),
# 第三层
nn . Linear ( 16 , 8 ),
nn . LeakyReLU (),
# 输出
nn . Linear ( 8 , 1 )
)
def forward ( self , x ):
x = self . layers ( x )
x = x . squeeze ( 1 ) # (B, 1) -> (B)
return x
Architecture from torchsummary import summary
summary ( model , input_size = ( input_dim ,))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Linear-1 [-1, 32] 672
LeakyReLU-2 [-1, 32] 0
Linear-3 [-1, 16] 528
LeakyReLU-4 [-1, 16] 0
Linear-5 [-1, 8] 136
LeakyReLU-6 [-1, 8] 0
Linear-7 [-1, 1] 9
================================================================
Total params: 1,345
Trainable params: 1,345
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.01
Estimated Total Size (MB): 0.01
----------------------------------------------------------------
2、Feature Selection
def select_feat ( train_data , valid_data , test_data , select_all = True ):
'''Selects useful features to perform regression'''
y_train , y_valid = train_data [:, - 1 ], valid_data [:, - 1 ]
raw_x_train , raw_x_valid , raw_x_test = train_data [:,: - 1 ], valid_data [:,: - 1 ], test_data
if select_all :
feat_idx = list ( range ( raw_x_train . shape [ 1 ]))
else :
selector = SelectKBest ( score_func = f_regression , k = 20 ) # 选择 k 个最佳特征
selector . fit ( raw_x_train , y_train ) # 训练特征选择器
feat_idx = selector . get_support ( indices = True ) # 获取选中的特征索引
return raw_x_train [:, feat_idx ], raw_x_valid [:, feat_idx ], raw_x_test [:, feat_idx ], y_train , y_valid
3、Training Loop
优化器 和反向传播 都在这边定义
optimizer = torch . optim . Adam (
model . parameters (), # 传入模型的参数
lr = config [ 'learning_rate' ], # 学习率(从 config 字典获取)
weight_decay = config . get ( 'weight_decay' , 1e-4 ), # L2 正则化(默认值 1e-4)
)
4、Hyperparameter
config = {
'seed' : 804286 ,
'select_all' : False , # 是否选取所有特征
'valid_ratio' : 0.2 , # validation_size = train_size * valid_ratio
'n_epochs' : 3000 ,
'batch_size' : 256 ,
'learning_rate' : 1e-5 ,
'early_stop' : 400 , # 如果模型没有进步则早停
'save_path' : './models/model.ckpt' # 保存模型
}
March 30, 2025 March 21, 2025