dssim性能提升结果-1

这部分对求解器进行了修改。

根据之前的分析结果，求解器花费时间基本都在step方法上

 def step(self):
     # this method represents the whole process of a major step
     # it returns a generator, which yields at each minor step
     t = self.t
     h = self.step_size
     y_init = self.state_array
     K = np.empty((self.stage_num + 1, self.state_num), dtype='float')
     K[0] = self.deriv_fun(t, y_init, 0)
     yield
     for s, (a, c) in enumerate(zip(self.A[1:], self.C[1:]), start=1):
         # x = c
         dy = np.dot(K[:s].T, a[:s]) * h
         self.output_fun(t + c * h, y_init + dy, c * h)
         yield
         K[s] = self.deriv_fun(t + c * h, y_init + dy, c * h)
         yield

     y_new = y_init + h * np.dot(K[:-1].T, self.B)
     # TODO: calculate f_new for error check
     self.state_array = y_new
     self.t_prev = t
     self.t = t + h

仿真的过程大致为，在每个步长中：
1. 所有模块output
2. 所有模块update
3. 所有模块solve 在solve中，会为每个模块生成一个求解器，求解时就是执行对应的step()方法。所以step函数的时间开销比较大。
之前讨论过temporary分支和dev分支，对于temporary分支，就是在小周期不更新，具体说就是离散模块不会调用这个step，而是调用包装类的_dummy_generator方法。
```
 def _dummy_generator(self):
     # for systems with no need to solve
     # make sure the generator works well
     # TODO: get stage num from engine
     for i in range(2 * self.solver.stage_num - 1):
         yield
```
yield的次数保证和连续模块同步，但是在rzip模型的测试中结果偏差较大，也就是小周期不更新带来的偏差。
在step方法中，对于连续模块所有步骤都是必要的，包括计算新的状态变量、调用output_fun、deriv_fun，但是对于离散模块来说只需要output_fun就可以完成在小周期输出的功能。

在rk.py中增加一个dummy_step方法，连续模块依然调用step，离散模块调用dummy_step来替代。

 def dummy_step(self):
     # this method will be called by systems without continuous state variables
     t = self.t
     h = self.step_size
     yield
     for c in self.C[1:]:
         self.output_fun(t + c * h, self.state_array, c * h)
         yield
         yield
     self.t = t + h

更改过后测试rzip，和之前的dev分支的结果一致，没有影响准确性，scalene测试时间在400s左右，时间降低约1/3。现在的测试结果如下。
1. system_proxy.py: % of time = 27.2%
  1. SystemProxy.collect_input_data
  2. SystemProxy.ode_output_func
  3. SystemProxy.solve
2. math_operation/product/system.py: % of time = 13.8%
  1. System.is_scalar
  2. System.output
3. rk.py: % of time = 8.1%
  1. RK4.dummy_step（大部分）
  2. RK4.step
4. math_operation/sum/system.py: % of time = 6.9%
  1. System.output
5. numpy/core/shape_base.py: % of time = 6.0%
  1. System.atleast_2d
6. base_system.py: % of time = 5.2%
  1. System.set_output_data
  2. System.solve
7. model_runtime.py: % of time = 5.0%
  1. ModelRuntime.solve
整体来看，rk.py的时间占比已经大幅度下降，system_proxy还有比较大的优化空间，下一步重点看system_proxy。

本文章使用limfx的vscode插件快速发布