| |||||||||||||||||||||||||
למידה ממוחשבת מחיזוקים
Reinforcement Learning |
0368-3075-01 | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
מדעים מדויקים | |||||||||||||||||||||||||
|
Introduction, Optimal Policy, Planning MDP (Bellman optimality equations), Value iteration,
Policy iteration, Dynamic Programming, Learning MDP (small state spaces), TD Learning,
Model base learning, Model free: Q learning, Policy gradient, Actor critic,
Learning MDP (large state spaces), Deep Learning, Multi-Arm Bandit,
Inverse RL, POMDP