| Trust region policy optimization J Schulman, S Levine, P Abbeel, M Jordan, P Moritz International conference on machine learning, 1889-1897, 2015 | 1703 | 2015 |
| Infogan: Interpretable representation learning by information maximizing generative adversarial nets X Chen, Y Duan, R Houthooft, J Schulman, I Sutskever, P Abbeel Advances in neural information processing systems, 2172-2180, 2016 | 1465 | 2016 |
| Proximal policy optimization algorithms J Schulman, F Wolski, P Dhariwal, A Radford, O Klimov arXiv preprint arXiv:1707.06347, 2017 | 1450 | 2017 |
| OpenAI Gym G Brockman, V Cheung, L Pettersson, J Schneider, J Schulman, J Tang, ... arXiv preprint arXiv:1606.01540, 2016 | 1115 | 2016 |
| Benchmarking deep reinforcement learning for continuous control Y Duan, X Chen, R Houthooft, J Schulman, P Abbeel International Conference on Machine Learning, 1329-1338, 2016 | 649 | 2016 |
| High-dimensional continuous control using generalized advantage estimation J Schulman, P Moritz, S Levine, M Jordan, P Abbeel arXiv preprint arXiv:1506.02438, 2015 | 637 | 2015 |
| Concrete problems in AI safety D Amodei, C Olah, J Steinhardt, P Christiano, J Schulman, D Mané arXiv preprint arXiv:1606.06565, 2016 | 484 | 2016 |
| Theano: A Python framework for fast computation of mathematical expressions R Al-Rfou, G Alain, A Almahairi, C Angermueller, D Bahdanau, N Ballas, ... arXiv preprint arXiv:1605.02688, 2016 | 458 | 2016 |
| OpenAI Baselines P Dhariwal, C Hesse, M Plappert, A Radford, J Schulman, S Sidor, Y Wu | 327 | 2017 |
| Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization. J Schulman, J Ho, AX Lee, I Awwal, H Bradlow, P Abbeel Robotics: science and systems 9 (1), 1-10, 2013 | 308 | 2013 |
| Spike sorting for large, dense electrode arrays C Rossant, SN Kadir, DFM Goodman, J Schulman, MLD Hunter, ... Nature neuroscience 19 (4), 634, 2016 | 290 | 2016 |
| Motion planning with sequential convex optimization and convex collision checking J Schulman, Y Duan, J Ho, A Lee, I Awwal, H Bradlow, J Pan, S Patil, ... The International Journal of Robotics Research 33 (9), 1251-1270, 2014 | 279 | 2014 |
| Vime: Variational information maximizing exploration R Houthooft, X Chen, Y Duan, J Schulman, F De Turck, P Abbeel Advances in Neural Information Processing Systems, 1109-1117, 2016 | 260* | 2016 |
| Variational lossy autoencoder X Chen, DP Kingma, T Salimans, Y Duan, P Dhariwal, J Schulman, ... arXiv preprint arXiv:1611.02731, 2016 | 257 | 2016 |
| RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning Y Duan, J Schulman, X Chen, PL Bartlett, I Sutskever, P Abbeel arXiv preprint arXiv:1611.02779, 2016 | 223 | 2016 |
| On first-order meta-learning algorithms A Nichol, J Achiam, J Schulman arXiv preprint arXiv:1803.02999, 2018 | 198* | 2018 |
| Gradient estimation using stochastic computation graphs J Schulman, N Heess, T Weber, P Abbeel Advances in Neural Information Processing Systems, 3528-3536, 2015 | 178 | 2015 |
| #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning H Tang, R Houthooft, D Foote, A Stooke, OAIX Chen, Y Duan, J Schulman, ... Advances in Neural Information Processing Systems, 2750-2759, 2017 | 154 | 2017 |
| Learning complex dexterous manipulation with deep reinforcement learning and demonstrations A Rajeswaran, V Kumar, A Gupta, G Vezzani, J Schulman, E Todorov, ... arXiv preprint arXiv:1709.10087, 2017 | 117 | 2017 |
| Theano: A Python framework for fast computation of mathematical expressions TTD Team, R Al-Rfou, G Alain, A Almahairi, C Angermueller, D Bahdanau, ... arXiv preprint arXiv:1605.02688, 2016 | 111 | 2016 |