Jimmy Ba | home page

My long-term research goal is to address a computational question: How can we build general problem-solving machines with human-like efficiency and adaptability? In particular, my research interests focus on the development of efficient learning algorithms for deep neural networks. My research interests overlap with the following research communities: NeurIPS, ICLR, and ICML. I am also broadly interested in reinforcement learning, natural language processing, and artificial intelligence.

For future students interested in learning algorithms and theory: Please apply through the department admission.

Short bio: I completed PhD under the supervision of Geoffrey Hinton. Both my master's (2014) and undergrad degrees (2011) are from the University of Toronto under Brendan Frey and Ruslan Salakhutdinov. I am a CIFAR AI chair. I was a recipient of the Facebook Graduate Fellowship 2016 in machine learning.

--Google scholar page contact me: jba at cs.toronto.edu

Current postdocs

Bradly C. Stadie

Current students

Harris Chan (joint with Sanja Fidler)

Danijar Hafner

Jenny Liu

Silviu Pitis

Tingwu Wang (joint with Sanja Fidler)

Yeming Wen

Denny Wu (joint with Marzyeh Ghassemi)

Michael Zhang

Teaching

CSC413: Neural Networks and Deep Learning (Winter 2020)

CSC421: Neural Networks and Deep Learning (Winter 2019)

CSC2541: Deep Reinforcement Learning (Fall 2018)

ECE521: Inference Algorithms and Machine Learning (Spring 2017)

Research

Selected publications:

High-dimensional asymptotics of feature learning: how one gradient step improves the representation. Ba, J., Erdogdu, M. A., Suzuki, T., Wang, Z., Wu, D., & Yang, G. (2022). arXiv preprint arXiv:2205.01445.

You can't count on luck: why decision transformers fail in stochastic environments. Paster, K., McIlraith, S., & Ba, J. (2022). arXiv preprint arXiv:2205.15967.

Dataset distillation using neural feature regression. Zhou, Y., Nezhadarya, E., & Ba, J. (2022). arXiv preprint arXiv:2206.00719.

Understanding the variance collapse of svgd in high dimensions. Ba, J., Erdogdu, M. A., Ghassemi, M., Sun, S., Suzuki, T., Wu, D., & Zhang, T. (2021). International conference on learning representations.

Learning domain invariant representations in goal-conditioned block mdps. Han, B., Zheng, C., Chan, H., Paster, K., Zhang, M., & Ba, J. (2021). Advances in Neural Information Processing Systems, 34, 764-776.

Efficient statistical tests: a neural tangent kernel approach. Jia, S., Nezhadarya, E., Wu, Y., & Ba, J. (2021). International conference on machine learning (pp. 4893–4903). PMLR.

How does a neural network's architecture impact its robustness to noisy labels? Li, J., Zhang, M., Xu, K., Dickerson, J., & Ba, J. (2021). Advances in Neural Information Processing Systems, 34, 9788-9803.

On monotonic linear interpolation of neural network parameters. Lucas, J. R., Bae, J., Zhang, M. R., Fort, S., Zemel, R., & Grosse, R. B. (2021). International conference on machine learning (pp. 7168–7179). PMLR.

Clockwork variational autoencoders. Saxena, V., Ba, J., & Hafner, D. (2021). Advances in Neural Information Processing Systems, 34, 29246-29257.

Lime: learning inductive bias for primitives of mathematical reasoning. Wu, Y., Rabe, M. N., Li, W., Ba, J., Grosse, R. B., & Szegedy, C. (2021). International conference on machine learning (pp. 11251–11262). PMLR.

When does preconditioning help or hurt generalization? Amari, S.-i., Ba, J., Grosse, R., Li, X., Nitanda, A., Suzuki, T., … Xu, J. (2020). International conference on learning representations.

Generalization of two-layer neural networks: an asymptotic viewpoint. Ba, J., Erdogdu, M., Suzuki, T., Wu, D., & Zhang, T. (2020). International conference on learning representations (p. https://openreview.net/forum?id=H1gBsgBY).

A study of gradient variance in deep learning. Faghri, F., Duvenaud, D., Fleet, D. J., & Ba, J. (2020). arXiv preprint arXiv:2007.04532.

Mastering atari with discrete world models. Hafner, D., Lillicrap, T., Norouzi, M., & Ba, J. (2020). International conference on learning representations.

Action and perception as divergence minimization. Hafner, D., Ortega, P. A., Ba, J., Parr, T., Friston, K., & Heess, N. (2020). arXiv preprint arXiv:2009.01791.

Improving transformer optimization through better initialization. Huang, X. S., Perez, F., Ba, J., & Volkovs, M. (2020). International conference on machine learning (pp. 4475–4483). PMLR.

Noisy labels can induce good representations. Li, J., Zhang, M., Xu, K., Dickerson, J. P., & Ba, J. (2020). arXiv preprint arXiv:2012.12896.

Graph generation with energy-based models. Liu, J., Grathwohl, W., Ba, J., & Swersky, K. (2020). ICML Workshop on Graph Representation Learning and Beyond (GRL+).

Evaluating agents without rewards. Matusch, B., Ba, J., & Hafner, D. (2020). arXiv preprint arXiv:2012.11538.

Planning from pixels using inverse dynamics models. Paster, K., McIlraith, S. A., & Ba, J. (2020). International conference on learning representations.

An inductive bias for distances: neural nets that respect the triangle inequality. Pitis, S., Chan, H., Jamali, K., & Ba, J. (2020). International conference on learning representations (p. https://openreview.net/forum?id=HJeiDpVF).

Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. Pitis, S., Chan, H., Zhao, S., Stadie, B., & Ba, J. (2020). International conference on machine learning (pp. 7750–7761). PMLR.

Learning intrinsic rewards as a bi-level optimization problem. Stadie, B., Zhang, L., & Ba, J. (2020). Conference on uncertainty in artificial intelligence (pp. 111–120). PMLR.

On solving minimax optimization locally: a follow-the-ridge approach. Wang, Y., Zhang, G., & Ba, J. (2020). International conference on learning representations. arXiv preprint arXiv:1910.07512.

An empirical study of stochastic gradient descent with structured covariance noise. Wen, Y., Luk, K., Gazeau, M., Zhang, G., Chan, H., & Ba, J. (2020). International conference on artificial intelligence and statistics (pp. 3621–3631). PMLR.

Interplay between optimization and generalization of stochastic gradient descent with covariance noise. Wen, Y., Luk, K., Gazeau, M., Zhang, G., Chan, H., & Ba, J. (2020). International conference on artificial intelligence and statistics.

Batchensemble: an alternative approach to efficient ensemble and lifelong learning. Wen, Y., Tran, D., & Ba, J. (2020). International conference on learning representations (p. https://openreview.net/forum?id=Sklf1yrY).

The scattering compositional learner: discovering objects, attributes, relationships in analogical reasoning. Wu, Y., Dong, H., Grosse, R., & Ba, J. (2020). arXiv preprint arXiv:2007.04212.

Int: an inequality benchmark for evaluating generalization in theorem proving. Wu, Y., Jiang, A., Ba, J., & Grosse, R. (2020). International conference on learning representations.

Neural theorem proving on inequality problems. Wu, Y., Jiang, A., Grosse, R., & Ba, J. (2020). Artificial intelligence and theorem proving (aitp 2020).

Actrce: augmenting experience via teacher's advice for multi-goal reinforcement learning. Chan, H., Wu, Y., Kiros, J., Fidler, S., & Ba, J. (2019). arXiv preprint arXiv:1902.04546.

Dream to control: learning behaviors by latent imagination. Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2019). International conference on learning representations. arXiv preprint arXiv:1912.01603.

Dom-q-net: grounded rl on structured language. Jia, S., Kiros, J., & Ba, J. (2019). International conference on learning representations. arXiv preprint arXiv:1902.07257.

Graph normalizing flows. Liu, J. S., Kumar, A., Ba, J., Kiros, J. R., & Swersky, K. (2019). Advances in neural information processing systems.

Protoge: prototype goal encodings for multi-goal reinforcement learning. Pitis, S., Chan, H., & Ba, J. (2019). preprint.

Exploring model-based planning with policy networks. Wang, T., & Ba, J. (2019). International conference on learning representations. arXiv preprint arXiv:1906.08649.

Benchmarking model-based reinforcement learning. Wang, T., Bao, X., Clavera, I., Hoang, J., Wen, Y., Langlois, E., … Ba, J. (2019). arXiv preprint arXiv:1907.02057.

Neural graph evolution: towards efficient automatic robot design. Wang, T., Zhou, Y., Fidler, S., & Ba, J. (2019). International Conference on Learning Representations.

An empirical study of large-batch stochastic gradient descent with structured covariance noise. Wen, Y., Luk, K., Gazeau, M., Zhang, G., Chan, H., & Ba, J. (2019). arXiv preprint arXiv:1902.08234.

Lookahead optimizer: k steps forward, 1 step back. Zhang, M., Lucas, J., Hinton, G. E., & Ba, J. (2019). Advances in neural information processing systems (pp. 9593–9604).

Towards permutation-invariant graph generation. Liu, J., Kumar, A., Ba, J., & Swersky, K. (2018). preprint.

Reversible recurrent neural networks. MacKay, M., Vicol, P., Ba, J., & Grosse, R. B. (2018). Advances in neural information processing systems (pp. 9029–9040).

Kronecker-factored curvature approximations for recurrent neural networks. Martens, J., Ba, J., & Johnson, M. (2018). International conference on learning representations.

On the convergence and robustness of training gans with regularized optimal transport. Sanjabi, M., Ba, J., Razaviyayn, M., & Lee, J. D. (2018). Advances in neural information processing systems (pp. 7091–7101).

Nervenet: learning structured policy with graph neural networks. Wang, T., Liao, R., Ba, J., & Fidler, S. (2018). International conference on learning representations.

Flipout: efficient pseudo-independent weight perturbations on mini-batches. Wen, Y., Vicol, P., Ba, J., Tran, D., & Grosse, R. (2018). International Conference on Learning Representations.

Distributed second-order optimization using kronecker-factored approximations. Ba, J., Grosse, R., & Martens, J. (2017). International conference on learning representations.

Automated analysis of high‐content microscopy data with deep learning. Kraus, O. Z., Grys, B. T., Ba, J., Chong, Y., Frey, B. J., Boone, C., & Andrews, B. J. (2017). Molecular systems biology, 13(4), 924.

Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. Wu, Y., Mansimov, E., Grosse, R. B., Liao, S., & Ba, J. (2017). Advances in neural information processing systems (pp. 5279–5288).

Using fast weights to attend to the recent past. Ba, J., Hinton, G. E., Mnih, V., Leibo, J. Z., & Ionescu, C. (2016). Advances in neural information processing systems (pp. 4331–4339).

Layer normalization. Ba, J., Kiros, J. R., & Hinton, G. E. (2016). Advances in nips 2016 deep learning symposium (p. arXiv preprint arXiv:1607.06450).

Classifying and segmenting microscopy images with deep multiple instance learning. Kraus, O. Z., Ba, J. L., & Frey, B. J. (2016). Bioinformatics, 32(12), i52-i59.

Generating images from captions with attention. Mansimov, E., Parisotto, E., Ba, J., & Salakhutdinov, R. (2016). International conference on learning representations (p. arXiv preprint arXiv:1511.02793, 2015).

Actor-mimic: deep multitask and transfer reinforcement learning. Parisotto, E., Ba, J., & Salakhutdinov, R. (2016). International conference on learning representations (p. arXiv preprint arXiv:1511.06342).

Multiple object recognition with visual attention. Ba, J., Mnih, V., & Kavukcuoglu, K. (2015). International conference on learning representations.

Learning wake-sleep recurrent attention models. Ba, J., Salakhutdinov, R. R., Grosse, R. B., & Frey, B. J. (2015). Advances in neural information processing systems (pp. 2593–2601).

Predicting deep zero-shot convolutional neural networks using textual descriptions. Ba, J., Swersky, K., & Fidler, S. (2015). Proceedings of the ieee international conference on computer vision (pp. 4247–4255).

Adam: a method for stochastic optimization. Kingma, D., & Ba, J. (2015). International conference on learning representations.

Show, attend and tell: neural image caption generation with visual attention. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., … Bengio, Y. (2015). International conference on machine learning (pp. 2048–2057).

Do deep nets really need to be deep? Ba, J., & Caruana, R. (2014). Advances in neural information processing systems (pp. 2654–2662).

Adaptive dropout for training deep neural networks. Ba, J., & Frey, B. (2013). Advances in neural information processing systems (pp. 3084–3092).