Four of Mitchell's open problems are closed

It’s pretty rare for people to make explicit lists of open problems in their field. People instead spend most of their time staring at smaller, individual, paper-sized pieces.

Still, in 2006, the great researcher Tom Mitchell listed nine open problems in machine learning. I claim that four of them have since been solved completely (in the loose sense that we went from no working system to practical systems).

Solved: Can unlabelled data be helpful for supervised learning? (Mikolov 2013, Devlin 2018, Radford 2019);

Solved: How can we transfer what is learned for one task to improve learning in other related tasks? (Devlin 2018, Alammar, Radford 2019, Raffel 2020, Zhuang 2020);

What is the relationship between different learning algorithms, and which should be used when? (some progress, but somewhat obviated by the dominance of the Transformer: Hu 2022, Tsai 2019);

For learners that actively collect their own training data, what is the best strategy? (minor progress; Loshchilov 2016, Katharopoulos 2018, Jiang 2019, Mindermann 2022);

Solved: To what degree can we have both data privacy and the benefits of data mining? (Bonawitz 2016, Kairouz 2021, McMahan 2022, Banse 2024);

Can we build never-ending learners? (unsolved; Parisi 2019, Khetarpal 2022, Wang 2024);

Can machine learning theories and algorithms help explain human learning? (minor progress; Shteingart 2014, Kudithipudi 2022, Parr 2021, Byrnes 2022, Levin 2024);

Can we design programming languages containing machine learning primitives? (some progress; Weiss 2021, Pyro 2020);

Solved: Will computer perception merge with machine learning? [i.e. multimodal systems] (Dosovitskiy 2021, Radford 2021, Radford 2022, Zia).

Bibliography

Alammar, Jay (2018). The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) https://jalammar.github.io/illustrated-bert/
Banse, Adrien; Kreischer Jan and Jürgens, Xavier Oliva i (2024). “Federated Learning with Differential Privacy” https://arxiv.org/abs/2402.02230
Bonawitz, Keith et. al. (2016). “Practical Secure Aggregation for Federated Learning on User-Held Data” https://arxiv.org/abs/1611.04482
Byrnes, Steven (2022). “My work on Artificial General Intelligence (AGI) safety” https://sjbyrnes.com/agi.html
Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton and Toutanova, Kristina (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” https://arxiv.org/abs/1810.04805
Dosovitskiy, Alexey et. al. (2021). “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale” https://arxiv.org/abs/2010.11929
Hu, Zhiting and Xing, Eric, P. (2022). “Toward a ‘Standard Model’ of Machine Learning”, Harvard Data Science Review, 4(4) https://hdsr.mitpress.mit.edu/pub/zkib7xth/release/2
Jiang, Angela H. et. al. (2019). “Accelerating Deep Learning by Focusing on the Biggest Losers” https://arxiv.org/abs/1910.00762
Kairouz, Peter et. al. (2021). “Advances and Open Problems in Federated Learning” https://arxiv.org/abs/1912.04977
Katharopoulos, Angelos and Fleuret, Francois (2018). “Not All Samples Are Created Equal: Deep Learning with Importance Sampling”, Proceedings of Machine Learning Research, 80, 2525-2534 https://proceedings.mlr.press/v80/katharopoulos18a.html
Khetarpal, Khimya; Riemer, Matthew; Rish, Irina and Precup, Doina (2022). “Towards Continual Reinforcement Learning: A Review and Perspectives” https://arxiv.org/abs/2012.13490
Kudithipudi, Dhireesha et. al (2022). “Biological underpinnings for lifelong learning machines”, Nature Machine Intelligence, 4, 196-210 https://www.nature.com/articles/s42256-022-00452-0
Levin, Michael (2024). “The Space Of Possible Minds”, Noema, https://www.noemamag.com/ai-could-be-a-bridge-toward-diverse-intelligence/
Loshchilov, Ilya and Hutter, Frank (2015). ‘Online Batch Selection for Faster Training of Neural Networks’ https://arxiv.org/abs/1511.06343
McMahan, Brendan, and Thakurta, Abhradeep (2022). “Federated Learning with Formal Differential Privacy Guarantees”. https://research.google/blog/federated-learning-with-formal-differential-privacy-guarantees/
Mikolov, Tomas; Sutskever Ilya; Chen, Kai; Corrado, Greg and Dean, Jeffrey (2013). “Distributed Representations of Words and Phrases and their Compositionality” https://arxiv.org/abs/1310.4546
Mindermann, Sören et. al. (2022), “Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt”, Proceedings of Machine Learning Research, 162, 15630-15649 https://proceedings.mlr.press/v162/mindermann22a.html
Mitchell, Tom (2006), “The Discipline of Machine Learning”. CMU working paper CMU-ML-06-108
https://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf
Parisi, German I.; Kemker, Ronald; Part, Jose L.; Kanan, Christopher and Stefan Wermter (2019). “Continual lifelong learning with neural networks: A review”, Neural Networks, 113, 54-71 https://www.sciencedirect.com/science/article/pii/S0893608019300231
Parr, Thomas and Pezzulo, Giovanni (2021). “Understanding, Explanation, and Active Inference”, Frontiers in Systems Neuroscience 15, https://www.frontiersin.org/journals/systems-neuroscience/articles/10.3389/fnsys.2021.772641/full
Pyro (2020). “Primitives” https://docs.pyro.ai/en/stable/primitives.html
Radford, Alec; Wu Jeffrey; Amodei, Dario; Sutskever, Ilya et. al. (2019). “Language Models are Unsupervised Multitask Learners” https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
Radford, Alec et. al. (2021). “Learning Transferable Visual Models From Natural Language Supervision” https://arxiv.org/abs/2103.00020
Radford, Alec et. al. (2022). “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
Raffel, Colin; Shazeer, Noam and Roberts, Adam, et. al. (2019). “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer” https://arxiv.org/abs/1910.10683
Shteingart, Hanan and Loewenstein, Yonatan (2014). “Reinforcement learning and human behavior”, Current Opinion in Neurobiology, (25), 93-98 https://www.sciencedirect.com/science/article/abs/pii/S0959438813002286
Tsai, Yao-Hung Hubert; Bai, Shaojie; Yamada, Makoto; Morency, Louis-Philippe and Salakhutdinov, Ruslan (2019). “Transformer Dissection: A Unified Understanding of Transformer’s Attention via the Lens of Kernel” https://arxiv.org/abs/1908.11775
Wang, Liyuan; Zhang, Xingxing; Su, Hang and Zhu, Jun (2024). “A Comprehensive Survey of Continual Learning: Theory, Method and Application”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8), 5362 - 5383 https://ieeexplore.ieee.org/document/10444954
Weiss, Gail; Goldberg, Yoav; Yahav, Eran (2021). “Thinking Like Transformers” https://arxiv.org/abs/2106.06981
Zhuang, Fuzhen et. al. (2019). “A Comprehensive Survey on Transfer Learning” https://arxiv.org/abs/1911.02685
Zia, Tehseen (2024). “Unveiling of Large Multimodal Models: Shaping the Landscape of Language Models in 2024”, Unite.ai, https://www.unite.ai/unveiling-of-large-multimodal-models-shaping-the-landscape-of-language-models-in-2024/