Four of Mitchell's open problems are closed
It’s pretty rare for people to make explicit lists of open problems in their field. People instead spend most of their time staring at smaller, individual, paper-sized pieces.
Still, in 2006, the great researcher Tom Mitchell listed nine open problems in machine learning. I claim that four of them have since been solved completely (in the loose sense that we went from no working system to practical systems).
- Solved: Can unlabelled data be helpful for supervised learning? (Mikolov 2013, Devlin 2018, Radford 2019);
- Solved: How can we transfer what is learned for one task to improve learning in other related tasks? (Devlin 2018, Alammar, Radford 2019, Raffel 2020, Zhuang 2020);
- What is the relationship between different learning algorithms, and which should be used when? (some progress, but somewhat obviated by the dominance of the Transformer: Hu 2022, Tsai 2019);
- For learners that actively collect their own training data, what is the best strategy? (minor progress; Loshchilov 2016, Katharopoulos 2018, Jiang 2019, Mindermann 2022);
- Solved: To what degree can we have both data privacy and the benefits of data mining? (Bonawitz 2016, Kairouz 2021, McMahan 2022, Banse 2024);
- Can we build never-ending learners? (unsolved; Parisi 2019, Khetarpal 2022, Wang 2024);
- Can machine learning theories and algorithms help explain human learning? (minor progress; Shteingart 2014, Kudithipudi 2022, Parr 2021, Byrnes 2022, Levin 2024);
- Can we design programming languages containing machine learning primitives? (some progress; Weiss 2021, Pyro 2020);
- Solved: Will computer perception merge with machine learning? [i.e. multimodal systems] (Dosovitskiy 2021, Radford 2021, Radford 2022, Zia).
Bibliography
-
Alammar, Jay (2018). The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) https://jalammar.github.io/illustrated-bert/
-
Banse, Adrien; Kreischer Jan and Jürgens, Xavier Oliva i (2024). “Federated Learning with Differential Privacy” https://arxiv.org/abs/2402.02230
-
Bonawitz, Keith et. al. (2016). “Practical Secure Aggregation for Federated Learning on User-Held Data” https://arxiv.org/abs/1611.04482
-
Byrnes, Steven (2022). “My work on Artificial General Intelligence (AGI) safety” https://sjbyrnes.com/agi.html
-
Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton and Toutanova, Kristina (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” https://arxiv.org/abs/1810.04805
-
Dosovitskiy, Alexey et. al. (2021). “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale” https://arxiv.org/abs/2010.11929
-
Hu, Zhiting and Xing, Eric, P. (2022). “Toward a ‘Standard Model’ of Machine Learning”, Harvard Data Science Review, 4(4) https://hdsr.mitpress.mit.edu/pub/zkib7xth/release/2
-
Jiang, Angela H. et. al. (2019). “Accelerating Deep Learning by Focusing on the Biggest Losers” https://arxiv.org/abs/1910.00762
-
Kairouz, Peter et. al. (2021). “Advances and Open Problems in Federated Learning” https://arxiv.org/abs/1912.04977
-
Katharopoulos, Angelos and Fleuret, Francois (2018). “Not All Samples Are Created Equal: Deep Learning with Importance Sampling”, Proceedings of Machine Learning Research, 80, 2525-2534 https://proceedings.mlr.press/v80/katharopoulos18a.html
-
Khetarpal, Khimya; Riemer, Matthew; Rish, Irina and Precup, Doina (2022). “Towards Continual Reinforcement Learning: A Review and Perspectives” https://arxiv.org/abs/2012.13490
-
Kudithipudi, Dhireesha et. al (2022). “Biological underpinnings for lifelong learning machines”, Nature Machine Intelligence, 4, 196-210 https://www.nature.com/articles/s42256-022-00452-0
-
Levin, Michael (2024). “The Space Of Possible Minds”, Noema, https://www.noemamag.com/ai-could-be-a-bridge-toward-diverse-intelligence/
-
Loshchilov, Ilya and Hutter, Frank (2015). ‘Online Batch Selection for Faster Training of Neural Networks’ https://arxiv.org/abs/1511.06343
-
McMahan, Brendan, and Thakurta, Abhradeep (2022). “Federated Learning with Formal Differential Privacy Guarantees”. https://research.google/blog/federated-learning-with-formal-differential-privacy-guarantees/
-
Mikolov, Tomas; Sutskever Ilya; Chen, Kai; Corrado, Greg and Dean, Jeffrey (2013). “Distributed Representations of Words and Phrases and their Compositionality” https://arxiv.org/abs/1310.4546
-
Mindermann, Sören et. al. (2022), “Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt”, Proceedings of Machine Learning Research, 162, 15630-15649 https://proceedings.mlr.press/v162/mindermann22a.html
-
Mitchell, Tom (2006), “The Discipline of Machine Learning”. CMU working paper CMU-ML-06-108
https://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf -
Parisi, German I.; Kemker, Ronald; Part, Jose L.; Kanan, Christopher and Stefan Wermter (2019). “Continual lifelong learning with neural networks: A review”, Neural Networks, 113, 54-71 https://www.sciencedirect.com/science/article/pii/S0893608019300231
-
Parr, Thomas and Pezzulo, Giovanni (2021). “Understanding, Explanation, and Active Inference”, Frontiers in Systems Neuroscience 15, https://www.frontiersin.org/journals/systems-neuroscience/articles/10.3389/fnsys.2021.772641/full
-
Pyro (2020). “Primitives” https://docs.pyro.ai/en/stable/primitives.html
-
Radford, Alec; Wu Jeffrey; Amodei, Dario; Sutskever, Ilya et. al. (2019). “Language Models are Unsupervised Multitask Learners” https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
-
Radford, Alec et. al. (2021). “Learning Transferable Visual Models From Natural Language Supervision” https://arxiv.org/abs/2103.00020
-
Radford, Alec et. al. (2022). “Robust Speech Recognition via Large-Scale Weak Supervision” https://arxiv.org/abs/2212.04356
-
Raffel, Colin; Shazeer, Noam and Roberts, Adam, et. al. (2019). “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer” https://arxiv.org/abs/1910.10683
-
Shteingart, Hanan and Loewenstein, Yonatan (2014). “Reinforcement learning and human behavior”, Current Opinion in Neurobiology, (25), 93-98 https://www.sciencedirect.com/science/article/abs/pii/S0959438813002286
-
Tsai, Yao-Hung Hubert; Bai, Shaojie; Yamada, Makoto; Morency, Louis-Philippe and Salakhutdinov, Ruslan (2019). “Transformer Dissection: A Unified Understanding of Transformer’s Attention via the Lens of Kernel” https://arxiv.org/abs/1908.11775
-
Wang, Liyuan; Zhang, Xingxing; Su, Hang and Zhu, Jun (2024). “A Comprehensive Survey of Continual Learning: Theory, Method and Application”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8), 5362 - 5383 https://ieeexplore.ieee.org/document/10444954
-
Weiss, Gail; Goldberg, Yoav; Yahav, Eran (2021). “Thinking Like Transformers” https://arxiv.org/abs/2106.06981
-
Zhuang, Fuzhen et. al. (2019). “A Comprehensive Survey on Transfer Learning” https://arxiv.org/abs/1911.02685
-
Zia, Tehseen (2024). “Unveiling of Large Multimodal Models: Shaping the Landscape of Language Models in 2024”, Unite.ai, https://www.unite.ai/unveiling-of-large-multimodal-models-shaping-the-landscape-of-language-models-in-2024/