Accelerating AI training with LLM (Large Language Models) innovation

We live in the era of refreshed natural language programming (NLP) and especially large transformer-based language models (LLMs).

First, NLP models are on the rise, advancing a fascinating field of AI with applications to shape the very future of human life, tackling an array of tasks including conversation, text completion, and even coding. This has been fueled by advances in accuracy, scalability, and production readiness. Indeed, a study conducted by John Snow Labs last year found that 60% of technology leaders indicated their LNP budgets increased by at least 10% compared to 2020, with a third rel=”nofollow” (33%) reflecting an increase of at least 30%.

Special attention was given to LLMs for their key role in reinventing two key branches of AI: language and vision, with expertise from NVIDIA Megatron NeMo a superb example, providing an end-to-end framework, from data curation to training, inference and evaluation. LLMs are trained on massive amounts of data and enable learning from text with applications through text summarization, real-time content generation, customer service chatbots, and Q&A for conversational AI interfaces, as summarized below:

Main advantages of the LLM

1) Summary and paraphrase (or a better way of saying)

– Concrete examples: newsletters, web search engines, media monitoring

2) Classification (request or question)

– Real world examples: supply chain order queries

– Bank queries sent to agents

3) Semantic similarity (answered before?)

– Real-world examples: credit scoring, clinical trial matching,

– Modular FAQ Scoring Helplines

The size and complexity of LLM- continuess accelerate. Traditionally, training a big language model could be compared to the sport of bodybuilding – you have to eat a lot (big data) and train a lot (big model)! Indeed, Deep Learning is more dependent on computing power than many other fields because models have more parameters than data points. Other Pain Points in LLM Developments include time, expense, level of deep technical expertise, distributed infrastructure, and the need for a comprehensive approach. But these challenges are met head-on with NVIDIA Megatron NeMo upgraded to help train LLMs more efficiently, making them more powerful and more applicable in a range of different scenarios. Let’s explore all the key developments!

Back to the pioneering research paper »Attention is all you need,” the scale, speed, and trajectory of innovation are clear with NVIDIA now established as a leading contributor and indeed pioneer in the field of AI. As someone who has worked in research development myself, most recently as a principal investigator on the “breakthrough” study of digital transformation, I have experienced first-hand the challenge of training LLMs on large datasets on supercomputers, which can take weeks, and sometimes even months. What if we could speed up training by up to 30% – or about ten days of development time?

NVIDIA’s NeMo Megatron Updates Enable Exactly That! – with the ability to distribute training across as many GPUs as you like, reducing both the amount of memory and compute required for training, and making model deployment much more accessible and much faster as well. This translates to a model of 175 B parameters trained in 24 days instead of 34 using 1,024 NVIDIA A100 Tensor Core GPUs.

Amount_of_Activation_Memory_Required.png

This development also provides companies with exciting opportunities to offer customized GPT-3 models “as a service”, for example a bespoke, industry-specific LLM. I also believe it can help advance the ability of LLMs to interact together, the “waterfall perspective” described here, and to better plan and reason about change through pattern recognition – an issue highlighted in other breakups to research. So how did this innovation come to fruition? The advance is based on two technologies and the hyperparameter tool:

Sequence parallelism (SP)

Reduces enable memory requirements beyond the usual tensor and pipeline parallelism methods by noticing previously unparallelized layers of transformers and recomputing only those parts of each transformer layer that use significant memory but are easy to calculate.

Selective Active Recomputation (SAP) and Hyperparameters Tool

Select activations with high memory requirements and low computation to recalculate when memory constraints are too tight, and thus avoid the inefficiency of full activation recalculation. The hyperparameter tool introduced in NeMo Megatron automatically identifies optimal training and inference setups, enabling setups with the highest model throughput or lowest latency during inference, and eliminating the time needed to find an optimal design – and all without any code changes required.

Amount_of_Computation_Overhead.png

And with my personal passion for democratizing access to AI, it’s significant to note that BLOOM – which stands for BigScience Large Open-science Open-access Multilingual Language Model and is the “largest open-science, open-access multilingual language model” – was trained using Megatron-DeepSpeed. This was supported by the work of over 1,000 volunteer researchers under the BigScience project, coordinated by hugging face and with funding from the French government – and it already allows the generation of texts in 46 languages ​​and 13 programming languages! With Megatron’s performance updates now available, alongside business benefits, I hope this will also help advance change in the culture of AI development and support the democratization of access to technology. cutting-edge AI for researchers around the world. Exciting times indeed!

Final Thoughts

NeMo Megatron’s advancements will dramatically increase future computations and results, making LLM training and inference both easier and reproducible across a wide range of GPU cluster configurations – I’m excited to see all the new applications than this will bring, and in the meantime the latest technical blog on all the changes can be viewed here. You can also explore Early Access options and freely test the company’s hands-on lab via NVIDIA too!

And finally, don’t miss the opportunity to connect with AI developers and innovators on #GTC22 September 19-22 and help shape the future of artificial intelligence, large language models, computer graphics, accelerated computing and more. You can register for free with my unique code for a chance to win an RTX3080 Ticlick here!

Thanks for reading – all feedback is welcome, Sally

About the Author

Professor Sally Eaves is a highly experienced CTO, Advanced Technology Professor and Global Digital Transformation Strategy Advisor specializing in the application of emerging technologies including AI, 5G, cloud, security and IoT disciplines. , for business and IT. transformation, alongside large-scale social impact.

An international speaker and author, Sally was the first recipient of the Frontier Technology and Social Impact Award, presented at the United Nations, and has been described as the “torchbearer of ethical technology”, founding Aspirational Futures to improve inclusion, diversity and belonging to the technological space and beyond. Sally is also the Chair of Global Cyber ​​Trust at GFCYBER.

Comments are closed.