AI Foundations

Content

Why am I writing this?
Foundational Models
Model Development
AI Engineering
Agents and RAG
Post Ramble

Why am I writing this?

It's hard to avoid AI, whether it's our search engines showing AI results or our email agents summarizing emails before we can read the actual message.

Good or bad, we've all got our opinions. Regardless, it's happening and most of us will use it whether we 'like' it or not. Myself, pretty neutral, technology continuously develops and I always try to stay on top of it.

This time though, I feel like I've been using AI without understanding the underlying principles. I'm naturally curious and to some of my colleagues and friends demise, always question everything. How does that work? Why would be do that? What would it help improve?

I believe that having a basic understanding of anything we use, benefits us. How does a computer work? How come planes can lift off and fly? Why are we humans susceptible to marketing strategies? All of which help us understand more about society, help business and if nothing else, allows you to drop fun facts during conversation.

What about AI as we see and use it today? Whilst I've worked on predictive analytics in the past, I feel like I did not fully understand the underlying principles of how AI works. So I set out to overcome this frustrating issue and landed on a book I'd highly recommend (and added to our in library at work); AI Engineering by Chip Huyen. She is a great resource when it comes to AI so do please check out her blog and further reading materials on her website: https://huyenchip.com/

The book has (finally) cured me from that itch of not knowing, and I've used this opportunity to solidify my learning by writing it up in this blog.

Foundational Models

Foundational models are designed and pre-trained on vast amounts of data, to be used for many different tasks. Think; writing and translation, image analysis, and reasoning, etc.

What makes a foundational model other than the vast amount of data going into it?

The two underlying concepts are language models & self-supervision.

Language models can process and predict text. The (text) data going into the language model is broken up into smaller chunks referred to as tokens. A few these tokens can then be used to generate a large number of distinct words.

A token can be a character, part of a word or a word. Different models tokenize sentences differently and can use them to finish a sentence (autoregressive language model) or predict missing sections of a sentence (masked language model).

an example of tokenization of a sentence

The suggestions that the language models make are based on the probability of finding words (tokens) in a certain sequence. This chance-based characteristic of the models is what we get to work with; great, exciting and sometimes frustrating.

Self-supervision is what makes language model stand out from other machine learning models. It refers to the capability of the model being able to label the input data itself. In other machine learning models, such as a customer churn model (of all our existing customers, who is likely to leave us?), we need to label our training data to say who did leave us in the past and who didn't.

Language models are capable of using their tokens to label the data themselves, e.g. which token follows after the other based on the start and end of a sentence. Given that data labeling is time consuming and therefor expensive given the vast amount of available to us, language models are able to overcome that limitation and utilize a lot more data; Large Language Models, although what is defined 'large' keeps shifting as technology and capabilities keep expanding.

So how did these LLM's evolve into Foundational Models?

Traditionally AI research focused on training their models on one specific data type. Text, image, video, audio etc. This also meant their purpose was specific to one modality, but once models started to get trained on more then one data modality, it started to give us options to input a variety of tokens (text and images) which also meant it could output more than one.

These multimodal modals don't necessarily make a Foundational Model, but it allowed models to become much broader in scope given us more opportunities to built on top of them for different needs. This and all the above makes a model a Foundational Model.

Model Development

To build a foundational model you start with pre-training, which is a incredibly resource intensive step. This is like other machine learning methods where we start at a base set of parameters and in the LLM case it's trained for text completion.

Post-training (carried out pre foundational model release) and finetuning (typically carried out by the AI engineer) are the less resource intensive parts compared to pre-training. These tasks involve optimizing the models for conversation rather than text completion (supervised finetuning) and to align the model responses with human preference (reinforcement learning techniques). The model can be great at completing sentences but the end users are after the whole response/conversation.

To do all the above, you need data. Retrieving, generating, curating and annotating data comes as a second nature in a data analytics career. For AI it differs slightly where it's less structured and the end goal isn't necessarily more structure but the there are overlap between principles, it requires:

Quality
Deduplication
Tokenization
Context Retrieval

Knowing the data that went into the training of the model gives us insights about the potential strengths and weaknesses of the model. Will it know enough about the domain I'm trying to work with? Are there any biases that might appear?

Lastly, there is inference optimization where speed and costs are tackled. If foundational models generate tokens sequentially (autoregression), you can imagine receiving a long answer back from the model could take a while. Lowering the compute required to get an output is the goal, with different strategies existing to date to do so. With industries being properly exposed to the costs in real-time, this remains an interesting optimization angle.

AI Engineering

AI engineering is the next step of the current AI landscape where applications are build on top of foundational models.

The way we see the results of today range from AI integrations within existing hardware, software, operating systems to stand alone applications/products. To the determent of my laptops' keyboard having an AI key, replacing my right ctrl key, but I digress.

Luckily I have remapped many mouse keys before so I reverted this key back to right ctrl

There are several techniques that can enhance a general foundational model from being reasonably capable to highly effective at delivering the desired outcomes.

Having a base understanding of what these techniques involve allowed me to have better conversations, discussions and, to be fair, more trust in applications that are being built.

Selection of the foundational model to be underlying the application development consist of evaluation of the model(s) against the purpose of the application that's being developed. Sounds straightforward but in reality it sounds much harder to me compared to more traditional machine learning techniques given the board scope what the LLMs are capable of. Whilst my knowledge on evaluation of LLMs feels shallow compared to what's out there, it feels like a very obvious and crucial step that needs continuous testing.

Prompt Engineering is a way to influence operation of the model without changing the weights of the model itself. As an end-user, you have likely done this before when using AI, and as a developer it allows you to do the same and get more desirable outcomes and performance.

The AI interface is the last but crucial piece that determines a large part of the user experience. The traditional chat interfaces are broadly speaking easily accessible and useable for most users. Embedding these chats with other (visual) interfaces provide better feedback loops between AI and other elements of the application. This mix of AI and user input to the output has to be faster, easier and/or better than separate application before it can be considered as added value over two independent tools. The final user interface that the application can be designed with visual best practices and ease of access in mind, I'd hope to see more of that but have yet to experience an intuitive application with a great user interface.

Agents and RAG

Before I wrap this up, I'd like to cover two terms that come up a lot when reading about development of new applications; AI agents and retrieval augmented generation (RAG).

They are two common extensions of foundational models to provide additional context without changing the model itself. For RAG, see it as the capability to point it towards external/secondary information, e.g., the capability to add documents, data, images, the internet, etc. to then use this to create more accurate outputs based on the input request.

For agents, I see them as a system of RAGs. They operate in an environment, like Power BI Desktop software, or a larger Power BI Service / Fabric environment. Within the environment they have access to tools to carry out a different set of actions. For Power BI Desktop, build out a DAX calculation (language used to build analytical expressions like moving average of sales) or within Fabric, build out a Notebook with T-SQL to build out data tables as well as generate and publishing a semantic model to a workspace for other people to use.

These are my fundamental understandings and there is a lot more to consider when implementing either of these techniques, so I recommend reading up on this further if you are interested!

Post Ramble

I've purposely not gone deep into application development, skipped infrastructure completely, and mainly stuck to the (foundational) model to keep it concise. All three have to work together in order to achieve success, and thus also create opportunity for work. Don't think application development is the only opportunity out there, if it all, I think it's the most volatile one.

On that note, appreciate you making it all the way to the end of this post and I hope you learned something new.

Author:

Robbin Vernooij

View Profile