IBM Granite 3.2 adds Enhanced Reasoning to its AI mix - Related to debugging, ai, adds, dreaded, language

Debugging the Dreaded NaN

You are training your latest AI model, anxiously watching as the loss steadily decreases when suddenly — boom! Your logs are flooded with NaNs (Not a Number) — your model is irreparably corrupted and you’re left staring at your screen in despair. To make matters worse, the NaNs don’t appear consistently. Sometimes your model trains just fine; other times, it fails inexplicably. Sometimes it will crash immediately, sometimes after many days of training.

NaNs in Deep Learning workloads are amongst the most frustrating issues to encounter. And because they often appear sporadically — triggered by a specific combination of model state, input data, and stochastic factors — they can be incredibly difficult to reproduce and debug.

Given the considerable cost of training AI models and the potential waste caused by NaN failures, it is recommended to have dedicated tools for capturing and analyzing NaN occurrences. In a previous post, we discussed the challenge of debugging NaNs in a TensorFlow training workload. We proposed an efficient scheme for capturing and reproducing NaNs and shared a sample TensorFlow implementation. In this post, we adopt and demonstrate a similar mechanism for debugging NaNs in PyTorch workloads. The general scheme is as follows:

Save a copy of the training input batch. Check the gradients for NaN values. If any appear, save a checkpoint with the current model weights before the model is corrupted. Also, save the input batch and, if necessary, the stochastic state. Discontinue the training job. Reproduce and debug the NaN occurrence by loading the saved experiment state.

Although this scheme can be easily implemented in native PyTorch, we will take the opportunity to demonstrate some of the conveniences of PyTorch Lightning — a powerful open-source framework designed to streamline the development of machine learning (ML) models. Built on PyTorch, Lightning abstracts away many of the boiler-plate components of an ML experiment, such as training loops, data distribution, logging, and more, enabling developers to focus on the core logic of their models.

To implement our NaN capturing scheme, we will use Lightning’s callback interface — a dedicated structure that enables inserting custom logic at specific points during the flow of execution.

Importantly, please do not view our choice of Lightning or any other tool or technique that we mention as an endorsement of its use. The code that we will share is intended for demonstrative purposes — please do not rely on its correctness or optimality.

Many thanks to Rom Maltser for his contributions to this post.

To implement our NaN capturing solution, we create a NaNCapture Lightning callback. The constructor receives a directory path for storing/loading checkpoints and sets up the NaNCapture state. We also define utilities for checking for NaNs, storing checkpoints, and halting the training job.

import os import torch from copy import deepcopy import lightning.pytorch as pl class NaNCapture(pl.Callback): def __init__(self, dirpath: str): # path to checkpoint self.dirpath = dirpath # revision to True when Nan is identified self.nan_captured = False # stores a copy of the last batch self.last_batch = None self.batch_idx = None @staticmethod def contains_nan(tensor): return [website] # alternatively check for finite # return not torch.isfinite(tensor).item() @staticmethod def halt_training(trainer): trainer.should_stop = True # communicate stop command to all other ranks trainer.strategy.reduce_boolean_decision(trainer.should_stop, all=False) def save_ckpt(self, trainer): os.makedirs(self.dirpath, exist_ok=True) # include trainer.global_rank to avoid conflict filename = f"nan_checkpoint_rank_{trainer.global_rank}.ckpt" full_path = [website], filename) print(f"saving ckpt to {full_path}") trainer.save_checkpoint(full_path, False).

We begin by implementing the on_train_batch_start hook to store a copy of each input batch. In case of a NaN event, this batch will be stored in the checkpoint.

Callback Function: on_before_optimizer_step.

Next we implement the on_before_optimizer_step hook. Here, we check for NaN entries in all of the gradient tensors. If found, we store a checkpoint with the uncorrupted model weights and halt the training.

Python"> def on_before_optimizer_step(self, trainer, pl_module, optimizer): if not self.nan_captured: # Check if gradients contain NaN grads = [[website] for p in pl_module.parameters() if [website] is not None] all_grads = [website] if self.contains_nan(all_grads): print("nan found") self.save_ckpt(trainer) self.halt_training(trainer).

To enable reproducibility, we include the NaNCapture state in the checkpoint by appending it to the training state dictionary. Lightning provides dedicated utilities for saving and loading a callback state:

def state_dict(self): d = {"nan_captured": self.nan_captured} if self.nan_captured: d["last_batch"] = self.last_batch return d def load_state_dict(self, state_dict): self.nan_captured = [website]"nan_captured", False) if self.nan_captured: self.last_batch = state_dict["last_batch"].

We have described how our NaNCapture callback can be used to store the training state that resulted in a NaN, but how do we reload this state in order to reproduce the issue and debug it? To accomplish this, we leverage Lightning’s dedicated data loading class, LightningDataModule.

DataModule Function: on_before_batch_transfer.

In the code block below, we extend the LightningDataModule class to allow injecting a fixed training input batch. This is achieved by overriding the on_before_batch_transfer hook, as shown below:

from lightning.pytorch import LightningDataModule class InjectableDataModule(LightningDataModule): def __init__(self): super().__init__() self.cached_batch = None def set_custom_batch(self, batch): self.cached_batch = batch def on_before_batch_transfer(self, batch, dataloader_idx): if self.cached_batch: return self.cached_batch return batch.

The final step is modifying the on_train_start hook of our NaNCapture callback to inject the stored training batch into the LightningDataModule.

def on_train_start(self, trainer, pl_module): if self.nan_captured: datamodule = trainer.datamodule datamodule.set_custom_batch(self.last_batch).

In the next section we will demonstrate the end-to-end solution using a toy example.

To test our new callback, we create a resnet50-based image classification model with a loss function deliberately designed to trigger NaN occurrences.

Instead of using the standard CrossEntropy loss, we compute binary_cross_entropy_with_logits for each class independently and divide the result by the number of samples belonging to that class. Inevitably, we will encounter a batch in which one or more classes are missing, leading to a divide-by-zero operation, resulting in NaN values and corrupting the model.

The implementation below follows Lightning’s introductory tutorial.

import lightning.pytorch as pl import torch import torchvision import [website] as F num_classes = 20 # define a lightning module class ResnetModel(pl.LightningModule): def __init__(self): """Initializes a new instance of the MNISTModel class.""" super().__init__() [website] = [website] def forward(self, x): return [website] def training_step(self, batch, batch_nb): x, y = batch outputs = self(x) # uncomment for default loss # return F.cross_entropy(outputs, y) # calculate binary_cross_entropy for each class individually losses = [] for c in range(num_classes): count = torch.count_nonzero(y==c) masked = [website], 1., 0.) loss = F.binary_cross_entropy_with_logits( outputs[..., c], masked, reduction='sum' ) mean_loss = loss/count # could result in NaN [website] total_loss = [website] return total_loss def configure_optimizers(self): return [website], [website].

We define a synthetic dataset and encapsulate it in our InjectableDataModule class:

import os import random from [website] import Dataset, DataLoader batch_size = 128 num_steps = 800 # A dataset with random images and labels class FakeDataset(Dataset): def __len__(self): return batch_size*num_steps def __getitem__(self, index): rand_image = [website][3, 224, 224], dtype=torch.float32) label = [website], num_classes-1), [website] return rand_image, label # define a lightning datamodule class FakeDataModule(InjectableDataModule): def train_dataloader(self): dataset = FakeDataset() return DataLoader( dataset, batch_size=batch_size, num_workers=os.cpu_count(), pin_memory=True ).

Finally, we initialize a Lightning Trainer with our NaNCapture callback and call [website] with our Lightning module and Lightning DataModule.

import time if __name__ == "__main__": # Initialize a lightning module lit_module = ResnetModel() # Initialize a DataModule mnist_data = FakeDataModule() # Train the model ckpt_dir = "./ckpt_dir" trainer = pl.Trainer( max_epochs=1, callbacks=[NaNCapture(ckpt_dir)] ) ckpt_path = None # check is nan ckpt exists if [website] # check if nan ckpt exists if [website] dir_contents = [[website], f) for f in os.listdir(ckpt_dir)] ckpts = [f for f in dir_contents if [website] and f.endswith('.ckpt')] if ckpts: ckpt_path = ckpts[0] t0 = time.perf_counter() [website], mnist_data, ckpt_path=ckpt_path) print(f"total runtime: {time.perf_counter() - t0}").

After a number of training steps, a NaN event will occur. At this point a checkpoint is saved with the full training state and the training is halted.

When the script is run again the exact state that caused the NaN will be reloaded allowing us to easily reproduce the issue and debug its root cause.

To assess the impact of our NaNCapture callback on runtime performance, we modified our experiment to use CrossEntropyLoss (to avoid NaNs) and measured the average throughput when running with and without NaNCapture callback. The experiments were conducted on an NVIDIA L40S GPU, with a PyTorch [website] Docker image.

Overhead of NaNCapture Callback (by Author).

For our toy model, the NaNCapture callback adds a minimal [website] overhead to the runtime performance — a small price to pay for the valuable debugging capabilities it provides.

Naturally, the actual overhead will depend on the specifics of the model and runtime environment.

The solution we have described henceforth will succeed in reproducing the training state provided that the model does not include any randomness. However, introducing stochasticity into the model definition is often critical for convergence. A common example of a stochastic layer is [website].

You may find that your NaN event depends on the precise state of randomness when the failure occurred. Consequently, we would like to enhance our NaNCapture callback to capture and restore the random state at the point of failure. The random state is determined by a number of libraries. In the code block below, we attempt to capture the full state of randomness:

import os import torch import random import numpy as np from copy import deepcopy import lightning.pytorch as pl class NaNCapture(pl.Callback): def __init__(self, dirpath: str): # path to checkpoint self.dirpath = dirpath # revision to True when Nan is identified self.nan_captured = False # stores a copy of the last batch self.last_batch = None self.batch_idx = None # rng state self.rng_state = { "torch": None, "torch_cuda": None, "numpy": None, "random": None } @staticmethod def contains_nan(tensor): return [website] # alternatively check for finite # return not torch.isfinite(tensor).item() @staticmethod def halt_training(trainer): trainer.should_stop = True trainer.strategy.reduce_boolean_decision(trainer.should_stop, all=False) def save_ckpt(self, trainer): os.makedirs(self.dirpath, exist_ok=True) # include trainer.global_rank to avoid conflict filename = f"nan_checkpoint_rank_{trainer.global_rank}.ckpt" full_path = [website], filename) print(f"saving ckpt to {full_path}") trainer.save_checkpoint(full_path, False) def on_train_start(self, trainer, pl_module): if self.nan_captured: # inject batch datamodule = trainer.datamodule datamodule.set_custom_batch(self.last_batch) def on_train_batch_start(self, trainer, pl_module, batch, batch_idx): if self.nan_captured: # restore random state [website]["torch"]) [website]["torch_cuda"]) [website]["numpy"]) random.setstate(self.rng_state["random"]) else: # capture current batch self.last_batch= deepcopy(batch) self.batch_idx = batch_idx # capture current random state self.rng_state["torch"] = [website] self.rng_state["torch_cuda"] = [website] self.rng_state["numpy"] = [website] self.rng_state["random"] = random.getstate() def on_before_optimizer_step(self, trainer, pl_module, optimizer): if not self.nan_captured: # Check if gradients contain NaN grads = [[website] for p in pl_module.parameters() if [website] is not None] all_grads = [website] if self.contains_nan(all_grads): print("nan found") self.save_ckpt(trainer) self.halt_training(trainer) def state_dict(self): d = {"nan_captured": self.nan_captured} if self.nan_captured: d["last_batch"] = self.last_batch d["rng_state"] = self.rng_state return d def load_state_dict(self, state_dict): self.nan_captured = [website]"nan_captured", False) if self.nan_captured: self.last_batch = state_dict["last_batch"] self.rng_state = state_dict["rng_state"].

Importantly, setting the random state may not guarantee full reproducibility. The GPU owes its power to its massive parallelism. In some GPU operations, multiple threads may read or write concurrently to the same memory locations resulting in nondeterminism. PyTorch allows for some control over this via its use_deterministic_algorithms, but this may impact the runtime performance. Additionally, there is a possibility that the NaN event will not reproduced once this configuration setting is changed. Please see the PyTorch documentation on reproducibility.

Encountering NaN failures is one of the most discouraging events that can happen in machine learning development. These errors not only waste valuable computation and development resources, but often indicate fundamental issues in the model architecture or experiment design. Due to their sporadic, sometimes elusive nature, debugging NaN failures can be a nightmare.

This post introduced a proactive approach for capturing and reproducing NaN errors using a dedicated Lightning callback. The solution we shared is a proposal which can be modified and extended for your specific use case.

While this solution may not address every possible NaN scenario, it significantly reduces debugging time when applicable, potentially saving developers countless hours of frustration and wasted effort.

En quelques jours, un homme a perdu son emploi, vu ses informations personnelles exposées et ses finances ravagées, tout ça à cause d’un fichier appar......

Amblyopia, often referred to as ‘lazy eye’, is a prevalent yet frequently overlooked vision disorder that affects 1-5% of the global population. Its p......

Un jeune Chinois, connu sous le nom de famille Liu, a été victime d’une fraude. Il pensait être en couple avec une femme nommée Mme Jiao. Liu a envoyé......

Unraveling Large Language Model Hallucinations

In a YouTube video titled Deep Dive into LLMs like ChatGPT, former Senior Director of AI at Tesla, Andrej Karpathy discusses the psychology of Large Language Models (LLMs) as emergent cognitive effects of the training pipeline. This article is inspired by his explanation of LLM hallucinations and the information presented in the video.

You might have seen model hallucinations. They are the instances where LLMs generate incorrect, misleading, or entirely fabricated information that appears plausible. These hallucinations happen because LLMs do not “know” facts in the way humans do; instead, they predict words based on patterns in their training data. Early models released a few years ago struggled significantly with hallucinations. Over time, mitigation strategies have improved the situation, though hallucinations haven’t been fully eliminated.

An illustrative example of LLM hallucinations (Image by Author).

Zyler Vance is a completely fictitious name I came up with. When I input the prompt “Who is Zyler Vance?” into the falcon-7b-instruct model, it generates fabricated information. Zyler Vance is not a character in The Cloverfield Paradox (2018) movie. This model, being an older version, is prone to hallucinations.

To understand where these hallucinations originate from, you have to be familiar with the training pipeline. Training LLMs typically involve three major stages.

Pretraining Post-training: Supervised Fine-Tuning (SFT) Post-training: Reinforcement Learning with Human Feedback (RLHF).

This is the initial stage of the training for LLMs. During pretraining the model is exposed to a huge quantity of very high-quality and diverse text crawled from the internet. Pretraining helps the model learn general language patterns, grammar, and facts. The output of this training phase is called the base model. It is a token simulator that predicts the next word in a sequence.

To get a sense of what the pretraining dataset might look like you can see the FineWeb dataset. FineWeb dataset is fairly representative of what you might see in an enterprise-grade language model. All the major LLM providers like OpenAI, Google, or Meta will have some equivalent dataset internally like the FineWeb dataset.

As I mentioned before, the base model is a token simulator. It simply samples internet text documents. We need to turn this base model into an assistant that can answer questions. Therefore, the pretrained model is further refined using a dataset of conversations. These conversation datasets have hundreds of thousands of conversations that are multi-term and very long covering a diverse breadth of topics.

Illustrative human assistant conversations from InstructGPT distribution.

These conversations come from human labelers. Given conversational context human lablers write out ideal responses for an assistant in any situation. Later, we take the base model that is trained on internet documents and substitute the dataset with the dataset of conversations. Then continue the model training on this new dataset of conversations. This way, the model adjusts rapidly and learns the statistics of how this assistant responds to queries. At the end of training the model is able to imitate human-like responses.

OpenAssistant/oasst1 is one of the open-source conversations dataset available at hugging face. This is a human-generated and human-annotated assistant-style conversation corpus consisting of 161,443 messages in 35 different languages.

Post-training: Reinforcement Learning with Human Feedback.

Supervised Fine-Tuning makes the model capable. However, even a well-trained model can generate misleading, biased, or unhelpful responses. Therefore, Reinforcement Learning with Human Feedback is required to align it with human expectations.

We start with the assistant model, trained by SFT. For a given prompt we generate multiple model outputs. Human labelers rank or score multiple model outputs based on quality, safety, and alignment with human preferences. We use these data to train a whole separate neural network that we call a reward model.

The reward model imitates human scores. It is a simulator of human preferences. It is a completely separate neural network, probably with a transformer architecture, but it is not a language model in the sense that it generates diverse language. It’s just a scoring model.

Now the LLM is fine-tuned using reinforcement learning, where the reward model provides feedback on the quality of the generated outputs. So instead of asking a real human, we’re asking a simulated human for their score of an output. The goal is to maximize the reward signal, which reflects human preferences.

Now that we have a clearer understanding of the training process of large language models, we can continue with our discussion on hallucinations.

Hallucinations originate from the Supervised Fine-Tuning stage of the training pipeline. The following is a specific example of three potential conversations you might have on your training set.

Examples of human-assistant conversations (Image by Author).

As I have shown earlier, this is what human-assistant conversations would look like in the training time. These conversations are created by human labelers under strict guidelines. When a labeler is writing the correct answer for the assistant in each one of these cases either they know this person or they research them on the internet. After that, they write the assistant response that has a confident tone of an answer.

At test time, if the model is asked about an individual it has not seen during training, it does not simply respond with an acknowledgment of ignorance. Simply put it does not reply with “Oh, I don’t know”. Instead, the model statistically imitates the training set.

In the training set, the questions in the form “Who is X?” are confidently answered with the correct answer. Therefore at the test time, the model replies with the style of the answer and it gives the statistically most likely guess. So it just makes stuff up that is statistically consistent with the style of the answer in its training set.

Our question now is how to mitigate the hallucinations. It is evident that our dataset should include examples where the correct answer for the assistant is that the model does not know about some particular fact. However, these answers must be produced only in instances where the model actually does not know. So the key question is how do we know what the model knows and what it does not? We need to probe the model to figure that out empirically.

The task is to figure out the boundary of the model’s knowledge. Therefore, we need to interrogate the model to figure out what it knows and doesn’t know. Then we can add examples to the training set for the things that the model doesn’t know. The correct response, in such cases, is that the model does not know them.

An example of a training instance where the model doesn’t know the answer to a particular question.

Let’s take a look at how Meta dealt with hallucinations using this concept for the Llama 3 series of models.

In their 2024 paper titled “The Llama 3 Herd of Models”, Touvron et al. describe how they have developed a knowledge-probing technique to achieve this. Their primary approach involves generating data that aligns model generations with subsets of factual data present in the pre-training data. They describe the following procedure for the data generation process:

Extract a data snippet from the pre-training data. Generate a factual question about these snippets (context) by prompting Llama 3. Sample responses from Llama 3 to the question. Score the correctness of the generations using the original context as a reference and Llama 3 as a judge. Score the informativeness of the generations using Llama 3 as a judge. Generate a refusal for responses which are consistently informative and incorrect across the generations, using Llama 3. (p. 27).

After that data generated from the knowledge probe is used to encourage the model to only answer the questions for which it knows about, and refrain from answering questions that it is unsure about. Implementing this technique has improved the hallucination issue over time.

We have enhanced mitigation strategies than just saying we do not know. We can provide the LLM with an opportunity to generate factual responses and accurately address the question. What would you do, in a case where I ask you a factual question that you don’t have an answer to? How do you answer the question? You could do some research and search the internet to figure out the answer to the question. Then tell me the answer to the question. We can do the same thing with LLMs.

You can think of the knowledge inside the parameters of the trained neural network as a vague recollection of things that the model has seen during pretraining a long time ago. Knowledge in the model parameters is analogous to something in your memory that you read a month ago. You can remember things that you read continuously over time than something you read rarely. If you don’t have a good recollection of information that you read, what you do is go and look it up. When you look up information, you are essentially refreshing your working memory with information, allowing you to retrieve and discuss it.

We need some equivalent mechanism to allow the model to refresh its memory or recollection of information. We can achieve this by introducing tools for the model. The model can use web search tools instead of just replying with “I am sorry, I don’t know the answer”. To achieve this we need to introduce special tokens, such as and along with a protocol that defines how the model is allowed to use these tokens. In this mechanism, the language model can emit special tokens. Now in a case where the model doesn’t know the answer, it has the option to emit the special token instead of replying with “I am sorry, I don’t know the answer”. After that, the model will emit the query and .

Here when the program that is sampling from the model encounters the special token during inference, it will pause the generation process instead of sampling the next token in the sequence. It will initiate a session with the search engine, input the search query into the search engine, and retrieve all the extracted text from the results. Then it will insert that text inside the context window.

The extracted text from the web search is now within the context window that will be fed into the neural network. Think of the context window as the working memory of the model. The data inside the context window is directly accessible by the model. It is directly fed into the neural network. Therefore it is no longer a vague recollection of information. Now, when sampling new tokens, it can very easily reference the data that has been copy-pasted there. Thus, this is a general overview of how these web search tools function.

An example of a training instance with special tokens. The […] notation indicates the placeholder for the extracted content.

How can we teach the model to correctly use these tools like web search? Again we accomplish this through training sets. We now need enough data and numerous conversations that demonstrate, by example, how the model should use web search. We need to illustrate with examples aspects such as: “What are the settings where you are using the search? What does it look like? How do you start a search?” Because of the pretraining stage, it possesses a native understanding of what a web search is and what constitutes a good search query. Therefore, if your training set contains several thousand examples, the model will be able to understand clearly how the tool works.

Large language model hallucinations are inherent consequences of the training pipeline, particularly arising from the supervised fine-tuning stage. Since language models are designed to generate statistically probable text, they often produce responses that appear plausible but lack a factual basis.

Early models were prone to hallucinations significantly. However, the problem has improved with the implementation of various mitigation strategies. Knowledge probing techniques and training the model to use web search tools have been proven effective in mitigating the problem. Despite these improvements, completely eliminating hallucinations remains an ongoing challenge. As LLMs continue to evolve, mitigating hallucinations to a large extent is crucial to ensuring their reliability as a trustworthy knowledge base.

If you enjoyed this article, connect with me on X (formerly Twitter) for more insights.

Un jeune Chinois, connu sous le nom de famille Liu, a été victime d’une fraude. Il pensait être en couple avec une femme nommée Mme Jiao. Liu a envoyé......

India has been aiming to develop its frontier AI model to serve the country’s vast population in their native language. However, this approach has man......

IBM Granite 3.2 adds Enhanced Reasoning to its AI mix

In its latest addition to its Granite family of large language models (LLMs), IBM has unveiled Granite [website] This new release focuses on delivering small, efficient, practical artificial intelligence (AI) solutions for businesses.

IBM has continued to modification its Granite LLMs line at a rapid rate. Its last release, Granite [website], appeared at the end of 2024. That version was essentially an modification. This new model, however, adds experimental chain-of-thought (CoT) reasoning capabilities to its bag of tricks.

Also: Most US workers don't use AI at work yet. This study indicates a reason why.

CoT reasoning is an advanced AI technique that enables LLMs to break down complex problems into logical steps. This process is meant to imitate human-like reasoning processes. In theory, this approach significantly enhances an LLM's ability to handle tasks requiring multi-step reasoning, calculation, and decision-making.

In particular, IBM CoT uses a Thought Preference Optimization framework that enhances reasoning across a broad spectrum of instruction-following tasks. Unlike traditional reinforcement learning approaches focused mainly on logic-driven tasks, TPO allows for improved reasoning performance without sacrificing general task effectiveness. This approach helps mitigate common performance trade-offs seen in other models that specialize in reasoning.

So, what does this advance mean for you and me? IBM explained that if you think about giving an AI chatbot a prompt, a process called "prompt chaining", you get a specific answer. For example, with prompt chaining the question "What color is the sky?", you should get the answer "Blue."

"However, if asked to explain 'Why is the sky blue?' using CoT prompting, the AI would first define what 'blue' means (a primary color), then deduce that the sky appears blue due to the absorption of other colors by the atmosphere. This response demonstrates the AI's ability to construct a logical argument," or the appearance that the LLM is reasoning its way to an answer.

Also: 15 ways AI has saved me time at work - and how I plan to use it now.

CoT is available in the Granite 8B and 2B versions. Developers can toggle reasoning on or off programmatically. This option enables businesses to optimize computational resources based on task complexity. After all, sometimes you want to know what the sky is like without any scientific details. This approach, IBM asserts, enables the 8B model to rival the performance of much larger models, such as Claude [website] Sonnet and GPT-4o on complex mathematical reasoning tasks.

IBM has also introduced a new two-billion-parameter Vision Language Model (VLM), specifically designed for document-understanding tasks. This development is not, as you might first think, a graphics function. Instead, the VLM is meant to improve Granite's document-understanding abilities. IBM used its open-source Docling toolkit to process 85 million PDFs and generated 26 million synthetic question-answer pairs to enhance the VLM's ability to handle complex document-heavy workflows.

While other AI companies appear to swerve safety issues, IBM still considers safety a top-of-mind function. Granite Guardian [website], the latest in IBM's suite of AI safety models, offers enhanced risk detection in prompts and responses. This updated version maintains performance while reducing model size by 30%, introducing a new "verbalized confidence" feature for more nuanced risk assessment.

Also: OpenAI finally unveils [website] Here's what it can do.

Businesses may also be interested in Granite's advanced forecasting capabilities. The new TinyTimeMixers (TTM) models with sub-10M parameters can run long-term forecasting up to two years into the future. These models are useful for trend analysis in finance, economics, and supply chain management. These models might not help you assemble your fantasy baseball team roster yet, but give them time.

As before, IBM is the most open-source friendly AI organization. All Granite [website] models are available under the Apache [website] license on Hugging Face. Some models are available on platforms, including IBM [website], Ollama, Replicate, and LM Studio. This open approach aligns with IBM's strategy to make AI more accessible and cost-effective for enterprises.

As Sriram Raghavan, IBM AI research VP, emphasized: "The next era of AI is about efficiency, integration, and real-world impact -- where enterprises can achieve powerful outcomes without excessive spend on compute."

Global technology conglomerate Honeywell unveiled a new Digital Holographic Microscopy technology that uses AI to streamline medical diagnostics, enab......

Microsoft has expanded its Copilot AI to Mac clients. On Thursday, the official Copilot app landed in the Mac App Store in the US, Canada, and th......

Market Impact Analysis

Market Growth Trend

2018	2019	2020	2021	2022	2023	2024
23.1%	27.8%	29.2%	32.4%	34.2%	35.2%	35.6%

Quarterly Growth Rate

Q1 2024	Q2 2024	Q3 2024	Q4 2024
32.5%	34.8%	36.2%	35.6%

Market Segments and Growth Drivers

Segment	Market Share	Growth Rate
Machine Learning	29%	38.4%
Computer Vision	18%	35.7%
Natural Language Processing	24%	41.5%
Robotics	15%	22.3%
Other AI Technologies	14%	31.8%

Technology Maturity Curve

Different technologies within the ecosystem are at varying stages of maturity:

Competitive Landscape Analysis

Company	Market Share
Google AI	18.3%
Microsoft AI	15.7%
IBM Watson	11.2%
Amazon AI	9.8%
OpenAI	8.4%

Future Outlook and Predictions

The Debugging Dreaded Unraveling landscape is evolving rapidly, driven by technological advancements, changing threat vectors, and shifting business requirements. Based on current trends and expert analyses, we can anticipate several significant developments across different time horizons:

Year-by-Year Technology Evolution

Based on current trajectory and expert analyses, we can project the following development timeline:

2024Early adopters begin implementing specialized solutions with measurable results

2025Industry standards emerging to facilitate broader adoption and integration

2026Mainstream adoption begins as technical barriers are addressed

2027Integration with adjacent technologies creates new capabilities

2028Business models transform as capabilities mature

2029Technology becomes embedded in core infrastructure and processes

2030New paradigms emerge as the technology reaches full maturity

Technology Maturity Curve

Different technologies within the ecosystem are at varying stages of maturity, influencing adoption timelines and investment priorities:

(Interactive diagram available in full report)

Innovation Trigger

Generative AI for specialized domains
Blockchain for supply chain verification

Peak of Inflated Expectations

Digital twins for business processes
Quantum-resistant cryptography

Trough of Disillusionment

Consumer AR/VR applications
General-purpose blockchain

Slope of Enlightenment

AI-driven analytics
Edge computing

Plateau of Productivity

Cloud infrastructure
Mobile applications

Technology Evolution Timeline

1-2 Years

Improved generative models
specialized AI applications

3-5 Years

AI-human collaboration systems
multimodal AI platforms

5+ Years

General AI capabilities
AI-driven scientific breakthroughs

Expert Perspectives

Leading experts in the ai tech sector provide diverse perspectives on how the landscape will evolve over the coming years:

"The next frontier is AI systems that can reason across modalities and domains with minimal human guidance."
— AI Researcher

"Organizations that develop effective AI governance frameworks will gain competitive advantage."
— Industry Analyst

"The AI talent gap remains a critical barrier to implementation for most enterprises."
— Chief AI Officer

Areas of Expert Consensus

Acceleration of Innovation: The pace of technological evolution will continue to increase
Practical Integration: Focus will shift from proof-of-concept to operational deployment
Human-Technology Partnership: Most effective implementations will optimize human-machine collaboration
Regulatory Influence: Regulatory frameworks will increasingly shape technology development

Short-Term Outlook (1-2 Years)

In the immediate future, organizations will focus on implementing and optimizing currently available technologies to address pressing ai tech challenges:

Improved generative models
specialized AI applications
enhanced AI ethics frameworks

These developments will be characterized by incremental improvements to existing frameworks rather than revolutionary changes, with emphasis on practical deployment and measurable outcomes.

Mid-Term Outlook (3-5 Years)

As technologies mature and organizations adapt, more substantial transformations will emerge in how security is approached and implemented:

AI-human collaboration systems
multimodal AI platforms
democratized AI development

This period will see significant changes in security architecture and operational models, with increasing automation and integration between previously siloed security functions. Organizations will shift from reactive to proactive security postures.

Long-Term Outlook (5+ Years)

Looking further ahead, more fundamental shifts will reshape how cybersecurity is conceptualized and implemented across digital ecosystems:

General AI capabilities
AI-driven scientific breakthroughs
new computing paradigms

These long-term developments will likely require significant technical breakthroughs, new regulatory frameworks, and evolution in how organizations approach security as a fundamental business function rather than a technical discipline.

Key Risk Factors and Uncertainties

Several critical factors could significantly impact the trajectory of ai tech evolution:

Ethical concerns about AI decision-making

Data privacy regulations

Algorithm bias

Organizations should monitor these factors closely and develop contingency strategies to mitigate potential negative impacts on technology implementation timelines.

Alternative Future Scenarios

The evolution of technology can follow different paths depending on various factors including regulatory developments, investment trends, technological breakthroughs, and market adoption. We analyze three potential scenarios:

Optimistic Scenario

Responsible AI driving innovation while minimizing societal disruption

Key Drivers: Supportive regulatory environment, significant research breakthroughs, strong market incentives, and rapid user adoption.

Probability: 25-30%

Base Case Scenario

Incremental adoption with mixed societal impacts and ongoing ethical challenges

Key Drivers: Balanced regulatory approach, steady technological progress, and selective implementation based on clear ROI.

Probability: 50-60%

Conservative Scenario

Technical and ethical barriers creating significant implementation challenges

Key Drivers: Restrictive regulations, technical limitations, implementation challenges, and risk-averse organizational cultures.

Probability: 15-20%

Scenario Comparison Matrix

Factor	Optimistic	Base Case	Conservative
Implementation Timeline	Accelerated	Steady	Delayed
Market Adoption	Widespread	Selective	Limited
Technology Evolution	Rapid	Progressive	Incremental
Regulatory Environment	Supportive	Balanced	Restrictive
Business Impact	Transformative	Significant	Modest

Transformational Impact

Redefinition of knowledge work, automation of creative processes. This evolution will necessitate significant changes in organizational structures, talent development, and strategic planning processes.

The convergence of multiple technological trends—including artificial intelligence, quantum computing, and ubiquitous connectivity—will create both unprecedented security challenges and innovative defensive capabilities.

Implementation Challenges

Ethical concerns, computing resource limitations, talent shortages. Organizations will need to develop comprehensive change management strategies to successfully navigate these transitions.

Regulatory uncertainty, particularly around emerging technologies like AI in security applications, will require flexible security architectures that can adapt to evolving compliance requirements.

Key Innovations to Watch

Multimodal learning, resource-efficient AI, transparent decision systems. Organizations should monitor these developments closely to maintain competitive advantages and effective security postures.

Strategic investments in research partnerships, technology pilots, and talent development will position forward-thinking organizations to leverage these innovations early in their development cycle.

Technical Glossary

Key technical terms and definitions to help understand the technologies discussed in this article.

Understanding the following technical concepts is essential for grasping the full implications of the security threats and defensive measures discussed in this article. These definitions provide context for both technical and non-technical readers.

platform intermediate

algorithm Platforms provide standardized environments that reduce development complexity and enable ecosystem growth through shared functionality and integration capabilities.

machine learning intermediate

interface

API beginner

platform APIs serve as the connective tissue in modern software architectures, enabling different applications and services to communicate and share data according to defined protocols and data formats.

How APIs enable communication between different software systems

Example: Cloud service providers like AWS, Google Cloud, and Azure offer extensive APIs that allow organizations to programmatically provision and manage infrastructure and services.

synthetic data intermediate

encryption

interface intermediate

API Well-designed interfaces abstract underlying complexity while providing clearly defined methods for interaction between different system components.

deep learning intermediate

cloud computing

large language model intermediate

middleware

reinforcement learning intermediate

scalability

neural network intermediate

DevOps

algorithm intermediate

microservices

IBM Granite 3.2 adds Enhanced Reasoning to its AI mix - Related to debugging, ai, adds, dreaded, language

Debugging the Dreaded NaN

SHARE

Unraveling Large Language Model Hallucinations

SHARE

IBM Granite 3.2 adds Enhanced Reasoning to its AI mix

SHARE

Market Impact Analysis

Market Growth Trend

Quarterly Growth Rate

Market Segments and Growth Drivers

Technology Maturity Curve

Competitive Landscape Analysis

Future Outlook and Predictions

Year-by-Year Technology Evolution

Technology Maturity Curve

Innovation Trigger

Peak of Inflated Expectations

Trough of Disillusionment

Slope of Enlightenment

Plateau of Productivity

Technology Evolution Timeline

Expert Perspectives

Areas of Expert Consensus

Short-Term Outlook (1-2 Years)

Mid-Term Outlook (3-5 Years)

Long-Term Outlook (5+ Years)

Key Risk Factors and Uncertainties

Alternative Future Scenarios

Optimistic Scenario

Base Case Scenario

Conservative Scenario

Scenario Comparison Matrix

Transformational Impact

Implementation Challenges

Key Innovations to Watch

Technical Glossary

platform intermediate

Related Terms

machine learning intermediate

API beginner

Related Terms

synthetic data intermediate

interface intermediate

Related Terms

deep learning intermediate

large language model intermediate

reinforcement learning intermediate

neural network intermediate

algorithm intermediate

Related Articles

Microsoft poursuit des développeurs de deepfakes pour contourner ses garde-fous d’IA - Related to il, garde-fous, 26, des, de

Plus Amazon Dévoile: Latest Updates and Analysis

Microsoft's Copilot AI now has a Mac app - here's what you'll need to run it - Related to learning,, science, app, work:, alphago