June 2025

The Rise of Private LLMs

Bring the Model to Your Data

After a year of blockbuster public-AI launches, many CIOs are discovering that the real competitive edge comes from running large language models inside their own perimeter.

Private LLMs shrink latency, cut per-token costs at scale, and—crucially—keep regulated or proprietary data where auditors expect it.

Open-weight families such as Meta’s Llama 3 and Microsoft’s Phi-3 show that high-quality models can now run on a commodity GPU cluster or even a beefy workstation, while frameworks like NVIDIA NeMo add enterprise-grade guardrails.

Add looming rules such as the EU AI Act, and the “bring the model to the data” pattern is fast moving from R&D to roadmap.

introducing-meta-llama-3

Introducing Meta Llama 3: The most capable openly available LLM to date

By Meta

Meta AI, built with Llama 3 technology, is now one of the world’s leading AI assistants that can boost your intelligence and lighten your load—helping you learn, get things done, create content, and connect to make the most out of every moment.

The release of Llama 3 features pre-trained and instruction-fine-tuned language models with 8B and 70B parameters that can support a broad range of use cases.

This next generation of Llama demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

In support of Meta’s longstanding open approach, Meta has put Llama 3 in the hands of the community.

Private AI to Redefine Enterprise Data Privacy and Compliance with NVIDIA

By Patricia Graciano

Private AI, a leader in privacy-preserving artificial intelligence, integrates with NVIDIA NeMo Guardrails to bring advanced privacy and compliance capabilities to enterprises leveraging large language models (LLMs), enabling them to unlock the potential of generative AI while safeguarding sensitive data.

Generative AI offers transformative potential but poses challenges with unstructured data containing PII, PHI, or proprietary insights. While regulations like HIPAA mandate specific data best practices for healthcare organizations, these requirements can serve as valuable guidelines for any organization seeking to mitigate risks when working with sensitive data.

phi-3-small-language-models

Tiny But Mighty: The Phi-3 small language models with big potential

By Sally Beaty

Inspired by children’s books, Microsoft researchers developed a new way to train small language models using simple vocabulary.

This led to the creation of the Phi-3 family—compact, efficient AI models that retain strong performance.

The first model, Phi-3-mini, outperforms larger models and is now available on Azure, Hugging Face, Ollama, and NVIDIA NIM. Larger versions, Phi-3-small and Phi-3-medium, will be released soon.

keeping-things-local-making-your-own-llm

Keeping Things Local – Making Your Own Private LLM

By Bronwen “Corvus” Aker

This video explains how to give individuals and small organizations control over their own large language models (LLMs).

It highlights key technical components such as model pruning to reduce size, open‐source tools like LoRA and QLoRA for fine‑tuning, and local GPU optimization techniques.

It emphasizes privacy and autonomy by demonstrating how users can deploy and run models entirely offline—on personal hardware like laptops or cheap cloud instances—with minimal performance loss, making powerful AI capabilities accessible without reliance on big tech providers.

MangoChango’s ability to deliver unquestionable value to its clients is highly dependent on keeping abreast of new technologies and trends. Our clients value this commitment to leading-edge thinking and expertise.

MangoChango’s engineers are experts in a wide variety of technologies, frameworks, tools, and languages, with an emphasis on continuous learning as new thinking, tools, and techniques come to market.

Check here for more information and to explore our technology assessment and maturity framework.