Skip to content
Live12+ production solutions40+ clients deployeddirect + partner
Glossary · AI & Models

What is Large Language Model?

A neural network trained on internet-scale text that produces fluent generative output and powers most of what people call "AI" in 2026 — including on-premises sovereign deployments.

Also known as

LLMllmfoundation modellanguage model
Definition

Large Language Model — explained.

A large language model (LLM) is a transformer-architecture neural network with billions to hundreds of billions of parameters, pre-trained on internet-scale text corpora and (typically) instruction-tuned and aligned via reinforcement learning from human feedback or direct preference optimisation. The serious open-weight landscape in 2026 includes Llama 3.x in 8B and 70B parameter sizes, Mistral 7B Instruct, Mixtral 8x22B (a Mixture-of-Experts model), Qwen 2.5, DeepSeek V2 and V3, and Gemma where its licence permits. Hardware sizing starts from the math: an 8B model in fp16 needs about 16GB of VRAM for weights alone, a 70B about 140GB. Add KV cache (often 20 to 60GB at meaningful concurrency), activation memory, and headroom. Practical answer: one or two NVIDIA H100 80GB GPUs for low-concurrency interactive serving of a 70B, four or more for higher throughput. Quantisation to fp8 or int4 roughly halves or quarters weight memory but quality varies per task — re-run an evaluation harness on the quantised model. Serving is the bottleneck: vLLM is the default for serious multi-user serving (paged attention, continuous batching), TGI is comparable where Hugging Face stack is in use, Ollama is excellent for developer ergonomics on small edge boxes.

Why it matters

Why operators care about large language model.

Every conversation about enterprise AI in 2026 routes through an LLM choice — open-weight or hosted, single model or multiple, on-premises or vendor cloud. Get the choice wrong and you either lock yourself into a hosted vendor with no residency story, or over-spend on hardware for a workload a 7B model would have handled. Get it right and you deploy a sovereign, bilingual, audit-friendly inference platform that pays back over three to five years against the per-token economics of public-cloud LLM APIs.

What to look for in a vendor

Buyer's checklist

  • Multiple, switchable open-weight models supported on the same platform
  • Transparent VRAM and throughput math at your target concurrency
  • Production inference stack (vLLM, TGI, or equivalent) with batching
  • Quantisation support (fp8, int4) with evaluation harness for quality trade-offs
  • On-prem deployment with weights pinned, signed, and operator-owned
  • Bilingual EN + AR generation quality with appropriate instruction prompts
Solutions where large language model applies

Zeour solutions that operate on this layer.

DT Consultation

digital · transformation · consultation

Zeour Digital Transformation Consultation helps companies digitalise their services and operations through three pillars: process automation (workflow engines, RPA, integration platforms that retire repetitive manual work), self-service technologies (customer + employee portals, kiosks, mobile apps, WhatsApp / SMS / IVR channels), and sovereign on-premises AI (open-weight large language models, vision models, voice models, RAG pipelines, and AI-augmented workflows that run entirely on the operator's own hardware — patient data, customer data, and classified material never leave the perimeter). The service stack is the full path from problem to outcome: consulting (digital-maturity assessment, transformation roadmap, business-case modelling, vendor selection), implementation (the build itself, often delivered in partnership with our Enterprise Development team), AI model deployment (open-weight LLMs, fine-tuning, embedding pipelines, on-prem inference infrastructure, GPU sizing), customisation (tailoring deployed AI and automation to your specific operations — prompts, RAG corpora, workflow templates), and training (role-based curricula for executives, operators, and end users, with operations playbooks, runbooks, and train-the-trainer programmes that make your team self-sufficient). The same team that ships our production AI assistant in MediCare (7-mode OpenAI Responses API, evidence-based prompts, audit-logged interactions) is what you engage.

See the solution

Enterprise Dev

enterprise · development · services

Zeour Enterprise Development — we design, build, and operate corporate-grade software for organizations that take their software seriously. Custom web platforms, mobile apps, kiosk fleets, embedded/hardware-coupled systems, real-time services, AI-augmented workflows, system integrations (CRM / ERP / HRIS / payment gateways / BI / national health systems / lab analyzers / payment terminals / card readers / GPIO barriers), legacy modernization, cloud migration, on-premise deployments, DevOps + CI/CD, security hardening, and 24/7 support. Every other solution on this site — MediCare Clinic Management, Smart Parking, GLARUS Queue Management, Wayfinding, Digital Signage, Visitor Management, Online Appointment, Self-Service Kiosks, Customer Feedback — is something our team designed, built, and operates today. The same team is available for your bespoke engagement.

See the solution

MediCare Clinic

medicare · clinic · management · system

Zeour MediCare — the multilingual on-premise clinic and EMR management system for small-to-mid healthcare practices. Covers patients (records, allergies, conditions, medications, body diagrams), appointments + visits with SOAP notes, prescriptions with drug-interaction checks, lab orders + samples + results, billing + payments + invoicing, inventory, expenses, referrals, medical certificates, refill requests, patient communications, telemedicine (WebRTC), an AI clinical assistant (OpenAI-powered with 7 modes), a patient self-service portal, and a full role-based access model across Admin, Doctor, Reception, and Lab Tech roles. Engineered multilingual — (with full RTL) as the production baseline, extensible to any locale — and runs locally on a single server.

See the solution
Related terms

Adjacent definitions to read next.

On-Premises AI

AI & Models

Open-weight large language models running on the operator's own hardware — no prompt, completion, or embedding ever leaves the perimeter.

Open-Weight LLM

AI & Models

A large language model whose trained parameters (weights) are published openly — runnable on the operator's own hardware without API dependency.

Retrieval-Augmented Generation (RAG)

AI & Models

A pattern where the LLM is given relevant excerpts from a knowledge base at query time — so answers come from authoritative source documents, not the model's memory.

Sovereign Deployment

Sovereign Deployment

Software that runs entirely inside the operator's perimeter — their hardware, their network, their backups, their keys — with no third-party dependency for continued operation.

Air-Gapped Deployment

Sovereign Deployment

A system deployed on a network with no physical or logical connection to the public internet — the strictest form of sovereign deployment.

AI Clinical Assistant

Healthcare & Clinical

A side-pane AI in the EMR that summarises history, drafts notes from voice, suggests differential diagnoses, and flags drug interactions.

Arabic Language Model

AI & Models

An open-weight or fine-tuned LLM that handles Modern Standard Arabic and major dialects with appropriate tokenisation efficiency and right-to-left rendering at the application layer.

Context Window

AI & Models

The maximum amount of text an LLM can process in a single request, measured in tokens — caps how much document context can be fed for RAG and long-form analysis.

Want to discuss large language model for your operation?

Talk to a Zeour engineer.

A 30-minute scoping call to walk your operational profile against where large language model actually sits in your stack, then a fixed-fee Discovery price by the end of the call.