Model Card Template: Structured Documentation for AI Systems
Structured template for documenting AI model capabilities, limitations, intended use, and evaluation results. Based on the Model Cards framework.
You are deploying or procuring an AI model and need a structured way to document what it does, where it works well, where it fails, and what risks come with using it. A model card is the standard tool for this job.
What is a model card
A model card is a structured document that accompanies an AI model, describing its intended use, capabilities, limitations, evaluation results, and ethical considerations. The concept was introduced by Mitchell et al. (2019) at Google, drawing on precedents in nutrition labels, electronics datasheets, and financial disclosures.
A model card is not marketing material. It is a technical disclosure document designed to help three audiences make informed decisions:
- **Developers** who integrate the model into applications need to understand its capabilities and failure modes
- **Decision-makers** who approve AI deployments need to understand risks, limitations, and compliance implications
- **Affected stakeholders** who are subject to the model's outputs deserve transparency about how the system works
Why teams need model cards
Without structured documentation, knowledge about a model's behavior lives in the heads of the people who built it. When those people change teams, leave the organization, or simply forget, critical information about limitations, failure modes, and design trade-offs is lost.
Regulatory requirements are making model documentation mandatory. The EU AI Act Annex IV requires technical documentation for high-risk AI systems that covers many of the same elements as a model card. The NIST AI Risk Management Framework recommends documented AI system profiles. Organizations that build the documentation habit now will be better prepared for compliance obligations.
Template sections
A model card should include the following eight sections. Each section is described below with guidance on what to include.
Section 1: Model Details
Basic identifying information about the model.
| Field | Description | Example |
|---|---|---|
| Model name | Official name and version | CustomerAssist v2.3 |
| Model type | Architecture or approach | Fine-tuned transformer (based on Llama 3 70B) |
| Developer | Organization or team that built or fine-tuned the model | Acme Corp AI Team |
| Release date | When this version was deployed | 2026-03-15 |
| License | Terms of use | Internal use only |
| Contact | Who to reach for questions | ai-team@acme.example.com |
| Model version history | Previous versions and key changes | v2.2 (2026-01-10): updated training data; v2.1 (2025-11-01): initial deployment |
Section 2: Intended Use
What the model is for and, equally important, what it is not for.
| Field | Description |
|---|---|
| Primary intended use | The specific task or tasks the model was designed to perform |
| Primary intended users | Who should be using this model (internal teams, end users, other systems) |
| Out-of-scope uses | Tasks or contexts where the model should not be used, even if it appears to work |
Be specific. "General-purpose language understanding" is not a useful intended use statement. "Classifying customer support tickets into 12 predefined categories for routing" is.
Section 3: Factors
Characteristics of the operating environment that affect model performance.
| Field | Description |
|---|---|
| Relevant factors | Groups, instruments, or environments that influence performance (demographics, languages, device types, data formats) |
| Evaluation factors | Which of these factors were specifically tested during evaluation |
This section is where you document that the model was tested on English-language inputs only, or that evaluation data came from one geographic region, or that performance varies by user demographic.
Section 4: Metrics
How model performance is measured.
| Field | Description |
|---|---|
| Performance measures | Which metrics are used and why (accuracy, F1, precision, recall, latency, fairness metrics) |
| Decision thresholds | What confidence thresholds are used and how they were chosen |
| Variation approaches | How performance variation across subgroups is measured |
Select metrics that are relevant to the intended use. A classification model needs precision and recall. A generative model needs factual accuracy and harmlessness measures. A model affecting people needs fairness metrics.
Section 5: Evaluation Data
Information about the data used to test the model.
| Field | Description |
|---|---|
| Datasets | Names and descriptions of evaluation datasets |
| Motivation | Why these datasets were chosen and what they represent |
| Preprocessing | How evaluation data was cleaned, filtered, or transformed |
Document the gaps. If the evaluation data does not cover a population the model will encounter in production, say so.
Section 6: Training Data
Information about the data used to train the model (to the extent it can be disclosed).
| Field | Description |
|---|---|
| Datasets | Names and descriptions of training datasets, or a summary of data sources |
| Motivation | Why these datasets were chosen |
| Preprocessing | How training data was cleaned, filtered, or transformed |
| Known gaps | Populations, languages, or scenarios underrepresented in training data |
For proprietary or vendor-provided models where you do not have full training data visibility, document what the vendor has disclosed and note the gaps in your knowledge.
Section 7: Ethical Considerations
Risks, harms, and sensitive use cases.
| Field | Description |
|---|---|
| Sensitive use cases | Where the model's errors could cause harm to individuals or groups |
| Known risks | Documented bias, fairness concerns, or failure modes that could cause harm |
| Mitigation strategies | What has been done to address identified risks |
| Unresolved concerns | Risks that have been identified but not yet addressed |
This is the most important section for decision-makers. Be honest about what you know, what you do not know, and what you have chosen to accept.
Section 8: Caveats and Recommendations
Practical guidance for anyone using the model.
| Field | Description |
|---|---|
| Known limitations | Conditions under which the model is expected to perform poorly |
| Deployment recommendations | How to deploy the model safely (monitoring, human oversight, usage limits) |
| Maintenance requirements | How often the model should be re-evaluated, and what triggers a review |
Pre-filled example: Customer service chatbot
Below is a condensed example showing how the template applies to a customer service chatbot. In practice, each section would contain more detail.
| Section | Content |
|---|---|
| **Model Details** | CustomerAssist v2.3. Fine-tuned Llama 3 70B. Released 2026-03-15. Internal deployment only. |
| **Intended Use** | Answer customer questions about product features, pricing, and order status using a verified knowledge base. NOT intended for: medical, legal, or financial advice; handling complaints that require human empathy; any decision with financial consequence to the customer. |
| **Factors** | Evaluated on English-language inputs only. Performance tested across product categories (electronics, clothing, home goods). Not tested on non-English inputs, regional dialects, or accessibility tool interactions. |
| **Metrics** | Factual accuracy: 94.2% on verified knowledge base questions. Hallucination rate: 3.1% on out-of-knowledge-base questions. Response latency: p95 under 2 seconds. Customer satisfaction (post-chat survey): 4.1/5. |
| **Evaluation Data** | 5,000 customer questions sampled from support logs (January to February 2026). Stratified by product category and question type. Does not include adversarial or edge-case inputs. |
| **Training Data** | Fine-tuned on 50,000 verified Q&A pairs from the product knowledge base, 10,000 historical support transcripts (PII redacted), and 2,000 manually written examples for edge cases. Knowledge base covers products sold from 2024 onward; older product questions are out of scope. |
| **Ethical Considerations** | Risk of hallucination on questions outside the knowledge base. Could provide incorrect pricing if the knowledge base is not updated promptly. Customers may not realize they are interacting with an AI system (disclosure is displayed but may be missed). No demographic bias testing has been conducted on response quality. |
| **Caveats** | Accuracy degrades on questions about products not in the knowledge base. Should not be deployed without the RAG retrieval pipeline and the escalation-to-human fallback. Re-evaluate after any knowledge base update exceeding 500 entries. |
When to create and update model cards
| Event | Action |
|---|---|
| New model development | Create the model card during development, before deployment |
| Model procurement from a vendor | Request the vendor's model card; create your own supplementary card documenting your specific deployment context |
| Model version update | Update the model card with new evaluation results, changed capabilities, and any new limitations |
| Change in intended use | Update intended use and out-of-scope sections; re-evaluate and document |
| Incident involving the model | Update ethical considerations and caveats with the incident details and any changes made |
| Annual review | Review and update all sections even if no changes have occurred, to confirm the documentation remains accurate |
Decision checklist
Before considering a model card complete, confirm:
- [ ] All eight sections are filled in with specific, verifiable information (not placeholder text)
- [ ] Intended use includes clear out-of-scope uses, not just what the model is for
- [ ] Limitations are described in concrete terms, not vague disclaimers
- [ ] Evaluation results are disaggregated by relevant subgroups where possible
- [ ] Ethical considerations include both known risks and unresolved concerns
- [ ] The card has been reviewed by someone who did not build the model
- [ ] A maintenance schedule is defined (when the card will next be reviewed)
- [ ] The card is stored where all relevant stakeholders can access it
Key takeaways
- A model card is a disclosure document, not a sales pitch. Its value comes from honest documentation of limitations and risks, not from presenting the model in the best light.
- The EU AI Act Annex IV technical documentation requirements overlap significantly with model card content. Building model cards now prepares your organization for regulatory compliance.
- Model cards should be created during development, not after deployment. Retroactive documentation is less complete and less accurate.
- The most useful model cards are specific. "The model may produce inaccurate outputs" tells readers nothing. "The model hallucinates at a rate of 3.1% on out-of-knowledge-base questions, most commonly by fabricating product features" tells them what to watch for.
- A model card is a living document. Update it when the model changes, when the deployment context changes, and when new information about the model's behavior becomes available.
Sources
- [1]Mitchell, M. et al. - Model Cards for Model Reporting (2019)Independent Review
- [2]Google - Model CardsVendor Claim
- [3]Hugging Face - Model Card GuidebookIndependent Review
- [4]NIST AI Risk Management Framework (AI RMF 1.0)Primary Source
- [5]
Related
AI Vendor Due Diligence
30 questions in five categories for evaluating AI vendors.
AI Risk Register Template
A structured framework for tracking AI risks across your organization.
Hallucination Risk
Patterns, detection methods, and mitigation strategies.
Data Leakage Risk
Five documented patterns and mitigation strategies for AI data leakage.