ThinkTech
Tool10 min read

Model Card Template: Structured Documentation for AI Systems

Structured template for documenting AI model capabilities, limitations, intended use, and evaluation results. Based on the Model Cards framework.

By ThinkTech Research|Published April 11, 2026

You are deploying or procuring an AI model and need a structured way to document what it does, where it works well, where it fails, and what risks come with using it. A model card is the standard tool for this job.

What is a model card

A model card is a structured document that accompanies an AI model, describing its intended use, capabilities, limitations, evaluation results, and ethical considerations. The concept was introduced by Mitchell et al. (2019) at Google, drawing on precedents in nutrition labels, electronics datasheets, and financial disclosures.

A model card is not marketing material. It is a technical disclosure document designed to help three audiences make informed decisions:

  • **Developers** who integrate the model into applications need to understand its capabilities and failure modes
  • **Decision-makers** who approve AI deployments need to understand risks, limitations, and compliance implications
  • **Affected stakeholders** who are subject to the model's outputs deserve transparency about how the system works

Why teams need model cards

Without structured documentation, knowledge about a model's behavior lives in the heads of the people who built it. When those people change teams, leave the organization, or simply forget, critical information about limitations, failure modes, and design trade-offs is lost.

Regulatory requirements are making model documentation mandatory. The EU AI Act Annex IV requires technical documentation for high-risk AI systems that covers many of the same elements as a model card. The NIST AI Risk Management Framework recommends documented AI system profiles. Organizations that build the documentation habit now will be better prepared for compliance obligations.

Template sections

A model card should include the following eight sections. Each section is described below with guidance on what to include.

Section 1: Model Details

Basic identifying information about the model.

FieldDescriptionExample
Model nameOfficial name and versionCustomerAssist v2.3
Model typeArchitecture or approachFine-tuned transformer (based on Llama 3 70B)
DeveloperOrganization or team that built or fine-tuned the modelAcme Corp AI Team
Release dateWhen this version was deployed2026-03-15
LicenseTerms of useInternal use only
ContactWho to reach for questionsai-team@acme.example.com
Model version historyPrevious versions and key changesv2.2 (2026-01-10): updated training data; v2.1 (2025-11-01): initial deployment

Section 2: Intended Use

What the model is for and, equally important, what it is not for.

FieldDescription
Primary intended useThe specific task or tasks the model was designed to perform
Primary intended usersWho should be using this model (internal teams, end users, other systems)
Out-of-scope usesTasks or contexts where the model should not be used, even if it appears to work

Be specific. "General-purpose language understanding" is not a useful intended use statement. "Classifying customer support tickets into 12 predefined categories for routing" is.

Section 3: Factors

Characteristics of the operating environment that affect model performance.

FieldDescription
Relevant factorsGroups, instruments, or environments that influence performance (demographics, languages, device types, data formats)
Evaluation factorsWhich of these factors were specifically tested during evaluation

This section is where you document that the model was tested on English-language inputs only, or that evaluation data came from one geographic region, or that performance varies by user demographic.

Section 4: Metrics

How model performance is measured.

FieldDescription
Performance measuresWhich metrics are used and why (accuracy, F1, precision, recall, latency, fairness metrics)
Decision thresholdsWhat confidence thresholds are used and how they were chosen
Variation approachesHow performance variation across subgroups is measured

Select metrics that are relevant to the intended use. A classification model needs precision and recall. A generative model needs factual accuracy and harmlessness measures. A model affecting people needs fairness metrics.

Section 5: Evaluation Data

Information about the data used to test the model.

FieldDescription
DatasetsNames and descriptions of evaluation datasets
MotivationWhy these datasets were chosen and what they represent
PreprocessingHow evaluation data was cleaned, filtered, or transformed

Document the gaps. If the evaluation data does not cover a population the model will encounter in production, say so.

Section 6: Training Data

Information about the data used to train the model (to the extent it can be disclosed).

FieldDescription
DatasetsNames and descriptions of training datasets, or a summary of data sources
MotivationWhy these datasets were chosen
PreprocessingHow training data was cleaned, filtered, or transformed
Known gapsPopulations, languages, or scenarios underrepresented in training data

For proprietary or vendor-provided models where you do not have full training data visibility, document what the vendor has disclosed and note the gaps in your knowledge.

Section 7: Ethical Considerations

Risks, harms, and sensitive use cases.

FieldDescription
Sensitive use casesWhere the model's errors could cause harm to individuals or groups
Known risksDocumented bias, fairness concerns, or failure modes that could cause harm
Mitigation strategiesWhat has been done to address identified risks
Unresolved concernsRisks that have been identified but not yet addressed

This is the most important section for decision-makers. Be honest about what you know, what you do not know, and what you have chosen to accept.

Section 8: Caveats and Recommendations

Practical guidance for anyone using the model.

FieldDescription
Known limitationsConditions under which the model is expected to perform poorly
Deployment recommendationsHow to deploy the model safely (monitoring, human oversight, usage limits)
Maintenance requirementsHow often the model should be re-evaluated, and what triggers a review

Pre-filled example: Customer service chatbot

Below is a condensed example showing how the template applies to a customer service chatbot. In practice, each section would contain more detail.

SectionContent
**Model Details**CustomerAssist v2.3. Fine-tuned Llama 3 70B. Released 2026-03-15. Internal deployment only.
**Intended Use**Answer customer questions about product features, pricing, and order status using a verified knowledge base. NOT intended for: medical, legal, or financial advice; handling complaints that require human empathy; any decision with financial consequence to the customer.
**Factors**Evaluated on English-language inputs only. Performance tested across product categories (electronics, clothing, home goods). Not tested on non-English inputs, regional dialects, or accessibility tool interactions.
**Metrics**Factual accuracy: 94.2% on verified knowledge base questions. Hallucination rate: 3.1% on out-of-knowledge-base questions. Response latency: p95 under 2 seconds. Customer satisfaction (post-chat survey): 4.1/5.
**Evaluation Data**5,000 customer questions sampled from support logs (January to February 2026). Stratified by product category and question type. Does not include adversarial or edge-case inputs.
**Training Data**Fine-tuned on 50,000 verified Q&A pairs from the product knowledge base, 10,000 historical support transcripts (PII redacted), and 2,000 manually written examples for edge cases. Knowledge base covers products sold from 2024 onward; older product questions are out of scope.
**Ethical Considerations**Risk of hallucination on questions outside the knowledge base. Could provide incorrect pricing if the knowledge base is not updated promptly. Customers may not realize they are interacting with an AI system (disclosure is displayed but may be missed). No demographic bias testing has been conducted on response quality.
**Caveats**Accuracy degrades on questions about products not in the knowledge base. Should not be deployed without the RAG retrieval pipeline and the escalation-to-human fallback. Re-evaluate after any knowledge base update exceeding 500 entries.

When to create and update model cards

EventAction
New model developmentCreate the model card during development, before deployment
Model procurement from a vendorRequest the vendor's model card; create your own supplementary card documenting your specific deployment context
Model version updateUpdate the model card with new evaluation results, changed capabilities, and any new limitations
Change in intended useUpdate intended use and out-of-scope sections; re-evaluate and document
Incident involving the modelUpdate ethical considerations and caveats with the incident details and any changes made
Annual reviewReview and update all sections even if no changes have occurred, to confirm the documentation remains accurate

Decision checklist

Before considering a model card complete, confirm:

  • [ ] All eight sections are filled in with specific, verifiable information (not placeholder text)
  • [ ] Intended use includes clear out-of-scope uses, not just what the model is for
  • [ ] Limitations are described in concrete terms, not vague disclaimers
  • [ ] Evaluation results are disaggregated by relevant subgroups where possible
  • [ ] Ethical considerations include both known risks and unresolved concerns
  • [ ] The card has been reviewed by someone who did not build the model
  • [ ] A maintenance schedule is defined (when the card will next be reviewed)
  • [ ] The card is stored where all relevant stakeholders can access it

Key takeaways

  1. A model card is a disclosure document, not a sales pitch. Its value comes from honest documentation of limitations and risks, not from presenting the model in the best light.
  2. The EU AI Act Annex IV technical documentation requirements overlap significantly with model card content. Building model cards now prepares your organization for regulatory compliance.
  3. Model cards should be created during development, not after deployment. Retroactive documentation is less complete and less accurate.
  4. The most useful model cards are specific. "The model may produce inaccurate outputs" tells readers nothing. "The model hallucinates at a rate of 3.1% on out-of-knowledge-base questions, most commonly by fabricating product features" tells them what to watch for.
  5. A model card is a living document. Update it when the model changes, when the deployment context changes, and when new information about the model's behavior becomes available.

Sources

  1. [1]
  2. [2]
  3. [3]
  4. [4]
  5. [5]

Related