RP-2026-0003Research

AICOS: a framework for measuring AI literacy

AI literacy is not a single number. AICOS, the AI Competency Objective Scale, sets out the sub-competencies that constitute it, and how to measure each one without self-report.

Published
May 19, 2026
Reading time
9 min read
Author
Acta Research

There is a temptation, in the discourse about AI in hiring, to treat AI literacy as a single trait the way typing speed is a single trait. Some people have more of it. The interview is to find out how much. That framing has the appeal of operational simplicity. It also has the problem that the trait does not exist.

What exists, instead, is a small bundle of related-but-distinct competencies. Some of them load heavily on one another; others almost not at all. A worker can be highly literate on one and badly under-literate on another, and the composite of "AI literacy" obscures which is which. Measuring the composite without measuring the bundle is the conceptual error this article is about.

The good news is that the literature has spent the better part of five years specifying the bundle. The AI Competency Objective Scale (AICOS) is the most-validated specification of it as of writing.

What AICOS specifies#

AICOS, developed by Markus, Carolus & Wienrich (2025), is a scale that decomposes AI literacy into six sub-competencies, each with its own item bank, its own published reliability statistics, and its own construct-validity evidence against AICOS's predecessor (MAILS, the Meta AI Literacy Scale) and against task performance.

The six sub-competencies are:

  • Know and Understand AI. Conceptual fluency with what current AI systems do: their input/output contracts, their training regimes, their failure modes. Closer to literacy in the most literal sense.
  • Use and Apply AI. Operational competence with AI tools as part of a workflow. Includes prompting, iteration, tool selection.
  • Detect AI. The ability to identify AI-generated content, AI-influenced workflows, and AI-mediated decisions when they are not labeled as such.
  • Evaluate and Create AI. The ability to assess AI output for accuracy, bias, and fit-for-purpose, and to produce AI-mediated work that meets a standard.
  • AI Ethics. Working understanding of the ethical surface area of AI use: confidentiality, attribution, bias, downstream impact.
  • Generative AI Literacy. A sub-competency added to AICOS to specifically cover competence with generative systems: prompting, hallucination handling, output validation in the context of LLMs specifically.

The decomposition is not arbitrary. It was developed by factor analysis on a large validation sample, and the six sub-competencies show internal consistency between α=0.71 and α=0.89, usable to high reliability for a behavioral construct.

AICOS radar, six sub-competencies

Why one number is misleading#

A composite "AI literacy" score is the average of six different competencies. The averaging operation discards exactly the information a hiring decision needs.

Take two candidates with the same overall AICOS score of 72. Candidate A is at 88 on Use and Apply, 86 on Generative AI Literacy, 81 on Know and Understand, 64 on Detect AI, 53 on Evaluate and Create, and 60 on AI Ethics. Candidate B is the inverse: 88 on Detect AI, 86 on Evaluate and Create, 81 on AI Ethics, 64 on Use and Apply, 53 on Generative AI Literacy, 60 on Know and Understand.

Candidate A is a fluent AI user who does not catch fabrications and does not flag downstream ethics concerns. Candidate B is a sharp critic of AI output who is operationally slow with the tools. Both have an AICOS of 72.

For most hiring decisions, these are not interchangeable. A financial-services compliance role would unequivocally prefer Candidate B. A product-marketing role at a startup would unequivocally prefer Candidate A. A senior PM role might prefer a mix neither candidate offers and rule out both.

The composite is a credential. The radar is a decision. Acta surfaces the radar by design.

AI literacy is not a single trait. It is a small bundle of competencies that load weakly on one another. The radar is the decision; the composite is a credential.

, The conceptual moveActa · 2026

What AICOS is not: what Acta adds#

AICOS is a validated self-report-plus-knowledge-test scale. It measures what a candidate knows and reports about their own AI use. It is not a behavioral measurement of what a candidate does under task conditions.

That gap is intentional. AICOS was designed as a scale, not as an assessment, and its developers are explicit that behavioral validation against task performance is the next frontier. The scale gives you the conceptual map. It does not give you the test.

Acta's contribution to that gap is to take the AICOS sub-competency map and operationalize it against realistic, demanding work. That is: to measure each sub-competency through a behavioral signal generated under task conditions, rather than through a self-report inventory. The mapping is direct:

  • Detect AI is measured by whether the candidate catches the mistakes that matter when the AI gets something wrong.
  • Evaluate and Create AI is measured by the candidate's accept-reject decisions on every AI output, plus the quality of any re-prompts they issue.
  • Use and Apply AI is measured by prompt specificity, iteration efficiency, and tool-appropriateness.
  • AI Ethics is measured by how the candidate handles the ethics-sensitive moments in the work (for example, an AI claim that would not survive a compliance check) and contributes to the ethics-override composite reported separately.
  • Know and Understand AI is the one sub-competency Acta inherits from AICOS short-form during the ValidationArtifact retest pass, because it is a knowledge claim, not a behavioral claim.
  • Generative AI Literacy draws on the others, leaning on Use and Apply and Detect AI.

The composite Acta reports, the Acta score, draws together how the candidate performed across the sub-competencies in the AICOS map. The radar is the candidate's profile against each axis.

How we validate that the test measures the thing#

A scale, on its own, is not enough. A scale that has not been validated against the construct it claims to measure is a list of items. The validation work is what makes the scale a measurement.

AICOS itself was validated through four steps in the original paper: factor analysis to confirm the six-sub-competency structure, internal-consistency testing per sub-scale, test-retest reliability over a two-week interval (r=0.79 across the full scale), and convergent validity against MAILS and against a knowledge test of AI fundamentals.

An assessment built on top of AICOS has to do the same work for its own predictions. The Acta ValidationArtifact table, described in detail on the methodology page, collects four things on every session that contributes to validation:

  1. AICOS short-form scores for the candidate, gathered at consent. These let us correlate the Acta behavioral signal with the published AICOS instrument across cohorts.
  2. MAILS scores for a sub-sample, used to validate convergent construct validity with the AICOS predecessor scale.
  3. Test-retest data on the small fraction of candidates who run the same scenario twice, internal estimates of test-retest reliability at the Acta-score level and at the sub-metric level.
  4. Protected-class self-identification under New York City Local Law 144, supporting four-fifths bias audits and the broader fairness-of-prediction question that any hiring tool now has to answer.

The point of collecting all four is not credentialing. It is that we will be wrong about some claim our scoring makes, and the only way to find out which claim is by running the audits (across cohorts, across scenarios, across time) and then walking the scoring function back until the claim holds. The platform that cannot run those audits cannot improve its scoring without guessing.

The shape of the answer, six months on#

What does an AICOS-anchored Acta score look like in practice?

The most common pattern we see in early data is a fully-saturated Use and Apply axis with a substantially weaker Detect AI axis: the candidate who can prompt fluently and iterate fast but does not catch fabricated citations and SOX-materiality misreads. This is the candidate who looks impressive in an unstructured live demo and produces a confident, well-formatted report that goes to the board with the wrong number.

The next most common pattern, especially in candidates with five-plus years of experience: a strong Detect AI and AI Ethics axis, weaker Use and Apply. This is the candidate who pushes back appropriately, asks for sources, and gets the work right (slowly). The radar is informative; the composite is misleading.

The least common but most-prized pattern: balanced across all six, with no axis below the 60th percentile. These candidates are rare and the gap between them and the next-most-balanced candidate in a hiring pool is consistently large.

That gap is the thing the AICOS-anchored test is built to find.

What this article does and does not commit to#

This article commits to one claim: any test of AI literacy that is not anchored against a published competency framework is asking a question to which the field has not agreed an answer. The Acta scoring rubric is anchored against AICOS not because AICOS is the last word, it explicitly is not, but because anchoring against a published instrument is the only way to talk about construct validity in the first place.

This article does not commit to AICOS being the right framework forever. It is the strongest published framework as of mid-2026. If the field converges on a different scale (and the funding lines suggest several teams are working toward exactly that), the right move will be to re-anchor against the new scale and rerun the validation. The architectural commitment is to a framework. The framework commitment is replaceable.

In the next article, we look at the one composite that does not fit into AICOS cleanly (calibrated trust) and why we report it separately rather than rolling it into the radar.

References

  1. 01Markus, J., Carolus, A., & Wienrich, C. Objective Measurement of AI Literacy: Development and Validation of the AI Competency Objective Scale (AICOS). arXiv:2503.12921, 2025.Read source
  2. 02Carolus, A., Koch, M., Straka, S., Latoschik, M. E., & Wienrich, C. MAILS, Meta AI Literacy Scale: Development and testing of an AI literacy questionnaire. Computers in Human Behavior: Artificial Humans, 1, 100014, 2023.Read source
  3. 03Long, D., & Magerko, B. What is AI Literacy? Competencies and Design Considerations. CHI ’20: Proceedings of the 2020 CHI Conference on Human Factors, 2020.Read source
  4. 04New York City Local Law 144, Automated Employment Decision Tools. NYC Department of Consumer and Worker Protection, 2023.Read source