Adding a new theory to cvc5

Table of Contents


cvc5 is an SMT solver with support for many theories: integers, reals, bit-vectors, algebraic datatypes, separation logic, sets, bags, and more. This means that is can find satisfying solutions for formulas involving these kinds of objects.

In this post, I explain the process of adding a new theory to cvc5: the theory of prime-order finite fields. This post serves as a record of the patches I made to cvc5 to add this theory, and why I made them. This post is not likely to be very comprehensible to someone who doesn’t know both how to use an SMT solver and a bit about how they are implemented. For a tutorial on how to use SMT solvers, see “Programming Z3”. For information about how they are implemented, the textbook “A Handbook of Satifiability” is a good place to start.

Finite Fields

The theory we’d like to add is that of finite fields. A finite field is a finite set of values with $\times$ and $+$ operations which respect the usual properties: associativity, inverses, commutativity, and distributivity.

Our focus will be on prime fields, which can be represented as integers modulo a fixed prime $p$, written $\mathbb{F}_p$ or $\mathbb{F}$ for brevity.

So, we’d like cvc5 to be able to determine things like this:

Our first task is to identify the sorts, operators, predicates, and constants for this theory:

Concrete Values

We begin by creating a concrete value type for prime field elements. This is just adding a file & class to /util. Roughly, the class looks like this:

class FiniteFieldValue {
  FiniteFieldValue(const Integer& val, const Integer& size);
  const Integer& getValue();
  const Integer& getFieldSize();
  friend bool operator==(const FiniteFieldValue&, const FiniteFieldValue&);
  friend bool operator!=(const FiniteFieldValue&, const FiniteFieldValue&);
  friend bool operator<(const FiniteFieldValue&, const FiniteFieldValue&);
  friend bool operator>(const FiniteFieldValue&, const FiniteFieldValue&);
  friend bool operator<=(const FiniteFieldValue&, const FiniteFieldValue&);
  friend bool operator>=(const FiniteFieldValue&, const FiniteFieldValue&);
  friend FiniteFieldValue operator+(const FiniteFieldValue&, const FiniteFieldValue&);
  friend FiniteFieldValue operator-(const FiniteFieldValue&, const FiniteFieldValue&);
  friend FiniteFieldValue operator-(const FiniteFieldValue&);
  friend FiniteFieldValue operator*(const FiniteFieldValue&, const FiniteFieldValue&);
  friend FiniteFieldValue operator/(const FiniteFieldValue&, const FiniteFieldValue&);
  FiniteFieldValue recip() const;
  // ...

The PR that creates the class is here. In that PR, it’s name is FiniteField, which gets changed to FiniteFieldValue later. Note that Integer is cvc5’s internal multiprecision integer type.

Theory Boilerplate

Next, we add boilerplate that defines a new theory in cvc5. The PR is here.

This boilerplate is organized around a shell-script called the “kinds file”. It declares basic theory and term utilities.

Term utilities include:

Each theory utility is materialized as a class. We’ll discuss them in more detail in a moment.

The kinds file itself is a shell script that uses a number of cvc5-defined shell functions to declare all of the above information. So, essentially the file contains a bunch of pointers to difference C++ classes that implement different parts of the theory. Below, there is a rendering of the file with additional comments (and some omissions).

theory THEORY_FF \
    ::cvc5::internal::theory::ff::TheoryFiniteFields \
typechecker "theory/ff/theory_ff_type_rules.h"
rewriter ::cvc5::internal::theory::ff::TheoryFiniteFieldsRewriter \

# declare a kind (FINITE_FIELD_TYPE) for the field sort itself
  struct \
  FfSize \
  "::cvc5::internal::FfSizeHashFunction" \
  "util/ff_val.h" \
  "finite field type"

# meta-data about the field sort
cardinality FINITE_FIELD_TYPE \
    "::cvc5::internal::theory::ff::FiniteFieldProperties::computeCardinality(%TYPE%)" \

enumerator FINITE_FIELD_TYPE \
    "::cvc5::internal::theory::ff::FiniteFieldEnumerator" \

well-founded FINITE_FIELD_TYPE \
    true \
    "(*cvc5::internal::theory::TypeEnumerator(%TYPE%))" \

# declare a kind (CONST_FINITE_FIELD) for field values
  class \
  FfVal \
  ::cvc5::internal::FfValHashFunction \
  "util/ff_val.h" \
  "a finite-field constant; payload is an instance of the cvc5::internal::FfVal class"

## declare other ff kinds and typing rules
operator FINITE_FIELD_MULT 2: "multiplication of two or more field elements"
operator FINITE_FIELD_NEG 1 "unary negation of a field element"
operator FINITE_FIELD_ADD 2: "addition of two or more field elements"

typerule CONST_FINITE_FIELD ::cvc5::internal::theory::ff::FiniteFieldConstantTypeRule
typerule FINITE_FIELD_MULT ::cvc5::internal::theory::ff::FiniteFieldFixedFieldTypeRule
typerule FINITE_FIELD_NEG ::cvc5::internal::theory::ff::FiniteFieldFixedFieldTypeRule
typerule FINITE_FIELD_ADD ::cvc5::internal::theory::ff::FiniteFieldFixedFieldTypeRule


The Rewriter

A rewriter extends TheoryRewriter and must implement: preRewrite and postRewrite. For now we implement both trivially, returning RewriteResponse(REWRITE_DONE, original_term).

The Solver

The theory solver extends Theory and is considerably more complex. Again, we implement it trivially for now: in void methods we do nothing, and in methods that return something, we return a default/null value.

One set of non-trivial methods is those that interact with the theory’s equality engine. We create members of of type TheoryInferenceManager and TheoryEqNotifyClass , initialize them, and implement the following:

bool TheoryFiniteFields::needsEqualityEngine(EeSetupInfo& esi)
  esi.d_notify = &d_eqNotify;
  esi.d_name = "theory::ff::ee";
  return true;

void TheoryFiniteFields::finishInit()
  Assert(d_equalityEngine != nullptr);


The first method says that we need an equality engine for out theory. The second registers our function symbols with that engines (so it can do congruence closure).

The type-checker

The type checker is a collection of type checking rules. Each rule is a class with a static, public method called computeType. Here is an example: our type checking rule for addition and multiplication nodes:

TypeNode FiniteFieldFixedFieldTypeRule::computeType(NodeManager* nodeManager,
                                                    TNode n,
                                                    bool check)
  TNode::iterator it = n.begin();
  TypeNode t = (*it).getType(check);
  if (check)
    if (t.getKind() != kind::FINITE_FIELD_TYPE)
      throw TypeCheckingExceptionPrivate(n, "expecting finite-field terms");
    TNode::iterator it_end = n.end();
    for (++it; it != it_end; ++it)
      if ((*it).getType(check) != t)
        throw TypeCheckingExceptionPrivate(
            n, "expecting finite-field terms from the same field");
  return t;

The arguments have the following meaning:

A Real Rewriter

Now, it’s time to implement a non-trivial rewriter. The PR is here. A rewriter is responsible for normalizing and/or simplifying the representation of theory terms. Finite field terms contain negations, additions, and multiplications. For example

$$ xy + (-xy) + 7 + (x + 2)(y + 3) + 2 $$

The goals for our rewriter are:

So we might rewrite the above to

$$ (x + 2)(y + 3) + 9 $$

The SMT-wide rewriter rewrites terms in a downward pass, followed by an upward pass. For each term, it allows the theory for that term to specify a rewrite. A theory specifies a rewriter by implementing a subclass of TheoryRewriter:

class TheoryFiniteFieldsRewriter : public TheoryRewriter
  RewriteResponse postRewrite(TNode node) override;
  RewriteResponse preRewrite(TNode node) override;

The downward pass hook is TheoryRewriter::preRewrite. The upward pass hook is TheoryRewriter::postRewrite. On our downward pass, we eliminate negation. On our upward pass, we normalize $n$-ary operators and combine like terms.


Inside cvc5, a statistics system is used to record and review dynamic information about a solve attempt. For example, there are statistics that record how often each theory reports a conflict, reports a lemma, and propagates a literal. There are statistics that count total node allocations and measure total solve time. And many more.

Pre-emptively, we create an object to hold the finite field solver’s statistics, here.

Theory-Internal Reasoning

At some point, you need to implement the internal logic of your theory. For a simple theory, this means taking a conjunction of theory literals, and determining:

In the case of finite-fields, you can map the literals to set of multivariate polynomials $P$ such that a model corresponds to a common root $\vec x$. Each polynomial corresponds to one field literal; if a subset of polynomials has no common roots, then the corresponding subset of literals is an UNSAT core.

Anyways, our algebraic task is to find a common root for $P$, and if not, attempt to identify a root-free subset of $P$. To do this, we build a few pieces of machinery:

  1. root-finding for a univariate polynomial (PR)
  2. common root-finding for a set of multivariate polynomials (PR)
  3. UNSAT cores for when a set of multivariate polynomials has no roots (PR)
  4. An incremental common-root-finder/UNSAT core builder (PR)
    • Here, the input polynomial set $P$ is incremental
      • you can push new polynomials, pop, and ask for a common root/UNSAT core at any time.

This machinery makes up the core logic of our theory. However, we have not yet connected that logic to the Theory interface.

A Sub-Theory Solver

The theory of finite fields is responsible for fields of all sizes. However, each field is independent of the others: there are no conversion operators between fields, and the only predicate (equality) takes arguments of the same field. Thus, we implement the theory of finite fields by implementing a “sub-theory” that handles the field $\mathbb{F}_p$, where $p$ is fixed.

This class is worth discussing in more detail because its interface foreshadows the larger theory. Here is the public interface:

class SubTheory : protected EnvObj, protected context::ContextNotifyObj
  SubTheory(Env& env, FfStatistics* stats, Integer modulus);
  void preRegisterTerm(TNode term);
  void notifyFact(TNode fact);
  void postCheck(Theory::Effort effort);
  bool inConflict() const;
  const std::vector<Node>& conflict() const;
  const std::unordered_map<Node, Node>& model() const;
  // ...

Now, let’s discuss each method:

We implement our subtheory in this PR, with a few fixes in a later PR. Essentially, our implementation waits for FULL effort, and then runs the common root finder from the previous section.

The Theory Solver

The final step is implementing the theory solver itself and modifying cvc5’s APIs and parser to make the the theory solver accessible to cvc5’s users. The PR is here. This is the PR where we change the name of the class that represents finite field values, to the diff is quite messy. A cleaner (but incomplete) diff can be found here: the initial commit of the PR.

A laundry list of changes that we make:

A Final Change

In the foregoing, we developed a number of FF-internal data structures that were incremental. Specifically, we built data structures for incrementally building a set of polynomials, finding their common roots, and blaming the lack of common roots on a subset. However, in our implementation of the sub-theory, we didn’t actually use these data structures very incrementally: we pushed polynomials to them incrementally (as we we passed new FF facts), but we only triggered a search for common roots at FULL effort. Our choice to not use incrementality was guided by experiments that showed it didn’t improve performance. In a later PR, we made the data structures non-incremental, which simplified the codebase significantly and also improved performance a bit. It didn’t affect the interface between the theory of finite fields and the rest of the SMT solver at all.

To give some more detail, before this patch, the core data structure was an incremental Ideal:

class IncrementalIdeal : EnvObj
  IncrementalIdeal(Env& env, CoCoA::ring polyRing);
  // Add new generators
  void pushGenerators(std::vector<CoCoA::RingElem>&& gens);
  // Remove the last batch of generators
  void pop();
  // Is the ideal the whole ring?
  bool idealIsTrivial();
  // For a trivial ideal, return a (sub)list of generator indices that generate it.
  const std::vector<size_t>& trivialCoreIndices();
  // For a trivial ideal, return a (sub)list of generators that generate it.
  std::vector<CoCoA::RingElem> trivialCore();
  // For a non-trivial idea, check whether there is a base-field variety member.
  bool hasSolution();
  // For a non-trivial idea with a base-field variety member, get it.
  const std::vector<CoCoA::RingElem>& solution();
  // ...

After this patch, that data structure was eliminated, and replaced by a single function that attempted to find a zero for a set of polynomials: returning either a zero or a subset of the polynomials that had no zero.