Sophia
A probabilistic prior authorization agent for a US health insurer
Related capabilities
The result
Decision times fell 54% (to 2.8 days), appeal overturn rates dropped from 41% to 23%, and the system now delivers a net annual return of $18.9–20.9 million against $3.8 million in operating costs. The initial $6.2 million in build investment was repaid in under 5 months.
The full details
A regional US health insurer handling 1.4 million prior authorization requests annually asked us to address a specific challenge. Prior authorization (PA) is the process by which a doctor must obtain the insurer’s approval before providing treatment or prescribing medication for it to be covered. The request included specific targets. Reduce the average time needed to decide on coverage by 40%, without increasing the rate at which overturned decisions are made after appeal, and without conflicting with the growing regulations around AI in insurance.
What we developed was Sophia, a probabilistic triage and decision-support agent that operates between a doctor’s submission and the clinical reviewer responsible for approval. Sophia neither approves nor denies, and this distinction extends beyond regulatory considerations. She reviews the clinical package, builds a calibrated probability distribution of likely outcomes, and directs cases to the appropriate review level based on that distribution’s shape. The quality of a coverage decision relies almost entirely on whether the correct reviewer examines the case at the right time with the right information, yet most approval workflows are critically poor in all three aspects.
This is the full build story, including the technology choices we made, the ones we rejected, the regulatory constraints that shaped the architecture, and an honest accounting of where the system works well and where it still has sharp edges.
The problem in numbers
The insurer, a Blues Plan covering 2.1 million members across four states, was managing prior authorizations through a workflow that had become stagnant after more than a decade of incremental patching. Doctors submitted requests via a web portal, by fax, or over the phone. A first-level intake team categorized each request by service type, entered it into the approval platform, and routed it to a pool of clinical reviewers—nurses and physicians who assessed whether the requested treatment met the insurer’s coverage criteria.
The issues were structural rather than motivational, which made them both easier to identify and more difficult to resolve. The average time from submission to decision for non-urgent PAs was 6.2 business days, compared to a target of three days. The overturn rate on first-level appeals was 41%, meaning nearly half of all denied cases were reversed upon review by a second reviewer. Additionally, 62% of requests that were ultimately approved had been examined by a clinical reviewer who, in hindsight, did not need to see them, as the clinical evidence already clearly met the published coverage criteria.
That last number is the economic core of the project. If you can reliably identify the 62% of cases that are straightforward approvals and route them through an accelerated pathway, you free up reviewer capacity for the 38% that genuinely need clinical judgment. The arithmetic is simple, but the execution required us to solve several problems that nobody in health insurance technology has solved cleanly.
Architecture overview
Sophia has four layers, each with distinct technology choices and failure modes.
The intake layer manages document ingestion from submission channels, converts all data into a standardized clinical package, and extracts structured data from unstructured sources, which is where the language model operates.
The probabilistic engine takes the structured clinical package and generates a probability distribution over coverage outcomes. This is a Bayesian network (a type of statistical model that illustrates how different pieces of evidence relate to each other and to the final decision), not a language model.
The routing layer interprets the shape of that probability distribution and assigns each case to one of four pathways, each with different turnaround targets and reviewer qualification requirements.
The monitoring and calibration layer continuously compares Sophia’s predicted distributions with observed outcomes (including appeal results) and triggers recalibration when predictions drift from actual outcomes.
Each of these warrants a detailed examination of the technology choices and trade-offs involved.
The intake layer
Document ingestion
Prior authorization submissions arrive in at least five formats, which partly explains why PA processing is costly from the outset. Portal-submitted requests are formatted as structured digital health records using the FHIR standard, a modern data format for exchanging medical information. Faxed submissions come as scanned PDFs, often rotated, occasionally handwritten, and almost always accompanied by clinical notes from the doctor’s records system that have been printed and re-scanned at a resolution suggesting the fax machine is about to go on strike. Phone submissions are transcribed by the intake team into the approval platform, transfers between insurers occur via electronic transaction standards, and a significant number of requests arrive as unstructured email attachments.
We employed Amazon Textract for optical character recognition (converting scanned images into readable text) on faxed documents, with a confidence threshold of 0.87, below which the document is flagged for human review rather than automated processing. The decision to use Textract instead of Google’s equivalent was influenced by the insurer’s existing AWS usage. The primary advantage was the pre-existing infrastructure and procurement relationship that sped up licensing, rather than any significant difference in quality. Both services perform similarly in recognizing medical documents with high confidence levels. The difference becomes clear with degraded fax images, where Google’s model has a slight advantage in handwritten annotations. However, since only around 8% of fax submissions are genuinely degraded, this did not justify migrating the insurer’s entire cloud infrastructure.
Clinical data extraction
This is the component where the system gains or loses trust. We use Claude Sonnet via the Anthropic API, with a prompt chain that extracts structured clinical elements from the normalized document package.
The extraction targets include diagnosis codes (standardized codes that identify the patient’s condition), procedure codes (standardized codes that specify the treatment the doctor requests), relevant lab values with dates, imaging study results, medication history, prior treatment attempts, and the treating physician’s clinical rationale for the requested service. Each extracted element has an explicit confidence score, and any element with a confidence below 0.82 is flagged for human review rather than passed downstream.
We rejected the idea of a single end-to-end extraction prompt because monolithic prompts tend to accumulate errors in ways that are difficult to diagnose.
The chain operates in three stages, each with a distinct extraction target and failure mode:
The first stage identifies the document type and extracts request metadata (such as the requested service, the requesting doctor, and the patient).
The second stage extracts the clinical evidence, starting with the most structured sources (lab reports, imaging results) and progressing to the least structured (doctor-narrated clinical notes).
The third stage performs a consistency check, looking for contradictions between the extracted elements, such as a diagnosis code that does not match the clinical narrative, a requested procedure unsupported by the documented diagnosis, or lab values that contradict the stated treatment rationale.
The consistency check stage addresses the most dangerous failure mode in automated PA processing—hallucinated clinical facts. A language model that “fills in” a missing lab value because the clinical context suggests a particular value is not extracting data but fabricating it. The process flags any extracted element that lacks a traceable source in the original documents. In production, about 3.4% of clinical extractions trigger a flag at this stage, and 71% of these are genuine documentation gaps rather than extraction errors, indicating that the system highlights real issues rather than creating false ones.
Technology tradeoffs at the intake layer
Why not an open-source model? We evaluated two leading open-source alternatives for the clinical extraction task. Both performed within 5-7% of Claude Sonnet on structured extraction accuracy in our evaluation set. The gap widened on two specific tasks: interpreting handwritten physician notes (where Claude’s vision capabilities provided a 12-point accuracy advantage) and identifying the implied clinical rationale in narrative notes where the physician did not explicitly state why they were requesting the service. In a production healthcare system where extraction errors propagate into coverage decisions, we judged the accuracy gap to be too large to accept for the cost savings.
Why not a dedicated medical AI model? Two reasons, both practical rather than theoretical. First, the leading medical-specific models were not generally available via API for production healthcare applications at the time of development, and building on a model with uncertain availability is a poor basis for a system your client expects to operate for 5 or more years. Secondly, our task is extraction and structuring, not answering medical questions. A general-purpose model with strong instruction-following and extraction abilities outperforms a domain-specific model optimized for medical question-answering on our particular task.
Why choose a managed OCR service over creating your own? The build-versus-buy decision mainly relates to the maintenance burden. A customized text-recognition system tailored for medical documents would have required a dedicated team to retrain it whenever a doctor’s office changed its records system or print format. Amazon’s managed service shifts that maintenance burden onto them, and the cost difference ($0.015 per page compared to about $0.003 per page for a self-hosted alternative) is minimal compared to the operational savings.
The probabilistic engine
This is the intellectual center of Sophia, and the element that distinguishes it from dozens of “AI-powered PA” products that are merely rules engines with a language model attached for document parsing.
Why a probability distribution rather than a simple score
The natural approach is to frame this as a classification problem. Take the extracted clinical package, run it through a machine learning model, produce a single number representing the probability of approval, and route it above a threshold while escalating below it.
This approach fails for PA triage because a single number loses the detailed information necessary for routing. Two cases may both show a 72% probability of approval, but if one has tight confidence (the clinical evidence clearly meets four of five criteria, and the fifth is borderline but well-documented) while the other has broad, bimodal confidence (the evidence either fully supports the request or not at all, depending on how an ambiguous imaging study is interpreted), they require different reviewers with varying expertise and time.
Our Bayesian network retains the full probability distribution, structured as a directed graph with three node types that directly mirror how coverage decisions are made.
Evidence nodes represent structured clinical elements extracted by the intake layer, each with a probability distribution over potential states (a lab value might be normal, abnormal, or missing; an imaging study might be consistent with the diagnosis, inconsistent, or ambiguous; prior treatments might be attempted, not attempted, or documented as failed).
Criteria nodes denote individual medical-necessity criteria derived from the insurer’s coverage policies that decide whether a service should be approved for a specific diagnosis. Each criterion node has a conditional probability distribution based on evidence input, reflecting the historical link between evidence patterns and reviewers’ decisions for that criterion.
The outcome node consolidates information from all criteria nodes to produce the final probability distribution over the coverage decision.
Training and calibration
The network was trained on 847,000 historical prior authorization (PA) decisions spanning three years, including 112,000 cases that underwent at least one level of appeal. We trained on the entire decision process, not just the initial decision, which influences the design and implications.
Initial reviewer decisions often display systematic biases that may accumulate over time. Specific reviewers develop characteristic patterns, seasonal workload variations can reduce thoroughness, and certain diagnoses tend to have implicit biases (mental health PAs have historically been denied at higher rates than comparable medical PAs, a disparity actively targeted by federal and state regulators). By incorporating appeal outcomes, the model learns from cases where the initial decision was incorrect, adjusting the probability distributions towards calibration against the “correct” outcome rather than a typical one.
Calibration means that the model’s confidence aligns with actual outcomes. If Sophia assigns an 80-85% probability of approval to a set of cases, then 80-85% should be approved (including through appeal). We measure this using a standard metric called expected calibration error, stratified by service type (hospital admissions, outpatient procedures, pharmacy, medical equipment, behavioral health). Our overall calibration error is 0.034 across categories, with the largest gap in behavioral health (0.061), reflecting rapid policy changes in mental health coverage.
The non-representative sample problem
Training on appeal outcomes introduces a bias that we have mitigated but not eliminated, and it is important to be precise about why. Patients and doctors who appeal denied decisions are not a random sample of all denied cases. They tend to be sicker patients, from better-resourced medical practices, or patients with employer-sponsored coverage (who have more sophisticated benefit advocates), and cases involving treatments with strong evidence bases, where the initial denial is most likely to be overturned.
This suggests that the model’s predictions for cases that seem “likely to be appealed” are better calibrated than for cases that appear “unlikely to be appealed.” We partly address this using a statistical technique called inverse propensity weighting, where we develop a separate model to estimate the probability that a denied case would be appealed and assign weights to the appeal outcomes accordingly. Cases with low appeal likelihood but successful appeals receive higher weights in calibration, based on the idea that these outcomes contain more information about the true approval probability.
The correction is not perfect because the weighting model itself has uncertainty, and correcting one bias with another model that has its own biases presents a recursive challenge that statistical methods manage well but never entirely resolve. In practice, this correction reduced the calibration gap between the “likely to appeal” and “unlikely to appeal” case groups from 8.7 percentage points to 3.1 percentage points.
Technology choices for the probabilistic engine
We built the Bayesian network using PyMC, a Python-based probabilistic programming framework, with the network structure defined based on the insurer’s coverage policy hierarchy and refined through structural learning on the historical dataset.
Why not consider a deep learning approach, you might well ask. In practice, two main reasons influenced this decision. The first is explainability, because the graph structure of a Bayesian network directly reflects the coverage criteria, allowing a reviewer to see precisely which pieces of evidence influence the probability and to examine the reasoning process. A neural network that outputs a 68% approval probability provides no such insight for the reviewer. The second reason is data sparsity within subgroups; although the overall dataset is large, filtering by specific treatment types, diagnoses, and evidence patterns significantly reduces the effective sample sizes. Bayesian methods tend to better handle inference with small samples than deep learning, which often overfits or underfits when data is limited.
The routing layer
Sophia routes each case into one of four pathways based on the shape of the probability distribution, not just its central value.
Auto-route to approval triggers when the probability of meeting coverage criteria exceeds 0.92, the distribution is tightly concentrated (indicating high confidence), and the case does not involve an experimental therapy, off-label drug use, or a service category flagged for mandatory physician review by state regulation. Approximately 34% of all PA requests route through this pathway, against the 62% that were retrospectively identified as straightforward approvals. The gap between 34% and 62% is deliberate conservatism, and it was among the most debated design decisions in the project. We set the threshold to favor precision over volume, accepting that some straightforward cases will still reach a reviewer rather than risk auto-routing a case that should have been examined.
Standard review is the default pathway for cases with moderate approval probability (0.55-0.92) and moderate confidence spread, covering about 41% of volume. These go to the general clinical reviewer pool with a structured brief generated by Sophia that highlights which criteria the case clearly meets, which are borderline, and what further documentation, if any, might resolve the borderline criteria.
Senior clinical review is triggered when the probability distribution is bimodal or highly skewed, indicating that the evidence supports contradictory conclusions depending on clinical interpretation. These cases make up about 18% of the volume, and here the clinical judgment of an experienced reviewer is most crucial. Such cases are assigned to senior reviewers with subspecialty expertise matching the service category, and Sophia’s brief highlights the interpretive disagreement rather than suggesting a specific direction.
Medical director escalation pertains to novel therapies, cases lacking clear coverage policy criteria, requests involving experimental designations, and any case in which Sophia’s extraction confidence falls below minimum thresholds for critical clinical elements. Approximately 7% of cases are routed here, with the medical director’s office treating these as both individual decisions and inputs for policy development (if Sophia continually flags the same type of request as lacking criteria, it indicates the coverage policy needs updating).
Routing thresholds and the false economy of optimization
We deliberately chose not to optimize routing thresholds to minimize reviewer time. The obvious approach would be to adjust thresholds so that fewer cases reach reviewers, lowering the auto-route threshold and raising the escalation threshold.
We rejected this because the objective function you optimize against encodes your values; minimizing reviewer hours per decision prioritizes efficiency over accuracy at the critical margin. Instead, thresholds were set based on acceptable error rates, resulting in a more balanced and defendable system. The auto-route threshold was calibrated so that the estimated false-approval rate (cases auto-approved that a competent reviewer would have denied) remains below 2%, and the escalation threshold was set so that the estimated missed-escalation rate (cases requiring senior review but sent to the general pool) stays under 5%.
This cost the insurer an estimated $1.2 million per year in reviewer hours beyond what the efficiency-optimized thresholds would have produced. We recommended accepting this cost, and the insurer’s chief medical officer agreed, on the grounds that an AI system that auto-approves cases it should not is a regulatory and reputational time bomb.
Regulatory and compliance architecture
Healthcare AI in the US operates within a regulatory environment that is tightening faster than most technology teams realize, and Sophia’s architecture was shaped by that environment from the very first design document.
State-level AI disclosure requirements
California, Illinois, New York, and Colorado have all enacted or are considering legislation requiring insurers to disclose when AI or algorithmic tools are used in coverage decisions. The specific requirements vary, but the common thread is that the member must be informed, the reasoning must be explainable, and a human must be responsible for the final decision.
We designed Sophia around these requirements as strict constraints, not afterthoughts, and three design choices directly stem from the regulatory environment.
First, Sophia never makes a coverage decision but only routes and recommends. The human reviewer is the decision-maker in all cases, including the auto-route pathway (where a supervising clinical reviewer audits a random 10% sample of auto-routed cases daily, with full authority to pull any case back into manual review).
Second, every Sophia recommendation includes a plain-language explanation generated from the Bayesian network’s evidence-to-criteria mapping. The explanation determines whether the clinical evidence strongly supports, partially supports, or does not clearly support the coverage criteria, cites the specific evidence linked to specific criteria, and states the model’s confidence level along with factors influencing uncertainty. This is not a language model generating a rationale after the fact but a deterministic rendering of the actual statistical reasoning, making it auditable and reproducible.
Third, all model outputs, evidence extractions, confidence scores, routing decisions, and reviewer overrides are logged in a tamper-proof audit store (AWS QLDB, a ledger database where records cannot be altered after the fact) with a retention period exceeding the longest statute of limitations for coverage disputes in any of the four operating states (seven years, in New York).
Patient data protection
The clinical data flowing through Sophia is protected health information under HIPAA (the federal law governing medical data privacy), which constrains the architecture in specific ways. The probabilistic engine and routing layer run in an isolated network with no public internet access. The language model calls the Anthropic API under a Business Associate Agreement (a legal contract that extends HIPAA obligations to the vendor). Patient identifiers are stripped and replaced with internal tokens before the clinical text is sent for extraction, and the mapping from tokens back to patient identity exists only within the insurer’s secure environment.
We evaluated running a locally hosted model to eliminate all external API calls. The accuracy difference (7-12 points on clinical extraction) between the best available open-source alternatives at the time of build made this impractical for production. As open-source medical models improve, this calculation will shift. The architecture is designed to allow swapping the extraction model without changing the downstream workflow.
Federal compliance for Medicare Advantage
The insurer operates Medicare Advantage plans (privately run alternatives to government Medicare) in two of its four states, which introduces federal oversight of PA practices. The Centers for Medicare and Medicaid Services (CMS) has become increasingly strict about PA denial rates in these plans, culminating in a final rule requiring them to use evidence-based clinical criteria aligned with standard Medicare coverage policies.
Sophia’s coverage policy mappings are explicitly aligned with the CMS coverage determination databases, with automated flagging when an insurer-specific policy is more restrictive than the corresponding standard Medicare criteria. This does not prevent the insurer from maintaining stricter criteria for commercial (non-Medicare) plans, but it creates a clear audit trail when Medicare Advantage decisions are made using criteria that exceed standard Medicare thresholds, which is exactly what federal auditors look for.
The data infrastructure
The production infrastructure operates on AWS, a decision driven by the insurer’s current cloud setup rather than any specific technical preference.
Incoming submissions are queued and managed through Amazon’s workflow services (SQS for message queuing, Step Functions for multi-stage orchestration, Lambda for individual processing steps). The Bayesian inference engine runs on memory-optimized compute instances in a nightly batch, with the resulting probability tables cached in a high-speed database (DynamoDB) for rapid retrieval during real-time scoring.
The audit log employs AWS QLDB (a ledger database that is append-only and cryptographically verifiable) to record all model inputs, outputs, and routing decisions. This verification method is important because several state AI disclosure laws require the insurer to demonstrate that the explanation provided to the member aligns with the actual reasoning used, and a tamper-proof ledger makes this demonstrable rather than merely claimed.
Monitoring is handled through a custom dashboard pulling data from AWS metrics, with alerts configured for extraction confidence degradation (rolling average confidence dropping below 0.80), calibration drift (calibration error exceeding 0.05 for any service category), and throughput anomalies (daily PA volume deviating more than two standard deviations from the 30-day rolling average, which could indicate a change in doctor behavior or a submission system failure).
ROI model
The return-on-investment calculation has four components: two that are easy to measure and two that are more difficult.
Direct cost savings from reviewer capacity
The auto-route pathway manages 34% of PA volume without clinical reviewer involvement (beyond the 10% audit sample). With 1.4 million annual PAs, that equates to 476,000 cases per year that are removed from the reviewer queue. At an average fully-loaded reviewer cost of $42 per case (including salary, benefits, training, platform licensing, and supervisory overhead), the gross savings amount to approximately $20 million annually. Deducting the audit sample cost ($2 million) results in net reviewer savings of about $18 million per year.
Time-to-decision improvement
The average time from submission to decision fell from 6.2 to 2.8 business days across all PA requests, with auto-routed cases averaging 0.4 business days (same-day once the extraction workflow finishes). The financial value of quicker decisions is indirect but significant. Doctors report increased satisfaction (which impacts whether they remain in the insurer’s network and their bargaining power in contract negotiations), members experience fewer treatment delays, and the insurer’s quality metrics improve regarding access-to-care.
We estimate the provider retention and negotiation value at $3-5 million annually, based on the insurer’s actuarial team’s model of the historical relationship between PA turnaround times and doctor contract renewal rates. This is a rough estimate, and we treat it as such.
Appeal rate reduction
The overturn rate on first-level appeals decreased from 41% to 23%, due to two main effects. Firstly, cases that might have been wrongly denied are now directed to reviewers with the appropriate expertise. Secondly, the structured briefs generated by Sophia highlight documentation gaps before the reviewer makes a decision, enabling them to request additional information rather than deny based on incomplete evidence.
A reduction in overturned appeals results in fewer second-level reviews, fewer external reviews, and fewer cases where the insurer pays for the service after incurring costs in a multi-stage appeals process. We estimate the cost savings from fewer appeals at about $4.7 million annually, calculated by multiplying the reduction in appeal volume by the average cost per appeal, which includes legal, clinical, and administrative expenses.
Regulatory risk reduction
This aspect is the hardest to quantify and is the most important to the insurer’s board. Federal fines for systematic PA violations in Medicare Advantage plans can reach hundreds of millions. In California and New York, state attorneys general have launched investigations into algorithmic denial practices. The reputational damage caused by investigative journalism exposing automated denials is incalculable.
We have not assigned a dollar value to regulatory risk reduction because the probability distribution of potential regulatory actions is too heavy-tailed to determine a meaningful expected value. Instead, we presented it as optionality, which the board grasped immediately. Sophia’s architecture—involving human oversight, auditable reasoning, calibration against appeal outcomes, and deliberately conservative thresholds—provides the insurer with a solid legal position in regulatory proceedings, unlike a black-box automation system.
Total ROI
Hard savings amount to roughly $22.7 million annually ($18M from reviewer capacity, $4.7M from appeal reduction). Soft savings range between $3-5 million each year, stemming from provider satisfaction and network value. The annual operating costs, including AWS infrastructure, Anthropic API, monitoring team, and model recalibration, are approximately $3.8 million. The net ROI is between $18.9 million and $20.9 million per year, representing a roughly 5-5.5x return on operating costs, with the initial build investment of $6.2 million paid back within the first five months of operation.
What we would build differently
Every production system gains lessons that become clear only in hindsight, and three of ours are worth noting.
The extraction workflow should have been modular from the start. We developed the three-stage extraction process as a tightly connected sequence, which means replacing the underlying language model now requires revalidating the entire chain rather than just the specific stage where the model changed. A modular architecture with well-defined interfaces between stages would have made future model upgrades considerably less costly.
The behavioral health calibration gap needed attention earlier. From the beginning, we recognized that mental health PA decisions had different statistical characteristics compared to medical PAs (such as higher baseline denial rates, quicker policy changes driven by parity enforcement, and more subjective clinical criteria). We initially approached this as a calibration issue to be addressed within the general framework, but behavioral health probably required a dedicated Bayesian network with a different structure—one that better reflected the evidence-to-criteria relationships—rather than being a subcategory within a broader network. We are now developing that separate network, and it should have been included in the original scope.
The doctor-facing documentation request system should have been included in the initial scope. Sophia identifies documentation gaps in about 28% of submissions, but these gaps are currently communicated to the doctor via the insurer’s existing portal messaging system, which is slow and has a 34% response rate. A purpose-built, Sophia-generated documentation request that clearly specifies what is missing, why it matters for the decision, and the required format would likely improve response rates and reduce time-to-decision further. This was removed from the initial build due to integration difficulties with the doctor-facing portal, and adding it later has proved more costly than including it from the start.
Eighteen months on
Sophia has been in operation for a year and a half, processing an average of 5,800 PA requests per business day, routing 34% for auto-approval, and maintaining a calibration error of 0.034 across all service categories. The insurer’s PA turnaround time ranks in the top quartile of Blues plans nationally, and the rate of overturned appeals has continued to decline as the calibration data accumulates.
The less quantifiable observation relates to culture, which took us by surprise. Clinical reviewers, initially skeptical that an automated system would grasp clinical nuance, have become Sophia’s most vocal supporters. This is because the system does not aim to replace their judgment but provides them with better cases, improved information, and more time to focus on cases that genuinely require expertise. The medical director’s team uses Sophia’s escalation patterns to identify gaps in coverage policy, feeding into a quarterly policy review cycle that previously did not exist.
The most telling metric is one we did not plan to track. In the first month of production, clinical reviewers overrode Sophia’s routing recommendation 14% of the time. Eighteen months later, the override rate sits at 3.2%, not because the reviewers changed but because Sophia got better, and the reviewers watched her earn their trust one case at a time, across hundreds of thousands of decisions, with her reasoning visible and her mistakes correctable. A production agent in healthcare is not a demo with cherry-picked accuracy numbers or a vendor pitch promising to reshape the approval process. It is a system that does tedious work reliably, explains itself when asked, and gets out of the way when a human needs to take the wheel.
And why call the system Sophia, you may be wondering? We felt a name would provide humanization, which could be helpful in integrating it into the team, given our tendency to anthropomorphize. It was certainly preferable to its working name, “PA Triage Engine v1”. We chose the name Sophia as it is the name of the daughter of one of our team members, who was born during the rollout. It is also Greek for “wisdom”, which seemed appropriate.