Anima — in-silico toxicity screen

Two toxicities, one molecule.

Example — troglitazone, an antidiabetic withdrawn in 2000 for liver toxicity. Anima flags the liver, not the heart. (Astemizole, a withdrawn antihistamine, returns the inverse: cardiac-HIGH, liver-LOW.)

anima.screen("troglitazone") · Cc1c(C)c2c(c(C)c1O)CCC(C)(COc1ccc(CC3SC(=O)NC3=O)cc1)O2

hERG · cardiotoxicity

LOW risk

borderline domain · P(blocker) · AUC 0.876

DILI · liver injury

HIGH risk

73%

in-domain · P(hepatotoxic) · AUC 0.75

Structure in, calibrated risk out.

01 · Structure

Featurize

Your SMILES is standardized and featurized with RDKit — Morgan fingerprints, physicochemical descriptors, and structural-alert counts.

02 · Models

Predict

A LightGBM cardiotoxicity model (124k matched hERG compounds) and a RandomForest liver model (3.5k human-DILI labels) — the architectures that won an exhaustive, honest bake-off.

03 · Output

Judge honestly

A calibrated probability, a risk band, and an applicability-domain flag that says off-domain — low confidence for chemistry unlike the training set. Plus nearest known compounds, so you can see the reasoning.

Validated in the open.

Honest isn’t a tagline — it’s the method. Every number here is scaffold-split and error-barred, and a scientific-governance review is on the record correcting our own early mistakes.

Error-barred, not cherry-picked

0.876 hERG ROC-AUC

Scaffold cross-validated on 123,950 matched compounds — at the upper end of published hERG performance, with an applicability-domain flag for chemistry outside the training set.

A governance review, on record

A three-part review — cheminformatics, toxicology, ML methodology — scrutinized the corpus and methods, and caught an inflated early result (leakage from missing-not-at-random assay coverage) before it shipped.

Why liver is triage, not a verdict

~0.75 DILI honest ceiling

Human liver injury is largely idiosyncratic — host genetics, dose, immune response. Across five model families and every data lever, structure alone tops out near 0.75. Anima tells you which is which, honestly.

The datasets are the point.

Anima ships two versioned, provenance-tracked, openly-licensed corpora — with dataset cards and cross-source quality control. Reuse them, extend them, build on them.

Cardiotox (hERG)

123,950 matched compounds — ChEMBL, TDC, PubChem qHTS, hERGCentral, BindingDB — 98.7% cross-source agreement on the IC50 core.

CC-BY-SAOpen

Human DILI

3,560 human liver-injury labels — FDA DILIrank/DILIst, LiverTox, curated literature — with graded severity, harmonized with provenance.

CC-BY-SAOpen

Launch the Screen.

Sign in to run the Screen online — hosted, account-based access is rolling out in early access. Prefer local? Run the real tool today in two commands:

Create an account Sign in

# build the model bundles (once)
python anima/train_models.py

# then open http://127.0.0.1:5000
python anima/app.py

Anima is a non-profit research prototype. This is a triage signal to prioritize testing — not a wet-lab replacement, not medical or clinical advice. Off-domain predictions are flagged low-confidence.