Skip to main content
At Lumos, we spin up projects based on your goals and pipeline type. We remain fully flexible to ensure that we can seamlessly integrate into your current architecture. Below are some of the most common human-expert projects we saw in the past.

Data Structuring/Extraction

Clinicians extract structured facts from charts/encounters with dual-review and adjudication; great for training and a gold dataset. Example: Extract line items from a hospital bill grouped by day. Modalities: Text, image.

Data Annotation

Experts annotate the inputs, creating an enriched set of data labels. Example: Annotate pathology reports, including symptoms, diagnosis, medication set, and pathology slides. Modalities: Text, image, DICOM, waveform, multi-modal.

Benchmark / Ground-Truth Generation

Experts author canonical answers/case keys and lock them as test sets with versioning. Example: Create a prompt-bank that verifies model’s knowledge in women’s health. Modalities: Text, image, audio, multi-modal.

Model Stumping

Doctors asking questions that lead to a factually incorrect result. Creates a set of problematic questions that a model needs to be improved on. Example: Probe a model for time-line handling for oncology patients. Modalities: Text, image, multi-modal.

Write/Rewrite

Experts provide the correct answer to pre-created prompts. Example: Write correct answers to a given set of prompts. Modalities: Text.

Rubric-Based Evaluations

Clinicians score model outputs against rubrics; includes calibration rounds and adjudication. Example: Evaluate model for clinical reasoning, using Diagnostic Reasoning, Symptom Recognition Beyond Patient Report, Symptom Severity Assessment, Treatment, Procedures metrics. Modalities: Text, image, video, multi-modal.

Prompt-Based Rubric Generation

Experts design rubrics from prompts and adjust them based on the output. Example: Generate a set of rubrics for psychiatry use-case, focused on severe conditions. Modalities: Text.

Feedback Collection

Clinicians are asked to use a model/agent and provide feedback. Example: Using an actual product, create 20 conversations to test a model’s knowledge on patient triage, ask a model to schedule a visit, and provide supporting documentation. Review and share feedback on the provided documentation. Modalities: Text, image, multi-modal.

Preference Ranking

Experts are asked to interact with a model/agent and select the best response on each turn. Example: Chat with a copilot agent designed to help reduce the workflow for a nurse. On each turn, pick the best option and provide a quick feedback using selected tags as to why you picked a given option. Modalities: Text, image, multi-modal.

Red Teaming

Clinicians craft adversarial prompts and scenarios to expose dangerous behaviors; deliver reproducible failure sets with mitigations. Example: Timeline handling using pregnancy, cancer, and mental health conditions. Modalities: Text, image.

Human Trajectories

Capture expert step-by-step problem-solving (plan → tool call → observation → revision) to teach agents safe tool use. Example: ED physician traces: plan workup, order labs, interpret results, update assessment/plan with reasoning at each step. Modalities: Text.

Other Projects

Other projects include creating realistic conversations, step-specific evals, user-in-the-loop preference mimicking, prod log triage and many many others. We treat each project as a unique entity, quickly spinning it up based on the desired design. The time to spin up a project on Lumos platform is approximately 3 days.