×

Expert preference signal for a financial-reasoning frontier model

Problem

A leading AI lab wanted to expand its model's capabilities in financial argumentation. That required detailed, domain-specific signal from finance experts — professionals who could evaluate responses through multi-step analysis of complex, hypothetical scenarios, all within a tight deadline.

Solution

Labelbox produced the signal. Through its Alignerr network, the platform captured judgment from finance professionals with CFA, MBA, Master's, and PhD-in-Finance backgrounds, in a customized project with detailed instructions and real-time quality monitoring built in the Labelbox platform.

Result

Through evaluation and preference ranking, Labelbox's platform quickly produced high-quality, differentiated signal that let the lab train its LLM on financial tasks. With a repeatable source of expert-graded financial signal, the lab can keep advancing its model's reasoning on complex financial prompts.

Expert preference signal for a financial-reasoning frontier model

A frontier AI lab needed to harden its model on financial reasoning. Labelbox's platform produced expert-graded preference signal from CFA- and PhD-level finance experts.

The challenge

A leading AI lab wanted to improve its models' industry-specific reasoning on finance — performance, trustworthiness, and accuracy on financial queries. The target capability: give meaningful insights on any public company from a ticker symbol and the latest financial reports, and answer the questions a financial analyst would ask. Producing that signal was hard. The tasks were complex and domain-specific, the deadline was tight, and the lab lacked finance expertise at the scale required.

The approach

Labelbox produced the signal. Through its Alignerr network — spanning industry domains and languages — the platform captured judgment from finance experts with CFA, MBA, Master's, and PhD-in-Finance qualifications, screened from over 50 candidates, with a 24-hour calibration period. Labelbox and the lab developed the task instructions together and built a custom ontology in the platform's text editor: classifications, sub-classifications, and free-text inputs. Against complex, hypothetical prompts, experts ranked aspects of the model's outputs on a 1-to-5 scale — evaluating hypotheses for probability, importance, and feasibility, and argument quality for conclusiveness and causality. The lab monitored performance and quality metrics throughout, and workflows adjusted as feedback came in.

As someone with a PhD in finance, I was intrigued by the opportunity to apply my financial expertise to help train AI models. I've found the work both flexible and intellectually stimulating. While the financial tasks are technically challenging, they have been incredibly rewarding and have provided a welcome mental challenge.

— Shaun C, PhD Finance

The outcome

The lab got high-quality financial signal within its tight timeframe and used it to boost its LLM's performance, accuracy, and reliability. With a repeatable process for expert-graded financial signal, the lab can keep advancing its model on industry-specific reasoning like financial argumentation.

Where this goes

Preference ranking from domain experts is reward signal. This is how a general model becomes a specialist that financial analysts can trust.