Hudson Labs is excited to announce a major breakthrough from our machine learning research team. Over the course of months of R&D and evaluation, we have redefined the way we model red flag importance.
Our models are now better able to understand subtle differences in language to determine what disclosed language is very high risk, medium risk and low risk.
We’ve decreased our false positive rate (the rate at which our models pick up language that isn’t predictive of downside risk) AND our false negative rate (the rate at which our models miss relevant language).
We are better able to ensure our models continue learning as SEC filing language changes from quarter-to-quarter and year-to-year.
These breakthroughs will have wide-ranging impacts on our product and content. More details can be found below.
Hudson Labs uses deep learning based language models to identify and extract high-impact language in SEC filings. Specifically, we train our models to identify language that is predictive of fraud risk and/or earnings quality issues. To learn more about our research process and get a demo, contact us here.
The problem we tackled:
While our current models are smart, they had identifiable failure modes.These failure modes included:
Distinguishing importance and meaning within specific topics: Our models understood some topics better than others. For instance, our models sometimes confused a restatement of an agreement with a financial statement restatement, items with very different risk profiles.
Understanding relative ranking/importance between red flag types and categories. Our models couldn’t always identify whether a CFO resignation was more or less important than an accounting policy change etc.
Understanding recency: Understanding what disclosure has been updated or added to filings is not the same as understanding when the disclosed event happened. Our models now understand that an SEC investigation that happened this year is different from new disclosure about an SEC investigation that took place in 2015.
Tracking and combating topic drift. Our models have, at times, had trouble adapting to new disclosure trends, especially if changes happen rapidly. For instance,the influx of new SPAC-specific boilerplate challenged our models.
How our models have changed:
Our models now better understand recency, red flag importance within topic categories and relative importance across topics. We’re also now able to track and address changes in SEC language over time through better continuous learning processes.
What this means for users:
Over the next month we will be updating our Risk Scores, red flags and their relative ranking on the company page. This update will be retrospective. Paid customers will be able to request prior versions of Risk Scores.
In the future, textual flags will be segmented and colour coded as red, orange and yellow flags. This improves the understandability of Risk Scores.
These updates will also facilitate red flag feeds and push notifications. You will be able to sign up to receive email notifications for the highest impact flags by category.
Finally, we are rolling out an offering for quantitative investors to take advantage of the strong performance of these updates. To learn more or sign up for quant-focused product updates, send us a note at firstname.lastname@example.org
A special thank you:
A special thank you to our Canadian ML development team. Suhas Pai and Xiao Quan lost many hours of sleep solving these thorny problems. They are both rare talents and a credit to Hudson Labs.
Interested in experiencing our new and improved models? Sign up for a trial of our equity research portal here: hudson-labs.com/demo