Transforming Litigation with Predictive ML-Powered Case Management Solution : A Customer Success Story

A litigation case management company aimed to stand out by predicting case outcomes—a key market differentiator. While they explored innovative technologies, they still needed help to evolve from basic analytics to impactful machine learning solutions.  

In this article, we will guide you through the steps BlueCloud took to help this client build two machine learning models that predict the outcome of cases in the litigation space. Further, it will examine the performance of both models, showing how advanced case management solutions powered by ML can help companies assess and predict their chances of success.  

Challenge: Predicting Case Outcomes in Litigation Space

Consumers harmed by defective products or pharmaceuticals often seek legal counsel for liability claims. Law firms need advanced solutions to handle these complex cases and assess their likelihood of success. Recognizing the critical role of case management systems, the client aimed to develop a machine learning model that not only predicts case outcomes but also tracks prediction changes throughout the case lifecycle.

BlueCloud partnered with the client to design a cutting-edge case management solution, leveraging machine learning to predict case success rates. These solutions are critical as they help the client drive revenue growth and enhance customer outcomes.

Solution: Building Advanced Case Management Solution with Machine Learning

To predict case outcomes and prioritize high-success potential cases, BlueCloud built two machine learning models: the pre-intake model and the post-intake model.

Pre-Intake Model: Activated at the initial stages, this model analyzes early data—whether ingested via API or manually—and predicts outcomes, streamlining the process from the start.

Post-Intake Model: Applied after intake, this model refines predictions using additional case data. It supports efficient case management for tasks like document collection, quality control, and review, ensuring optimal resource allocation.

By analyzing historical data, both models identify patterns to predict case outcomes, enabling smarter decision-making and improved case prioritization.

Pre-Intake Model

The aim of the pre-intake model is to predict the outcome of the cases. The proven cases are considered as ‘success’ and the canceled and disproven ones are considered as ‘failure’. A classification model was used for this purpose.

Digging into Data  

The historical data, which included various types of internal and external data, helped train the pre-intake ML model. Data analysis included joining different tables with relevant data, cleaning the issues, and filtering out cases that may have a negative effect on the dataset.  

When conducting a deep data analysis we took the case status (outcome), the case category and case distribution across companies into consideration to understand the data and determine the important patterns.  

Data and Feature Engineering

Model performance depends heavily on data quality, making feature engineering essential for selecting relevant aspects of raw data based on the predictive task and model type. As described above, the pre-intake model will be used to predict the outcome using some features right after ingesting the cases into the system. We built a feature set and custom features that could help the model solve the pattern between the label and the features.

Creating Dataset  

After analyzing the data and engineering the features, our next step was to build the final dataset that was clean, consistent, and representative of the overall process. Data cleaning steps included grouping rare examples into an "other" category, implementing filters, and removing outliers.

Data Modelling  

After preparing the dataset, the next step was building the model pipeline. This involved converting all the data to a numerical format, as ML models require. The ML pipeline includes two main components: preprocessing and classification algorithm. The preprocessing step handles numerical, categorical, and binary data. Numerical features are processed with an imputer to fill missing values and a scaler to standardize them. Categorical data is imputed and then converted into a one-hot encoded format, while binary data remains unchanged. The preprocessed data is then fed into a classification algorithm.

For our model, we tested various classification algorithms and found that tree-based models like Random Forest, XGBoost, and LightGBM performed best, as they excel in datasets with conditional relationships. These models split the data into smaller regions based on similar features, allowing them to capture patterns effectively.

To validate the model, we used K-fold cross-validation to identify the best algorithm and parameter combination.  

Building Machine Learning (ML) Pipeline  

We built a sophisticated data pipeline within the Snowflake environment to optimize the pre-intake model training process. This pipeline integrates data from four different tables and incorporates specialized features to create the final dataset for model training. To automate the workflow, we implemented two core Snowflake Tasks: one for training the model and another for scoring the pre-intake data. After the execution of these tasks, all relevant artifacts, training data, and model metadata are securely stored in Snowflake.  

Our prediction pipeline is designed to utilize the active model artifacts and generate a comprehensive data frame based on the most current data. It selectively generates scores for new or unscored cases, optimizing resource usage and ensuring timely predictions. The pipeline efficiently updates prediction tables, ensuring up-to-date insights.

To further streamline development and deployment, we integrated a CI/CD workflow, which enhances operational efficiency, scalability, and accuracy for model training and data-driven decision-making.

Post-Intake Model

The post-intake model predicts case outcomes after the intake stage, using all available data up to the prediction point, such as case details, intake results, number of calls, and documents obtained.

Digging into Data

The model is trained on historical data, which includes past status changes in the database. State-based features and statistical insights from past states are also included to enrich the dataset. The model’s goal is to classify cases as either 'success' or 'failure,' where successful cases continue, and canceled or disproven cases are categorized as failures.

Feature Engineering  

The key difference between the post-intake and pre-intake models is that the post-intake model can predict case outcomes at any state, not just before the intake. The dataset is created with available data at each state, incorporating as many features as possible to provide the model with rich information. For instance, intake-related features like longer-than-expected intake times can indicate a higher likelihood of case failure.

In addition to the intake data, we also included the features from earlier states, such as the number of documents obtained, or time spent in each state to enhance prediction accuracy.

Creating Dataset  

The post-intake model predicts case outcomes at any state, requiring a state-wise dataset. To achieve this, we created a table to track state changes and create summary data for each state. Key information such as phone calls, messages, documents, and events is summarized for each state. This summary is then merged with case data from various tables, including campaigns, claimants, and intake data.

Data Modelling  

We applied the same steps and processes in the post-intake model that we used in the pre-intake model. When it comes to validation approach, we used custom cross-validation to handle cases with multiple rows in the dataset. This approach is designed to prevent the overrepresentation of cases and ensure a fair evaluation across different cases. To address imbalance and evaluate metrics effectively, the post-intake model achieved an impressive F1-score of 86%. This metric is particularly valuable for handling imbalanced data, offering a more comprehensive assessment than accuracy alone.

Building ML Pipeline  

The post-intake model is a significant advancement in our data-driven initiatives, enhancing decision-making through a robust training and deployment pipeline. This model, like the pre-intake version, is built on a comprehensive dataset that integrates diverse data sources and features, all processed and managed in Snowflake.

Key data management components include:

Post-intake training: Stores recent training data for model refinement.

Post-intake test: Holds the latest test data for model evaluation.

Model metadata: Captures essential metadata for model governance.

Post-intake scores model: Records scores for training and test datasets.

The prediction pipeline leverages active model artifacts and preprocessing objects to create detailed data frames for scoring new inputs. Additionally, we have established tables such as POST_INTAKE_LATEST_DATA and POST_INTAKE_SCORES to manage and record post-prediction results.

The model scores only new or relevant data, optimizing computational resources

Tying it All Together with UI

We built the user interface with three main components: data operations/helpers, main page content, and sidebar content. The UI manages data efficiently without redundant queries.

The main page functions as the landing page, featuring data getters and displaying querying progress.  

The Pre-intake Page offers a range of filters, that update the main page content dynamically. It includes sections for Data, Exploratory Analyses, Model Information, and Model Data, each providing insights into prediction results, model performance, and dataset composition.

The post-intake page enhances the pre-intake features with new functionalities designed for the post-intake phase. It includes an additional sidebar filter for viewing data and plots on case status breakdowns, crucial for the multiple statuses encountered after intake. The Exploratory Analyses section now features a data table displaying predictions and scores for each state in a case's history. This addition allows users to track how the model’s predictions change with different statuses, offering a comprehensive view of each case's progress.

Turning data chaos into actionable insights with Snowflake

BlueCloud’s expertise in Snowflake and Data Cloud was pivotal in helping the client unlock insights for advanced case management, smarter decisions, and improved litigation outcomes.

Impact: Data-Driven Decision Making with ML

This advanced case management solution enables the client to prioritize areas with the highest likelihood of success and review areas of improvement within cases that had a lower likelihood to improve success rates.  

Finally, by conducting a cost-benefit analysis and recommending Snowflake over AWS for its superior cost-effectiveness and functionality, BlueCloud helped the client significantly save time, and reduce costs.  

"The machine learning models we have built do not only track cases—they also predict their outcomes allowing law organizations to focus on winning strategies early in the process and shift decision-making from gut instincts to data-driven insights. This has the potential to revolutionize litigation, saving time, money, and safeguarding people’s rights."

Gopal Muppala
Senior Business Analyst
,
BlueCloud

KPI's