Companies cannot reliably predict which patents are likely to be asserted against them. If they could, they would be better able to quantify and mitigate their own patent infringement risk. We used machine learning methods, informed by legal scholars’ understanding of relevant patent traits, to improve on prior attempts to predict litigation. We built primarily on Colleen Chien’s Predicting Patent Litigation. Chien used traits from a patent’s legal history and developed a method of prediction based on the traits acquired before litigation, but not after. She demonstrated that the traits acquired before litigation are useful predictors. Evaluating Chien’s approach, we determined that her logistic regression model was generalizable—that is, not overfit to her training sample—though it does not perform as well on real datasets as her matched-pairs evaluation suggested. We found that year-over-year changes in patenting and litigation will hinder real-world prediction with this approach, but only modestly. Building a much larger dataset of newer patents, and selecting machine learning models tailored to the task, we improved on Chien’s results. Our random forest model had a 7.8% greater area under the precision-recall curve, and it could allow a company to narrow its patent clearance search to a set of patents up to 34% smaller, compared to Chien’s logistic regression approach. We report our results on a random sample of patents using standardized metrics, providing a baseline for future work predicting patent litigation.

Included in

Law Commons