Mithun P. Acharya
Senior Manager of AI / Principal Data Scientist
LexisNexis
Raleigh, NC USA
acharya DOT mithun AT gmail
LinkedIn | Scholar | Patents
I am a Senior Manager of AI and a Principal Data Scientist at LexisNexis (a legal AI leader). My team works in the areas of Natural Language Processing (NLP) and Generative AI with Large Language Models (LLMs). Select products: Lexis+ AI (video; the most powerful Generative AI solution for legal professionals; press release 1, 2, 3), Lexis Connect (AI for corporate legal; press release), Lexis Search Advantage (AI for law firms). To power these products, we train, finetune, and deploy several transformer-based small and large language models in production; both closed and open source.
Before Lexis, as a Manager and Lead Principal Scientist, I led the Artificial Intelligence Group for the Carol Data and AI platform at TOTVS Labs USA (a “startup” lab focusing on Computer Vision, NLP, and ERP/Tabular Analytics). My team was responsible for the innovation as well as the implementation of features for products such as Carol ClockIn (facial recognition for employee clockin), Carol Assistant (AI for conversational experience), and Deep Audit (insurance claims management with neural attention models). These products have a market leading presence in the Latin Americas.
Prior to TOTVS, I was a Machine Learning Tech Lead and Lead Principal Scientist at ABB (AI for power, robotics, and automation sectors). My innovations, projects, and technical leadership significantly contributed to and influenced the formation of ABB Ability.
My areas of work and interest in AI include NLP, Large Language Models (LLMs), Reinforcement Learning, and Computer Vision
Selected Works
(For a full list, see Google Scholar and Patents)
Machine Learning and Data Mining
Publications related to Lexis+ AI (press release 1, 2): On Conversational Wrappers for LexisNexis Products with Generative AI. RELX Generative AI Summit
Publications related to Lexis Connect (press release): a) Lexis Connect: Intelligently unifying workflow and research. Transforming how counsel accomplishes work. LexTech b) Intelligent recommendations for legal matter management c) Bots with Semantic Search and Language Understanding. Machine Learning and AI in Search, RELX Search Summit.
Considering neighborhood structural similarity in Non-negative Matrix Factorization (NMF) makes NMF well suited for timeseries anomaly detection. Journal of Machine Learning Research (JMLR).
Deep Audit: Neural attention models on tabular and relational data for insurance claims decision automation. US patent pending. US2022/0156573 A1
ClockIn: Personnel time clock in with computer vision (facial verification and recognition). US Patent 11, 763, 605.
Computer vision for managing safety at industrial sites (on smartphones, video cameras, and edge devices). US Patent 10,573,147.
a) Real-time AI powered by edge-deployed digital twins (Edge computing and analytics in synergy with the cloud). b) Managing solar asset performance with connected analytics. ABB Review. Digital Twins and Simulation Edition.
Technologies for decentralized fleet analytics (How can each customer learn and benefit from the wisdom of the entire fleet of customers data without any customer sharing data with the central cloud to which all customers are connected?). US, etc. patent pending.
Cyber-attack detection with machine learning for networked electrical power system devices. US Patent 12,069,088.
Location-aware analytics for industrial sites. US Patent 10,520,927.
Technologies for solar power system performance model tuning with machine learning. US Patent 11,949,239.
Code Drones (On intelligent and socially active software artifacts that guide their own self-improvement; AI Bots for Software Engineering). See paper here. Visions Track at International Conference on Software Engineering (ICSE Visions). Best paper, runner up.
Technologies for optimizing power grids through decentralized forecasting. US Patent 11, 909, 215.
Industrial equipment installation (seamless information model updates for parts replacement; spare parts and inventory management). US Patent 10,331,119.
Technologies for producing training data (using techniques such as Generative Adversarial Networks (GANs)) for identifying degradation of physical components. US/EP patents pending.
Systems and methods for identifying anomalous events for electrical systems. WO2020010291A1
Machine Learning based real-time intrusion detection using processor execution timing information on embedded systems. Workshop at Real-time Systems Symposium (RTSS Workshop).
Data mining and graph analytics techniques for industrial alarm management. A series of patents and filings resulted from this work: US Patent 10,523,495, EP Patent 3 187 950, CN 111 656 418, WO/2016/141007
Diagnosis method and apparatus (analyzing logs from one or more robots for failure root cause clustered in time). US Patent 11,945,116.
Mining API Specifications from source code for improving software reliability. PhD dissertation.
Machine learning for performance monitoring of services. Automated Software Engineering (ASE).
Static API Specification Mining: Exploiting source code model checking. Book Chapter. Mining Software Specifications: Methodologies and Applications.
Mining API error-handling specifications from source code. Fundamental Approaches to Software Engineering (FASE).
Mining API patterns as partial orders from source code: from usage scenarios to specifications. Foundations of Software Engineering (FSE).
Other Areas
Imp: A change impact analysis tool (Visual Studio plugin) for C/C++ that integrates with version control and build systems. Foundations of Software Engineering Tool Demo (FSE Tools)
Oracle-based regression test selection. International Conference on Software Testing (ICST).
Configuration selection using code change impact analysis for regression testing. International Conference on Software Maintenance (ICSM).
Impact analysis of configuration changes for test case selection. International Symposium on Software Reliability Engineering (ISSRE).
Practical change impact analysis based on static program slicing for industrial software systems. Industry Track at International Conference on Software Engineering (ICSE Practice).
Intelligent jamming in wireless networks with applications to 802.11b and other networks. Military Communications Conference (MILCOM). Nominee, Fred W. Ellersick best paper award.
Method for distributing keys for encrypted data transmission in wireless sensor networks. US Patent 7,702,905.
Secure comparison of encrypted data in wireless sensor networks. Wireless Modeling and Optimization Symposium (WiOpt).
Concealed data aggregation for reverse multicast traffic in sensor networks: Encryption, key distribution, and routing adaptation. IEEE Transactions on Mobile Computing. Featured article