Abstract:
Artificial Intelligence (AI) has appeared as a life-changing innovation in recent years transforming the conventional
problem-solving strategies adopted so far. Machine Learning (ML) and Deep Learning (DL) approaches are making a monumental
impact in the fields of life sciences and health care. The tremendous amount of biochemical data has set off leading-edge research
in health care and Drug Discovery. Molecular Machine Learning has precisely adopted ML techniques to uncover new insights from
biochemical data. Biochemical data-sets essentially hold text-based sequential information about molecules in several forms. Simplified
Molecular Input Line Entry System (SMILES) is a highly efficient format for representing biochemical data that can be suitably utilized
for countless relevant applications. This work presents the SMILES molecular representation in a nutshell and is centered on the major
applications of ML and DL in health care especially in the drug discovery process using SMILES. This work further utilizes a Sequence
to Sequence architecture built on Recurrent Neural Networks (RNNs) for generating small drug-like molecules using the benchmark
data sets. The experimental results prove that the Long Short Term Memory (LSTM) based RNNs can be trained to encode the raw
SMILES strings with nearly perfect accuracy and to generate similar molecular structures with minimal or no feature engineering. The
gradient-based optimization strategy is applied to the network and found distinctly suited to assemble the most stable and proficient
sequence model. RNNs can thus be employed in Drug Discovery activities like similarity-based virtual screening, lead compound
finding, and hit to lead optimization.