Accelerating Drug Discovery with High Content Screening
The drug discovery and development pathway is notoriously long and costly, with many compounds failing to progress to clinical stages—or, even worse, failing during clinical trials. High Content Screening (HCS) enables researchers to identify promising drug candidates and weed out those likely to fail early on. But what exactly is High Content Screening, and why is it so effective? And, how can you leverage it for your research?
What is High Content Screening?
High Content Screening (HCS) integrates automated microscopy with advanced image analysis to provide detailed, quantitative data on cellular phenotypes. Cells are cultured and exposed to drugs or other perturbations. Automated microscopy is then used to acquire high-resolution images, which are analyzed using image analysis software, such as the open-source CellProfiler™. Images undergo segmentation, isolating individual cells and subcellular organelles.
These isolated structures are then subjected to hundreds or thousands of measurements of their size, shape, granularity and more; a process known as feature extraction. This wealth of information on a cell’s phenotype—the “content” in High Content Screening—is what makes HCS such a powerful tool in drug discovery.
HCI, HCA, or HCS—What's the difference?
Although commonly interchanged, High Content Screening (HCS), High Content Imaging (HCI), and High Content Analysis (HCA) refer to distinct concepts:
- High Content Imaging (HCI): This refers to the imaging technologies used to capture high-resolution cellular images.
- High Content Analysis (HCA): This involves the processing of these images to extract and analyze meaningful data. HCA and HCI can both pertain to low and high-throughput experiments.
- High Content Screening (HCS): This more generally refers to experiments that combine high content imaging (HCI) and analysis (HCA) with high-throughput approaches.
Morphology Encodes Biology
Collectively, this set of morphological features constitutes a cell’s phenotypic signature or phenotypic profile. As cellular morphology is closely linked to cellular physiology and function, this can offer valuable biological insights. For instance, it can identify potential toxicity, or elucidate a compound’s mechanism of action, as compounds that induce similar phenotypes are likely to share similar modes of action.
In contrast to traditional target-based approaches, the phenotypic approach provides a more unbiased and holistic assessment of a cell's entire phenotypic response, independent of pre-existing knowledge about known pathways and processes. This approach's significance is underscored by the fact that around 21% of new molecular entities approved by the FDA from 1989 to 2000 either had unknown targets or were believed to function through mechanisms unrelated to specific molecular targets (1).
Challenges in High Content Analysis
However, despite its promise, leveraging HCS effectively poses significant challenges. Young et al. (2) highlight two primary challenges that hinder the full utilization of HCS data. Firstly, the lack of biological meaning in most features makes these datasets difficult to interpret. While the meaning of some features, such as DNA content per nucleus, is clear, the biological relevance of others, such as texture features, is not. Secondly, managing and processing the sheer volume of data generated by these experiments demands significant IT infrastructure, or specialized cloud-based solutions. To subsequently extract meaningful information from these datasets, sophisticated algorithms and data analysis techniques are needed.
Best-Practices High-Content Data Mining
To refine and interpret the massive, multiparametric datasets, a couple of important steps must be followed. The first step, feature selection, is where irrelevant and redundant features are identified and eliminated. Quality control metrics and visual inspection of raw data are then used to ensure data quality, before proceeding to data normalization, transformation, and scaling.
The next vital step is dimensionality reduction, where highly multidimensional data is compressed into a lower-dimensional space. Common techniques such as Principal Component Analysis (PCA), Common Factor Analysis (CFA), and Uniform Manifold Approximation and Projection (UMAP) can be employed for this purpose, reducing the dataset to a few comprehensive scores that capture the majority of variance. These scores already facilitate initial explorations into phenotypic diversity and distribution.
Subsequently, researchers can perform hit selection: reduced data can be used to calculate phenotypic distance scores, allowing for example the identification of compounds that produce phenotypes similar or dissimilar to that of a known compound of interest, Alternatively, Machine Learning can be used to predict the likelihood of certain compounds belonging to specific reference classes. Finally, clustering algorithms can group similar data points, revealing patterns and relationships within the dataset, which can provide further insights into mechanisms of action or toxicity profiles.
Analyze Your Own High Content Data with StratoMineR
Too often, the power of this data is limited by the need for advanced coding skills or reliance on data scientists. StratoMineR, part of StratoInsight, walks biologists through a best-practices workflow for multiparametric data mining. The guided workflow allows them to perform advanced data analyses such as feature selection, dimensionality reduction, and clustering with ease. Interactive visualizations enable a deeper exploration of the data, making sophisticated analysis accessible to all.
Curious to see it in action? Request a demo today.
References
1: Overington JP, Al-Lazikani B, Hopkins AL. How many drug targets are there? Nat Rev Drug Discov. 2006 Dec;5(12):993-6. doi: 10.1038/nrd2199. PMID: 17139284. https://pubmed.ncbi.nlm.nih.gov/17139284/
2: Young DW, Bender A, Hoyt J, McWhinnie E, Chirn GW, Tao CY, Tallarico JA, Labow M, Jenkins JL, Mitchison TJ, Feng Y. Integrating high-content screening and ligand-target prediction to identify mechanism of action. Nat Chem Biol. 2008 Jan;4(1):59-68. doi: 10.1038/nchembio.2007.53. Epub 2007 Dec 9. PMID: 18066055. https://pubmed.ncbi.nlm.nih.gov/18066055/