Mark Gerstein

Orcid: 0000-0002-9746-3719

Affiliations:
  • Yale University


According to our database1, Mark Gerstein authored at least 137 papers between 1994 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
A Survey of Generative AI for De Novo Drug Design: New Frontiers in Molecule and Protein Generation.
CoRR, 2024

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science.
CoRR, 2024

2023
Binding peptide generation for MHC Class I proteins with deep reinforcement learning.
Bioinform., February, 2023

FAVOR: functional annotation of variants online resource and annotator for variation across the human genome.
Nucleic Acids Res., January, 2023

GENCODE: reference annotation for the human and mouse genomes in 2023.
Nucleic Acids Res., January, 2023

Insights from incorporating quantum computing into drug design workflows.
Bioinform., January, 2023

Constructing a full, multiple-layer interactome for SARS-CoV-2 in the context of lung disease: Linking the virus with human genes and microbes.
PLoS Comput. Biol., 2023

Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents.
CoRR, 2023

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning.
CoRR, 2023

ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks.
CoRR, 2023

Investigating Data Contamination in Modern Benchmarks for Large Language Models.
CoRR, 2023

Improved prediction of ligand-protein binding affinities by meta-modeling.
CoRR, 2023

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
CoRR, 2023

BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge.
CoRR, 2023

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs.
CoRR, 2023

Disentangled Wasserstein Autoencoder for T-Cell Receptor Engineering.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

GersteinLab at MEDIQA-Chat 2023: Clinical Note Summarization from Doctor-Patient Conversations through Fine-tuning and In-context Learning.
Proceedings of the 5th Clinical Natural Language Processing Workshop, 2023

Aligning Factual Consistency for Clinical Studies Summarization through Reinforcement Learning.
Proceedings of the 5th Clinical Natural Language Processing Workshop, 2023

2022
Venus: An efficient virus infection detection and fusion site discovery method using single-cell and bulk RNA-seq data.
PLoS Comput. Biol., October, 2022

Scalable privacy-preserving cancer type prediction with homomorphic encryption.
CoRR, 2022

Higher-Order Generalization Bounds: Learning Deep Probabilistic Programs via PAC-Bayes Objectives.
CoRR, 2022

Privacy-preserving Model Training for Disease Prediction Using Federated Learning with Differential Privacy.
Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022

2021
Bayesian structural time series for biomedical sensor data: A flexible modeling framework for evaluating interventions.
PLoS Comput. Biol., 2021

GENCODE 2021.
Nucleic Acids Res., 2021

Forest Fire Clustering: Cluster-oriented Label Propagation Clustering and Monte Carlo Verification Inspired by Forest Fire Dynamics.
CoRR, 2021

Gene Tracer: a smart, interactive, voice-controlled Alexa skill For gene information retrieval and browsing, mutation annotation and network visualization.
Bioinform., 2021

FANCY: fast estimation of privacy risk in functional genomics data.
Bioinform., 2021

DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays.
Bioinform., 2021

Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption.
IEEE Access, 2021

2020
Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks.
PLoS Comput. Biol., 2020

Epigenome-based splicing prediction using a recurrent neural network.
PLoS Comput. Biol., 2020

Analyses of non-coding somatic drivers in 2,658 cancer whole genomes.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Nat., 2020

NIMBus: a negative binomial regression based Integrative Method for mutation Burden Analysis.
BMC Bioinform., 2020

DiNeR: a Differential graphical model for analysis of co-regulation Network Rewiring.
BMC Bioinform., 2020

The corrected gene proximity map for analyzing the 3D genome organization using Hi-C data.
BMC Bioinform., 2020

Origins and characterization of variants shared between databases of somatic and germline human mutations.
BMC Bioinform., 2020

Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients.
BMC Bioinform., 2020

TopicNet: a framework for measuring transcriptional regulatory network change.
Bioinform., 2020

Shaping the nebulous enhancer in the era of high-throughput assays and genome editing.
Briefings Bioinform., 2020

2019
TeXP: Deconvolving the effects of pervasive and autonomous transcription of transposable elements.
PLoS Comput. Biol., 2019

GENCODE reference annotation for the human and mouse genomes.
Nucleic Acids Res., 2019

2018
Multiple-Swarm Ensembles: Improving the Predictive Power and Robustness of Predictive Models and Its Use in Computational Biology.
IEEE ACM Trans. Comput. Biol. Bioinform., 2018

Rank Projection Trees for Multilevel Neural Network Interpretation.
CoRR, 2018

MOAT: efficient detection of highly mutated regions with the Mutations Overburdening Annotations Tool.
Bioinform., 2018

Novel approaches for bioinformatic analysis of salivary RNA sequencing data for development.
Bioinform., 2018

2017
Landscape and variation of novel retroduplications in 26 human populations.
PLoS Comput. Biol., 2017

MrTADFinder: A network modularity based approach to identify topologically associating domains in multiple resolutions.
PLoS Comput. Biol., 2017

HiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps.
Bioinform., 2017

2016
DREISS: Using State-Space Models to Infer the Dynamics of Gene Expression Driven by External and Internal Regulatory Networks.
PLoS Comput. Biol., 2016

Extending gene ontology in the context of extracellular RNA and vesicle communication.
J. Biomed. Semant., 2016

2015
Loregic: A Method to Characterize the Cooperative Logic of Regulatory Factors.
PLoS Comput. Biol., 2015

VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications.
Bioinform., 2015

MetaSV: an accurate and integrative structural-variant caller for next generation sequencing.
Bioinform., 2015

High-order neural networks and kernel methods for peptide-MHC binding prediction.
Bioinform., 2015

2014
Comparative analysis of regulatory information and circuits across distant species Open.
Nat., 2014

Interpretable Sparse High-Order Boltzmann Machines.
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, 2014

2013
Interpretation of Genomic Variants Using a Unified Biological Network Approach.
PLoS Comput. Biol., 2013

Identification of yeast cell cycle regulated genes based on genomic features.
BMC Syst. Biol., 2013

Comparative network analysis of gene co-expression networks reveals the conserved and species-specific functions of cell-wall related genes between Arabidopsis and Poplar.
Proceedings of the ACM Conference on Bioinformatics, 2013

2012
VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment.
Bioinform., 2012

2011
Measuring the Evolutionary Rewiring of Biological Networks.
PLoS Comput. Biol., 2011

Genomics and Privacy: Implications of the New Reality of Closed Data for the Field.
PLoS Comput. Biol., 2011

Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data.
PLoS Comput. Biol., 2011

Tiling array data analysis: a multiscale approach using wavelets.
BMC Bioinform., 2011

Predicting protein ligand binding motions with the Conformation Explorer.
BMC Bioinform., 2011

ACT: aggregation and correlation toolbox for analyses of genome tracks.
Bioinform., 2011

RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries.
Bioinform., 2011

TIP: A probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles.
Bioinform., 2011

AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision.
Bioinform., 2011

2010
Network Modeling Identifies Molecular Functions Targeted by miR-204 to Suppress Head and Neck Tumor Metastasis.
PLoS Comput. Biol., 2010

Getting Started in Gene Orthology and Functional Analysis.
PLoS Comput. Biol., 2010

Analysis of Combinatorial Regulation: Scaling of Partnerships between Regulators with the Number of Governed Targets.
PLoS Comput. Biol., 2010

3V: cavity, channel and cleft volume calculator and extractor.
Nucleic Acids Res., 2010

Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model.
BMC Bioinform., 2010

MOTIPS: Automated Motif Analysis for Predicting Targets of Modular Protein Domains.
BMC Bioinform., 2010

Genome-wide sequence-based prediction of peripheral proteins using a novel semi-supervised learning technique.
BMC Bioinform., 2010

Using semantic web rules to reason on an ontology of pseudogenes.
Bioinform., 2010

Human Genome Annotation.
Proceedings of the Bioinformatics Research and Applications, 6th International Symposium, 2010

Hierarchical analysis of regulatory networks and cross-disciplinary comparison with the Linux call graph.
Proceedings of the 2010 IEEE International Workshop on Genomic Signal Processing and Statistics, 2010

Dynamic and static analysis of transcriptional regulatory networks in a hierarchical context.
Proceedings of the 2010 IEEE International Workshop on Genomic Signal Processing and Statistics, 2010

Analysis of molecular networks.
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, 2010

2009
Integrated Assessment of Genomic Correlates of Protein Evolutionary Rate.
PLoS Comput. Biol., 2009

Getting Started in Text Mining: Part Two.
PLoS Comput. Biol., 2009

Small RNAs Originated from Pseudogenes: <i>cis</i>- or <i>trans</i>-Acting?
PLoS Comput. Biol., 2009

Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants.
PLoS Comput. Biol., 2009

Pseudofam: the pseudogene families database.
Nucleic Acids Res., 2009

Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels.
BMC Bioinform., 2009

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions.
Bioinform., 2009

2008
Modeling ChIP Sequencing In Silico with Applications.
PLoS Comput. Biol., 2008

Open Access: Taking Full Advantage of the Content.
PLoS Comput. Biol., 2008

An integrated system for studying residue coevolution in proteins.
Bioinform., 2008

2007
The Importance of Bottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics.
PLoS Comput. Biol., 2007

RNAi Development.
PLoS Comput. Biol., 2007

Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation.
Nucleic Acids Res., 2007

An interdepartmental Ph.D. program in computational biology and bioinformatics: The Yale perspective.
J. Biomed. Informatics, 2007

PARE: A tool for comparing protein abundance and mRNA expression data.
BMC Bioinform., 2007

LinkHub: a Semantic Web system that facilitates cross-database queries and information retrieval in proteomics.
BMC Bioinform., 2007

Publishing perishing? Towards tomorrow's information architecture.
BMC Bioinform., 2007

An efficient pseudomedian filter for tiling microrrays.
BMC Bioinform., 2007

Hinge Atlas: relating protein sequence to sites of structural flexibility.
BMC Bioinform., 2007

FlexOracle: predicting flexible hinges by identification of stable domains.
BMC Bioinform., 2007

Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications.
Bioinform., 2007

The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks.
Bioinform., 2007

Leveraging the structure of the Semantic Web to enhance information retrieval for proteomics.
Bioinform., 2007

Assessing the need for sequence-based normalization in tiling microarray experiments.
Bioinform., 2007

2006
An Integrative Genomic Approach to Uncover Molecular Mechanisms of Prokaryotic Traits.
PLoS Comput. Biol., 2006

The Database of Macromolecular Motions: new features added at the decade mark.
Nucleic Acids Res., 2006

PseudoPipe: an automated pseudogene identification pipeline.
Bioinform., 2006

Predicting interactions in protein networks by completing defective cliques.
Bioinform., 2006

A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge.
Bioinform., 2006

Helix Interaction Tool (HIT): a web-based tool for analysis of helix-helix interactions in proteins.
Bioinform., 2006

Design Issues in Implementing a Portable Sample Tracking and Analysis Research Support (STARS) System for PCR Based Microarray Research.
Proceedings of the 40th Annual Conference on Information Sciences and Systems, 2006

2005
Editorial.
Nucleic Acids Res., 2005

Case Report: A High Productivity/Low Maintenance Approach to High-performance Computation for Biomedicine: Four Case Studies.
J. Am. Medical Informatics Assoc., 2005

Analysis of Genomic Tiling Microarrays for Transcript Mapping and the Identification of Transcription Factor Binding Sites.
Proceedings of the Advances in Bioinformatics and Computational Biology, 2005

YeastHub: a semantic web use case for integrating data in the life sciences domain.
Proceedings of the Proceedings Thirteenth International Conference on Intelligent Systems for Molecular Biology 2005, 2005

Protein Interaction Prediction by Integrating Genomic Features and Protein Interaction Network Analysis.
Proceedings of the Data Analysis and Visualization in Genomics and Proteomics, 2005

2004
Fast Optimal Genome Tiling with Applications to Microarray Design and Homology Search.
J. Comput. Biol., 2004

Information assessment on predicting protein-protein interactions.
BMC Bioinform., 2004

Using 3D Hidden Markov Models that explicitly represent spatial coordinates to model and compare protein structures.
BMC Bioinform., 2004

A XML-Based Approach to Integrating Heterogeneous Yeast Genome Data.
Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Scienes, 2004

2003
ExpressYourself: a modular platform for processing and visualizing microarray data
Nucleic Acids Res., 2003

MolMovDB: analysis and visualization of conformational change and structural flexibility.
Nucleic Acids Res., 2003

Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data.
Bioinform., 2003

Computational Proteomics: Genome-scale Analysis of Protein Structure, Function, & Evolution(Invited Talk).
Proceedings of the German Conference on Bioinformatics, 2003

2002
Toward a systematic definition of protein function that scales to the genome level: defining function in terms of interactions.
Proc. IEEE, 2002

Calculations of protein volumes: sensitivity analysis and parameter database.
Bioinform., 2002

Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts.
Bioinform., 2002


2001
SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.
Nucleic Acids Res., 2001

Determining the minimum number of types necessary to represent the sizes of protein atoms.
Bioinform., 2001

An XML Application For Genomic Data Interoperation.
Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering, 2001

1998
Measurement of the effectiveness of transitive sequence comparison, through a third 'intermediate' sequence.
Bioinform., 1998

[Invited Lecture] A Structural Census of Genomes: Comparing Bacterial, Eukaryotic, and Archaea Genomes in Terms of Protein Structure.
Proceedings of the German Conference on Bioinformatics, 1998

1996
Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures.
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology, 1996

1995
Using a measure of structural variation to define a core for the globins.
Comput. Appl. Biosci., 1995

1994
Finding an Average Core Structure: Application to the Globins.
Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 1994


  Loading...