ENCODE is an NIH funded consortium of research labs producing large quantities of data and others using computational analysis to discover new functional and regulatory elements in the human genome. The Data Coordination Center (DCC) is the nucleus of the ENCODE project and is therefore essential for the success of the greater project. The DCC moved to Stanford in 2012 with the beginning of the third phase of the project. The power of these human genomic analyses cannot be realized until they have been integrated into standardized and reproducible collections of observations made available with defined metadata and robust tools.
We focus on the application of meticulous curation skills of the biomedical literature to the validation of very large high-quality human genome datasets from the ENCODE project. Our research is to define and apply rigorous methods to create the resources and to distribute this information for the discovery of new knowledge. All these datasets are integrated via their metadata into a system developed and distributed via a public web resource the ENCODE Portal developed by my group. The application of this resource requires development of software methods that are useful to all biomedical researchers from computational biologist to clinical researchers. Our challenge is to develop tools that provide all users access to these data with complete metadata and at the resolution desired. The result of our work will be a truly useful human online research resource that will meet the user needs for human single gene studies and provide a summarization of many types of information such as regulatory elements and sequence variation.
Funding is provided from the US National Institutes of Health, National Human Genome Research Institute via grant U24 HG009397.