Google Summer of Code - Ideas List (2016)

Share Last Updated: January 27, 2017

NOTE: Google has not yet selected the 2017 organizations. However, we will be applying to become a GSOC 2017 Org.

Welcome to GSoC 2016. We have been accepted as an organization, and we look forward to working with you this summer.  This year we will be using Slack as the primary medium of communication. Each idea on this page has a mentor assigned. You can find their email addresses on this page. Contact your mentor with the name of the project you are interested in, and they will invite you to our Slack team  ( All communication will then happen on Slack. Discuss the project on Slack, and once you are ready to submit your application, use the template below. You must submit your application directly to our organization using the GSOC Program Site.
1. Building Survival Models from Genomic Data with Google TensorFlow.
Mentor: Lee Cooper (Lee dot Cooper at Emory dot edu)
Overview: Genomic medicine generates high-dimensional descriptions of disease containing thousands of features, yet relatively few features are used in clinical practice to predict patient outcomes. We have developed layered neural network models to predict the survival of cancer patients from genomic profiling. This project will work to translate these basic prototypes into the Google TensorFlow framework where different models can be explored more effectively. The student on this project will work to translate the code from Theano to TensorFlow, and to develop interfaces that enable users to train models, identify the best models, and generate interpretations of these models with bioinformatics tools.
2. Tensor Factorization for Accelerating Convolutional Neural Networks (CNNs) in Google TensorFlow
Mentor: Lee Cooper (Lee dot Cooper at Emory dot edu)
Overview: Several studies have shown that the computation time for complex CNNs can be dramatically reduced using factorization techniques, with little impact on prediction accuracy. This project will work on developing C/C++ code for TensorFlow that will enable users to factorize their networks to accelerate CNN prediction. We will focus on the tensor biclustering approach, and work to implement this in C/C++ for integration into TensorFlow. Students interested in this project should have a basic understanding of neural networks and matrix algebra, and be proficient in Python and C/C++ programming.
3. Information Visualizations using Datascope
Mentor: Ganesh Iyer (ganesh.iyer at
Overview: DataScope is an interactive dashboard system for doing exploratory analysis on large biomedical datasets.  Each dashboard is described through a set of 4 JSON files that describe the data sources, the data, the filters and the resulting display. Datascope comes with a few basic visualizations. The system is extendible to support additional data visualizations. The project would involve extending Datascope to support visualizations such as: Parallel coordinates, Choropleth maps, Chord diagrams etc.  Currently the authoring and viewing of the dashboard are 2 separate processes. The project would also involve providing users the ability to dynamically add/change visualizations on runtime.
Requirements: D3, Javascript, React.
4. Realtime Exploration and Visualization of Massive Multidimensional Data with DataScope Using Nanocubes
Mentor: Ganesh Iyer (ganesh.iyer at
Overview: Nanocubes are based on the OLAP data cubes with modest memory requirements. They can be loaded in main memory and perform computations on large datasets for interactive visualizations. DataScope is an interactive dashboard system for doing exploratory analysis on large biomedical datasets. It currently uses Crossfilter for performing filtering. We’ll explore using datacubes/nanocubes to scale Datascope to work with billion point datasets. The project would also involve extending Datascope to allow for adding/removing data sources at run time and to work with streaming data.
Requirements: Node.js, Python
5. Near Duplicate Detection in Medical Image Archives
Mentor: Pradeeban Kathiravelu ( pkathi2 at emory dot edu )
Overview: MediCurator is a proposed framework to find duplicate entries in multiple heterogeneous medical data sources, in a distributed manner. Medical image archives are huge and consist of structured and hierarchical data, which may be accessed by querying the metadata. MediCurator may be considered naturally a component of an efficient and dynamic ETL process for medical warehouse construction. More emphasis can be given to correctness than the performance. However, there will be a trade-off among these variables. Various distributed execution frameworks such as Hazelcast and Infinispan In-Memory Data Grids can be leveraged for the distributed execution of the duplicate detection. In addition to the duplicate detection across the various medical image archives, the second phase of this project looks into consolidating the data to an integrated data base or a data warehouse, eliminating the duplicates from the data sources (SQL and NoSQL databases as well as public medical image sources such as TCIA). Many similarity join algorithms have been proposed in research, which can be extended to operate in a distributed environment.
Programming Languages/Frameworks:  Java
Prerequisites: Java programming skills. Experience in databases and distributed computing.
6. Spatial Extensions to MongoDB
Mentor: Ashish Sharma ( ashish dot sharma at emory dot edu )
Overview: In one of our projects we use MongoDB to manage shapes extracted from very high resolution (50Kx50K) digital pathology images. The resulting data is massive — >1B shapes from 100K images. When viewing and exploring these images, one would like to exploit the MongoDB 2d geospatial indexing system (2dspherical can't be used because it is in spherical coordinates). However the 2d index is limited because Mongo insists that it be the first index in a compound index (2d + image metadata). In this project you will create a custom 2d index that is appended to all documents and will extend Mongo's Java driver so that spatial queries can exploit your new 2d index.
Programming Languages/Frameworks: MongoDB, Java
Prerequisites: Java programming skills, experience with Mongo , experience in computational geometry or spatial query processing will be useful but is not a prerequisite.
7. Medical Image Analysis on the Cloud
Mentor: Ravi Madduri (
Overview:  Our goal is to create docker containers for various image analysis algorithms. Docker compose scripts need to be developed to analyze images and provide results.  Work on integrating these tools with Galaxy project. We will be creating a trusted docker registry with docker images, help quantify the costs of running analysis on a commercial cloud.
Prerequisites: Docker, Python. a high level of comfort in tinkering with Unix. Prior experience with image analysis or Galaxy would be a bonus. 
8. Multi-resolution Analysis of Pathology Data with Convolutional Neural Networks (CNNs)
Mentor: Lee Cooper (Lee dot Cooper at Emory dot edu)
Overview: Pathology images contain visual patterns that are used to classify diseases and predict future prognosis. Evaluating these patterns requires effective integration of information across multiple resolutions. As part of Google Summer of Code, our lab has developed high-throughput software for analyzing pathology image data with convolutional networks for the classification of objects at a single resolution. This project will extend that work to integrate CNNs that operate at different resolutions to more effectively classify objects and recognize visual patterns that are predictive of patient survival. The student on this project will work with our team to become familiar with our current CNN pathology framework (, and to develop CNN architectures that integrate the outputs of two or more image resolutions. Students will work with Google TensorFlow to train these networks on GPU-equipped servers and to create software capable of handling hundreds of Gigabytes of data.
9. De-mystifying Medical Imaging Data
Mentor: William Bennett (
Overview: Structural information about various Information Object defined in a complex Medical Imaging Standard (DICOM) have been collected and analyzed into a Descriptive Data Structure which can be easily rendered into html.  These renderings can be modified by selecting a number of parameters in order to make interpretation of the standard more comprehensible in different contexts.  A prototype implementation of this application exists (as a web-based application), but is based on a server-side perl program, and doesn’t incorporate all of the desired rendering parameters, and has a primitive UI.  The proposed project is to do the following:
1. Do the rendering on the client side, in Javascript, based upon a JSON encoding of the data structures
2. Add some new rendering parameters
3. Upgrade the UI to modern standards
4. Package the application for distribution as a single zipped directory, with no need for a server to host the app.
Relevant languages/frameworks: Javascript
Skills: Rendering documents from complex data structures


Copyright © 2017 Emory University - All Rights Reserved | 201 Dowman Drive, Atlanta, Georgia 30322 USA 404.727.6123