BioRuby & Google Summer of Code 2011

Dear All,
our project, is looking for students to participate at GSoC 2011,
thanks to OBF and NESCENT
Please feel free to forward this message to your university-ml, lab or
local ruby group.

Use our ml to discuss ideas and feel free to contact the mentors or
any other member for the development team.
http://lists.open-bio.org/mailman/listinfo/bioruby

March 18-27: Would-be student participants discuss application ideas
with mentoring organizations.
March 28: Student application period opens.
April 8 19:00 UTC Student application deadline.

Our proposals
-links-
http://bioruby.open-bio.org/wiki/Google_Summer_of_Code#Proposal_2011
http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2011#BioRuby_forester

-text-
Proposal 2011

OBF
Support Next Generation Sequencing (NGS) in BioRuby

Rationale
The processing and analyzing of NGS data is challenging for a variety
of reasons, in particular due to the fact that the data-sets are
usually very large and contain a vast amount of information and a high
number of unknown data. Furthermore there are many different
approaches to perform NGS analyses and several software tools need to
be integrated to produce reliable results. Since this topic is so
important for the BioRuby community we started a sub-project
bioruby-ngs for analyzing NGS data. The project is in an early stage
of development but notable results have been quickly gained. Many
topics need to be still addressed, in particular:
data and results reporting
workflow management
DSL for describing experimental designs
YALIMS (Yet Another LIMS), a simple web based Lims for raw datasets
processing, with reporting and monitoring
Approach
Due to the open nature of the project the student will choose which
feature he/she wants to develop and to focus on. The student will
learn basic concept of NGS data analysis and will work tightly with a
mentor to produce a working library that will be integrated into the
BioRuby NGS project.
Difficulty and needed skills
Medium to Hard depending on the topic selected.
The project requires
Ruby
Bash programming and knowledge of the Linux environment
Ruby on Rails 3.x
Mentors
Raoul J.P. Bonnal, Francesco Strozzi
Project overview and updates
[1]
Source code


BioRuby Wrapper for Command line application

Rationale
The main reason for this project is the need to support different
stand-alone applications critical for Next Generation Sequences
analyses. Direct binding to existing C/C++ source code or rewriting
all the applications is impractical and a waste of resources. A quick
solution is to use stand-alone applications directly, integrating them
into the BioRuby API. Some work has been already done in the BioRuby
NGS project with this wrapper but a better support for demanding I/O
processes is required. Following this design pattern will be possible
to improve also the support for other bioinformatics suites, like
EMBOSS, outdated in BioRuby at the time of this proposal.
Approach
The student will familiarize with advanced meta-programming concepts
in Ruby and will contribute to the definition of a DSL for this
wrapping library. He/she will build also a parser to automatically
define additional wrappers for the EMBOSS suites starting from the ACD
configuration files.
Difficulty and needed skills
Medium. Good Ruby knowledge and experience with meta-programming are
required to achieve the goals.
The project requires
Ruby 1.9
Ruby Metaprogramming
Mentors
Raoul J.P. Bonnal, Francesco Strozzi
Source code
https://github.com/helios/bioruby-ngs, wrapper branch
Represent bio-objects and related information with images

Rationale
Most of the time, after a bioinformatics analysis, the resulting data
needs to be re-processed into a graphical way since we, as
human-beings, are more comfortable accessing results and data visually
than browsing a huge table with interconnected information. Very often
it is also difficult to extrapolate the real biological meaning from a
raw datasets. The main idea of this proposal is to define and attach
graphical functions to BioRuby objects and consequently to the results
computed from a generic process or pipeline. With this solution, it
would be possible to explore them more naturally but also to export
and integrate the information into a web environment, for sharing the
knowledge and the results. For example, different objects storing
alignments results could share the same interface and display their
data in a common way. The same is true also for other kind of objects
or computational procedures.
Approach
The student and the mentor will define together a minimum set of
features that need to be shared by the BioRuby objects and that could
be visualized. Then the student will create a library/module to
implement these graphical features within the BioRuby project. He/she
will gain experience with Rubyvis as the graphical API and with Ruby
on Rails for web visualization.
Difficulty and needed skills
Medium/Hard. The student will need to define a graphical API and
integrate the new code with the existing BioRuby modules. High level
coding skills will be required to create a clean API with a clear
documentation.
The project requires
Very good knowledge of Ruby (1.9) and pattern design
Basic concepts of graphics/visualization
Ruby on Rails basic knowledge
Mentors
Raoul J.P. Bonnal, Christian Zmasek

Modular annotation knowledge base for BioRuby

Rationale
Handling data sets coming from platforms for gene expression analysis
or real time PCR requires to access the corresponding gene annotations
several times during the measurements. This kind of information is
normally stored into remote databases that provide the required
knowledge and data. Problems arise when the available databases do not
support a specific version of the data of interest or when huge
queries need to be submitted. A BioRuby knowledge base, designed to be
modular and expandable through time, could solve these problems. A
good compromise between performances and portability could be achieved
using embedded databases and accessing the data through a clean API.
Approach
The student and the mentor will explore which platforms should be
supported by their popularity. Then the student will recover the
essential annotation and will design a simple database schema to
support all the relevant non-redundant information. The schema will be
flexible enough to allow interconnecting the dataset with external
databases or resources for subsequent analyses. After this phase of
discovery and design, the student will build the database using SQLite
and will write a Ruby library to access the data using ORM
ActiveRecord
Difficulty and needed skills
Medium. The student will need to define the core data to be included
into the database and how this information will be organized and
accessed by the end-user. The Ruby library will be created using the
powerful ActiveRecord paradigms, but good coding skills will be
required to design an efficient API with a clear documentation.
The project requires
Minimal SQL dialect
Good knowledge of Ruby
Experience in querying biological databases
Experience with annotation data
Mentors
Raoul J.P. Bonnal, Francesco Strozzi

ยทยทยท

from: http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/timel
--------------

NESCENT

BioRuby forester

Rationale
Forester is a collection of software libraries, mostly written in
Java, for comparative genomics and evolutionary biology research. A
prominent example of a tool based on forester is the phylogenetic tree
explorer Archaeopteryx. Most of forester's use-cases are associated
with the use of evolutionary trees as tools for establishing
(functional) relations between genes or proteins (for example protein
function prediction with RIO) and comparing genome based features
between different species. Therefore, it implements objects
representing evolutionary trees overlaid with biological data from
other sources (e.g. protein domain architectures), as well as
algorithms operating on these, such as the automated inference of
ancestral taxonomies on gene trees, which has proven useful in the
functional interpretation of large gene trees.
Most of these methods are currently only accessible via the
command-line or through the GUI of Archaeopteryx and therefore
difficult or impossible to use from other computer programs or
toolkits (such as BioRuby). Although forester is mostly written in
Java, it also contains components in Ruby ("evoruby"). These implement
operations on multiple sequence alignments (MSAs) that are crucial in
the development of workflows for automated, large scale, phylogenetic
inference, including I/O, and efficient MSA manipulation (such as
deletion of all columns with a gap-portion larger than a given
threshold, removal of short and/or redundant sequences).
Approach
The goal would be to develop a framework for accessing forester's
central algorithms and applications from within BioRuby. It is
expected that this project will be implemented in form of a BioRuby
plugin in order to avoid creating additional dependencies for the main
BioRuby distribution. Full two-way access between the Java and Ruby
languages can be accomplished by using JRuby as the underlaying
platform.
Depending on the level of experience and skills of a student, a
project proposal could also include either or both of the following
additional goals.
BioRuby and the "evoruby" components of forester partially overlap in
functionality. You could incorporate MSA management functionality
present in "evoruby" but missing in BioRuby into the BioRuby
distribution. This would not only make that functionality immediately
accessible to all BioRuby users, but would also allow a larger
community of developers to participate in maintentence and future
development of these components.
Display gene conversions. This would entail developing a parser for
GENECONV output and use the newly developed BioRuby-forester link to
directly display gene conversions within Archaeopteryx.
Challenges
The student needs to learn two disparate toolkits, BioRuby and forester.
The project involves two programming languages, Ruby and Java.
Need to understand the BioRuby plugin system.
Involved toolkits or projects
BioRuby
BioRuby plugin system
RubyGems
JRuby
forester
Degree of difficulty and needed skills
Expected difficulty: Medium. Proficiency in at least one of the two
involved programming languages, Ruby and Java, is necessary.
Experience/interest in molecular evolution or comparative genomics is
required, and experience with BioRuby or forester will help.
Mentors
Christian Zmasek, Pjotr Prins, Raoul J.P. Bonnal

Regards

--
Ra