Web Tools

Artificial microRNAs

WMD3

WMD3 (Web MicroRNA Designer) is a web-tool for the automated design of artificial microRNAs.

Resources:

Genome Sequences

Arabidopsis thaliana assemblies

As part of the 1001 Genomes Project, we provide the genome sequences of four Arabidopsis thaliana accessions. The sequences are assembled from Illumina sequence reads only accounting for ~50 to 200x genome coverage. We used two to three different sequencing libraries per genome at varying insert size lengths (200bp to 5kb).
The assembly process followed a homology-guided strategy in order to make use of the Col-0 reference sequence, though excluding the centromeres (as defined by Clark et al, Science, 2007).

Resources:

  • BLAST tool to search and download scaffolds
  • Download complete assemblies in FASTA format.

Arabidopsis lyrata promoter database

Promoter sequences (1000 bp upstream of ATG (start codon) extracted from Arabidopsis lyrata annoation (A. lyrata v1.0, April 7, 2008)) are available on the A. lyrata project site.

SNPs and Polymorphisms

1001 Genomes Webtools

Within the 1001 Genomes Project we've developed several web based tools and databases.

Resources:

  • POLYMORPH for 1001 Genomes: A collection of databases and web applications for storage, retrieval analysis and visualization of polymorphism data.
  • T8T9 is a translator between TAIR8 and TAIR9 Arabidopsis thaliana genome coordinates.
  • Col-0 DB is a catalog of variations and sequencing errors in the reference strain Col-0.
  • GMOD GBrowse: Browse our analysed sequences.

MSQT

MSQT is a web based multiple SNP query tool.

Being able to map the genetic basis of phenotypic differences between individuals is an essential task in Natural Variation research. Mapping requires distinct markers between genotypes. Several studies aim at describing the sequence variation within a species. One common outcome of those studies usually is the publication of multiple sequence alignments from multiple loci from several individuals, ecotypes or strains, which of course contain information about polymorphisms between them. We set out to make this polymorphism information readily accessible. The MSQT-application can be easily installed and will run on any BSD, Linux or Mac OS X platform.

Resources:

POLYMORPH

POLYMORPH is a collection of databases and web applications for storage, retrieval, analysis and visualization of polymorphism data from whole genome re-sequencing projects. It is optimized for the efficient handling of high volumes of data produced by so-called next generation sequencing technologies (e.g., Illumina/Solexa, Roche/454, Perlegen and ABI/SOLiD).

Genome annotation and polymorphism data is organized by species and sequencing project in a relational database management system (MySQL). In addition to extensive search and retrieval interfaces, various tools have been implemented to make use of sequence and polymorphism data. This includes CAPS marker and primer design tools as well as an interface for SNP selection for several genotyping platforms. Based on an existing genome annotation, the effects of polymorphisms on genes, protein domains and splice sites as well as allele frequency information have been pre-computed to guarantee fast analysis of small to large genomic regions. Visualization of polymorphism data and extensive genome annotation is provided by a viewer (GBrowse), which can be accessed easily from within any POLYMORPH application.

As of today POLYMORPH provides access to two whole genome polymorphism datasets for the model plant Arabidopsis thaliana generated with re-sequencing methods from Perlegen and Illumina/Solexa. It is being extended to host data from several ongoing Illumina/Solexa sequencing efforts in other species. Future versions will also include the results of small RNA sequencing projects and analysis of expression data across varieties, taking the natural (sequence) variation within a species into account.

Resources:

Micro Arrays

AtGenExpress Visualization Tool

Regulatory regions of plant genes tend to be more compact than those of animal genes, but the complement of transcription factors encoded in plant genomes is as large or larger than that found in those of animals. Plants therefore provide an opportunity to study how transcriptional programs control multicellular development. We analyzed global gene expression during development of the reference plant Arabidopsis thaliana in samples covering many stages, from embryogenesis to senescence, and diverse organs. Here, we provide a first analysis of this data set, which is part of the AtGenExpress expression atlas. We observed that the expression levels of transcription factor genes and signal transduction components are similar to those of metabolic genes. Examining the expression patterns of large gene families, we found that they are often more similar than would be expected by chance, indicating that many gene families have been co-opted for specific developmental processes.

Resources:

Arabidopsis thaliana - Tiling Array Express (At-TAX)

At-TAX is a whole genome tiling array resource for developmental expression analysis and transcript identification in Arabidopsis thaliana.
Tiling array hybridization intensities, probe set definitions for Arabidopsis genes and segmentations into exonic and background regions are visualized in a genome browser. Visualizations of gene expression and cross-platform comparisons are available through tileViz.

Resources:

Databases

Guppy EST and Markers

Our EST and marker databases provide useful tools for genetic mapping and phylogenetic studies of the guppy.

The guppy, Poecilia reticulata, is a well-known model organism for studying inheritance and variation of male ornamental traits as well as adaptation to different river habitats. However, genomic resources for studying this important model were not previously widely available.

With the aim of generating molecular markers for genetic mapping of the guppy, cDNA libraries were constructed from embryos and different adult organs to generate expressed sequence tags (ESTs). About 18,000 ESTs were annotated according to BLASTN and BLASTX results and the sequence information from the 3' UTRs was exploited to generate PCR primers for re-sequencing of genomic DNA from different wild type strains. By comparison of EST-linked genomic sequences from at least four different ecotypes, about 1,700 polymorphisms were identified, representing about 400 distinct genes. Two interconnected MySQL databases were built to organize the ESTs and markers, respectively. A robust phylogeny of the guppy was reconstructed, based on 10 different nuclear genes.

Resources: