SIMAP

Message

#1 Post by **rebirther** » 28.05.2006, 12:19

The next client release is under preparation and will provide processor specific optimizations.

Bald gehts dort rund :mrgreen:

#2 Post by **rebirther** » 14.08.2007, 17:28

13. August 2007
SIMAP stellt Daten für Gene3D-Projekt bereit: SIMAP hat begonnen, monatlich umfassende Datensätze von Protein-Ähnlichkeiten und -features für das Gene3D-Projekt zu erzeugen.
Das Gene3D-Projekt hat das Ziel, die Verteilung der Strukturdomänen in Proteinen in der Natur zu charakterisieren und diese Information für die Forschung zur Proteinevolution and -funktion zu nutzen. In lebenden Zellen stellen Proteine, verschlüsselt durch DNS, die Funktionseinheiten dar. Sie fungieren sowohl als Katalysatoren im Zellstoffwechsel als auch als strukturelle einheiten für die Struktur und Organisation der Zelle. Fast alle Proteine bestehen aus einer oder mehrerer Domänen. Domänen sind nahezu unabhängige Teilsequenzen der Proteine, die eine bestimmte Topologie formen die als "fold" bezeichnet wird. Man nimmt an, dass es nur einige tausend folds mit ca. 20 "superfolds" gibt, welche die große Mehrzahl der Domänenstrukturen ausmachen.

Gene3D's Schwesterdatenbank, CATH, kombiniert eine Suite von Computerprogrammen mit der Analyse durch Experten, um die Begrenzungen der folds in 3D-Strukturdaten - z.B. durch Röntgenbeugung von Proteinkristallen gewonnen - herauszufinden und die folds basierend auf ihren Struktureigenschaften und evolutionären Beziehungen in eine Hierarchie einzuordnen. Gene3D erstellt dann anhand der Sequenzen (Proteine bestehen aus Ketten von Aminosäuren) Modelle der Domänen, die als Hidden Markov Models (HMMs) bezeichnet werden. Diese Modelle identifizieren auf spezifische Weise diejenigen Proteinsequenzen, die zu den Ausgangsdomänen der Modelle in CATH evolutionär verwandt sind. Daraus folgt, dass diese Proteine dieselben Raumstruktur wie die Ausgangsdomänen des passenden Modells formen.

Momentan befinden sich >6000 HMMs in der CATH-Gene3D Datenbank. Diese Modelle werden mit allen bekannten Proteinsequenzen (mehr als 7 Millionen) verglichen, um deren Domänenzusammensetzung zu ermitteln. Dies stellt einen enormen Rechenaufwand dar und kann normalerweise nur mit Hilfe großer Computernetze realisiert werden. Ausgehend vom Vergleich dieser Domänenarchitekturen und der direkten Analyse der Domänensequenzähnlichkeiten ist es möglich, experimentell gewonnene Erkenntnisse von der kleinen Anzahl gut charakterisierter Proteine auf die sehr große Menge derjenigen Proteine zu übertragen, die aus DNS-Sequenzen (z.B. dem Human-Genomprojekt) abgeleitet wurden.

Darüber hinaus ist es möglich, direkte funktionelle Zusammenhänge durch die Identifikation subtiler evolutionärer Signale (z.B. durch Co-Evolution) aufzuzeigen; dies ist jedoch nur ein Beispiel der vielfältigen Anwendungen. Somit hatten und haben viele Untersuchungen basierend auf CATH & Gene3D, aber auch auf Proteinstrukturen allgemein, einen signifikanten Beitrag zum Verständnis von Erkrankungen und der Entwicklung neuer Arzneimittel geliefert.

13. August 2007
Leistungsstarker Storage-Server für SIMAP: SUN Microsystems hat SIMAP als Empfänger eines "Academic Excellence Grant" ausgewählt. Um die Datenbank-Plattform des SIMAP-Projekts zu verbessern, unterstützt SUN das Projekt mit der Spende eines komplett ausgestatteten X4500 Datacenter-Servers. Diese dual-Opteron-Maschine besitzt 16 GB Hauptspeicher und 48 lokal angeschlossene SATA Festplatten mit je 500GB Kapazität. Während der letzten Wochen haben wir den neuen Server installiert, getestet und die Parameter optimiert. Die SIMAP MySQL-Datenbank und Binärdateien sind auf mehreren, sehr schnellen RAID10 Festplattenarrays gespeichert. Ab sofort läuft SIMAP im vollen Produktionsmodus auf dieser Maschine und es zeigt sich, dass alle Datenbankoperationen von SIMAP erheblich beschleunigt werden konnten.

Text wurde zu unserem Wiki hinzugefügt.

#3 Post by **rebirther** » 24.12.2007, 01:08

Project status
Calculation of SIMAP updates is currently running.
The current workunits calculate the similarities of approx. 10 million new sequences from several environmental sequencing projects.

Das kann etwas länger dauern ^^

#4 Post by **rebirther** » 31.03.2008, 11:47

The calculation of new simap workunits containing the novel proteins from march 2008 (simap app) and their domains (hmmer app) will start on march 31st in the evening (UTC).

#5 Post by **rebirther** » 29.04.2008, 22:01

April 29, 2008
New workunits: The calculation of similarities and features of the approx. 250.000 new sequences, that were imported from the databases PDB, GenBank and Uniprot in april 2008, will start in the evening (UTC) of april 30th.

#6 Post by **rebirther** » 30.05.2008, 18:20

The calculation of new simap workunits containing the novel proteins from may 2008 (simap app) and their domains (hmmer app) will start on june 1st in the evening (UTC).

#7 Post by **rebirther** » 02.06.2008, 19:08

June 1, 2008
Additional new workunits: We have recently imported additional approx. 400.000 new sequences from the database ENSEMBL. Therefore the number of workunits will be much larger than announced on May 30th.

#8 Post by **rebirther** » 10.06.2008, 09:37

The current workunits calculate the similarities and domains of approx. 600.000 new sequences from public protein databases (PDB, Uniprot, RefSeq, Ensembl), imported in may 2008.

#9 Post by **rebirther** » 30.06.2008, 13:40

June 30, 2008
New Workunits: The calculation of similarities and features of approx. 300.000 new sequences, that were imported from the databases PDB, GenBank and Uniprot in june 2008, will start in the evening (UTC) of july 1st. Several members of the BOINCSIMAP team will be attending a scientific conference in denmark this week, thus we apologize for possible delays in providing new workunits. Additionally, we will test a new version of dynamic quotas, making the daily amount of results per host dependent on its average turnaround time. All hosts will start from the same daily quota and average turnaround time. Thus we hope to have a better start of the new batches, without the usual overloading of our servers. Please give us feedback and report problems with the daily quota in our message boards.

#10 Post by **rebirther** » 28.07.2008, 16:16

July 28, 2008
New Workunits: The calculation of similarities and features of the new sequences, that were imported from the databases PDB, RefSeq and Uniprot in july 2008, will start in the evening (UTC) of next monday, august 4th. Additional workunits will contain sequences from approx. 100 new environmental sequencing projects. With the new calculation period we will limit the number of workunits in progress to max. 30 per CPU, in order to distribute the work more homogeneously. We hope this limitation will work properly, for feedback and reporting problems with it, please use our message boards.

#11 Post by **rebirther** » 23.08.2008, 20:36

August 23, 2008
New Workunits: One week earlier than expected we could prepare the new workunits for the 2 million environmental sequences from the preprocessing batches. The calculation of these workunits will therefore already start in the evening (UTC) of august 24th. Additional workunits have been prepared for the calculation of similarities and features of the 400.000 new sequences, that were imported from the databases PDB, GenBank, ENSEMBL and Uniprot in august 2008.

#12 Post by **rebirther** » 15.12.2008, 18:07

December 14, 2008
New Workunits: an additional batch of workunits has been started for the calculation of similarities and features of approx. 1.500.000 new sequences, that were imported from metagenomes.

#13 Post by **rebirther** » 28.02.2009, 09:15

New Workunits in the evening (UTC) of February 28th
February 27, 2009
New Workunits: the calculation of similarities and features of approx. 200.000 new sequences, that were imported from protein databases into SIMAP in February, will start in the evening (UTC) of February 28th.

#14 Post by **rebirther** » 30.05.2009, 06:38

New Workunits in the evening (UTC) of June 01
May 30, 2009
New Workunits: the calculation of similarities and features of approx. 650.000 new sequences, that were imported from protein databases into SIMAP in May, will start in the evening (UTC) of June 1st.

#15 Post by **rebirther** » 26.08.2009, 12:27

August 24, 2009
New Workunits: Approx. 200.000 new workunits are that calculate the similarities of domain superfamilies. This dataset has been prepared in co-operation with the Gene3D team and initiates a novel aspect of SIMAP. If the test using the current approx. 200.000 workunits will be successful, we will maintain this dataset as part of SIMAP. It will be used to construct new versions of the Gene3D database.

#16 Post by **rebirther** » 30.09.2009, 15:32

New Workunits in the evening (UTC) of September 30
September 30, 2009
New Workunits: the calculation of similarities and features of approx. 430.000 new sequences, that were imported from protein databases into SIMAP until the end of September, will start in the evening (UTC) of September 30. We will also calculate a batch of new environmental sequences.

#17 Post by **rebirther** » 30.10.2009, 18:21

New Workunits in the morning (UTC) of October 31 and Win64 simap app
October 30, 2009
New Workunits: the calculation of similarities and features of approx. 400.000 new sequences, that were imported from protein databases into SIMAP until the end of October, and approx. 5 million sequences from environmental genomes will start in the morning (UTC) of October 31. We will also test a 64bit simap application for Windows. This new app will be automatically installed on Windows64 BOINC clients.

#18 Post by **rebirther** » 26.11.2009, 17:02

November 26, 2009
New Workunits: the calculation of similarities and features of approx. 300.000 new sequences, that were imported from protein databases into SIMAP in November, will start immediately after the currently running calculations. The January batch will start later as usual after the new years holidays on January 7th

#19 Post by **rebirther** » 31.01.2010, 08:27

The current workunits calculate the similarities and domains of approx. 200.000 new sequences from public databases imported in January 2010, and the similarities of approx. 2 million environmental sequences. After these calculations are finished, next workunits are expected on March 1st.

#20 Post by **rebirther** » 31.10.2010, 17:02

BOINCSIMAP will move to a new location onto new hardware
Due to the move of parts of the BOINCSIMAP crew to Vienna in 2010, this project is now a joint project of the Technical University Munich and the University of Vienna.

The recent series of power faults, combined with defects of the UPS devices in our Munich facilities, has motivated us to move BOINCSIMAP now immediately to new hardware at the University of Vienna. This hardware is brand new and became available last week. We will now setup the new boinc servers (BOINC server, download-/upload server and database server) and run a first small test in December 2010. If everything works fine, the first full run of the new servers will start around January 7, 2011 after the new year holidays. We expect then 1..2 months of continuous work, as there are the new proteins of three months plus a couple of new metagenomes.

There will be a new URL for the project, but we will also keep the current URL operational for a longer period.

Thanks to all friends of SIMAP for your great work, help and the many discussions in the previous years. We look forward to another productive era starting next year.

Best regards from Vienna
Thomas 27 Oct 2010 20:25:27 UTC