SciLINC - status unbekannt

Beendet
Post Reply
Message
Author
User avatar
WNj
BC-Member
Posts: 148
Joined: 26.05.2006, 13:00
Location: Wendesse

SciLINC - status unbekannt

#1 Post by WNj » 10.06.2007, 21:15

lebe Dein Leben so wie Du wenn Du stirbst wünschen wirst gelebt zu haben

User avatar
rebirther
Admin
Posts: 2902
Joined: 19.12.2005, 00:59
Location: Germany

#2 Post by rebirther » 22.06.2007, 16:47

NEWS:
When development of the SciLINC project began it had four primary goals. Edited for brevity, they were: Increase public access to nationally significant scientific literature. Enhance the usefulness of digitized materials by creating a Web repository of scanned literature, keywords, and online resources with tools for searching and analysis. Create an educational tool for learning about plant life. While the screensaver application is indexing keywords, the participant's computer will display information about plant life within the United States and around the world. The information displayed will describe each plant name or term currently being indexed on the participant's computer, and will include descriptive data, images, maps, and the annotated outlinks for that term. Provide a model for adopting public-resource computing applications within the library community. Botanicus is doing a wonderful job of meeting goals 1 and 2 including processing data generated by SciLINC. The project has certainly also meet goal 4. We have learned much about grid-based, distributed, public-resource computing applications and the BOINC architecture. There are thoughts and plans for analyses down the road that will be much more computationally intensive than the original SciLINC analysis and we look forward in time to bringing these projects to you. While the amount of data that SciLINC has to analyze will increase greatly in the days ahead it does not appear that increasing the volume of information is going to improve the user experience of running the SciLINC client. It has been suggested that we increase repackage our data into single files instead of uploading and downloading 50 files per workunit as we currently do. This suggestion has been heeded and implemented. We had planned on doing it before SciLINC was rolled out but scheduling prevented it and the community discovered the project before we were ready to announce it. We expect that testing will show the repackaging lessens the load placed upon the core BOINC client software. But, it does not change the amount of data being transferred. The truth is that the workunits fly by so rapidly that implementing goal 3 never became realistic. When development of SciLINC began, the project lead's understanding was that from a technological and economic standpoint it makes sense to use public-resource computing in place of an internal grid computing architecture whenever less than a gigabyte of data is required per cpu-day of computation. Using the BOINC framework to transfer the data to clients, SciLINC meets this volume-of-computation guideline. However, our brief experience with the dedicated BOINC community over the last couple weeks has shown that, to the community these numbers may differ somewhat. In its original form SciLINC would have need to transfer roughly 250MiB of compressed data in order to occupy a modern CPU for a day. This would expand to nearly 660MiB of input data. Then the client would need to upload about 44MiB of results which would compress to 17MiB. These numbers have only grown as SciLINC has been improved and made more efficient. This is not acceptable to the average BOINC user. Looking at the numbers from the perspective of someone on dial-up, if they set SciLINC to only 1% of their BOINC time, this would be roughly 15 minutes out of a day. For this 15 minutes they would have needed to download around 2.5MiB of data. This may not be a huge issue for broadband users, but if someone is on dial-up (as we have learned many BOINC fans still are) the transfer time would exceed the computation time. So, where are we now? Even if the transfer:credit ratios were acceptable to the community, we do not have enough data to realistically occupy hundred or thousands of BOINC enthusiasts for a lengthy period of time. As we have already seen on various community boards a relatively small amount of credit is earned for a comparatively large load on their system resources. Any computational and transport related improvements that have been tested have only resulted in more data needing to be transferred. As stated above, we are investigating the possibility of performing much more computationally intensive analyses in the months ahead. It is expected that these will be a much better fit for a BOINC project than the current task of text-indexing and taxonomic analysis which has a relatively low mathematical complexity. Because of this it has been decided that for now all SciLINC computation will be performed internally. When we have something with a better credit-reward ratio (and nicer screensaver) it will be made available to the community. Thank you again for your interest and support. We look forward to working with you in the future. The SciLINC Team This has been cross-posted to the forums for discussion.

User avatar
rebirther
Admin
Posts: 2902
Joined: 19.12.2005, 00:59
Location: Germany

#3 Post by rebirther » 19.07.2007, 13:54

Weg und schon wieder da:
We expect to release SciLINC back into full production this week. We're in the final stages of testing results from our project and plan to start sending new work units en masse by the end of the week. We will also be accepting new participants at that time. Thanks to all for your continued support as we work out these final few bugs. -Chris, Project Manager

User avatar
rebirther
Admin
Posts: 2902
Joined: 19.12.2005, 00:59
Location: Germany

#4 Post by rebirther » 22.09.2007, 17:53

In ca. einer Woche sollte es wieder losgehen, mal abwarten:
Who's there? Despite what you might think, we are. Long story short, the project is not dead, but may change framework to successfully deliver our intended outcomes. As previously mentioned, this summer we ran internal testing and came up with some fundamental bugs with our taxonomic name matching algorithms. As we worked through those problems, we then had hardware malfunctions. Then, our main developer moved on to another gig. You know, typical problems faced by every development team.

In the mean time, we continued working with the folks at uBio.org, who have also developed taxonomic name matching algorithms (under an umbrella of what they call 'taxonomic intelligence'). Their algorthims are service-based and performant, and initial evaluation has suggested that they are also more accurate than the code we co-opted for SciLINC. Given that, and that text indexing is not a processor intensive exercise (something we knew going into the project), we're at a decision point of whether to continue this project using BOINC, which may ultimately prove an unsuccessful implementation of the technology for the previous reasons, or to switch to a SOA using uBio.

To be honest, we're torn. We set out to demonstrate BOINC within scientific literature and to build a community of enthusiasts around the subject matter, but the problems listed above have given us pause on direction. I'd be interested to hear your (collective) thoughts.

Post Reply