<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Science, Reengineered</title>
	<atom:link href="http://sciencereengineered.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://sciencereengineered.com</link>
	<description>Thoughts on accelerating the advancement of science</description>
	<lastBuildDate>Wed, 10 Apr 2013 15:45:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='sciencereengineered.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://1.gravatar.com/blavatar/962e838266e9e024b201a5550ef8fb56?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Science, Reengineered</title>
		<link>http://sciencereengineered.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://sciencereengineered.com/osd.xml" title="Science, Reengineered" />
	<atom:link rel='hub' href='http://sciencereengineered.com/?pushpress=hub'/>
		<item>
		<title>Science in the Clouds</title>
		<link>http://sciencereengineered.com/2012/09/04/science-in-the-clouds/</link>
		<comments>http://sciencereengineered.com/2012/09/04/science-in-the-clouds/#comments</comments>
		<pubDate>Wed, 05 Sep 2012 04:52:37 +0000</pubDate>
		<dc:creator>Michael Kellen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[cloud computing]]></category>

		<guid isPermaLink="false">http://sciencereengineered.com/?p=143</guid>
		<description><![CDATA[Recently, I sat down with Jeff Barr on the AWS report to discuss how we&#8217;ve used various Amazon services throughout our architecture while developing Synapse.  In the interview, I discussed how Synapse uses RDS (MySQL) as our back end database, &#8230; <a href="http://sciencereengineered.com/2012/09/04/science-in-the-clouds/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=143&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Recently, I sat down with Jeff Barr on the AWS report to discuss how we&#8217;ve used various Amazon services throughout our architecture while developing Synapse.  In the interview, I discussed how Synapse uses RDS (MySQL) as our back end database, Elastic Beanstalk to host our service and web hosting tiers, Cloud Search for providing a search across all Synapse content, and Simple Workflow to manage distributed scientific workflows (see also our <a href="http://aws.amazon.com/swf/testimonials/swfsagebio/">AWS case study</a>). The decision to rely heavily on Amazon as an infrastructure provider for our project was based on the belief that hosted infrastructure was they way of the future, and it was best to build technology with that future in mind assuming that services that were still early stage would mature along with our own work.  Despite a few of the <a title="Meltdown in the Cloud" href="http://sciencereengineered.com/2012/07/03/meltdown-in-the-cloud/">hic-ups associated with adopting early stage technology</a>, I&#8217;m still pretty pleased with the decision to go full steam ahead on cloud computing in general, and with Amazon in particular.</p>
<p><span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='640' height='390' src='http://www.youtube.com/embed/eAyZMkOTfto?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span></p>
<p>In the interview, we focused more on what Sage Bionetworks has already done rather than what we might do with AWS in the future; however the breadth of offerings from Amazon keeps expanding along with our potential applications.  One very interesting service to us is Amazon&#8217;s recently launched <a href="http://aws.amazon.com/glacier/">Glacier</a> product, which is designed for data archival.  With S3, you are essentially paying Amazon to make your data continually available and protected from loss; many machines must have the data live on disc to provide S3&#8242;s level of service.  Therefore, even though Amazon may be operating the service on pretty thin margins, it&#8217;s still a relatively expensive way to store data for some use cases.  Glacier complements S3 because it lets you trade off cost for availability: by agreeing to wait 3-5 hours for a request for data to be serviced, you can significantly cut the cost of storing that data.  Biomedical research is full of examples where large amounts of data are actively analyzed for a period of time, but quickly give way to processed versions of the same data that will be used for downstream analyses, e.g. processing a raw genetic sequence to a series of varients compared to a reference genome. While the reference genome or the processing algorithm does occasionally change, once the variants are called this occurs relatively infrequently.  This sort of use case makes Glacier a very interesting proposition.</p>
<p>Along with genomics, imaging is another common source of large data volumes in the medical research space. Sage is planning to enter this space shortly by launching &#8221;Melanoma Hunt&#8221; project, intended to aid in the early detection of melanoma through images of skin lesions captured from mobile phones.  We&#8217;d like to create a publicly accessible database of suspicious and  benign images to catalyze the development of better image processing and machine learning algorithms.  Feature extraction from raw images might be performed only occasionally, feeding into downstream work to build a classification model.  The project will also aim to engage the average citizen in the research process, crowdsourcing much of the effort that goes into developing these classifiers. Citizen scientists may learn to classify the images manually, or remove potentially identifying information from the images. The organization of such efforts could benefit immensely from technologies such as <a href="http://aws.amazon.com/mturk/">Mechanical Turk</a>, with data ultimately relased to researchers through Synapse.</p>
<p>As the Synapse system itself scales to handle these sorts of projects, we are going to need additional technologies on the back end to scale appropriately as well. Given that there will always be a category of application data where we need real-time concurrent updates, we will probably always have some of our services running off of RDS.  We&#8217;re already using Cloud Search to support a standard search feature, but there are other sorts of queries where we might need higher query performance and scalability.  We&#8217;ve started looking at DynamoDB (a noSQL db) and ElastiCache for some of our future needs, or for refactoring current services to support higher scales.  In this type of architecture, we will likely end up using Amazon SQS as a  message queue between synchronous and asynchronous logic in Synapse.</p>
<p>We&#8217;ve also been investing recently in automating the deployment of new instances of Synapse.  As the application has grown in complexity, the configuration of the AWS components it&#8217;s built upon has become increasing difficult to manage manually.  Fortunately, several approaches are possible to automate provisioning these components and installing software into them.  The one we took was to simply write a program (See <a href="https://github.com/Sage-Bionetworks/Synapse-Stack-Builder">Sage Bionetworks / Synapse Stack Builder</a> on GitHub) using the AWS SDK for Java that automates these tasks; the approach works for us because we are a very Java-centric shop and don&#8217;t have clear boundaries between developers and operations personnel.  You could easily do the same in other languages, or though the use of Amazon&#8217;s Cloud Formation templates.  The end result is a blue-green deployment system, where we will be continually building new instances, operating them in staging mode for a test period, and then using Route 53 to manage the cut-over as a staging system is promoted to production.</p>
<p>Finally, I&#8217;ll end by noting that we have been looking at the cloud computing space more generally.  In particular, we&#8217;ve had a great relationship with Google, who&#8217;s very generously provided 2,000 virtual cores of support for our <a href="https://synapse.sagebase.org/#BCCOverview:0">public challenge in the predictive modeling of breast cancer</a>, as well as hosting the clinical and genomic data the challenge is built upon.  We&#8217;ve recently taken the <a title="Motivating a challenge" href="http://sciencereengineered.com/2012/07/26/motivating-a-challenge/">&#8220;Tour de France&#8221; strategy</a> of awarding an intermediate stage win to the currently best preforming classification model of aggressive vs. nonaggressive disease.  It&#8217;s important to remain flexible in our support for scientific computing; the cloud computing market is still young and ultimately, scientists will move to the computing platforms that provide the mix of compute services appropriate for their applications.  Another force that will catalize the formation of communities of scientists working with particular technologies is the presence of large, interesting scientific data sets to work with.  If those data sets can be open to the full scientific community, so much the better.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sciencereengineered.wordpress.com/143/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sciencereengineered.wordpress.com/143/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=143&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sciencereengineered.com/2012/09/04/science-in-the-clouds/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/11744786cf82d08f17b96d11e926a315?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mkellen</media:title>
		</media:content>
	</item>
		<item>
		<title>If I had a billion dollars&#8230;</title>
		<link>http://sciencereengineered.com/2012/08/09/if-i-had-a-billion-dollars-2/</link>
		<comments>http://sciencereengineered.com/2012/08/09/if-i-had-a-billion-dollars-2/#comments</comments>
		<pubDate>Thu, 09 Aug 2012 22:31:51 +0000</pubDate>
		<dc:creator>Michael Kellen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[open science]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://sciencereengineered.com/?p=137</guid>
		<description><![CDATA[In an apparently recurring theme, my thoughts again are running to the incentives that drive human behavior, this time inspired by the recent news that the Russian billionaire Yuri Milner has established a new $3 Million Fundamental Physics Prize.  He&#8217;s &#8230; <a href="http://sciencereengineered.com/2012/08/09/if-i-had-a-billion-dollars-2/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=137&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In an apparently recurring theme, my thoughts again are running to the incentives that drive human behavior, this time inspired by the recent news that the Russian billionaire Yuri Milner has established a new $3 Million <a href="Fundamental Physics Prize">Fundamental Physics Prize</a>.  He&#8217;s actually awarded 9 of these prizes for a cool $27M promoting the efforts of theoretical physics.  Certainly that kind of money and publicity could drive a lot of attention to the field, and I love the fact that we now almost have a basketball team&#8217;s worth of physicists who almost make a basketball player&#8217;s salary.</p>
<p>However, is this the best way to spend $27M to shake up and rally support for science?  Of course Mr. Milner is free to spend his money any way he wishes but I see some potential problems with his approach.  Quoting from the NY times article referenced above &#8220;Mr. Milner personally selected the inaugural group, but future recipients of the Fundamental Physics Prize, to be awarded annually, will be decided by previous winners.&#8221;  I don&#8217;t know how well a Russian billionaire can select the best work in theoretical physics, but let&#8217;s assume he did his due diligence as well as the experts in Stolckholm.  Past the first year the process turns into a bunch of self-anointed experts picking their own colleagues.  Nothing particularly wrong with this, but not that much different than the Nobel Prize.</p>
<p>The other fact in the article that really caught my attention was the condition that theoretical predictions don&#8217;t need experimental evidence to be considered breakthroughs.  No sitting around for decades waiting for messy difficult to acquire data to roll in here, this prize gets straight to rewarding breakthough ideas.  According to Milner “This intellectual quest to understand the universe really defines us as human beings,”. What could be wrong here?</p>
<p>Well, I&#8217;m reminded of the quote by Thomas Huxley which was posted on my thesis adviser&#8217;s door: &#8220;The great tragedy of Science — the slaying of a beautiful hypothesis by an ugly fact.&#8221;  I think the lack of a requirement for experimental validation for the Fundamental Physics Prize shows a fundamental lack of understanding of what science is.  Science is not philosophy.  It is based in a belief that there is in fact a real world which behaves a certain way, and that the way to uncover the way this universe works is though empirical evidence, not the scientist&#8217;s opinion of the beauty of the theory explaining it.</p>
<p>But maybe I&#8217;m just a cranky science dropout.  Let&#8217;s check the news post again &#8220;Dr. Arkani-Hamed, for example, has worked on theories about the origin of the <a title="Recent and archival news about Higgs boson." href="http://topics.nytimes.com/top/reference/timestopics/subjects/h/higgs_boson/index.html?inline=nyt-classifier" target="_blank">Higgs boson</a>&#8230; None of his theories have been proved yet. He said several were &#8216;under strain&#8217; because of the new data.&#8221;&#8230; Wow.  Tough break.  Even in the short interval between when these winners were decided and when they were announced, one of the winner&#8217;s ideas is &#8220;under strain&#8221;, or in layman&#8217;s terms &#8220;wrong&#8221;.</p>
<p>I don&#8217;t have the billion dollars to fund a competitor to the Nobel, but I do have the $18 it took to acquire the sciencereengineered domain, and in this world I am the boss.  So, here&#8217;s my proposal for a Nobel alternative:</p>
<p>First of all, I&#8217;m not going after the Physics prize.  I&#8217;m targeting the one for Physiology and Medicine.  But I&#8217;m not giving the power to award it to a group of experts, I&#8217;m going to let patients vote on it.  My guess is that unproven theoretical ideas decades away from experimental validation won&#8217;t make the top spots.  Instead, the award will go to the projects that have the biggest impact on people&#8217;s lives.  Doing this in practice might be difficult so maybe I&#8217;d pick a different disease every year and go to that patient community to get a more involved and knowledgeable subset of voters.  To make sure the voters are knowledgeable, part of the process of awarding the prize will be having the candidate projects present their work to the lay audience.  I&#8217;d build some sort of online environment for the projects to build presentations of their work, and for patients and scientists to discuss over the course of several months before the vote.  Of course, the data, code, and other materials used to comprise the project have to be open and available to those who want to validate the work.</p>
<p>Secondly, I&#8217;m not giving the award to individuals.  This propagates the false belief that science advances due the unique and rare break-through insights of a small group of geniuses (see <a title="On the Shoulders of Giants" href="http://sciencereengineered.com/2012/04/14/on-the-shoulders-of-giants/">On the Shoulders of Giants</a>).  Instead the award goes to projects, and is shared by every member of the project team equally.  Say a flat prize of $100,000 per team member&#8230;. I want to give the winners enough to be a noticeable Thank-You but not be so large that they retire!  However, unlike the Nobel, I have no limit on the number of people on the team.  It doesn&#8217;t matter if you&#8217;re the PI, the lab rat doing the pipetting, the data analyst, or the marketing guy putting together the project description for the lay audience.  The award goes to the whole team, identified in alphabetical order.  For $27M that means that the winning team size could hit 270 people.  That&#8217;s big, but if you&#8217;ve seen some of the massive author lists on 4 page journal articles it&#8217;s not that far off the mark for modern science.</p>
<p>So, that&#8217;s a first pass at the Kellen prize in Physiology or Medicine.  Of course, if there&#8217;s one thing that&#8217;s clear from studies of human behavior, it&#8217;s that incentive structures often motivate people in ways unintended by those who create the incentives.  So, I reserve the right to modify the rules of the prize based on empirical evidence acquired as the prizes unfold.  After all, that&#8217;s what being a scientist means.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sciencereengineered.wordpress.com/137/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sciencereengineered.wordpress.com/137/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=137&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sciencereengineered.com/2012/08/09/if-i-had-a-billion-dollars-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/11744786cf82d08f17b96d11e926a315?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mkellen</media:title>
		</media:content>
	</item>
		<item>
		<title>Motivating a challenge</title>
		<link>http://sciencereengineered.com/2012/07/26/motivating-a-challenge/</link>
		<comments>http://sciencereengineered.com/2012/07/26/motivating-a-challenge/#comments</comments>
		<pubDate>Fri, 27 Jul 2012 04:45:54 +0000</pubDate>
		<dc:creator>Michael Kellen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[open science]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://sciencereengineered.com/?p=104</guid>
		<description><![CDATA[In my previous post I introduced the Breast Cancer Challenge Sage is hosting to build predictive models of the disease.  The initial conception of this project was as a winner take all competition, with a clear scoring method and single &#8230; <a href="http://sciencereengineered.com/2012/07/26/motivating-a-challenge/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=104&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In my previous post I introduced the Breast Cancer Challenge Sage is hosting to build predictive models of the disease.  The initial conception of this project was as a winner take all competition, with a clear scoring method and single top model as the winner of the big prize: publication in Science Translation Medicine.  Compared to some of our other attempts to catalyze scientific collaborations based more on preaching to scientists to share data and methods for the better good of society, this approach seems to have triggered substantially more interest from the community, and action by some of the initial participants.</p>
<p>Our task now is how to best harness this energy to motivate researchers, and also form a community where people not only compete, but also collaborate and build off each other&#8217;s work effectively.  Many of my recent discussions with the challenge organizers have drawn analogies to the Tour de France, and how multiple dimensions of awards and glory motivate different riders to focus on achieving different objectives in the race.  Every year there&#8217;s only a handful of guys who can realistically hope to win the yellow jersey, but many other riders that compete to take home other awards.</p>
<p><strong>Multiple Jerseys</strong> &#8211; In the tour, there&#8217;s not only an overall winner of the yellow jersey, but several other jerseys that award different riding styles.  The green jersey is awarded by points obtained by sprints to stage and intermediate race points, and the polka-dotted king of the mountains awards the rider who does the best on the major climbs of the tour.  Breast cancer is not really a single disease; in reality many different molecular defects give rise to a heterogeneous collection of diseases.  A prognostic test, particularly one focused on detailed genetic data, is unlikely to perform well across all types of breast cancer.  It would be interesting to award sub-category awards for particular types of cancer, based on things like ER, PR, and HER2 status.</p>
<p><strong>Stage Winners</strong> &#8211; An awful lot of the tactics of a bike race involve the dynamic between riders who are trying to win an individual stage, and riders or teams that are more concerned with the overall standings.  The glory of a stage win is enough to cause riders to attempt to break away from the peloton even though they have no hope of competing for the overall lead.  With our challenge we&#8217;d like to reward people for submitting ideas early and allowing others to incorporate and modify their models, instead of waiting to the the last minute to submit an entry.  One way to do this would be to have defined intermediate points where we had some sort of recognition of the leader.  Maybe we could introduce new validation data at set periods to score the models, and then allow contestants to incorporate previous validation data as new training data in subsequent rounds.  Another approach we&#8217;ve discussed is to reward people for time spent on top of the leaderboard and for the amount of improvement they make over the previous best score.  The person who makes many submissions that improve the top score over the competition may have contributed more to the competition than someone who sneaks in a marginally better model right at the end of the challenge.</p>
<p><strong>Visible Leaderboard</strong> &#8211; In the tour, everyone knows what everyone is doing.  Riders have their coaches in their ears, giving them information on what other riders are doing, and the team constantly replan strategy as the race unfolds over several weeks.  We&#8217;ve got a <a href="http://validation.bcc.sagebase.org/bcc-leaderboard-public.php">basic leaderboard</a> up giving real-time feedback on the challenge and it seems to be a good way to engage people.  I&#8217;m sure with more time and resources we could do a better job of making this resource more exciting for the challengers.  Maybe we should pull in contestants Synapse profiles and try to connect the contestants to an audience. Math and statistics might not be as sexy as bike racing, but there&#8217;s got to be some cancer survivors out there that would be interested in seeing people compete to better understand the disease.</p>
<p>So what do you think?  Do you have any better ideas on how to organize these scientific challenges?  We are open to experimentation.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sciencereengineered.wordpress.com/104/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sciencereengineered.wordpress.com/104/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=104&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sciencereengineered.com/2012/07/26/motivating-a-challenge/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/11744786cf82d08f17b96d11e926a315?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mkellen</media:title>
		</media:content>
	</item>
		<item>
		<title>Breast Cancer Predictive Modeling Challenge</title>
		<link>http://sciencereengineered.com/2012/07/22/breast-cancer-predictive-modeling-challenge/</link>
		<comments>http://sciencereengineered.com/2012/07/22/breast-cancer-predictive-modeling-challenge/#comments</comments>
		<pubDate>Sun, 22 Jul 2012 22:37:41 +0000</pubDate>
		<dc:creator>Michael Kellen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[breast cancer]]></category>
		<category><![CDATA[open science]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://sciencereengineered.com/?p=100</guid>
		<description><![CDATA[Way back at the Sage Congress in April, Sage Bionetworks and DREAM announced a jointly sponsored modeling challenge.  The basic idea behind this effort is to try to catalyze better understanding of the disease by framing a scientific challenge for &#8230; <a href="http://sciencereengineered.com/2012/07/22/breast-cancer-predictive-modeling-challenge/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=100&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Way back at the Sage Congress in April, Sage Bionetworks and <a href="http://www.the-dream-project.org/">DREAM </a>announced a jointly sponsored modeling challenge.  The basic idea behind this effort is to try to catalyze better understanding of the disease by framing a scientific challenge for the entire scientific community to solve.  The starting point for the challenge is a clinical study of breast cancer patients.  On all these women, we now have a range of full-genome molecular data, as well as detailed clinical data on the cancer and course of treatment.  The immediate scientific goal of this challenge is to see who can build the best model of survival time, segmenting patients into aggressive vs. non-aggressive disease.  If you&#8217;re interested in the scientific details of the challenge, you can read more on the <a href="https://sagebionetworks.jira.com/wiki/display/BCC/Home">contest wiki site</a>, or watch last week&#8217;s <a href="http://www.youtube.com/watch?v=xSfd5mkkmGM">intro video</a>.</p>
<p>The higher level experiment that Sage is running though is really on the social structures governing how science is done, and if other non-traditional incentives and structures can accelerate the discovery process itself.  The traditional way this research would be performed is for a high-powered company or academic group to organize and execute a clinical study: an enormously expensive undertaking as it involves managing care and collecting data on a statistically significant number of patients.  The data used in our challenge was gathered over 10 years and comes from about 2,000 patients.  Once collected, this data becomes a valuable commodity.  The group running the trial will typically hold it while analyzing it, and only release their high-level conclusions in the form of publications, or in support of FDA approval for the sale of a new drug.  The data may be shared long after the generators believe they have extracted all value from it, or used as a trading chip as a few high-powered labs form closed collaborations.</p>
<p>Sage&#8217;s hypothesis is that getting this data into the public domain as quickly as possible is in the best interests of patients and society as a whole. Last week we finally had a big public launch of the challenge and got a phenomenal response from the community.  So far, we&#8217;ve had over 160 people register to attempt the modeling challenges, and over 100 attend last week&#8217;s opening on-line launch.  We&#8217;ve already had at a few people submit models, and even have one that beats the simple base-line approach we used as an example.  Compared to other research efforts I&#8217;ve seen, the wide open approach of the challenge seems to have generated far more interest from the community.  We&#8217;ll see by this fall if this turns into better science.</p>
<p>Of course, the idea of framing scientific challenges and posting rewards for their completion is not a new one.  Our partner DREAM has been doing this in the academic setting for 7 years, and there&#8217;s some interesting commercial sites out there like Kaggle and Innocentive.  However, I think some of the things we&#8217;re doing in the course of this challenge are pushing the envelop on this format:</p>
<ul>
<li><strong>Generating experimental validation data</strong> &#8211; In parallel with running the challenge we&#8217;ve identified another 350 or so frozen samples obtained in one of the studies providing the underlying data for the challenge.  Sage has raised funding for generating new molecular data off these samples in parallel with running the modeling challenge. One of the big problems in this space is that statistical models produced in one study don&#8217;t hold up when applied to data not used to generate them.  It will be interesting to see if the best models during the competition phase hold up and generalize to this new data used to determine the final winner.</li>
<li><strong>Publishing the winner in a premier journal</strong> &#8211; Science Translation Medicine has agreed to publish an article written by the winner.  Instead of the usual system of blind peer review by two experts, for this article the fact that the challenge is run on a completely open platform will provide a broader and hopefully more rigorous method of ensuring the winning approach is reviewed and understood by the community.</li>
<li><strong>Requiring participants submit reusable code</strong> &#8211; Computational experts often complain about a lack of access to high-powered data sets, but can be less willing to invest the time to make their own analytical code open to use by others.  Unlike many other challenges in which analysts simply submit a vector of predictions for the validation data, in our challenge analysts will have to submit code that we can run to score the model.</li>
<li><strong>Providing participants dedicated compute space</strong> &#8211; Through a generous donation by Google, we are providing all participants use of a Google virtual machine running on Google&#8217;s new Compute Engine service.  Besides providing raw compute power (currently 2,000 cores dedicated for community use), this approach helps ensure reproducibility by providing commonly configured infrastructure.</li>
</ul>
<p>I&#8217;m really excited to get this project off the ground, and am sure I&#8217;ll have more detailed posts on specific aspects of it over the coming months.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sciencereengineered.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sciencereengineered.wordpress.com/100/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=100&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sciencereengineered.com/2012/07/22/breast-cancer-predictive-modeling-challenge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/11744786cf82d08f17b96d11e926a315?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mkellen</media:title>
		</media:content>
	</item>
		<item>
		<title>Meltdown in the Cloud</title>
		<link>http://sciencereengineered.com/2012/07/03/meltdown-in-the-cloud/</link>
		<comments>http://sciencereengineered.com/2012/07/03/meltdown-in-the-cloud/#comments</comments>
		<pubDate>Tue, 03 Jul 2012 21:38:14 +0000</pubDate>
		<dc:creator>Michael Kellen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[software engineering]]></category>

		<guid isPermaLink="false">http://sciencereengineered.com/2012/07/03/meltdown-in-the-cloud/</guid>
		<description><![CDATA[One of the perks for working for a small non-profit is that we can decide to do simply do things that large companies never would.  This week, Sage Bionetworks is completely closed, with a week-long summer vacation as our bonus &#8230; <a href="http://sciencereengineered.com/2012/07/03/meltdown-in-the-cloud/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=99&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>One of the perks for working for a small non-profit is that we can decide to do simply do things that large companies never would.  This week, Sage Bionetworks is completely closed, with a week-long summer vacation as our bonus for hitting last year’s company-wide objectives.  Of course, leave it to Mr. Murphy to intrude on what seemed like a great morale-booster.</p>
<p>As our week off approached and people started heading out to start their vacations early, we got a first warning that shutting down completely is probably not an option if you’re operating an online service, even an early-stage beta that’s currently supporting a handful of users.  The week before our break we had a brief service outage caused by a security certificate expiring on the Crowd server backing out authentication services.  Although we corrected the problem quickly, that incident got our attention enough to decide that pushing out a new release with only minor bug fixes right before the break was probably not worth the risk.  We decided to just leave everything alone and resume work once everyone was back in town.</p>
<p>Unfortunately, that first incident was only the precursor to the main event.  Friday night, Amazon suffered a <a href="http://www.cmswire.com/cms/customer-experience/amazon-sheds-light-on-aws-outage-016130.php">power outage at their Virginia data center</a>, which also affected a number of other much higher profile sites like Netflix, Heroku, and Instagram.  Saturday morning I woke to the realization that Synapse was completely non-functional.  It took the better part of the day for us to trace the problem to our AWS RDS service being down, and recover the live system using a backup taken shortly before the event. </p>
<p>So, does this incident bring into question the decision to use cloud services and particularly Amazon as the foundation of our own service?  What’s the alternative?  In my previous job my team built a service using a stack of our own selected servers installed in a local colo facility.  Several months in, after a couple power outages had taken us down, we were able to determine that this facility did not in fact have redundant power wired to our cage as they had promised us when we moved in.  Instead, they had just plugged us straight into Seattle City light.  Stung by this we found a new, more reputable hosting partner, located in a quality facility also serving the local ABC news affiliate.  They had a big bank of UPS devices, multiple redundant connections to the power grid, and the ability to run the site indefinitely off backup generators in case of emergency.  Still, at that facility we had a couple day outage caused by an electrical fire that cut power to our servers for a couple days.  At Sage, I could buy physical servers and put them in the data center right next to my office at the Hutch.  It might give me a warm fuzzy safe feeling, but I think that’s mostly due to a psychological tendency to imagine that bad things only happen to other people.  I doubt they’d really be any safer than rented servers sitting in an AWS facility in Virginia.</p>
<p>So, I think the lesson here is not that cloud services are unreliable and you should build everything yourself.  Rather, I think it’s a reminder to not believe the cloud marketing hype too strongly, and remember that cloud services are a <a href="http://www.joelonsoftware.com/articles/LeakyAbstractions.html">leaky abstraction</a> over the fact that you are programming someone else’s data center.  Most of the time that abstraction is a useful simplification which allows you to work more efficiently, but occasionally, like last Saturday, that simplification breaks down.  However, what I took out of responding to the crisis wasn’t a desire to move elsewhere, it was a set of improvements to the way we configure and maintain our service that could greatly improve our ability to respond to things like this in the future.  There’s no magic here, just a set of engineering tasks that we need to triage against other work like adding new features or improving documentation to provide the best experience for our users.  For example:</p>
<ul>
<li>When creating a service off a set of AWS services, it’s best to wire components together though cnames instead of directly to the public name of the service.  We had done this with our app and web tiers running on Elastic Beanstalk, but not with RDS and our Cloud Search instances.  Consequently, we had to change configuration on the production server instead of just swap in new services, slowing our response and increasing risk. </li>
<li>We need better monitoring of our services.  Both more extensive use of Cloud Watch metrics on key components of our services, as well as an external periodic smoke test of the application would have gotten us information about the problem more quickly.</li>
<li>We need better automation of provisioning new environments.  Some Synapse components require a manual set up process, documented via a wiki.  Like most wikis we try to keep ours up to date, but things occasionally get missed.  An automated script to build a new environment checked into version control is self documenting and accurate if you continuously use it to create new environments.</li>
<li>We should think hard about the number of different components comprising our service.  Some, like the Crowd server, offer only a bit of value for us, at the cost of making our overall environment more complex to administer and trouble-shoot.  We need to continuously watch our environment and make sure we are engineering for reliability and ease of maintenance as we expand functionality.</li>
</ul>
<p>However, in the end, I still think the flexibility and development speed we get from building off of Amazon’s building blocks is worth the cost.  More days like last Saturday could change my mind about Amazon, but I’d likely be shopping for a more reliable vendor, not trying to bring more low level IT maintenance in house. </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sciencereengineered.wordpress.com/99/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sciencereengineered.wordpress.com/99/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=99&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sciencereengineered.com/2012/07/03/meltdown-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/11744786cf82d08f17b96d11e926a315?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mkellen</media:title>
		</media:content>
	</item>
		<item>
		<title>Synapse Now Open Source</title>
		<link>http://sciencereengineered.com/2012/05/30/synapse-now-open-source/</link>
		<comments>http://sciencereengineered.com/2012/05/30/synapse-now-open-source/#comments</comments>
		<pubDate>Thu, 31 May 2012 04:10:31 +0000</pubDate>
		<dc:creator>Michael Kellen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[open science]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://sciencereengineered.com/2012/05/30/synapse-now-open-source/</guid>
		<description><![CDATA[I am happy to announce that all Synapse source code is now posted on the Sage Bionetworks GitHub site.  Of course, since Sage is a non-profit institute focused on promoting open science you might fairly ask why this is news &#8230; <a href="http://sciencereengineered.com/2012/05/30/synapse-now-open-source/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=93&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I am happy to announce that all Synapse source code is now posted on the <a href="https://github.com/Sage-Bionetworks">Sage Bionetworks GitHub site</a>.  Of course, since Sage is a non-profit institute focused on promoting open science you might fairly ask why this is news now, over a year and a half after we started initial coding.  Why wasn’t the code up on GitHub from the very beginning?</p>
<p>Well, in retrospect maybe we should have started that way from the very beginning.  I doubt it would have hurt anything, and it may have actually facilitated a collaboration or two that we missed out on over the initial stages of the project.  In our initial months at Sage, our team didn’t know what we had, or what we were doing, or if people would care at all about our vision.  We picked a set of development tools that were familiar and powerful (Atlassian’s hosted suite) and just focused on prototyping ideas and evaluating base technologies.  At the beginning our co-workers didn’t even pay much attention to what we were doing even though they were the customers we were supposed to be supporting.  It took some time for the vision of Synapse to form, and for us to start getting people interested in what we were building.</p>
<p>Once the project was underway, adding more functionality always seemed to take precedence over taking time to think about how to effectively get other to collaborate on development.   It was really only at the Sage Congress last April where we started demoing the product and felt enough traction forming within the community that we started believing we were seriously on to something. There’s still an awful lot of new functionality we desperately want to start working on and it’s quite tempting to dive straight into coding again.</p>
<p>However, we’ve now bitten the bullet and taken the last month to focus on not just moving code to Git, but also trying to structure our codebase and development practices to facilitate having other developers come on to the code base.  We’ve refactored the code base into smaller pieces, are putting more effort into developer docs, and thinking about how to review and incorporate check-ins from external developers.  We’ve had our first of a three new engineers start recently: a great young student on internship which is a good chance to do a dry run with a fresh set of eyes.  We’ve got more <a href="http://sagebase.org/info/careers.php#24">open positions</a> I’m working hard to fill, but even more ideas than people who can implement them.  Hopefully taking the time to do this right now will let us more fully engage the community on the development front in the future.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sciencereengineered.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sciencereengineered.wordpress.com/93/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=93&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sciencereengineered.com/2012/05/30/synapse-now-open-source/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/11744786cf82d08f17b96d11e926a315?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mkellen</media:title>
		</media:content>
	</item>
		<item>
		<title>Are You My Data?</title>
		<link>http://sciencereengineered.com/2012/05/21/are-you-my-data/</link>
		<comments>http://sciencereengineered.com/2012/05/21/are-you-my-data/#comments</comments>
		<pubDate>Tue, 22 May 2012 04:51:01 +0000</pubDate>
		<dc:creator>Michael Kellen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ethics]]></category>
		<category><![CDATA[informed consent]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://sciencereengineered.com/2012/05/21/are-you-my-data/</guid>
		<description><![CDATA[A couple of weeks ago I visited UCSC as part of a workshop titled “Are you my Data?” organized by Jenny Reardon of UCSC Office of Research.  UCSC is becoming the major center in the US for hosting cancer-related genomics &#8230; <a href="http://sciencereengineered.com/2012/05/21/are-you-my-data/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=90&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A couple of weeks ago I visited UCSC as part of a workshop titled “Are you my Data?” organized by Jenny Reardon of UCSC Office of Research.  UCSC is becoming the major center in the US for hosting cancer-related genomics data, and organized the workshop to help inform policies governing distribution of this data to the public.  This was a bit of a departure from my usual audience and role, as the event was targeted more at the social and ethical issues around sharing human health data than the more technical issues with which I am normally faced.</p>
<p>The central issue we faced was how to manage data collected from people in clinical trials or other medical settings: in particular, data which have broad utility to seed future medical advances though research.  There’s been a pretty dismal history of medical research in this country, ranging from the cases of directly harmful experiments in the <a href="http://en.wikipedia.org/wiki/Tuskegee_syphilis_experiment">Tuskegee syphilis experiment</a> to the much more common type of neglect and dehumanizing treatment described in Rebecca Skloot’s book <a href="http://www.amazon.com/The-Immortal-Life-Henrietta-Lacks/dp/1400052173">The Immortal Life Henrietta Lacks</a>.  This history resulted in the development of ethical rules and procedures to govern clinical trials, including a strong emphasis on patient privacy that greatly limits access to the data generated by those trials.  As part of enrolling in a clinical trial, patients sign an “informed consent” agreement in which they agree to participate in the research and are informed about the potential dangers and side effects of their participation, and give their consent for the trial to be conducted and data used in certain pre-defined ways.</p>
<p>However, many at this meeting felt that while the concerns around patient privacy and rights are real, many of the current rules do not serve the public well.  Current consent agreements typically limit who can access the data to ensure patient privacy and protect patients against those who might use the data against them.  Unfortunately, this arrangement also prevents widespread distribution of the data among researchers who could use that information to better advance treatments for diseases.  In fact, in many cases the researchers running trials use narrowly-framed consents as a shield to prevent data from being distributed to their scientific or corporate competitors.  Having a monopoly on data is good for individual scientific careers, but bad for future patients.</p>
<p>If you’re wondering why you should care about something that seems rather abstract and academic, just remember that that odds are very high that you or someone you love will someday come down with one of these sorts of diseases.  People usually get involved with groups like <a href="http://www.armyofwomen.org/">Army of Women</a> or <a href="http://www.michaeljfox.org/">Michael J Fox Foundation</a> when they or someone they love is touched by a particular disease. These groups are great, and the specifics of targeting a particular disease gives them incredible focus and energy.   However, we also think there’s a lot of general purpose infrastructure that doesn’t need to be reinvented in every disease area, and hope to partner with these sorts of groups to build it.  It’s a founding belief of Sage Bionetworks that open research will be faster and more effective research, and it’s our job to find ways to make open systems work with human health data.</p>
<p>An example of this infrastructure is an attempt to create a standard legal clause that can be dropped into informed consent agreements, giving patients the ability to opt in to broad sharing of their data for research purposes.  The <a href="http://weconsent.us/">Portable Legal Consent</a> will act much like an open source software license: it will provide a standard way for patients to choose to place their data into the public domain for research purposes under terms that ensure the data will be easily and broadly shared among all researchers, not just those conducting the trial.</p>
<p>Of course, we don’t advocate changing clinical trials without the informed understanding and cooperation of patients.  As people at the meeting discussed the issue, the topic moved from issues of privacy to issues of respect.  Fundamentally, the problem with previous ethical lapses is that researchers stopped thinking of the people in the trials as actual people, and started thinking of them more like any other object of scientific scrutiny.  While it’s probably not feasible or productive to have researchers completely engaged in continuous dialog with patients, we feel the research community would benefit from more active engagement of the community they are ostensibly trying to help.   It’s not enough to simply ask patients to submit to research; we must find new ways to encourage patients to increase their involvement in medical research, and make that research as open and transparent as possible.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sciencereengineered.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sciencereengineered.wordpress.com/90/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=90&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sciencereengineered.com/2012/05/21/are-you-my-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/11744786cf82d08f17b96d11e926a315?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mkellen</media:title>
		</media:content>
	</item>
		<item>
		<title>99% Perspiration</title>
		<link>http://sciencereengineered.com/2012/05/06/99-perspiration/</link>
		<comments>http://sciencereengineered.com/2012/05/06/99-perspiration/#comments</comments>
		<pubDate>Mon, 07 May 2012 04:08:18 +0000</pubDate>
		<dc:creator>Michael Kellen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[software-development]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://sciencereengineered.com/?p=88</guid>
		<description><![CDATA[Recently I’ve been having a lot of conversations about how to create incentives for open science.  As we continue to make progress on the technical challenges around open data and collaborative analytics it’s becoming ever more apparent that changing the &#8230; <a href="http://sciencereengineered.com/2012/05/06/99-perspiration/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=88&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Recently I’ve been having a lot of conversations about how to create incentives for open science.  As we continue to make progress on the technical challenges around open data and collaborative analytics it’s becoming ever more apparent that changing the motivations and culture of science is the real barrier.  Even at Sage Bionetworks, despite the emphasis on openness and transparency in research as a foundation of the culture, it is still often the case that “papers are the currency of science”. Internal and external projects often require complex politics to break work into “publishable units” and define first and last authorship.</p>
<p>With <a href="http://synapse.sagebase.org/">Synapse</a>, one of the areas we are being driven to develop more fully is the notion of attribution tracking.  The basic argument I get is “Papers are too coarse grained and slow to be published.  We need finer grained attribution tracking in real time so we can see everyone’s contribution.  That will allow people to get credit for smaller pieces of work, driving faster science.”  When I’ve used the “<a href="http://wp.me/p2faIU-y">GitHub for Biology” analogy</a> when explaining Synapse to scientists, the thing many latch on to is the notion that a digital record of their work could be used to drive career advancements like tenure decisions.  There’s a lot of interest in features allowing users to pull together the full history of their work and quantify its impact.  For example, what if on every Synapse project page you could get to the list of all people who contributed to generating and processing data before it entered the project, sort of an automatic generation of references? The knowledge that the software would drive complete attribution for downstream work might draw a lot more interest in contributing well-curated data to the Synapse Commons.</p>
<p>Of course this idea can be carried to extremes.  One of the worrying comments I actually got recently was that we should try to extend attribution tracking upstream to the point where we could capture the “generation of the key ideas” that could drive future work.  Basically, I had a scientist wanting to record pure ideas so that anytime anyone else did work to actually do something in that area; he could take the credit for coming up with the idea.  I had a flashback to an old Onion story about Microsoft patenting “<a href="http://www.theonion.com/articles/microsoft-patents-ones-zeroes,599/">Ones and Zeros</a>”.  Pure ideas are cheap; making them work is hard.  If genius is “1% inspiration and 99% perspiration”, I think attribution clearly needs to be focused on the 99%.</p>
<p>I’m also a little worried that we’re over-emphasizing the notion that work needs to be clearly attributed to a single author.  GitHub, like other software development tools, emphasizes tracking work not primarily for the purpose of creating attribution but for the simple fact that fine grained tracking of code changes is often useful in resolving issues and getting work done more efficiently.  People were using code version control systems long before GitHub took the lead in connecting your check-in history to your online profile.  Even today, building a profile is a side-effect of doing actual work writing software.  Many of the most effective engineers I know contribute far more to a project indirectly through their interactions with their teammates than directly though lines of code added to the code base.</p>
<p>Much of a software engineer’s reputation and career advancement comes from the projects he’s worked on and companies for which he’s worked.  The best ones put the overall success of the project before their own need to attach their name to the work because they get excited by the potential of the project to change the world.  This is not unique to software engineering: I’m sure the engineers on the Apollo or Manhattan projects were also driven by the enormous vision and scope of the projects.  Trying to cut these projects up into “publishable units” where everyone got a few pieces would only have prevented their success.  I feel this is something we desperately need in the area of human health research.   Hopefully, our recently launched <a href="https://sagebionetworks.jira.com/wiki/display/BCC/Home">Sage / DREAM challenge on predictive modeling of breast cancer</a> will help us start figure out how to create these sorts of driver projects around human health research.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sciencereengineered.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sciencereengineered.wordpress.com/88/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=88&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sciencereengineered.com/2012/05/06/99-perspiration/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/11744786cf82d08f17b96d11e926a315?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mkellen</media:title>
		</media:content>
	</item>
		<item>
		<title>Sage Congress Recap</title>
		<link>http://sciencereengineered.com/2012/04/26/sage-congress-recap-3/</link>
		<comments>http://sciencereengineered.com/2012/04/26/sage-congress-recap-3/#comments</comments>
		<pubDate>Fri, 27 Apr 2012 04:43:18 +0000</pubDate>
		<dc:creator>Michael Kellen</dc:creator>
				<category><![CDATA[Synapse]]></category>
		<category><![CDATA[sage bionetworks]]></category>
		<category><![CDATA[sage congress]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://sciencereengineered.com/2012/04/26/sage-congress-recap-3/</guid>
		<description><![CDATA[Despite my best efforts, our march towards the 2012 Sage Congress and launch of Synapse beta turned into a chaotic crunch during the final months and weeks as all software projects seem to inevitably do.  I still can’t believe I &#8230; <a href="http://sciencereengineered.com/2012/04/26/sage-congress-recap-3/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=82&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Despite my best efforts, our march towards the 2012 Sage Congress and launch of Synapse beta turned into a chaotic crunch during the final months and weeks as all software projects seem to inevitably do.  I still can’t believe I encouraged check-ins from the plane and Starbucks the day before my demo. There’s just something that&#8217;s so ethereal about software&#8230; it seems as if it should spring to life beautifully formed as soon as you think it. Alas, we still have loads of bugs, features we wish we had, and crippled build system that needs to be rebuilt.</p>
<p>With a bit of time to decompress I can now say I&#8217;m really proud of how far our team got, and the good reception we got from so many at the Congress. My <a href="http://fora.tv/2012/04/20/Synapse_Pilot_for_Building_an_Information_Commons#chapter_01">full talk</a> is now posted on the Congress website, including the nice bit in the R Studio portion of my demo where I am completely defeated by a podium with a slanted surface and no lip at the bottom.  I guess Isaac Newton wasn’t happy with my quote and the laws of motion took revenge.</p>
<p>We are starting to get real users kicking the tires now which is great, but also is highlighting all the work left to do.  May is now bug fixing time: no more new features until we solidify what we have. It’s also planning time at Sage, as we set development priorities for the summer.  If you dig into Synapse and have ideas of how to improve the system please let us know.  Even better, if you can write code we are looking to hire good software engineers.  We’ll also be taking steps to start pushing code into the open source community.  The past year things have not felt well-formed enough to nucleate any external contributions, but we’re now seeing signs that is starting to change.</p>
<p>In closing, my favorite talks of the Congress:</p>
<ul>
<li>Larry Lessing on <a href="http://fora.tv/2012/04/20/Ingredients_for_Innovation">Ingredients for Innovation</a> – I which I could speak half this well.  A powerful talk on copyright issues leading into larger issues of corruption in our political system.</li>
<li>Adrien Treuille on <a href="http://fora.tv/2012/04/21/Adrien_Treuille_-_Leland_Hartwell_Award_Recipient">Crowdsourcing in Science</a> – Wish I could think of something half as cool as his <a href="http://fold.it/portal/">Fold It</a> and <a href="http://eterna.cmu.edu/eterna_page.php?page=me_tab">Eterna</a> games to engage users.  Will start by taking his advice on the magic of forums.</li>
<li>Jaime Heywood on <a href="http://fora.tv/2012/04/21/Discovery_20_-_We_Might_Know_How_to_Solve_the_Problem">Patients Like Me</a> and new ways to think about disease research.  Great approach to science, and a cool technology platform.</li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sciencereengineered.wordpress.com/82/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sciencereengineered.wordpress.com/82/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=82&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sciencereengineered.com/2012/04/26/sage-congress-recap-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/11744786cf82d08f17b96d11e926a315?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mkellen</media:title>
		</media:content>
	</item>
		<item>
		<title>Synapse Beta Launch</title>
		<link>http://sciencereengineered.com/2012/04/18/synapse-beta-launch/</link>
		<comments>http://sciencereengineered.com/2012/04/18/synapse-beta-launch/#comments</comments>
		<pubDate>Wed, 18 Apr 2012 22:12:42 +0000</pubDate>
		<dc:creator>Michael Kellen</dc:creator>
				<category><![CDATA[Synapse]]></category>

		<guid isPermaLink="false">http://sciencereengineered.com/?p=61</guid>
		<description><![CDATA[As we are nearing the Congress and launch of Synapse beta, we have filmed a quick Synapse Intro movie.  It&#8217;s a 2.5 minute summary of why we are building the platform and the approach we are taking.  Looking forward to &#8230; <a href="http://sciencereengineered.com/2012/04/18/synapse-beta-launch/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=61&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>As we are nearing the Congress and launch of Synapse beta, we have filmed a quick <a href="http://www.youtube.com/watch?v=ACuPT4i5Jg0">Synapse Intro</a> movie.  It&#8217;s a 2.5 minute summary of why we are building the platform and the approach we are taking.  Looking forward to a full demo at the 2012 <a href="http://sagecongress.org/WP/2012congress/">Sage Congress</a> on Friday followed by other exciting announcements from Sage Bionetworks.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/sciencereengineered.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/sciencereengineered.wordpress.com/61/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=sciencereengineered.com&#038;blog=33168816&#038;post=61&#038;subd=sciencereengineered&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://sciencereengineered.com/2012/04/18/synapse-beta-launch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/11744786cf82d08f17b96d11e926a315?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">mkellen</media:title>
		</media:content>
	</item>
	</channel>
</rss>
