<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Dead Programs Tell No Tales (or &#8220;We don&#8217;t need no stinkin&#8217; error handling!&#8221;)</title>
	<atom:link href="http://blog.markturansky.com/archives/42/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.markturansky.com/archives/42</link>
	<description>software architecture &#38; engineering, code hints, sometimes philosophy, photography, life, etc.</description>
	<lastBuildDate>Sun, 20 Jun 2010 07:51:38 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Roger Hayes</title>
		<link>http://blog.markturansky.com/archives/42/comment-page-1#comment-866</link>
		<dc:creator>Roger Hayes</dc:creator>
		<pubDate>Wed, 10 Sep 2008 17:06:32 +0000</pubDate>
		<guid isPermaLink="false">http://blog.markturansky.com/archives/42#comment-866</guid>
		<description>See &quot;fail-stop&quot;,  Schlichting 1983.</description>
		<content:encoded><![CDATA[<p>See &#8220;fail-stop&#8221;,  Schlichting 1983.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: &#187; Code complete doesn&#8217;t mean you&#8217;re done</title>
		<link>http://blog.markturansky.com/archives/42/comment-page-1#comment-134</link>
		<dc:creator>&#187; Code complete doesn&#8217;t mean you&#8217;re done</dc:creator>
		<pubDate>Wed, 13 Feb 2008 11:00:13 +0000</pubDate>
		<guid isPermaLink="false">http://blog.markturansky.com/archives/42#comment-134</guid>
		<description>[...] our case, testing with big numbers revealed concurrency issues that we did not and could not find when developing with simple, smaller tests. Our multi-threaded, [...]</description>
		<content:encoded><![CDATA[<p>[...] our case, testing with big numbers revealed concurrency issues that we did not and could not find when developing with simple, smaller tests. Our multi-threaded, [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan</title>
		<link>http://blog.markturansky.com/archives/42/comment-page-1#comment-103</link>
		<dc:creator>Dan</dc:creator>
		<pubDate>Thu, 07 Feb 2008 13:22:35 +0000</pubDate>
		<guid isPermaLink="false">http://blog.markturansky.com/archives/42#comment-103</guid>
		<description>I am on Mark&#039;s team.

Paul:
Good point about development-time assertions. They are a typically overlooked strategy in most Java-land projects I&#039;ve seen (my own included!). I think I&#039;m going to make it a point when we get back to do a sweep of the codebase and strengthen our pre/postcondition assertions into actual Assert assertions. I believe we got pretty focused on the &quot;try to work for as long as possible&quot; aspect prematurely. Thanks for the sanity check.

Ram:
Another good point. For some reason, Mark didn&#039;t mention in his post that we relied heavily upon remote debugging and thread dumps to track down these problems. I don&#039;t know what we&#039;d do without the JMX console and jdb... I think it&#039;d be nearly impossible to build a large, stable concurrent system like without them.</description>
		<content:encoded><![CDATA[<p>I am on Mark&#8217;s team.</p>
<p>Paul:<br />
Good point about development-time assertions. They are a typically overlooked strategy in most Java-land projects I&#8217;ve seen (my own included!). I think I&#8217;m going to make it a point when we get back to do a sweep of the codebase and strengthen our pre/postcondition assertions into actual Assert assertions. I believe we got pretty focused on the &#8220;try to work for as long as possible&#8221; aspect prematurely. Thanks for the sanity check.</p>
<p>Ram:<br />
Another good point. For some reason, Mark didn&#8217;t mention in his post that we relied heavily upon remote debugging and thread dumps to track down these problems. I don&#8217;t know what we&#8217;d do without the JMX console and jdb&#8230; I think it&#8217;d be nearly impossible to build a large, stable concurrent system like without them.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ram</title>
		<link>http://blog.markturansky.com/archives/42/comment-page-1#comment-97</link>
		<dc:creator>Ram</dc:creator>
		<pubDate>Tue, 05 Feb 2008 22:41:13 +0000</pubDate>
		<guid isPermaLink="false">http://blog.markturansky.com/archives/42#comment-97</guid>
		<description>Thats why you have the thread dump ! Thread dumps are always more useful than exception stack trace. Tools like Lockness make the job of identifying a problem. If your whole application model is state based (Read NIO) with a lot of transitions - you are back to square one where depending on logs is the best bet - since a prior state affects the behavior of the next transitioned state. These are hard to replicate problems.</description>
		<content:encoded><![CDATA[<p>Thats why you have the thread dump ! Thread dumps are always more useful than exception stack trace. Tools like Lockness make the job of identifying a problem. If your whole application model is state based (Read NIO) with a lot of transitions &#8211; you are back to square one where depending on logs is the best bet &#8211; since a prior state affects the behavior of the next transitioned state. These are hard to replicate problems.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul W. Homer</title>
		<link>http://blog.markturansky.com/archives/42/comment-page-1#comment-96</link>
		<dc:creator>Paul W. Homer</dc:creator>
		<pubDate>Tue, 05 Feb 2008 20:01:55 +0000</pubDate>
		<guid isPermaLink="false">http://blog.markturansky.com/archives/42#comment-96</guid>
		<description>The trick, I&#039;ve always found is to reduce the time between the problem occurring, and the program stopping, when in development. Way back, we would use assert statements in C and set the debug flag to true while we were developing. Any problem, no matter how tiny would grind the whole system to a halt. You can&#039;t move forward until you&#039;ve fixed the base problems. That would force the developers to fix the problems instead of ignoring them. 

Problems often cascade, so an earlier ignored problem might actually the part of the cause of a latter one. It is always best to deal with the problems in the order they occurred.

That strategy is great for development, but lousy for support. Once in &#039;wild&#039; the code should try to work for as long as possible (producing lots of nasty log messages if needed), and there should be plenty of room for work-arounds. There should be a way to get a huge amount of information &#039;easily&#039; from the users once the system has crashed. In that type of case, too much information is never a problem, too little is. Once a problem has been diagnosed, a work-around is the first (and cheapest) approach, while a patch is a fall back. 

I&#039;ve always found that mixing between the two strategies based on running environment is the best way to handle the two completely opposite problems.

Paul.
http://theprogrammersparadox.blogspot.com</description>
		<content:encoded><![CDATA[<p>The trick, I&#8217;ve always found is to reduce the time between the problem occurring, and the program stopping, when in development. Way back, we would use assert statements in C and set the debug flag to true while we were developing. Any problem, no matter how tiny would grind the whole system to a halt. You can&#8217;t move forward until you&#8217;ve fixed the base problems. That would force the developers to fix the problems instead of ignoring them. </p>
<p>Problems often cascade, so an earlier ignored problem might actually the part of the cause of a latter one. It is always best to deal with the problems in the order they occurred.</p>
<p>That strategy is great for development, but lousy for support. Once in &#8216;wild&#8217; the code should try to work for as long as possible (producing lots of nasty log messages if needed), and there should be plenty of room for work-arounds. There should be a way to get a huge amount of information &#8216;easily&#8217; from the users once the system has crashed. In that type of case, too much information is never a problem, too little is. Once a problem has been diagnosed, a work-around is the first (and cheapest) approach, while a patch is a fall back. </p>
<p>I&#8217;ve always found that mixing between the two strategies based on running environment is the best way to handle the two completely opposite problems.</p>
<p>Paul.<br />
<a href="http://theprogrammersparadox.blogspot.com" rel="nofollow">http://theprogrammersparadox.blogspot.com</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Georgi</title>
		<link>http://blog.markturansky.com/archives/42/comment-page-1#comment-95</link>
		<dc:creator>Georgi</dc:creator>
		<pubDate>Tue, 05 Feb 2008 17:49:28 +0000</pubDate>
		<guid isPermaLink="false">http://blog.markturansky.com/archives/42#comment-95</guid>
		<description>Hi Mark,

I had a deja vue straight ahead reading your article. We had exactly the same problem with toArray() in a server, it was a list that stored notification e-mails to be sent in case a fatal error occured. All program parts and Threads can access this interface. A customer called stating that no emails are sent any more so we analysed the problem, found the bug pretty fast and fixed it within minutes. It was a one in a million chance: Two parts of the program &quot;went fatal&quot; and wanted to send an email simultaneously.

Finding the erroneous part that quick only was possible because we have a kind of logging-guidelines and do improve the software from time to time. This way, we do not have that much &quot;error&quot; - logging entries (and merely never a &quot;fatal&quot; one) and can follow them up pretty fast. Will say: We try to avoid to log into the error-channel where possible. (Part)Recoverable actions that occur (i.e. connection errors where you can retry etc.) are logged into the info-channel more and more as the program evolves.

Thus I think it is not an option to exit a program (or more worse: To System.exit() in case you write a Framework or module for your program). Just imagine you go to productive level and forget a System.exit() - statement ... .

It will help much more to improve where/what/how to log. If you want to, have a look at http://goit-postal.blogspot.com/2007/01/how-to-successful-log-in-programming.html .

Just my 0,02$, Georgi</description>
		<content:encoded><![CDATA[<p>Hi Mark,</p>
<p>I had a deja vue straight ahead reading your article. We had exactly the same problem with toArray() in a server, it was a list that stored notification e-mails to be sent in case a fatal error occured. All program parts and Threads can access this interface. A customer called stating that no emails are sent any more so we analysed the problem, found the bug pretty fast and fixed it within minutes. It was a one in a million chance: Two parts of the program &#8220;went fatal&#8221; and wanted to send an email simultaneously.</p>
<p>Finding the erroneous part that quick only was possible because we have a kind of logging-guidelines and do improve the software from time to time. This way, we do not have that much &#8220;error&#8221; &#8211; logging entries (and merely never a &#8220;fatal&#8221; one) and can follow them up pretty fast. Will say: We try to avoid to log into the error-channel where possible. (Part)Recoverable actions that occur (i.e. connection errors where you can retry etc.) are logged into the info-channel more and more as the program evolves.</p>
<p>Thus I think it is not an option to exit a program (or more worse: To System.exit() in case you write a Framework or module for your program). Just imagine you go to productive level and forget a System.exit() &#8211; statement &#8230; .</p>
<p>It will help much more to improve where/what/how to log. If you want to, have a look at <a href="http://goit-postal.blogspot.com/2007/01/how-to-successful-log-in-programming.html" rel="nofollow">http://goit-postal.blogspot.com/2007/01/how-to-successful-log-in-programming.html</a> .</p>
<p>Just my 0,02$, Georgi</p>
]]></content:encoded>
	</item>
</channel>
</rss>
