Jul 24, 2009
Sample Chapter is provided courtesy of Prentice Hall
SOURCE
This chapter covers some of the general steps you can take to isolate the cause a wide range of Ubuntu Server problems and work out their solutions.
Troubleshooting is a topic that is near and dear to me. While there are many other areas of system administration that I enjoy, I don’t think anything compares to the excitement of tracking down the root cause of an obscure problem. Good troubleshooting is a combination of Sherlock Holmes–style detective work, intuition, and a little luck. You might even argue that some people have a knack for troubleshooting while others struggle with it, but in my mind it’s something that all sysadmins get better at the more problems they run into.
While this chapter discusses troubleshooting, there are a number of common problems that can cause your Ubuntu system to not boot or to run in an incomplete state. I have moved all of these topics into their own chapter on rescue and recovery and have provided specific steps to fix common problems with the Ubuntu rescue CD. So if you are trying to solve a problem at the moment, check Chapter 12 first to see if I have already outlined a solution. If not, come back here to get the more general steps to isolate the cause of your problem and work out its solution.
In this chapter I’m going to discuss some aspects of my general philosophy on troubleshooting that could be applied to a wide range of problems. Then I will cover a few common problems that you might run into and introduce some tools and techniques to help solve them. By the end of the chapter you should have a head start the next time a problem turns up. After all, in many organizations downtime is measured in dollars, not minutes, so there is a lot to be said for someone who can find a root cause quickly.
General Troubleshooting Philosophy
While there are specific steps you can take to address certain computer problems, most troubleshooting techniques rely on the same set of rules. Below I will discuss some of these rules that will help make you a better troubleshooter.
Divide the Problem Space
When I’m faced with an unknown issue, I apply the same techniques as when I have to pick a number between 1 and 100. If you have ever played this game, you know that most people fall into one of two categories: the random guessers and the narrowers. The random guessers might start by choosing 15, then hear that the number is higher and pick 23, then hear it is still higher. Eventually they might either luck into the right number or pick so many numbers that only the right number remains. In either case they use far more guesses than they need to. Many people approach troubleshooting the same way: They choose solutions randomly until one happens to work. Such a person might eventually find the problem, but it takes way longer than it should.
In contrast to the random guessers, the narrowers strategically choose numbers that narrow the problem in half each time. Let’s say the number is 80, for instance; their guesses would go as follows: 50, 75, 88, 82, 78, 80. With each guess, the list of numbers that could contain the answer is reduced by half. When people like this troubleshoot a computer problem, their time is spent finding ways to divide the problem space in half as much as possible. As I go through specific problems in this chapter, you will see this methodology in practice.
Favor Quick, Simple Tests over Slow, Complex Tests
What I mean here is that as you narrow down the possible causes of a problem, you will often end up with a few hypotheses that are equally likely. One hypothesis can be tested quickly but the other takes some time. For instance, if a machine can’t seem to communicate with the network, a quick test could be to see if the network cable is plugged in, while a longer test would involve more elaborate software tests on the host. If the quick test isolates the problem, you get the solution that much faster. If you still need to try the longer test, you aren’t out that much extra time.
Favor Past Solutions
Unless you absolutely prevent a problem from ever happening again, it’s likely that when a symptom that you’ve seen before pops up, it could have the same solution. Over the years you’ll find that you develop a common list of things you try first when you see a particular problem to rule out all of the common causes before you move on to more exotic hypotheses. Of course, you will have problems you’ve never seen before, too—that’s part of the fun of troubleshooting—but when you test some of your past solutions first, you will find you solve problems faster.
No comments:
Post a Comment