Archive for the ‘System Administration’ Category

Introduction to data recovery using open source tools

Sunday, September 19th, 2010

I have an old hard disk with corrupt NTFS volumes. I’m not sure how they got corrupt but they cannot be fixed using the standard chkdsk /f command. Fortunately, there’s a plethora of open source data recovery tools available. Two such tools are foremost and photorec which specialize in combing through hard drive partitions to recover files based on header information. They can even recover files after the disk or memory card has been accidentally erase.

Foremost is a Linux command-line tool originally developed by the U.S. Air Force. If you’re using a debian-based Linux distribution (e.g., Ubuntu), you can grab it using the command sudo apt-get install foremost. Foremost can recover common file types such as txt, jpg, avi, and etc.. It was last updated in 2008 which means that its knowledge of file headers is, at best, two years dated.

Photorec is part of the testdisk suite, which is a set of Linux command-line tools. If you’re using debian-based Linux, you can install it using the command sudo apt-get install testdisk. Testdisk not only “tests your disk” but also rebuilds your partition table. This is the tool to use if your hard drive’s master boot record or partition table is corrupt. Photorec, like foremost, recovers files (not just photos) based on file headers. In fact, photorec supports more file types and is more up-to-date than foremost, which is evident by the fact that I was able to recover more files with photorec than foremost.

The problem with both foremost and photorec is that they recover file content but not file names. So you end up with directories of randomly named files with only the file extension preserved. It’s not ideal but it’s still better than not having the data at all.

See also:

http://help.ubuntu.com/community/DataRecovery

Distributed Dictionary Attack Solutions

Saturday, June 26th, 2010

We have had the misfortune of having attempted distributed dictionary attacks on our Linux servers.  A dictionary attack uses a long list of common usernames and passwords trying to find a way to gain a foothold and eventually root access of a password-protected server.

Our servers use utilities such as fail2ban or denyhosts that look for repeated failed login attempts, and once found they direct the firewall service to ban the originating IP addresses.  However this technique fails when the attack is distributed among thousands of compromised “zombie” computers that are doing the bidding of a malicious hacker.

Our log files correctly diagnosed each attempt from an individual IP address, but a new attempt was immediately started from a different IP address.  We were clearly looking at an attack coordinated from a single unknown source.

There are several ways to reduce or thwart these attacks, including:

  • never allow remote root logins (but attacks still occur on non-root-user names)
  • have a chroot jail shell in case an attack on a non-root account succeeds
  • changing SSH service to use a non-obvious port (such as port 5022 instead of 22)
  • deactivate password authentication and rely exclusively on authentication keys
  • restrict allowed IP address by country of origin
  • only allow certain IP addresses or ranges of addresses to have access

We chose to implement more than one of these solutions, and I wanted to share some techniques we used for our implementation.

For the last rule that only allows certain IP addresses, I wanted to start with a list of valid IP addresses used in the last month.  The following script extracts these IP numbers from our SSH log file, sorts them alphabetically, and then removes duplicates.  Note that this server uses Fedora – you may need to tweak it for other linux distributions.

root# fgrep "Accepted" /var/log/secure* | awk '{print $11}' | sort | uniq
166.77.6.4
205.232.34.1
67.255.5.155
...

The IP addresses from the above script should be added to the file /etc/hosts.allow in the following format:

# hosts.allow   This file contains access rules which are used to
#               allow or deny connections to network services that
#               either use the tcp_wrappers library or that have been
#               started through a tcp_wrappers-enabled xinetd.
#
#               See 'man 5 hosts_options' and 'man 5 hosts_access'
#               for information on rule syntax.
#               See 'man tcpd' for information on tcp_wrappers

# allow local addresses
all: 127.0.0.1
all: 192.168.1.*

# valid IP addresses gathered June 2010
all: 166.77.6.4
all: 205.232.34.1
all: 67.255.5.155
...

Now disallow all other IP addresses for SSH by editing the file /etc/hosts.deny:

# hosts.deny    This file contains access rules which are used to
#               deny connections to network services that either use
#               the tcp_wrappers library or that have been
#               started through a tcp_wrappers-enabled xinetd.
#
#               The rules in this file can also be set up in
#               /etc/hosts.allow with a 'deny' option instead.
#
#               See 'man 5 hosts_options' and 'man 5 hosts_access'
#               for information on rule syntax.
#               See 'man tcpd' for information on tcp_wrappers
#
# The portmap line is redundant, but it is left to remind you that
# the new secure portmap uses hosts.deny and hosts.allow.  In particular
# you should know that NFS uses portmap!

# deny SSH service except for IP numbers in /etc/hosts.allow file
sshd: all

Restart your SSH service, and your server should now be a bit more secure against distributed dictionary attacks:

root# service sshd restart

An Internet search using keywords from the other mentioned solutions above will teach you how to change SSH port, disallow password authentication, etc.

worked in academia, corporate research labs and several technology startup companies prior to GORGES. His expertise is software architecture, database development, and system administration. Matt brings GORGES over 25 years experience developing fast and robust software on a multitude of platforms and languages.

What Hosting Do I Need?

Tuesday, September 15th, 2009

Choosing a hosting service is important, and there are many choices to make.  Here are some tips to help you make your selection.

The first step is to determine your business requirements.  The criteria should be reliability (or uptime), performance, support, and cost.  Try to estimate the cost of downtime, because that value should factor in your hosting decision.  If a day of downtime costs you thousands of dollars, then reliability is very important.

The cheapest hosting is to purchase an account on a shared server.  Your domain is one of perhaps hundreds or even thousands that vie for the server CPU, memory, and bandwidth.  If your site is slow, it may be difficult or even impossible to diagnose why since the fault may be with another domain on the same server.

The next level up is a virtual private server (VPS).  In reality you are still sharing the server with other customers, but there are separations between these relatively-independent operating systems so they affect each other less if problems on one arise.  The term “cloud computing” is really just another name for using virtual private servers, although often the cloud computing control panels make it easy and fast to add and remove VPS units as your domain needs change.

If you want the whole server to yourself, then you can hosting on a dedicated server.  This is all about control – there are no other customers to contend with if you are the only one using the server.  Note that you may need an experienced system administrator to help if you are setting up your own dedicated server.

If your domain outgrows a dedicated server, then you have graduated to a cluster solution.  You will have new challenges regarding sharing session management and your database between multiple servers.  It should also be mentioned that cloud computing supports clustering with their VPS machines, which is cheaper than a custom-built clustered solution.

At Gorges, we offer shared-server and dedicated-server hosting solutions to our software development clients.  We have two co-location facilities that we use in Ithaca, New York, and our servers are monitored constantly.  Since we do our own hosting, we can add software packages or customize the server configuration as-needed for our clients.

worked in academia, corporate research labs and several technology startup companies prior to GORGES. His expertise is software architecture, database development, and system administration. Matt brings GORGES over 25 years experience developing fast and robust software on a multitude of platforms and languages.

Securing Linux Web Servers

Monday, July 20th, 2009

We are often asked by our software development and hosting customers how we secure our servers.  We have several layers of security protection, and this blog posting will mention some that we implement.

A firewall is used to only allow traffic to the outside world on a few of the TCP/UDP ports.  We obviously have to allow web and e-mail users access to the server, but almost all other ports can be closed to prevent intrusion attempts.  On our newest servers we even prevent FTP and Telnet access, since those protocols rely on unencrypted packets which are easier to intercept and hijack.

Every day we have perhaps dozens of “dicitionary” attacks that try to gain e-mail or user account access.  A dictionary attack picks a user (for example “root” or “john”) and then goes through a long, long list of possible passwords.  We use two packages Fail2Ban and DenyHosts that monitor our log files looking for dictionary attacks; if found, the originating computer is banned from accessing our servers.

When we develop online shopping solutions, we choose to not store credit card numbers online.  We securely pass this information to the credit card processing vendor, and then we only record the order information and the payment confirmation number.  For some web sites with user accounts, we encrypt the user account passwords, therefore gaining access to our user password list would still not result in someone gaining access to their online account.

Some of our hosting customers are concerned about unencrypted web traffic.  We occasionally add a feature that automatically forwards a web page inquiry from non-SSL to SSL mode, which means it forward to a page starting with “https://” thus all traffic is encrypted between our server and each web browser client.

We also have logging records and constant monitoring to help us detect intrusion attempts and help us implement even better security measures.  “Tripwire” software can also alert us when certain files are modified.

Do these basic measures above make us impervious to hackers?  Alas, no.  On two occasions in the last five years we have had hackers penetrate one of our servers.  However no damage was done and we patched those specific holes quickly.  Security is a cat-and-mouse game, and we strive to stay one step ahead.

worked in academia, corporate research labs and several technology startup companies prior to GORGES. His expertise is software architecture, database development, and system administration. Matt brings GORGES over 25 years experience developing fast and robust software on a multitude of platforms and languages.

Server Control Panel Comparison

Friday, May 22nd, 2009

Control panels are used to configure and maintain servers by both system administrators and users.  Years ago all the server settings for web, e-mail, name service, and other packages were done manually by system administrators.  Nowadays there are powerful control panel packages that configure the settings based on relatively simple interfaces.

We have experimented with several Control Panels for our Gorges servers.  This blog post is not intended to be an exhaustive list of all control panel software, but rather a summary of our own experiences.

PLESK and CPANEL

These two solutions are terrific, and they present the obscure settings needed to control web/e-mail/etc. packages in a meaningful way.  The biggest drawback is the price.  At Gorges we run our servers “lean” with relatively few customers per server; this maximizes the web page performance.  Adding a commercial control panel to our servers would be costly and our hosting fees would have to rise perhaps unacceptably.

We do not host all the web applications we develop, and we often work with Plesk and CPanel on customer-supplied servers as well as one of our own.  These packages work, but for hosting companies to justify their cost they either overload the servers with domains or use virtual machines to squeeze more clients onto each server box.  You get what you pay for – and it can be truly frustrating when your domain is hosted on a server that has other customer domains saturating the bandwidth and processors.

WEBMIN

Webmin is perhaps the simplest of control panels, and basically just adds web page interfaces for packages.  We used this for a while, but the settings were so low-level that one had to be a system administrator to understand the screens, so the improvement was only marginal since most savvy sys-admins know the text interface already.  The companion package Usermin was perhaps more useful to the customer since it is for configuring e-mail accounts.

VHCS2

We have VHCS2 installed on most of our production servers.  This decision was made almost five years ago, and it took months to both learn all the nuances of how it works and to develop some custom solutions for important-but-missing features such as name service records and backups.  Although we liked VHCS2 at the time, work on this open-source package has apparently stopped, so it is stuck in time while better control panel software has surpassed the supported VHCS2 features.

ISPCONFIG

When we purchased several 64-bit quad-core servers in late 2007, we reviewed available control panel solutions.  The package ISPConfig was selected since it appeared much better than other control packages and was under open source license (i.e. free for us to install and use).

ISPConfig is not without problems, but we have extended this control panel solution with custom patches for grey-listing and spam filtering, propagating domain name service (DNS) records to our production name servers, and integrating it into our system-wide backup.

SUMMARY

Perhaps the biggest drawback of all control panel solutions is that it is not easy bypassing the panels and doing custom configurations for special-needs clients.  It’s pretty obvious that labor costs much more than hardware or bandwidth nowadays, so automating as much of the account setup and maintenance is the key to staying profitable.

As for us, we’ll keep using ISPConfig and passing the cost savings for hosting back to our customers.

worked in academia, corporate research labs and several technology startup companies prior to GORGES. His expertise is software architecture, database development, and system administration. Matt brings GORGES over 25 years experience developing fast and robust software on a multitude of platforms and languages.

Server Backups With Minimal Bandwidth

Friday, April 10th, 2009

Most of our production servers are at our main co-location facility.  All these servers have second ethernet cards so monitoring and between-server file transfers will not interfere with normal web and e-mail traffic. hard drive This “shadow network” is used for our nightly backups.

One server has a lot of disk capacity and is the backup device.  For years we did a monthly/weekly/daily archive of each server over the shadow network.

However we expanded to a second co-location facility, and also run a “grid” machine at a 3rd site.  Backups now started eating a lot of bandwidth, and the monthly backups were not completing during the overnight hours.

So what to do?  Web sites consist mostly of files that do not often change, so really most of the files do not have to be backed up daily.

Our solution was to have our backup system mirror the other server files, and then at night only copy over the changed files.  We used the rsync utility to determine and copy the changed files, and then a daily archive is created that compresses the files into a single archive.  This solution also means less processing on the production servers.

Once the files are synchronized on the backup server, then a compressed archive is created and stored away.

There is also a filtering done on the files so that we do not back up temporary files or non-critical system files.

The end result is that we use our bandwidth packets sparingly.  We have backup archives, without saturating our Internet connections getting them offsite.

worked in academia, corporate research labs and several technology startup companies prior to GORGES. His expertise is software architecture, database development, and system administration. Matt brings GORGES over 25 years experience developing fast and robust software on a multitude of platforms and languages.
©2012 GORGES - All rights reserved
where programming meets design and lives happily ever after