Wednesday, September 24, 2008

LPT730 Lab #3 - Part 2, The Robot Exclusion Standard

The Robot Exclusion Standard is a voluntary standard by which web spiders and other automated downloading programs can avoid downloading content that's otherwise publicly available. The need for such a standard came about because search engines and other legitimate robot users attempted to download inappropriate content such as a cgi-bin directory containing programming code inappropriate for a search query. While this standard is voluntary, it's a good example of an imperfect solution on the Internet.

Given the current nature of how the Internet communicates, it's highly impractical if not impossible to hide content away from a subset of visitors to your web site. It would take nothing short of a redesign of basic protocols such as HTTP to make this happen. So a cooperative state has evolved where a website author creates a file on the site called robots.txt telling web-crawlers and other robots where they are and are not welcome. Here's an example of a robots.txt file that that asks all robots to refrain from download any files from the entire web site.
# Tells Scanning Robots Where They Are And Are Not Welcome
# User-agent: can also specify by name; "*" is for everyone
# Disallow: if this matches first part of requested path, forget it
User-agent: * # applies to all robots
Disallow: / # disallow indexing of all pages
Here's an example that asks crawlers to avoid download cgi code
User-agent: *
Disallow: /cgi-bin/
Disallow: /Ads/banner.cgi
Will doing this prevent all unwanted downloading? No. As it's a voluntary standard, some unscrupulous people will download whatever parts of your site they wish to. However, it still makes sense to create the exclusion file because the majority of users will obey it and thus a website owner can save significant money in download bandwidth and headaches by having a website that's under an appropriate load. For more information about the robot exclusion standard, try this FAQ or the references below.

References
--
The Web Robots Page - http://www.robotstxt.org/
Wikipedia - http://en.wikipedia.org/wiki/Robots.txt
Web Developer's Virtual Library - http://www.wdvl.com/Location/Search/Robots.html

Tuesday, September 23, 2008

LPT730 Lab #3 - Part 1, Phishing

The term "phishing" describes the act of trying to fraudulently acquire private and sensitive information from someone for criminal purposes by pretending to be a legitimate entity. A common example of phishing is an email that looks as if it came from your bank, informing you that your bank card has been accessed in some far away country and that you could be out some money. It's a common tactic for the message to try to prompt a strong emotional reaction (e.g., panic, fear or greed) from a potential victim. The message then points you to a link that when clicked, displays a web page asking you for your card number and PIN in order to verify your card's activity. But both the email message and the web page are fraudulent. They're designed to look exactly as if they've come from the actual bank. If you enter the information it won't be long before your account will be empty.

Early phishing attempts of this type could be detected by moving the mouse cursor over the link in the email message and looking at the control bar. If the web address displayed wasn't the bank's, you knew your were being lied to. But because today's email messages can have embedded javascript (programming code) that alters a browser's status bar, it can be almost impossible to detect a phishing attempt. Phishing doesn't have to occur on your computer. You could just as easily get a voice message from someone claiming to be your bank leaving a number to call back and because they use a voice-over-IP (VOIP) phone number and false caller ID information they could appear to be legitimate.

Some Tips to Help You Avoid Phishing Attacks
  • Don't click on links in an email to go to a website. Use your bookmarks or type a trusted address into your browser's location bar.
  • Don't call the phone numbers that come in emails. Use a number from your paper statement or from the company web site.
  • Update your web browser. Microsoft Internet Explorer 7 and Mozilla Firefox 2 or later contain anti-phishing features. These are the oldest versions you should be using.
For a more complete list of tips try here.

References
---
Anti-phishing working group - http://www.antiphishing.org/
The Phishing Guide - http://www.technicalinfo.net/papers/Phishing.html
Wikipedia - http://en.wikipedia.org/wiki/Phishing
RCMP - http://www.rcmp-grc.gc.ca/scams/phishing_e.htm
Repoting Economic Crime On-Line - http://www.recol.ca/

Monday, September 22, 2008

SPR720 - BASH Scripting Lab

BASH (Bourne Again SHell) is a command processor for POSIX operating systems such as Linux. Not only does it provide a command line interface, it provides scripting support by letting me place commands into a text file for later execution. BASH and its scripts give access to redirection, pipes, regular expressions and the panoply of Linux command line utilities. This is indeed a very powerful set of tools. As the possibilities of the command line open up to me I'm finding very little I can't find out about or get done inside the shell. But when I try to write scripts I find the language to be non-intuitive and just plain difficult. While there are many things that make BASH difficult for me, two stand out:
  • It's variables are untyped or (for practical purposes) just one type - string. In order to use numeric values stored as strings, there are some cryptic work-arounds.
  • 0=TRUE. In every other programming language I've used, it's the opposite. It reminds me of a scene from a movie called "The Gods Must Be Crazy" where in an African tribe, people nodded their heads to indicate "no" and shook their heads to mean "yes". At least once a script I get tripped up by this. I know it's because the Linux exit status for "job well done" is 0, but it's still a drag.
Sometimes the difficulty is compounded when both of these "gotchas" come together in forming conditionals or when evaluating strings and numbers inside the same complex expression. It's enough to make me chew the inside of my cheek. But thankfully the Internet came to my rescue, again. I found some great resources that made the process of scripting bearable, if not enjoyable. I hate to admit it, but without these resources I would not have finished my lab. So my thanks to their creators and contributors.

Here's a sample script I did for the lab:

#!/bin/bash
#
# lsname.bash # This script prints out the longest and shortest names in /etc/passwd
#


SNAME="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

LNAME=""

while IFS=: read USER PASS USERID GID GECOS HOME SHELL
do
if test ${#USER} -gt ${#LNAME}
then

LNAME=$USER
fi

if test ${#USER} -lt ${#SNAME}
then
SNAME=$USER
fi
done < /etc/passwd


echo "Longest name in /etc/passwd is $LNAME - ${#LNAME} characters"
echo "Shortest name in /etc/passwd is $SNAME - ${#SNAME} character"


---
Bash Resources
http://tldp.org/LDP/abs/html/index.html
http://wooledge.org:8000/BashFAQ
http://www.tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html

Btw, this post is not meant to be flame bait. I just needed to vent a little.

Wednesday, September 10, 2008

LPT730 Lab #1 - Part 2 - Bill C-61

What is Bill C-61?
Bill C-61 is an amendment to Canada's Copyright act intended to bring Canadian law into accord with treaty obligations agreed to when Canada joined the World Intellectual Property Organization (WIPO). If you go here you can get a good list of specific "dos", "don'ts" and potential fines. In simple terms, the argument for the bill is that content creators deserve to be fairly compensated for their work and that society should work against consumers who use the Internet and other technologies to freely share copyrighted content. The argument against states that this law's penalties are excessively punitive and would give the content creator the right to charge the consumer again and again for services that add no real value such as format shifting (e.g., music from a CD to a MP3 on digital player). To avoid confusion, the bill does allow "format-shifting" for any content without any digital rights management(DRM), but if any exists it's illegal to attempt to circumvent it. So in many cases format shifting would be illegal (e.g., copying a movie from a DVD and converting it into a smaller MP4 to watch on a notebook computer while traveling somewhere).

What is with Bill C-61?
Bill C-61 was about to become law but Canada's parliament was recently dissolved because an election was called. This happens to many bills that are excessively controversial. For those trying to get or remain elected this type of issue is a no-win situation. Because there are far more consumers of media than there are producers, it can only lose votes to a politician. Whatever side you come down on, I think it's safe to say that out next government will show equal skill in procrastinating on this issue.

References
C-61 - Text of the act
Wikipedia summary of bill C-61
Michael Geist's Fair Copyright For Canada blog

SYA710 Lab #0 - Valt-X controllers

A Valt-X Desktop/Server Security Controller is a PCI card that stores a pristine system image on a portion of your hard drive that is never accessible from your operating system. Each time the system boots up, the image is copied to the partition of the drive that performs the boot, thus the computer runs the same OS image all the time. You can also have partitions that are left alone for data storage. In other words, this card lets you automatically restore a system to it's original, "IT-blessed" state just by re-booting. To test this I booted into Fedora 8 on my test machine and proceeded to use fdisk to wipe out the disk's partition table. Before I could begin to hyperventilate, I rebooted the system and it was as if nothing had happened. There are several system administration scenarios where this kind of insurance would be very valuable, such as after:
  • a virus or mal-ware infection
  • a user installs software against company policy
  • accidental file deletion(s)
While there are comparable solutions that don't involve adding a card to your system and taking up some of your hard disk space, they usually involve more complex administration policies. Be warned though that if you don't have experience in building a good base system image, this card can work against you. If your just a regular user, you can easily put an infected program onto your boot image and reactivate it every time you re-boot your system.

All in all though, this card is a valuable tool to guard against the malicious forces of the Internet and your users' own mistakes. That translates into fewer headaches and late nights for the IT staff in your organization.

Other products that make similar claims
PC Guarder : Hard Disk data recovery card
Creative IT UK data recovery card
Lenten Technology recovery card

SPR720 Command Lab - Examining core Linux directories and commands

I've spent some time in the last week examining the /bin, /sbin, /usr/bin and /usr/sbin directories on my Ubuntu Linux box and my first conclusion is that a week is not nearly enough time to gain a clear understanding of their contents. Thus I turned to the Internet to get help on the subject and found the following links helpful:

http://www.tuxfiles.org/linuxhelp/linuxdir.html
http://doc.vic.computerbank.org.au/tutorials/linuxdirectorystructure/

/bin - 106 files
The /bin directory contains programs that provide basic text and file manipulation along with basic system information. These programs are so basic that they are used by both end user and system administrator alike. Examples of programs in this directory include ls, pwd, rm, cp, bash, su, mkdir, rmdir and touch. One thing I notice is that Gnome and KDE (among other graphical shells) are evolving to the point where even these basic progams will not be needed by casual users.

/sbin - 153 files
The /sbin directory contains the most basic and most critical system configuration programs. Anyone who's had to struggle with installing Linux will have had to learn some of these commands. They include mkfs, fsck, parted, hdparm for disk manipulation; ifconfig, ifup, ifdown, iptables, route and others for network management; sysctl, lsmod and modprobe for kernel information and control.

/usr/sbin - 236 files
This directory contains many essential tools for system administration. While they are not essential for getting a system up and running, they are essential for getting it set up properly. User and group manipulation commands include: adduser, addgroup, deluser, delgroup. It
also contains programs to control system programs called daemons including: cupsd, cron, ntpd, tcpd.

/usr/bin - 1279 files
This directory contains end-user applications and numerous tools and utilities. To start with, just about every application one can invoke from the Gnome menus can be invoked from the command line in this directory. Many supplemental programs related to the "menu-available" applications but which are not available from the menus are in here too. Finally there are many small but powerful and useful utilities. Here's a small list of some of the programs I found interesting:

Sound Control - alsa, alsamixer, amixer, amidi, pacmd, pactl, paplay

Software Installation - apt-get, aptitude

Batch Command Execution - at, batch

System Settings / System Information - charset, charmap, free, getconf, getkeycodes, locale, lspci, peekfd, pmap, w

CD/DVD Creation - cdrdao, cdrecord

Miscellaneous Tools
bc - a powerful calculator
expand - converts tabs to spaces
find - find files
file - tries to determine a file type
fmt - text formatter
fold - text wrapper
head - display the top of a file
tail - display the end of a file
join - joins files together
ftp, lftp, sftp - file transfer programs
lp - prints files
man - displays system documentation
nice - run a program with modified scheduling priority
passwd - change user password
paste - merge lines of files together
pg - browse pagewise through text files
sort - sorts text files
split - splits a file into pieces
ssh - securely login to another system over the network
watch - runs a program and occasionally displays the output
xxd - make a hex dump of a file
yelp - a graphical help browser that can display and navigate info files


Some final notes:

There are many symbolic links in the directories. These serve to give access to the same program under many different names. The reason for doing this is to preserve compatibility between current and older version of Linux. It also provides some compatibility between the Ubuntu/Debian flavour of Linux and other Linuces and Unices.

The above lists are not meant to be comprehensive. They represent a snapshot of what I now know and am interested in. Please bear this in mind.

Tuesday, September 9, 2008

LPT730 Lab #1 - Part 1 - Software Patents

Software patents are confusing. A patent is basically a trade between an inventor and the government. The inventor discloses to the government exactly how the invention works and if it is unique, the government grants the inventor a monopoly to manufacture, sell and export the invention for a number of years (usually 20). Patents are designed to promote innovation by allowing the inventor to offset the high cost of inventing something with a guaranteed number of years to recoup costs and return a reasonable profit. While this system has worked very well for many industries, it has proven problematic when applied to software.

The first and most basic problem is the notion of locking up an innovation for 20 years makes no sense in the software world. Software evolves too quickly to be tied down for 20 years making many patents useless. An example of this is the patent granted to Unisys for its LZW compression used in GIF graphic image files. The patent is still in force and though Unisys has granted royalty-free licenses to many groups, it did no good. Unisys was universally hated for enforcing the patent and the software industry moved on to a patent-free technology (PNG).

A second problem is that patent offices have a very inconsistent record in evaluating what software is patentable. A good example is the patent granted to Amazon.com regarding it's 1-click technology. This technology is so technically obvious that granting one company exclusivity is clearly unfair to its competitors and indeed to society at large. But the patent was successfully enforced against an Amazon competitor (Barnes and Noble). This success has given rise to a third problem - patent trolls.

Patent trolls are unscrupulous companies who seek out and acquire and/or enforce patents solely on the basis of potential value through litigation. A recent example is a company called NTP which sued Research In Motion (RIM) and Palm Inc. (among others) claiming that their patent on mobile email was being violated, despite systems that existed in the public domain before NTP received its patents. The result was a big payday for NTP and wasted money, time and reduced innovation from RIM. Just as in the Unisys case, the goal of promoting innovation was hurt by the patent - exactly the opposite of what was intended.

Another problem with patents is that they are not universally applied around the world. Each nation has its own set of laws regarding what can be patented and how long a patent will last. For example, the European Community doesn't not allow patents for software while the United States has very broad guidelines regarding what software can be patented.

While there's been much discussion on how to change the patent system to deal more fairly with software, little concrete action has come about in North America. Until changes are made, this jurisdiction will have to endure the extra hassle and inefficiency that comes from regulating obvious innovations.

References
An article from LawMart.com about the differences between copyright, trademark and a patent
Wikipedia Article on Software Patents
Wikipedia Article listing notable Software Patents

Sunday, September 7, 2008

LPT730 Lab #0 - Part 2 - Two Pieces of Software I Regularly Use

Myth TV is a homebrew personal video recorder (PVR) that let's your Linux computer greatly enhance your TV viewing experience. While writing this, I'm watching a movie called Adam's Rib staring Spencer Tracy and Katharine Hepburn. I recorded this movie on August 31 knowing I wouldn't have the time to watch it until much later. This kind of time-shifting hasn't been new since the VCR was invented, but for my Myth box this is just the tip of the iceberg. While watching this movie I'm also recording the Belgian Grand Prix and flagging the commercials on another recording made last night. I can also:
  • pause, rewind, play in slow-motion (both recordings and live-TV)
  • skip forward to pass commercials or boring bits
  • skip back to review something I didn't get the first time
  • put multiple TV tuner cards in my computer (currently I have 3) and record many channels at once
  • burn what I record onto a DVD
  • play and archive my DVDs
just to name a few. For a more complete list click here. It's hard to convey how much better TV is when not encumbered by commercials and the strict constraints of a broadcast schedule. I really notice the difference when I visit a friend with regular TV and sit there biding my time, silently reciting some mantra during the commercials or stifling my annoyance at the fact that I just missed some important detail and can't rewind. Such rank inflexibility could only be designed by the advertising community and I see the PVR as a natural and sane response by the viewing public. At this point I would most likely stop watching TV if no PVR were available. Here's a screen capture showing some of my media library.


iTunes on Windows is one of the last proprietary applications I still use. While I've tried gtkpod and YamiPod, I've found them both more hassle then they're worth. You can say what you like about Apple's proprietary business model and expensive hardware, but they design software that satisfies my needs and requires virtually no training or learning curve. Here's a snapshot of my music library in iTunes.


I surmise that one day a team will put together a Linux distribution with this much ease of use as a primary goal. That would be a good thing.

Wednesday, September 3, 2008

LPT730 Lab #0 - Part 1 - A Bit About Myself

I've a degree in geography and skills in developing and using Geographic Information Systems. I'm self taught in some older technologies such as xBase and VB (up to version 6 / pre .Net) and Microsoft Access. My interest in computers and open source is both practical and philosophical.

Computers are likely the most flexible tools for managing information ever invented. I think back to the days before the Internet, Google and Wikipedia (to name a few) and I remember how so many arguments were won by the person who yelled the loudest and longest. Life is saner today but by no means perfect. While these new technologies ease the aggregation and communication of knowledge, they also bring about a new set of problems - information overload for example. Today it's easy to get sidetracked by:
  • superfluous details (too much information)
  • lies and half-truths (false information, misinformation)
  • errors (incorrect information)
  • obsolete information
I find myself constantly developing new techniques and learning skills in order to better navigate around these ever-present distractions. But would I go back to the way it was? No way. From poker to recipes to personal finance to finding the closest beer store that's still open, the Internet has greatly contributed to my knowledge and appreciation of the world. I like to use open source software (OSS) when I can because it generally does a better job helping me get my work done and almost always for a much lower cost. But as important a reason as this is greater freedom of choice. OSS file formats are better documented, letting me more easily migrate my data to another software tool should I need to. OSS gives me access to the source code, so many small changes can be made with my own resources. I may not be able to fix the code myself but I can hire someone to fix a bug or add a small feature and thus my problem gets solved. I have no such recourse with proprietary software. With OSS, no vendor is forcing me to "upgrade" to a new version with "features" that I don't need just so I get a security fix and the vendor can meet their quarterly sales quota. OSS isn't perfect, but I find it's constantly improving and I've never had cause to regret my choice to use it.

Blogging For School

First off, I've never had the desire to blog nor the need to until now. But I've just started the LUX course at Seneca College in Toronto and one requirement is to submit much of my homework through a blog. Other new skills I need to get a handle on quickly are using IRC and publishing on a wiki. None of these are really intellectual back-breakers but it's taxing to learn them all at once. Such are the travails of this middle-aged brain. The name of the program is "Linux/Unix Systems Administration" but I think it might be more accurately entitled "Linux/Unix Systems Administration and Open Source methods of collaboration and productivity". In spite of this curve ball, I'm excited to be here. Rather than lament the fact that there's extra stuff to learn, I choose to buy into the idea that I'm getting two courses for the price of one. If I can handle the work load I'll come out with a really nice set of skills. Here's to thriving in the world of Linux and Open Source.