IT-Consultant: Charles Pegge > Software Design and Development Issues

Protection Considerations

<< < (2/2)

Donald Darden:
Since we are talking about customer data, we might want to turn 90 degrees at this point and think about the manner in which customer records are built up.  I've mentiond data bases several times, but only portions of the customer data may actually work well in typical data structures.

Every list represents a type of data structure, usually with an x-y or row-column referencing system.  A phone list, or mail list, or book list, or list of expenses are all examples.  These are familiar concepts and translate well into arrays with one or two indexes.  In Excel or other spreadsheet, you frequently break a record up into a number of cells, often separating specific fields from each other. the Alignment is sommonly on vertical columns, and we have headers to identify the contents of the individual cells below that point.  You might have header columns named Last Name, First Name, MI, Phone No, Address, City, ST, ZIP, DOB, and so on.  It's a pretty good way to consolidate similar information and categorize it.

But suppose you wanted to expand your basic list to handle customer accounts.  You might want to add business address, shipping, address credit lender, credit card number, expiration code, account number, items bought, items paid for, items shippede, when shipped, returns, issued RMAs, account balance, and so on.

The trouble is in some of these cases, is that the data is neither static, nor is it necessarily on a one-for-one basis.  The customer may have a long running account, and may have moved a number of times, spedified different shipping addresses for different orders, used different credit cards on different purchases, had maultiple transactions, and each transaction involves a number of items, a quantity of items, weight and shipping costs, different carriers or shipping methods, and on and on.

Giving up on the spreadsheet approach, you may decide to try a TYPE structure and use a whole range of field elements, trying to work out the maximum size of each field, and all the possible fields that the client would ever use for his customer accounts.  Now there are several things wrong with this approach.  First, you are asking the client for the very last word on how many fields, the absolute maximum on the length of every field, and other qualifiers that the client can't possibly know.  Even if you explore all his existing records, the maximums you come up with would still prove to be inadequate at some point in the future.  And his business might change, which would necessitate a lot of redesign work, code revision, and database rework.

More likely, you may decided that one type structure would be inadequate to represent everything.  So you dream up multiple types that all bear at least part of the information.  You may decide that you will have type structures for the individual, for the account, for each order, for each transaction, another for each shipment, and so on.  You hope to use some common linking mechansim, possibly based on the account number, to pull everything together.

Alright, so let's say you somehow succeed in outlining an approach that looks like it should work and do what you want it to do.  But then the client says he wants to be able to look at a client's whole history of purchases and prior transactions that goes back years.  You realize that your model has only taken into account the current and last known information - the data in a flat model would be overwritten by updates and changes, even by new orders.

In looking about for a better model, you might realize that a log file or type of diary forms a written record that reflects changes made and when, and can even indicate who was responsible.  Let's just call this a journal, although the same name is used with a slightly different purpose in some other applications.  You could also call it a ledger if you prefer.

The idea of a journal or ledger is that once written, it is suppose to be inalterable.  (I use inalterable rather than unalterable to indicate that this is a initiative-based effort rather than an actual physical restraint).  You merely note any changes through additional entries,  Thus you can begin at the beginning and work to the present, or from the present and work back to the beginning, or review the state of the account at any point.  You can also review any transactions that had happened up to that point as well

The question is then, if you finally elect to rely primarily on a journal or ledger approach, how do you make this work?

Think of this as though it were a movie film.  The movie film resides in a series of containers, each representing about 15 minutes of playing time when loaded into a projector.  But you don't see the whole 15 minutes at once, you have to wait until a portion of the film is framed in front of a light and focused through a lens in order to be recognized.  If the film has to be stopped, You have a marker to where you are in watching the film, and you have the choice to continue from there, or begin again at the beginning,  With some projectors, you can even play it backwards if you choose.

Let's begin with the idea of creating a single frame that will become a part of that long film.  The frame would be representative of any form data that we have collected, which might be typlified by the use of some type structures.  We might actually begin our process by describing the forms or the type structures we are going to use, just in case they change later.  Or we might arbitrarily just begin with linked data fields that are simply strings with associative names.  And it could be some combination as well.

Here is an example:  We imagine the user wants to begin with a new account, so we provide that as a menu option.  If that option is picked, we want a minimum set of questions answered and verified so that we can assign a new acount number and begin tracking whatever might follow.  So the user indicates that all the data fields are complete, and we want to first validate, or have validated, the information provided.  Then we automatically assign a unique account number.  This will be our first journal entry foir this account.  But rather than write the form data to the file alone, we are going to write the field names followed by the data.  And we will encode it to protect the contents as well. using a technique similar to that has been already discussed.

Now we have many choices at this point:  We could add it to a general ledger, or we could begin a new journal just for this account.  We could even do both, as a means of trying to keep our data safe.  If we do this in a general ledger, we need to mark the point where this new acocunt begins so that we can return to it later.  We can do that as part of the initial account information, then write the whole thing into the new account journal file to start it off.

When the customer adds another credit card, places an order, gives a different shipping address, or whatever, our processes have to allow for this.  In order to manage some of it, we have to let the customer see his account data and make changes.  When the customer confirms thechanges, we can write either the individual changes or the whole account record out to the journalling processes again.  But we probably want to keep a copy of the account record with the current information somewhere handy, wich probably would be a record of accounts file.

The shopping cart is the way we relate to new purchases, and what we do is essentially a checkout and bagger operation, with the transactions going into the journal files.  In our account journal, we can merely append entries to what already exists, since it is all about that one account.  But in our general ledger, we have to recognize that we have a threading situation, since everything else is going in there as well, for all accounts and all other activities.  Now a thread is really more like a linked chain in this situation,  You have a record that has to be embedded among other records that are not related, and you have to point to the last related record, and allow a place for a pointer to the next related record.  That is two fields that have to be made part of the journal entry.  Each journal entry also has to have a length field included.  Now if we were not encoding the data, then we could probably get by with ASCIIZ strings which are null-byte terminated, but our encoded data can accidently flag portions of our record as any other character or character combination, so best to have the length clearly marked for the purpose of getting the exact number of characters back.  This is the form our general ledger record might have:

--- Code: ---[prior rec ptr][next rec ptr][rec len][####################]. 

--- End code ---
The first three square bracket sets represent the prior record
pointer or offset in the file, the next record pointer or offset, and
the length.  The actual record entry, encoded, is represented by
the square brackets with pound signs between.  For the individual account file, you just need the [prior rec ptr][rec len] and the record. or the [prior rec ptr]
[next rec ptr] and the record.  That is because without intervening records from other sources in that file, one of the fields can be deduced.  On the other
hand, if you retain the three sets of brackets, it gives you another form of sanity check to the contents of your files.

You probably have to deal with other journals as well.  For instance,
each order has to deal with existing inventory, orders to the packing and shipping department, notification to the shipping company, the printing of shipping material, and of course charges or payments.

The interesting thing about the journalling approach is that you can invent new pieces as you go along, and you don;t have to be concerned about field sizes or anticipating everything.  Suppose a customer wants to add a second credit card to his account, or make payments via PayPal.  You figure out how you want to update the account info and use it, but it does not nullify or alter any accounts that do not have that requirement, and the only effect is that you now might have additional field names like credit#2, that show up in the journal process.  When they come up in the future, your program can reflect them as appropriate.

Donald Darden:
Over the years, I've read many posts from aspiring programmers that wanted to learn how to transistion to programming full time.  Back when I got started, being able to program was considered an art, and there was little competition.  You could pretty much call yourself that, and the work came looking for you.  In fact, it was common to see a poster with a chimp's picture with the words: "Two weeks ago I culdn't spell Programmer, now I are one". 

This is not true any more.  Lots of people program now, even it it is just managing a number of applications with script files and processing some simple information on a computer.  So the degree of how much programming in involved, and what specific knowledge is required, become the real issues.  And there has been a lot of specialization involved.

The discussion here has been with some of those people in mind, the ones that have asked what it takes to become a programmer.  A lot of people decide that it really requires a continuous and ever deeper study of the art of programming itself.  It is sort of like an aspiring painter that enrolls in one art course after another, always striving to learn new methods, techniques, and way of achieving effects.  There is no doubt that there is value in doing this, but where is the transistion from being a mere student of the art, to becoming an artist?

Painters have to know what to paint.  Programmers have to know what to program.  Painters can look around for possible subjects, and be creative, hoping that others will appreciate their finished work.  Others paint on commission, where someone else decides what needs to be painted.  Programmers may face similar choices.  But people don't buy programs for their beauty as static objects, they buy them for their functionality or entertainment value.

If you have followed the earlier discussion, it's probably occurred to you that in an effort to write a business level application, that it would probably help to know something about the business itself, or something about how businesses operate.  If you aren't into business management, you might have found some of the discussion more into things you weren't really aware of, or had failed to consider on your own.  If you have a background in business, you might have been amused by the many considerations that were not even considered, such as taxes, commissions, coupons, cash flow, the various roles that agents have (sales, advertising and promotions, order processing, customer service, tech support, warehouse, quality assurance, buyers, returns, and others).

You might get the idea that maybe you would need to know quite a bit about the needs of the client in order to write a program that would integrate into his operations effectively.  That would be a good thought.  It suggests that perhaps the role of the programmer is not really about programming as much as it is putting the computer to work to benefit the client.  If you are going to be a one person business, you have to be able to meet the client on his own grounds, and the more you know about what he has to deal with, the more you have to offer when discussing your role and function.

You can also consider the team approach, where you join with people that have complementary knowledge and skills.  For instance, if you look at the needs of small businesses that want to grow, and you think there is a future for you there, then either you or someone in your team should have the expertise to offer to help make this happen.  And you may find that it is less about writing new code, then finding existing code and processes that would fit right in and work for that client.  It could easily turn out that as the team programmer, you have little to contribute on your ownm except for your knowledge of what is available and best suited in each instance.

Like a doctor who once had visions of being a successful surgeon, you might find your life religated to listening to people cough, voice their complaints, looking at test results, and prescribing medication.  Your future in programming may not be what you envision it to be, because just as the doctor found, you may be outclassed in your preferred area, and forced to serve the client's needs rather than sticking to your original goals.

Donald Darden:
Now let's look at some of the many, and often very good reasions, for avoiding the use of any type of data protection, and possibly for not trying to protect your program as well.

The first is the element of trust.   Do you like dealing with excessively suspicious people?  It tends to be offensive, doesn't it?  When you work for a client, you expect him to trust you and your work.  So if you write code that locks him into a dependency on your software to access it, it heightens the need for trust to an extreme degree.  You may have to work hard at building that trust factor in your relationship with your client by being frank and direct, and avoiding any signs of withholding critical information from him.

Another factor is temptation.  There is no doubt that keeping secrets is a form of power, because it gives you a way to avoid oversight and supervision.  Even if you are as honest as the day is long, the client or someone else may question just how sure they can be about whatever it is you are masking with your code.  You may have to deal with accusations and suspicion, which can really damage your relationship with your client.

A third factor is the lack of standards.  There is no question that at least a part of your code is noncompliant with established standards.  Now that is not in itself a bad thing, because standards are meant for information exchange and the use of proven, common elements, but this can also make you be seen as a renegade or someone who is going against the norm.

A fourth factor, as strange as it may seem, could be a legal one.  It may in fact be against some laws or regulations for you to render data into a form that cannot be easily read by the government.  This is a very murky area at best.  There is always a struggle between what the government wants to know, how much it can legally know, and how much right to privacy you or our company is entitled to.  Just because you feel it's your business, the government may question your need to keep such secrets.  You already see where the government is going its best to secure the right to access records, be given a back door into methods of public encryption, and are believed to be watching all manner of communications for threats against the country, the government, or members of the government.

A fifth factor is the matter of audits.  Audits are performed by thrid parties that validate existing records and transactions.  Audits can be internal or external, and on the behalf of the owners or by another party, such as the IRS or other government agency.  It could be ordered by a bankrupcy court, or even requested by some owners to ensure that management has been doing a good job.  It could be done by management to ensure that employees have been honest and doing their jobs as intended, or to explain unexpected losses.

There seems to be no doubt that any efforts to conceil data from being accessed by hackers and unscrupulous employees will run counter to the needs of others who feel entitled to access it for independent verification.  Now you could position yourself that you will help any legitimate claimant to that data to access it, even provide basic tools for the purpose that are not generally available, but whenever someone else is forced to change the way they do things to the way you allow, it creates friction, anger, and often distrust.

The code given above shows how simple the act of encoding and decoding information really is, and there are so many methods available that it may seem strange that it is not done more frequently.  But most business and computing matters benefit from a high degree of openness and cooperation.  It just seems to go counter to those involved with either to hide so much.  However, if you really want to protect your data, if it is that important, you may have to think about doing the unthinkable.  And one way to halp keep the negative side from getting out of control is to keep it to yourself.

There are probably some fairly happy compromixes that can be worked out as well.  For instance, You might be able to arrange to encode records within a standards-based data structure.  People could access the data structure, and the decoding just has to be done internally to the program using the database,
The decode function could be made a DLL that the auditor could call as part of their auditing methods.  It's just something that needs to be thought out.

Another consideration:  Much of a personal record is written in a way that you can tell if it has been tampered with.  For instance the name Taylor, John Richard is self verifying simply because we can read it.  If you saw instead,
drahciR nhoJ ,rolyaT, you might correctly reason that this is not the original content, and even work out the changes that took place.  But digits are different.  If you saw a number 888-555-1234, or its reverse 432-155-5888,
or a rotated variant 488-855-5123, then it's difficult for you or the computer to determine if the number is in fact valid, unless you can determine contraints on that particular type of data.  For instance, if you can deduce that this is probably a telephone number field, you could try to match up the area code with those associated with that address.  But it might be just enough of a change to prevent 99.9% of the hackers out there from being able to extract and use the information in that file.

Suppose you had a complete record for this individual:

--- Code: ---Taylor, John Richard 0123-45-6789 888-555-1234 2212 Eastside Road, Middlesex, NM, 12345

--- End code ---
If you just took all the digits provided in this record and rotated them one place to the right in place of the next digit, then this record would become:

--- Code: ---Taylor, John Richard 5012-34-5678 988-855-5123 4221 Eastside Road, Middlesex, NM, 21234

--- End code ---
The records still seems to pass self-validation, and most people on sight would believe that it is correct as it stands.  But this is another case of hiding in plain sight.  It is very easy to set the contents right, but first you would have to guess the rule of change that was used.  And there are many possible rules that can be used, and the validity of each attempt would require substantial work, making it an unattractive prospect.  And we haven't even discussed digit manipulation yet such as subtracting each digit from 9 and using the result, or adding some offset value, such as beginning with 1 and incrementing up with each digit, and retaining just the last digit as a replacement digit.

Take those registration codes that you sometimes get when someone sells you a progam that you can download and install on your computer.  The current trend is to get or give you a registration name, such as John R. Taylor.  Then they use that exact name to generate a registration key, such as 12804-4BCVD-5KN06-LP5JR.  Don't let the digits and letters fool you.  Many are the result of adding some offset value or indixing into a string of replacement values, such as using the string "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" or some variant as a way of obscuring the original source.  They can't keep you from copying the program to another computer, but they can always trace it back to its source and the original transaction because it has to be used with a valid registration name and matching registration key, so John Taylor is now responsible for where it ends up at.  And if John Taylor becomes recognized as the source of abuse, his account can be flagged so that he cannot get any upgrades or added support.

The critical ingredient here is that while two pieces appear unrelated, there is actually a specific relationship between the two that only the program provider knows, And the program provider is unlikely to make that secret known, which helps reserve his entitlement to market that program openly and receive some  return for it.  People who create methods for generating registration keys have another advantage:  Their methods do not have to have a complementary method to support them.  That is because all they have to do is take the original source and generate the registration key again, then verify that the two registration keys are exactly the same.

Donald Darden:
From the prior discussion, it should be recognized that steps to protect the client's data required the client's understanding, cooperation, and agreement.  The pros and possible cons need to be spelled out in some detail, and some provision made for audits, if nothing else.

But when it comes to protecting your own programs, that is a matter that you have to work out for yourself.  When you write programs, your exact relationship with the client becomes a factor in deciding who actually owns the program and the source code.  If it's your source code, then you have to protect it.

It might make sence to keep a copy of your source code on the client's site.  After all, if the client needs a fix or support, having the code there along with the compiler tools means having the tools in place for hands-on or possibly remote support.  But all that is then accessable to anyone else having access to the client's premise.  You might also have fewer concerns about keeping the exact version for that client intact and available, and having adequate backups and distribution of those backups.

But even if you do not keep the source code on site, just the mere presence of your executable and support files exposes your program to being hacked.  Making your program only work in the presence of the right userid and password is a nice concept, but when someone examines your program intently, they can generally determine how this is actually implemented.  For instance, since your program probably calls on the system via API calls, they can look in your program for where that happens.  They can intercept messages in the message loop as well, and use keystroke loggers and other spyware to uncover what the user is doing that enables the program.

There are generally four points of attacking your program:  First, most people might attempt to examine it as it is stored on the hard drive or other media.  This is the way most novices would tackle the problem.  Someone who is more advanced in the art, or who is using available hacker tools, may attempt to examine it im memory after it gets loaded, on the premise that all external encryption methods have since been torn away.  And the third approach is to look at the nature of your program, how it has to fit the requirements of the operating system in order to be executed there, and look for ways to unmask what it does through the system calls.  The fourth approach, already mentioned, is an attempt to intercept or monitor what the user is doing in order to activate and use the program.

There is no way to prevent these types of assaults on your program, except to recognize that they are not part of a legitimate business model.  In other words, it does not serve the client to let the program be stolen and used elsewhere, and your continued success is needed by the client to help him stay in business because he might need your services again later.  But someone who might break this rule would be a disgrunted employee or unscrupulour individual who breaks bonds of trust for some monetary or other gain.

Efforts to defeat an abuser have varied.  For instance, you can make your program call home automatically and secure permission to run.  If the program is stolen or being run from a different computer than originally installed on, that permission can be denied.  The client then has to negotiate to get the program reinstated to your good graces.  Another technique is to link your program to some external device, generally called a doggle.  This unique device has to be present in order for your program to run, and limits your program to the computer where the doggle resides.  Some people use the computer's MAC address as a form of modern doggle, since each one is unique.

But if a hacker can analyze your code to the point of finding the decision point where a branch instruction is used to control program access, they can alter the instruction so that the program will continue to run, regardless.  To try and prevent this type of attack, some programmers provide any number of branches, each one possibly validating another qualifier for the program to run, and the hacker may spend a lot of time trying to search out each one and alter it, yet still finding there must be more because the program still won't run.

I encountered one scheme where the programmer adopted his own memory management scheme so that he could obscure how that memory was used and what parts represented data and what parts represented code.  I've seen where programmers store information in reverse order on a hard drive, and where, in the older X86 architecture, they used odd combinations of segment and offset addressing, which made it harder to determine where references where located.

Most concerns about protecting one's programs involve techniques, but in some cases, people want to punish any abuse by wiping out hard drives, causing the data to become corrupted, causing the system to crash, or something of that nature.  My advice is, don't even consider it.  Aside from questions of what criminal acts you might be charged with or liabilities you might incur, it will tarnish your reputation and ruin our prospects.  No company would willingly do business with anyone that walked around with a loaded gun in their hand, ready to shoot, and they would see your program as representing a real threat to their business.


[0] Message Index

[*] Previous page

Go to full version