IT-Consultant: Charles Pegge > Software Design and Development Issues

Protection Considerations

(1/2) > >>

Donald Darden:
The average forum discussion on protection usually centers on protecting a program from being pirated, or keeping original source code and concepts from being exposed.  Some of these views get pretty extreme, such as the guy that figures out how to write a program that says "Hello, Mark!" instead of "Hello, World!" and wants to protect it.

If you program long enough, you get to the point where you recognize that efforts to steal embedded code takes a lot of work and experience, and is not the real risk, but that having a marketable program illegally copied and used without due payment is a nightmare to be contended with.

But what is hardly ever discussed is the vulnerability of any data that is needed by the customer, which your program has recourse to.  Take for example any customer accounts that your program might process or access.  If you specify a  file format for that data that essentially leaves the data exposed to any other program that can recognize that format, then the customer accounts are vulnerable to being hacked with little fanfare.

Now let's face it:  Customers are often slow to consider these matters themselves.  Our model of centralized processing is that we are safe within the confines of the computer room or building.  But with internet connections, the world at large is now able to try and gain access to our primises.  And not only are our customer's records and information at risk, but often the accounts and information related to other customers and businesses as well.

At the present state of art, there is little concern for the manner in which programs work and data is exposed internally to the PC or mainframe, and much more attention to trying to secure the premises and boundaries through proxy servers, firewalls, and a number of resident programs that monitor for suspicious activity or signature behaviors.

And when these precautions fail, which they have been known to do in the face of determined hackers, the fault is generally heaped on those responsible for system security, rather than being laid as well to the present inability for securing the data within the confines of the computer.

Some indications that this may change are already evident.  You can now buy software that encrypts the entire contents of hour hard drive, so that even if the computer itself is stolen (such as a laptop), the information, both programs and data files, are rendered useless without the right password.

There are a number of problems with this approach:  First, it is assumed that there is only one user of the computer in question, so if it is shared, all users have to have access to the same password.  Thus, there is a question of real accountability on an individual basis.  Second, it depends upon the human agent, and this is often the point of least certainty when it comes to security.  People can betray you intentionally, or act in a way that jeapodizes your efforts to protect and secure the information entrusted to them.  Third, it might prove impractical to ever change the password, and it could happen that you lose the employee, then have no way to recover the drive's contents in their absence.

When it come to securing the customer's data directly, it may not be possible if that customer intends to use other software tools and programs on that data as well.  Then you are forced to accept and use whatever data format is being presently subscribed to.  However, if you are contemplating building a vertical application for the customer from scratch, then it might be woth considering a proprietary data format that envolves encryption or some method of encoding as a means of securing it from prying eyes and rogue programs.

Let's say you are contemplating offering the customer a secure alternative to his present mode of operation that is cobbled together with Excel and Access and some scripting tools.  You've shown hin that his present environment is easy to hack, and vulnerable to everything from a single drive failure to deliberate acts of sabatoge and theft.  Not only is he vulnerable, but he could be liable as well,
if a court of law found him negligent.  It may not be the rule of thumb today, but a time ls likely to come when companies will be liable for the safeguards that they fail to employ to protect their customer data.

You realize that if the customer accepts your secure alternative, it gives you the opportunity to lock that customer into your approach for the foreseeable future.  Any additional programming needs will have to be able to work conjunctively with your model, and most likely you will have a role as the architecture or consultant needed to make that happen.  So from your persepective as well as the customers, there is good reason to plan to go with this model.

The first part of the model involves making the program user specific, but not user dependent.  That is, each authorized user must be able to access and run the program, and there needs to be an audit train related to each user's activities.  But the user may leave at some point, and other users may need to be added from time to time, and losing an employee should not put the company's programs or data at risk.  There has to be a way to remove an employee's access to both, so that they can no longer make use of either.

The second part of the model means ensuring the validity, privacy, and survivability of the data.  You can adopt a plan that allows the data to be made redundant on a range of drives, and secured with physical backups off premise,
and you can use encryption or encoding as methods to protect the privacy of the contents.  The validity of the data depends on human factors and recourse to other sources, so it is really hard to define in detail except in the specific of each situation.  You get into considerations of what qualifies as trusted sources, digital signatures, private and public encryption keys, methods of data aquisition, and so on.

What I do want to look at briefly is the manner of securing data locally.  The art of hiding information in plain sight is an old one, but most of the history that is involved has been limited by the human agent.  It had to be clever, but not too difficult, because people had to be able to do it by hand.  With computers now able to handle the difficult parts for us at a very rapid rate, it can now be both clever and quite involved in nature.  And there are certain operations of the computer architecture, such as XOR and shift instructions, that are ready made for data conversions.

Have you ever heard of the trick of casting out nines?  You can take any number, such as 3248097, and by a process of adding digits together, reduce the result to a signle digit 0 through 8.  For instance, if you add 3+2+4 in this sequence, you get 9, but with any number 9 or greater, you deduct 9, and the result here would be 0.  The next digits of 8+0+9+7 would give 24, but you add the original 0+2+4 together and you get 6.  It makes absolutely no difference which way you solve this problem for these digits, the outcome will always be 6.
This trick was one of many used in the past to help validate whether adding up a column of numbers, or performing addition or multiplication, gave a true result.
It worked because a certain amount of useful information was able to be retained in the process that could be used to verify the original values.  But it was a destructive process, because most of the original information was discarded.  In other words, if I knew that casting out nines gave me a result of 6, I could not use that six to determine the original number of digits, what the individual digits were, or the sequence that they appeared in. 

Obviously casting out nines is not a good method for sending code, because too much information becomes lost in the process.  Typically, a good encoding process cannot lose information, otherwise it loses value.  However it can gain information and often does.  It can also be restructured so that the code is not in its original sequence.  It can undergo substitution where something else is used in its place - letters and digits can be interchanged, or other symbols employed instead for example.  And it can be divided up into several different messages that can be recombined later.  Another method involves external referencing, where you might have a book, map, or chart that contains the actual result, and only use references in your code.  Not knowing what the external reference is, or having an exact copy of it, would twart efforts to read the code, as long as similar references were made to different targeted items rather than all to the same one.

Whatever method is used in encoding information, an exact counter-method has to be employed to unravel it and restore it to original form.  Sometimes the two methods also have to share a password or reference key to complete the process.  The problem with using a password is that it becomes embedded, but  a userid/password file could be used to look up or validate a reference key, which would add significant protection.  Remove a userid and pasword, and a vengfull ex-employee would be no further threat, since they never knew or handled the reference key directly.

One method of encryption that has come into vogue involves prime numbers and the idea that you can have two prime number keys to the encryption, one called the private key and the other the public key.  This form of encryption is best where two or more parties have to be able to be able to exchange data, and you do not trust the other party to hold the private key, but you do trust them with the public key.  The downside with this method is twofold:  First, your computer must be able to process very large prime numbers numerically with great precision, a process that takes up quite a bit of time, even for a computer.  The other is, that two parts of the process are then known to the public, and the manner in which they are related to each other, then with enough time and processing power, it is theoricially be possible to determine the private key at some point in the future.  With today's processors, the time required is measured in the millineum, but with the power of processors doubling every 18 months or so, and quantum computers on the far horizon, and new strides made in mathematics as well, it may be possible for a breakthrough where prime-based codes of the present will suddenly become as easy to break as reading yesterday's newspapers.

Since the information in a private data set in a computer is not intended to be shared with any outside parties, you can focus on the best ways to protect it without concerns of making a part of that process known to third parties.  You don't need multiple keys, you can adopt a method to suit yourself, and you simply have to address the method and manner of protecting access via your program, and any other programs within your scope.       

Donald Darden:
A secret message can be encrypted in its entirety by one method, or it can be subjected to multiple methods of encryption.  Once the encryption process is complete, it becomes static and unchangable.  But if you want to encrypt a data base, there is a profound difference - the data base is dynamic, subject to frequent changes, additions, deletions, and restructuring due to various sorts.
The encryption process is not only going to get in the way of all that, it may lead to data corruption and nonrecovery if it is not implemented with some care.

One of the key differences then is that the data base method must me independent of any aspect of the data base, such as its size, record position in the data base, and so on.  Each record, possibly each field in each record, must be recoverable regardless of the threatment of any remaining records (or fields),

There are two techniques that are supportive of this concept, because they avoid any direct alteration of the data itself,  The first is defusion, which is splitting up each record into multiple parts and distributing the results into various files scattered in different folders on the drive.  Without knowing the exact relationship and sequence for recovering the data from each independent file, putting individual records back together would be difficult.  Adding amd maintaining some meaningless files would complicate the problem more, like throwing pieced from one jigsaw puzzle in with another.

The other method involves scrambling the contents of each record in a way to make it unreadable.  But there has to be a rule as to how it was scrambled, so that it can be unscrambled by applying the same rule in reverse.

You can also change the data by modifying each character or each word or dword combination.  The XOR and rotate commands in the ASM and some BASIC languages are particularly good for this, as no bits are actually lost, they are merely flipped or moved, and thus there is a counter process for restoring them.

Keep in mind that the typical PC is filled with thousands of files, and nobody knows what they all are.  Hiding additional files is not a real challenge, and by avoiding names that giveaway their nature or use, or extentions that indicate what type of file they are or program is associated with them, you can make it hard to track them down.  And if your method of encryption prevents the contents from being easily read by a person spying on your contents, you have effectively blocked even the most determined hacker from knowing if he has the goods in hand or not.

One of the problems that has to be considered is the nature of the data base file or records, and even the fields.  How are they separated?  Are they null-byte
terminated?  Are they separated by tabs?  Do they end in a CRLF?  Or are they all fixed length, and the offsets can be calculated externally?  Are they assiciated with pointers and length indicators?  Are the referenced by external indexes?  These are important considerations, because you do not want to trample over your boundaries by accident.  Nor do you want to accidently duplicate any separation data when you change your other characters to something different. So here are some points to consider:

If you employ fixed length records, then you can scramble the contents of each record as you like.  If you are usnig null-terminated strings, then you cannot allow any null-byte to be generated as a result of your emcryption process.  If you use CRLF to mark the end of a record, you cannot produce a CRLF as part of the string contents (and you may have some problems if you generate either a CR or LF character by itself).  If you intend to us comma, emi-colon, or tab to mark field separations, and you are encrypting on field boundaries, then you cannot allow these characters to result from your encryption method either.

Unfortunately, XOR and rotate instructions are no guarantee that you will not accidently produce an unacceptable result at certain times.  They are best employed if dealing with fixed length records where you have no restrictions.
In other cases, some form of code substitution or swapping characters about would be the best choice.  Code substitution would take all the allowable characters as a group, and you would use some method of chosing a successor to the present character.  In code swapping, you would just switch the existing characters around so as to scramble the contents.

The techniques that can be employed here are virtually endless.  And this is good, because the chances are that whatever method you adopt, if it is not an obvious first choice, will leave the hacker in the dark.  Only by studying your program in operation and deducing your method, or by observing the data in memory, is he likely to capture any part of it.  And there are easier pickings around out there, so it ls like any good lock, it discourages the casual burgler.
Any real theft would almost have to be done by an insider, and you can even prevent most employees from being aware of the safeguards involved.

Let's talk about one record, and show some of the simple things that can be done to render it unreadable:
ACCOUNT RECORD
000123456 Mapleton, John Edward 071-23-9810, VISA 0123-4567-8901 05/10 $129.85 07/23/2007 Linksys Router

The first method just involves dispersing the contents of this record into a number of files, four characters at a time:

--- Quote ---0001 --> file A
2345 --> file B
6 Ma --> file C
plet --> file D
on,  --> file E
John --> file F
 Edw --> file G
ard  --> file H
071- --> file I
23-9 --> file J
810, --> file K
 VIS --> file L
A 01 --> file M
23-4 --> file N
567- --> file O
8901 --> file P
 05/ --> file Q
10 $ --> file R
129. --> file S
85 0 --> file T
7/23 --> file U
/200 --> file V
7 Li --> file W
nksy --> file X
s Ro --> file Y
uter --> file Z

--- End quote ---
I've made no pretense of scrambling anything, but without telling you the name of each file and the order in which the data is to be recovered, you have been left with the difficult job of decypering what needs to be done here to recoverf the data,  It becomes even harder when additional data from other records are added to each file as well.  And the sequence of files could be rotated -- for instance, for the second record, you begin writing to file B through Z, then finall y A.  For the third record, you begin writing to file C through Z, then to A and B.   

For your destracting files, you could make them useful by an overlap process,  Instead of just every four characters, they could begin with an offset of two,  That way they overlap the original set, but are not identical:  In some cases they could even help recover data if any files become corrupted.

--- Quote ---0001 --> file A             0123 --> file AB
2345 --> file B             456  --> file BC
6 Ma --> file C             Mapl --> file CD
plet --> file D             eton --> file DE
on,  --> file E             , Jo --> file EF
John --> file F             hn E --> file FG
 Edw --> file G             dwar --> file GH
ard  --> file H             d 07 --> file HI
071- --> file I             1-23 --> file IJ
23-9 --> file J             -981 --> file JK
810, --> file K             0, V --> file KL
 VIS --> file L             ISA  --> file LM
A 01 --> file M             0123 --> file MN
23-4 --> file N             -456 --> file NO
567- --> file O             7-89 --> file OP
8901 --> file P             01 0 --> file PQ
 05/ --> file Q             5/10 --> file QR
10 $ --> file R              $12 --> file RS
129. --> file S             9.85 --> file ST
85 0 --> file T              07/ --> file TU
7/23 --> file U             23/2 --> file UV
/200 --> file V             007  --> file VW
7 Li --> file W             Link --> file WX
nksy --> file X             sys  --> file XY
s Ro --> file Y             Rout --> file YZ
uter --> file Z             er00 --> file ZA

--- End quote ---
 
Since we've broken the record up into segments of four bytes, we can treat these as DWORDS if we want.  Now PowerBasic gives us a neat command that we can use to reverse the order of every four characters, to make the results less meaningful if chanced upon:  STRREVERSE$().  This is what our final output to the several files would look like:

--- Quote ---1000 --> file A             3210 --> file AB
5432 --> file B              654 --> file BC
aM 6 --> file C             lpaM --> file CD
telp --> file D             note --> file DE
 ,no --> file E             oJ , --> file EF
nhoJ --> file F             E nh --> file FG
wdE  --> file G             rawd --> file GH
 dra --> file H             70 d --> file HI
-170 --> file I             32-1 --> file IJ
9-32 --> file J             189- --> file JK
,018 --> file K             V ,0 --> file KL
SIV  --> file L              ASI --> file LM
10 A --> file M             3210 --> file MN
4-32 --> file N             654- --> file NO
-765 --> file O             98-7 --> file OP
1098 --> file P             0 10 --> file PQ
/50  --> file Q             01/5 --> file QR
$ 01 --> file R             21$  --> file RS
.921 --> file S             58.9 --> file ST
0 58 --> file T             /70  --> file TU
32/7 --> file U             2/32 --> file UV
002/ --> file V              700 --> file VW
iL 7 --> file W             kniL --> file WX
yskn --> file X              sys --> file XY
oR s --> file Y             tuoR --> file YZ
retu --> file Z             00re --> file ZA

--- End quote ---
Reversing STRREVERSE$() is simple:  Just repeat the operation.  Other techniques are also possible, such as just reversing the two end characters, or the two characters in the middle.  You could even use CVDWD() to change the 4 bytes to a double number, then HEX$(,8) to change it to an eight digit hex value.  To convert it back, use VAL("&H"+stringname) and MKDVD$() to put it back as it was.

I am not advocating one approach over another.  But as a programmer, whatever concerns you have about protecting your program, you have to consider that your client has that much concern and more about protecting his data, his business, and his reputation.  I would suggest that you at least look into the matter to the extent of making him aware that he is at some risk, and work on your own methods of protecting the data that his livelihood depend upon, and that his customers trust him with.  If you do so, that could help put you at the head of the pack.

Donald Darden:
Working with files also leaves some other telltales that you might want to deal with.  For instance, if the datetime stamp on all the files are recent or pretty much the same, this may help a hacker deduce the ones you are likely using.  So redating the files may help hide the fact that you are using them.  Another clue could be that all the files might reside at the same depth when it comes to the directory tree, so creating a few staggered folder levels can help mask their use.  And of course the filenames should bear little resemblance to each other, even in length or naming conventions.  You also might break up your pattern of how many charaters to write so that the several files are of different sizes.

Years ago, I leaned that as a soldier, being completely hidden is a more effective defence than just being well protected.  IF they can't see you, they are less likely to attack you.  If they know where your cover is, they can and will search for its weak point or call in big guns to root you out or kill you.

This is the philosophy of hiding in plain sight.  You can't completely avoid the evidence that your program and the data files reside on the system, but you can make it so that their presence and relationships to each other are made totally inobvious.

In an effort to please those that might prefer to use familiar BASIC syntax to
effect changes, I also used some fairly obvious techniques to alter the string
characters and show how effective this can be.  But there is an even more effective way, which is to use pointers and employ the XOR or rotate steps that were mentioned above.  Now rememer, I advised that this should only occur with fixed length data.  Well, if you split the record into fixed length segments, this would work.  So can be discussed in more detail here.

When using pointers, you can specify the type and size of the data that is associated with that pointer.  Obviously a string pointer points to some aspect of a string.  a STRPTR points to the first character of a string.  A VARPTR, in PowerBasic, when used with dynamic strings, points to a descriptor of that string, which includes both the STRPTR and the number of characters in the string (the string length, or LEN).

Using the STRPTR, and any offset from the first character, you can reach any portion of the string.  Pretty much what MID$() permits you to do.  But the use of a pointer comes without the safeguards that MID$() has, so you must use it at your own risk.

The advantage is that you could establish a different pointer reference, such as for a DWORD, then set it to point somewhere within the string.  You can then
manipulate that part of the string as though it were a DWORD value at that location.  Thus you could rotate it left, rotate it right, or perform an XOR against it with some known value. and this would change that portion of the string in place in memory.  Simple and quick, and with other offsets, you can repeat or
perform other acts on the string at will.

Using XOR is a very powerful tool because it allows you to incorporate an
external value into the process.  This could be your elusive reference key.  If
they do not know what the original data was, or have the reference key, then efforts to unravel the code will be made super hard.  But with the power of computers, and large amounts of data to process, it is still conceivable that the reference key can be discovered through exhausive analysis.

There is a way to extend the reference value way beyond the scope of any efforts to analyze it, and that is by incorporating it mathematically with any equation that produces an irrational number.  This means it cannot be expressed as an exact quantity n/m, where n and m are integers.  There are an infinite
number of irrational numbers, such as PI or natural e .  If you chose a well known irrational number, you might make it too easy for a hacker to discover your trick.  But you can always find others.

Note that you can produce a number like 1/3 and get a nonterminating fraction like .33333333333333333333.  But this is no good for our purposes, because it repeats, and if it repeats, it is the same as using it over and over again.  You need a nonrepeating fraction, one that never repeats, no matter how far it is extended.  Since the reference key can then be used to generate a nonrepeating fraction, you can make an XOR mask as long as you could possibly need it to be.  By not exposing that reference key, or the manner in which the mask is generated, you have created a coding method that is virtually unbreakable.

Note that once you establish the reference key and method of creating the XOR mask, you are locked into its continued use.  Your only means of getting away from it it to create a whole new XOR mask, and unencrypt your data base with the old maxk while re-encrypting it with the new.  You can certainly do this if the need arrises, but that one reference key is too valuable to share with anyone.  You need another method for giving people the ability to use the reference key without actually disclosing it to them.

You can do this by having the reference key and XOR method embedded in the program itself.  Then require any user to have an authorized userid and password to access and run the program.  This requires a userid and password management file that only the system administrator can access.  The SA adds or removes users as required as part of the business, and the initial password is set so that it expires on first time use.  The designated user must then set a new password at that time.  The program, is able to change the password portion of the SA's file for a user that can successfully log in.  The SA cannot read the password stored, since the program uses its encryption power to render the stored password unreadable and unrecoverable by normal means.

Every unique user has their own userid, and only they should know the current
password.  Every activity involving the data base can be logged against the specific user.  This gives the owner full accountability of who did what and when.  The log should also be hidden and encrypted so that its journalling of events cannot be compromised.   The absence or corruption of the userid/password file or journal log should be the cause of an alarm, and should also be correctable by the SA in an effort to restore the program and data base to normal use as quickly as possible.

A lot of programmers simply will not go to this length to protect a company's data.  First, they are not prepared to introduce this technology on their own.  Second, the customer probably does not require it, being somewhat unaware or unconcerned with the risks involved.  Third, it takes a lot of work and forethought as to how this can all be done successfully.  You will note that many products may be on the market already that attempt to handle some or all of the requirements set forth for a particular customer, and it may prove more cost effective and easier to go that route.  But this discussion will likely have caused you to think harder on the topic than you had cause to do before.

Donald Darden:
So much for what I consider the foundation for why protection might be necessary, and the general areas where protection might be needed.  Now let's look as some methods for achieving some degree of protection.

First, I want to introduce two functions:  One is named Encode(), and the other is named Decode().  Each is passed a string, and returns a string of the same length.  The purpose of each function should be evident from its name.

The example program code also creates an endless series of strings of various lengths, then fills them randomly with spaces and capital letters. This is the aa string.  The bb string receives the encoded version of aa after it is processed by Encode().  The IF 1 THEN allows you to see what happens when the contents of aa are modified and used in bb.  If you change the IF 1 THEN to IF 0 THEN to surpress this code, the rest of the program will just continue until a mismatch betwee the initial contents of aa and the Decode(bb) results differ.  Properly done and implemented, this program will just loop because aa will always equal Decode(bb). 

--- Code: ---#COMPILE EXE
#DIM ALL
#DEBUG ERROR ON
#REGISTER NONE

FUNCTION encode(sourcestr AS STRING) AS STRING
  LOCAL aa AS STRING
  LOCAL a, b AS LONG
  aa=sourcestr
  b=LEN(aa)-3
  IF b>0 THEN
    a=STRPTR(aa)
    ! mov eax,a
    ! mov ecx,b
ecode:
    ! mov edx,[eax]
    ! ror edx,1
    ! mov [eax],edx
    ! inc eax
    ! loop ecode
  END IF
  FUNCTION=aa
END FUNCTION

FUNCTION decode(sourcestr AS STRING) AS STRING
  LOCAL aa AS STRING
  LOCAL a, b AS LONG
  aa=sourcestr
  b=LEN(aa)-3
  IF b>0 THEN
    a=STRPTR(aa)+b-1
    ! mov eax,a
    ! mov ecx,b
dcode:
    ! mov edx,[eax]
    ! rol edx,1
    ! mov [eax],edx
    ! dec eax
    ! loop dcode
  END IF
  FUNCTION=aa
END FUNCTION

FUNCTION PBMAIN
  LOCAL aa, bb AS STRING
  LOCAL a, b AS LONG
  RANDOMIZE
  COLOR 15,1
  CLS
  DO
    aa=SPACE$(RND*5000+1)
    FOR a=1 TO LEN(aa)
      IF RND>.85 THEN INCR a
      MID$(aa,a)=CHR$(65+RND*26)
    NEXT
    LOCATE 1,1
    INCR b
    PRINT b,LEN(aa)
    bb=encode(aa)
    IF 1 THEN
      PRINT LEFT$(aa,SCREENX-1)
      PRINT LEFT$(bb,SCREENX-1)
      WAITKEY$
    END IF
    IF decode(bb)<>aa THEN
      PRINT
      COLOR 15,1
      PRINT aa
      COLOR 14,2
      PRINT bb
      WAITKEY$
    END IF
  LOOP
END FUNCTION

--- End code ---
Note that Encode() and Decode() perform complimentary functions.  One does
a rotate right, then other a rotate left.  They agree in the amount of rotation used.  One processes the string from left to right, the other from right to left.
The rotations could have been on byte, word, or dword boundaries, but by using dword (four consecutive bytes), the shifting becomes compounded over portions of the string.  Had I elected to use word or byte boundaries instead, the results would have been different.  I could have begun with rotate left, followed by a rotate right for the decode stage.

Bit shifting obviously works, but the XOR function really helps mask the original contents.  XOR has to work against some known quantity or value, which has to be exactly the same at each stage of the encode and decode sequence, but can be made to vary between stages.  This woul dmake it much harder to detect the pattern, since it would not only involve an unknown that the outsider would have to discover, but it might not even be a constant.

Donald Darden:
If you are going to include XOR as one of the functions for encoding and decoding your lines of text or whatever, then I suggest just creating a separate function for that purpose and calling it as needed,  The advantage is that XOR forms its own natural complement, unless you decide to make it more complicated.

I am going to include a separate XOR process with the previous example used above.  I am using an embedded constant at the start of each call to the XOR operation, but I rotate it between uses, which changes its effects on any subsequent character codes that it gets XOR'ed with.

--- Code: ---#COMPILE EXE
#DIM ALL
#DEBUG ERROR ON
#REGISTER NONE


FUNCTION xorcode(sourcestr AS STRING) AS STRING
  LOCAL aa AS STRING
  LOCAL a, b AS LONG
  aa=sourcestr
  b=LEN(aa)
  IF b>0 THEN
    a=STRPTR(aa)
    ! mov esi,a
    ! mov ecx,b
    ! mov edx,21385  'example of a reference key
xorit:
    ! mov al,[esi]
    ! xor al,dl
    ! mov [esi],al
    ! rol edx,1
    ! loop xorit
  END IF
  FUNCTION=aa
END FUNCTION

FUNCTION encode(sourcestr AS STRING) AS STRING
  LOCAL aa AS STRING
  LOCAL a, b AS LONG
  aa=sourcestr
  a=STRPTR(aa)
  b=LEN(aa)-3
  IF b>0 THEN
    ! mov eax,a
    ! mov ecx,b
ecode:
    ! mov edx,[eax]
    ! ror edx,1
    ! mov [eax],edx
    ! inc eax
    ! loop ecode
  END IF
  FUNCTION=xorcode(aa)
END FUNCTION

FUNCTION decode(sourcestr AS STRING) AS STRING
  LOCAL aa AS STRING
  LOCAL a, b AS LONG
  aa=xorcode(sourcestr)
  b=LEN(aa)-3
  IF b>0 THEN
    a=STRPTR(aa)+b-1
    ! mov eax,a
    ! mov ecx,b
dcode:
    ! mov edx,[eax]
    ! rol edx,1
    ! mov [eax],edx
    ! dec eax
    ! loop dcode
  END IF
  FUNCTION=aa
END FUNCTION

FUNCTION PBMAIN
  LOCAL aa, bb AS STRING
  LOCAL a, b AS LONG
  RANDOMIZE
  COLOR 15,1
  CLS
  DO
    aa=SPACE$(RND*5000+1)
    FOR a=1 TO LEN(aa)
      IF RND>.85 THEN INCR a
      MID$(aa,a)=CHR$(65+RND*26)
    NEXT
    LOCATE 1,1
    INCR b
    PRINT b,LEN(aa)
    bb=encode(aa)
    IF 1 THEN
      PRINT LEFT$(aa,SCREENX-1)
      PRINT LEFT$(bb,SCREENX-1)
      WAITKEY$
    END IF
    IF decode(bb)<>aa THEN
      PRINT
      COLOR 15,1
      PRINT LEFT$(aa,SCREENX*(SCREENY-1)/2)
      COLOR 14,2
      PRINT LEFT$(bb,SCREENX*(SCREENY-1)/2)
      WAITKEY$
    END IF
  LOOP
END FUNCTION

--- End code ---
The advantage of using multiple encoding methods should be obvious.  But just to reiterate some of them, you are forcing the hacker to deduce the following:
(1)  What mathed(s) were involved
(2)  How those methods were implemented
(3)  The sequence in which those methods were used
(4)  In the case of XOR, what the constant or source reference was.
(5)  For any operations spanning multiple bytes, whether you worked left to
right or the reverse, or even in any particular byte, word, or dword sequence
(6)  With multiple byte encoding methods, you also have an issue with the starting and ending points - if someone attempted to decode a whole file at once, it would fail if the encoding were done on one record at a time, even if they got everything else exactly right.

Note that the XOR function I created only works on one byte at a time.  I could have made it work on word or dword references, much as I did with the Encode and Decode functions. but then I would have to deal with two functions, one to go left to right, and the other from right to left, and I also would have had to go to extra lengths to determine the current shift state of the XOR reference value as I attempted to undo the encoding done earlier.

If every customer has a different reference value unique to their copy of the program, than they cannot read each other's records.  That is another power of using the XOR method, because it is very key dependent.  You have to have the right version of the program, you have to be able to use the program (remember the userid and password specification eariler?), and only then is the corresponding data extractable from the encoded data files.

Navigation

[0] Message Index

[#] Next page

Go to full version