Friday, October 08, 2010

advanced wordpad editing explained

so a couple weeks back i mentioned that i was experimenting with a new databending technique that i call advanced wordpad editing. but wtf does that mean, and what is going on in these images?

take that, homer

as i've discussed extensively in the past, when you open a bitmap image in microsoft wordpad and then re-save, it creates a distinctive warping effect i call the wordpad effect. wordpad, thinking the data is inappropriately formatted rich text, tries to "fix" the formatting of the file, corrupting the data in a recognizable (but often surprising) way. but what exactly is going on? pietjepuk666 was the first to solve that puzzle, in the comments:

I've found that Wordpad does at least two things to a binary file; it replaces byte 07 (ascii: BEEP) with a space , and it replaces every lonely 0A or 0D (line feed and carriage return respectively) and also 0B (vertical tab) with the bytes "0A 0D". So the rate of glitching is probably dependent on how dark the picture is, since low bytes like these give dark pixels (i suppose).

this is pretty much right, except in my experience the replacement value is actually 0D 0A, not 0A 0D. (not that it makes much difference here.)

the change from 07 to 20 (space) isn't very noticeable to the eye. it just replaces an almost-totally-black pixel with a slightly-less-black pixel. but every time wordpad replaces a 0A, 0B, or 0D with 0D 0A, it adds one byte to the file. these added bytes are ultimately what causes the wordpad effect—this is why wordpadded images always warp to the right, why the trick breaks fragile formats such as JPG, why darker (but not pitch black) images bend more than lighter ones, and so on.

knowing exactly how the wordpad effect works enables you to take control of the process by specially pre-treating the files before feeding them to wordpad, so that they bend in the way we want to. this can be achieved by altering the file with a hex editor before opening it in wordpad.

the first step—as well as the initial proof of concept—is to strip out any existing 0A, 0B, or 0D bytes from the file. i replace them all with 09, which is close enough that you can't spot it by eye, but you could use 0C, 0E, or whatever. this ensures that the file won't bend anywhere where you don't want it to. (don't delete these bytes; that would cause the image to warp in the opposite direction... a sort of anti-wordpad effect, if you will.)

the next step is to "add" 0A, 0B, or 0D bytes to the file in specific areas (really replacing bytes one at a time, not inserting them—the actual adding will come later when opening it in wordpad) so that it will bend where you want it to. this may sound simple, but it actually involves a lot of math. you need to calculate the y pixel range where you want the file to warp, and convert those to percentages in order to determine the byte range where you want to work. and if you're working with a multichannel non-interleave file, you need to do this for each color channel—not to mention calculating where each channel begins and ends. plus, you need to add lots of bytes to get any significant warping—hundreds if not thousands.

another aspect you may have noticed about my advanced wordpad work is the way it doesn't just warp, but also unwarps. the color channels warp out of phase with each other, but then they coalesce back together. this never happens with the standard wordpad effect, so how is it accomplished?

getting ahead of myself

since wordpad warping is caused by added bytes, the obvious way to undo the warping is by removing bytes. to get color channels back in phase, you need to remove (almost) exactly the same number of bytes as will be added to the file. yes, this means you need to track precisely how many 0A/0B/0D bytes are in the file, and delete exactly that many bytes. i typically do all this manually, one byte at a time, counting in my head as i go along. it's tedious, but it works. if you were a programmer, you could probably write scripts to help with this process.

once all this is done, finally the file is finally ready to be run through wordpad. if you did your math wrong, or miscounted, you may end up with something like this:

lily out of phase with herself

oops! i lost count there and my color channels didn't sync up right. (i was eventually able to fix this one.)

working this way raises a number of questions about matters such as whether a "glitch" can be controlled, and so forth. but the most obvious questions is probably what the hell is the point of using wordpad if you're also using a hex editor?
after all, you could just use the hex editor to insert bytes and forgo using wordpad at all.

so why still use wordpad? at first, i needed to do it that way to prove that i had unlocked the mysteries of wordpad, so to speak. i keep doing it that way because i like the romanticism of using such a thoroughly incorrect application, and because the term "advanced wordpad editing" has a nice ring to it—ironically overtechnical... as if there's anything advanced about microsoft wordpad.


phat_joe said...

As a programmer I of course want to approach this from the "ooh how can I algorithmically mangle these bytes?" But then I think if I am going that far, why not just manipulate the images directly with an actual graphics library. I've done plenty of graphics work in PHP using the gd library. So I think there is a certain charm to keeping it manual.

sephim said...

Does this work with any other medium?

stAllio! said...

for example?

the added bytes will break any file formats that have checksums etc. this could possibly work on wav files but it wouldn't necessarily sound very good and running it through wordpad would take forever.

Depricode. said...

Wow, this is very,very creative, and may i say the final results are utterly mindblowing? I love you're site,and rarely ever see glitches used so effectively, you are not only a super nerd, but a super artist too. Thanks for the great site and brilliant tutorials!

Depricode said...

Woops, said "you're" instead of "your". How embarrassing.

tukk said...

I have a problem: when ever i try to open a glitched raw file in photoshop (cs3) its always black and white. Why do i lose the colors? Even when i put it "non-interleaved" or something...

stAllio! said...

tukk: most likely, you're using the wrong settings, either when creating the file or when reopening it.

when reopening the file you need to manually enter the number of color channels (3 for RGB) and image dimensions. if you accept the default you'll get something like this.

tukk said...

Woohoo! So i changed the number of the channels from 1 to 3 as you said and it worked. Thank you very much for the quick response, your site rocks!

Different55 said...

I tried to count the number of times 0A and all those popped up, but after 100 I stopped and wrote a program.
Here it is
I wrote in php using the substr_count. You can enter your own values to search for, but it automatically gives you the values that wordpad replaces. You can only enter one needle at a time, so I'd just enter some random stuff and look at the instances it generates itself. Your time has just been saved by a bored teen.

Anonymous said...

The link above is broken. The new one is

Anonymous said...

The above link is broken. I haven't made a new one yet.