HomeLatest ThreadsGreatest ThreadsForums & GroupsMy SubscriptionsMy Posts
DU Home » Latest Threads » Forums & Groups » Main » General Discussion (Forum) » One Internet Archive Quir...

Wed Apr 25, 2018, 08:44 PM

One Internet Archive Quirk Which May Not Be Relevant

If you remember, a lot of the early web, in which the IA was first launched, consisted of static HTML pages, making it relatively easy to store and compress content. However, if you think the IA is merely gorging down a "copy of everything on the internet" on their budget and without the storage space of the Almighty, then you may have a simplistic view of how the IA works, and how it has worked at different times during its development.

And, oddly, since I was looking for an old picture I tweeted from an IA server location several years back, I found that my photograph wasn't archived....




In any event, as things moved beyond static HTML and storage capacity varied, IA implemented, at various times, different sorts of tricks to deal with either problem.

Since I deal in IP disputes which often hinge on claims of "who was first", one of the tricks I noticed was that IA would skimp on storage space by sometimes making external calls to the existing site for images. If the image wasn't stored at IA, one of two things would happen, (a) you'd get a broken image icon, or (b) if the same filename still existed on at the reference URL and the same last-modified date (which is easy to change on some systems), then when you called up the "archived" version of the page, IA would simply inline the presumed-to-be-the-same image file from the referenced site.

There was a time, and I haven't checked this lately, where active content - i.e. content generated by scripts or served up from databases - would be handled in a similar way: if the relevant php file existed on the live and current server, then it would be invoked to serve up the content.

None of this may be even remotely relevant to the teacup tempest at hand. I am only saying that there are circumstances I have encountered in the course of my career where there had been issues involving "things in the Internet Archive not actually being what they seem to be". That's all. Whether it applies in this instance - I have NO IDEA.

But more importantly:

Joy Reid is a living breathing human being. IMHO if you want to know what sort of person she is, and what sorts of opinions she holds, you don't need to consult the Internet Archive. I would imagine the best way to know what sort of person she is and what sorts of opinions she holds, would be to converse with her.

8 replies, 924 views

Reply to this thread

Back to top Alert abuse

Always highlight: 10 newest replies | Replies posted after I mark a forum
Replies to this discussion thread
Arrow 8 replies Author Time Post
Reply One Internet Archive Quirk Which May Not Be Relevant (Original post)
jberryhill Apr 2018 OP
GusBob Apr 2018 #1
unc70 Apr 2018 #2
struggle4progress Apr 2018 #3
greyl Apr 2018 #4
Spider Jerusalem Apr 2018 #5
jberryhill Apr 2018 #6
Azathoth Apr 2018 #7
jberryhill Apr 2018 #8

Response to jberryhill (Original post)

Wed Apr 25, 2018, 08:54 PM

1. Malcolm Nance, speaking on the Stephanie Miller show

Described this at "black propaganda" He said it was pulled on him

He said don't believe every word you read on the Internet

Me, I wouldn't put nothing past the Russians

Reply to this post

Back to top Alert abuse Link here Permalink


Response to jberryhill (Original post)

Thu Apr 26, 2018, 01:23 AM

2. I know several ways to fool the archives

Doing research a couple of years ago I discovered ways to "poison" the IA in ways similar to what can/could be done with archives like Google. It too depends on problems with dynamic content.

Lots of other issues can affect IA. I have seen it reported that changing the "robots.txt" file can make archived content disappear.

These kinds of issues affect you when simply trying to "save" a web page on your desktop. In a simple example, when you try to redisplay an online news article, you might get one that was updated in a later edition long after you thought you had "saved" it. The time stamp of your save would not indicate the content had been modified. Something similar can be used to change the historic archive.

I have seen such techniques being used deliberately in the wild to re-write news articles. Very 1984. I posted about this at DU several years ago.

BTW these techniques require no hacking of the IA, only a source for the dynamic content. Lots of subtle ways it can be done. To long and technical to describe at the moment while traveling

Reply to this post

Back to top Alert abuse Link here Permalink


Response to jberryhill (Original post)

Thu Apr 26, 2018, 01:27 AM

3. It seems the disputed archives have disappeared from the Wayback Machine

... Reid's claim that the posts were fraudulent and the result of a hack was met with immediate and widespread skepticism.

The scrutiny only intensified after a representative for the Internet Archive, a nonprofit dedicated to storing old digital content, said Tuesday that the organization could not verify the claim. Links to Reid's old blog were stored in the Wayback Machine, a service run by the nonprofit ...

... But at some point, unbeknownst to the people working at the Archive, the archives were removed from the Wayback Machine via an automated process ...

http://money.cnn.com/2018/04/26/media/joy-reid-hacking-fbi-investigation/index.html

Reply to this post

Back to top Alert abuse Link here Permalink


Response to jberryhill (Original post)


Response to jberryhill (Original post)

Thu Apr 26, 2018, 01:32 AM

5. Relevant blog posts are also in the LOC archive (n/t)

Reply to this post

Back to top Alert abuse Link here Permalink


Response to Spider Jerusalem (Reply #5)

Thu Apr 26, 2018, 03:45 AM

6. It would operate the same way relative to externally retrieved content

Reply to this post

Back to top Alert abuse Link here Permalink


Response to jberryhill (Original post)

Thu Apr 26, 2018, 03:55 AM

7. Where is the archive loading the pages from if not from an archived snapshot?

Her actual blog was taken down awhile ago. Any links to her server would presumably 404. Moreover, lets assume the archive was loading pages from her actual site -- then it would be loading pages from the lastest version of her site. Which means the first page would have dates from say 2010 or whenever it was last updated, which would be immediately apparent to anyone who was loading a 2006 snapshot.

All of this hand-waving about the details of the Wayback Machine is starting to give me flashbacks of the Great Superscript Hunt when suddenly everyone was trying to concoct ways that a 60's typewriter could produce a document with Microsoft Word default settings.

Reply to this post

Back to top Alert abuse Link here Permalink


Response to Azathoth (Reply #7)

Thu Apr 26, 2018, 04:18 AM

8. ....which may not be relevant

Last edited Thu Apr 26, 2018, 05:07 AM - Edit history (1)

Do you see those words in my OP?

Yes or no?

I haven't drilled down through every detail in this fundamentally irrelevant controversy about a media figure for whom no one voted to obtain her position.

To the extent I've looked at it at all, my only comment is that I have encountered circumstances in which IA content is not what it seems to be. Whether there are other circumstances, I don't know.

You will also notice a further statement to that effect at the end of my post, which you also seem not to have noticed.

That picture, incidentally, is one I took of an IA server location in Redwood City circa 2011. As you might imagine, it was not at that time what one would consider to be a high-security facility.

Reply to this post

Back to top Alert abuse Link here Permalink

Reply to this thread